Same issue. JavaScript. You have to remove those values if you want to.
On Monday, December 23, 2013 6:16:00 PM UTC+5, Gaurang shah wrote: > > Thank Umair, > > It worked, however i am facing another problem. > > When I use xpath //h3/a/@href in firefox it gives me the URL for the > search result. However when I use scrapper it gives me URL with some other > values which I am really not interested in. > > I would really appreciate if you will let me know why I am getting these > other values in the URL and also how to get rid of that. > > using xpath; > http://jquery.com > > using scraper: > */url?q*=https://jquery.org/ > *&sa=U&ei=9jW4UoW3FoGMrQfIsYD4Bg&ved=0CEYQFjAI&usg=AFQjCNF682LcCp3OrQvjAsqaCAJkke9gfQ* > > > *GauranG Shah* > On Monday, 23 December 2013 15:12:42 UTC+5:30, [email protected]: >> >> Google use JavaScript to load results and because Scrapy itself doesn't >> support JavaScript, you can't see results. >> >> Use this url: https://www.google.com/search?output=search&q=jquery >> >> Umair >> >> >> On Sunday, December 22, 2013 1:11:44 PM UTC+5, Gaurang shah wrote: >>> >>> Hi Guys, >>> >>> I have just started with the scrapy and facing a problem. Let me first >>> tell you what i am trying to develope. >>> >>> - search for the keyword on the google. - >>> >>> https://www.google.com/?q=keyword/#q=<https://www.google.com/?q=selenium/#q=selenium> >>> keyword >>> - get all the URL >>> >>> However the problem is when i hit the above url in the browser i am able >>> to get the result for keyword but when i mention this in scraper it shows >>> me google home page only. >>> I would really appreciate is someone would help me understand what is >>> going on? >>> >>> Following my code >>> >>> from scrapy.spider import BaseSpider >>> from scrapy.selector import HtmlXPathSelector >>> from firstScrapper.items import FirstscrapperItem >>> >>> class googleSpider(BaseSpider): >>> name = "googleSpider" >>> allowed_domains = ["google.co.in"] >>> start_urls = ["https://www.google.com/?q=selenium/#q=selenium"] >>> >>> >>> def parse(sef, response): >>> hxs = HtmlXPathSelector(response) >>> links = hxs.select("//a") >>> >>> items = [] >>> for link in links: >>> item = FirstscrapperItem() >>> item["urls"] = link.select("@href").extract() >>> item["title"] = link.select("text()").extract() >>> items.append(item) >>> return items >>> #print link.select("text()").extract() >>> # print link.select("@href").extract() >>> >>> -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/groups/opt_out.
