Thank Umair, It worked, however i am facing another problem.
When I use xpath //h3/a/@href in firefox it gives me the URL for the search result. However when I use scrapper it gives me URL with some other values which I am really not interested in. I would really appreciate if you will let me know why I am getting these other values in the URL and also how to get rid of that. using xpath; http://jquery.com using scraper: */url?q*=https://jquery.org/ *&sa=U&ei=9jW4UoW3FoGMrQfIsYD4Bg&ved=0CEYQFjAI&usg=AFQjCNF682LcCp3OrQvjAsqaCAJkke9gfQ* *GauranG Shah* On Monday, 23 December 2013 15:12:42 UTC+5:30, [email protected] wrote: > > Google use JavaScript to load results and because Scrapy itself doesn't > support JavaScript, you can't see results. > > Use this url: https://www.google.com/search?output=search&q=jquery > > Umair > > > On Sunday, December 22, 2013 1:11:44 PM UTC+5, Gaurang shah wrote: >> >> Hi Guys, >> >> I have just started with the scrapy and facing a problem. Let me first >> tell you what i am trying to develope. >> >> - search for the keyword on the google. - >> >> https://www.google.com/?q=keyword/#q=<https://www.google.com/?q=selenium/#q=selenium> >> keyword >> - get all the URL >> >> However the problem is when i hit the above url in the browser i am able >> to get the result for keyword but when i mention this in scraper it shows >> me google home page only. >> I would really appreciate is someone would help me understand what is >> going on? >> >> Following my code >> >> from scrapy.spider import BaseSpider >> from scrapy.selector import HtmlXPathSelector >> from firstScrapper.items import FirstscrapperItem >> >> class googleSpider(BaseSpider): >> name = "googleSpider" >> allowed_domains = ["google.co.in"] >> start_urls = ["https://www.google.com/?q=selenium/#q=selenium"] >> >> >> def parse(sef, response): >> hxs = HtmlXPathSelector(response) >> links = hxs.select("//a") >> >> items = [] >> for link in links: >> item = FirstscrapperItem() >> item["urls"] = link.select("@href").extract() >> item["title"] = link.select("text()").extract() >> items.append(item) >> return items >> #print link.select("text()").extract() >> # print link.select("@href").extract() >> >> -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/groups/opt_out.
