Re: My First scrapper not working

umair Mon, 23 Dec 2013 06:34:12 -0800

Same issue. JavaScript. You have to remove those values if you want to.


On Monday, December 23, 2013 6:16:00 PM UTC+5, Gaurang shah wrote:
>
> Thank Umair, 
>
> It worked, however i am facing another problem. 
>
> When I use xpath //h3/a/@href in firefox it gives me the URL for the 
> search result. However when I use scrapper it gives me URL with some other 
> values which I am really not interested in. 
>
> I would really appreciate if you will let me know why I am getting these 
> other values in the URL and also how to get rid of that. 
>
> using xpath;
> http://jquery.com
>
> using scraper: 
> */url?q*=https://jquery.org/
> *&sa=U&ei=9jW4UoW3FoGMrQfIsYD4Bg&ved=0CEYQFjAI&usg=AFQjCNF682LcCp3OrQvjAsqaCAJkke9gfQ*
>
>
> *GauranG Shah*
> On Monday, 23 December 2013 15:12:42 UTC+5:30, [email protected]:
>>
>> Google use JavaScript to load results and because Scrapy itself doesn't 
>> support JavaScript, you can't see results.
>>
>> Use this url: https://www.google.com/search?output=search&q=jquery
>>
>> Umair
>>
>>
>> On Sunday, December 22, 2013 1:11:44 PM UTC+5, Gaurang shah wrote:
>>>
>>> Hi Guys, 
>>>
>>> I have just started with the scrapy and facing a problem.  Let me first 
>>> tell you what i am trying to develope. 
>>>
>>>    - search for the keyword on the google. - 
>>>    
>>> https://www.google.com/?q=keyword/#q=<https://www.google.com/?q=selenium/#q=selenium>
>>>    keyword 
>>>    - get all the URL
>>>
>>> However the problem is when i hit the above url in the browser i am able 
>>> to get the result for keyword but when i mention this in scraper it shows 
>>> me google home page only. 
>>> I would really appreciate is someone would help me understand what is 
>>> going on? 
>>>
>>> Following my code
>>>
>>> from scrapy.spider import BaseSpider
>>> from scrapy.selector import HtmlXPathSelector
>>> from firstScrapper.items import FirstscrapperItem
>>>
>>> class googleSpider(BaseSpider):
>>> name = "googleSpider"
>>> allowed_domains = ["google.co.in"]
>>> start_urls = ["https://www.google.com/?q=selenium/#q=selenium";]
>>>
>>>
>>> def parse(sef, response):
>>> hxs = HtmlXPathSelector(response)
>>> links = hxs.select("//a")
>>>
>>> items = []
>>> for link in links:
>>> item = FirstscrapperItem()
>>> item["urls"] = link.select("@href").extract()
>>> item["title"] = link.select("text()").extract()
>>> items.append(item)
>>> return items
>>> #print link.select("text()").extract()
>>> # print link.select("@href").extract()
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Re: My First scrapper not working

Reply via email to