Re: My First scrapper not working

Gaurang shah Mon, 23 Dec 2013 05:16:46 -0800

Thank Umair, 

It worked, however i am facing another problem.


When I use xpath //h3/a/@href in firefox it gives me the URL for the search 
result. However when I use scrapper it gives me URL with some other values 
which I am really not interested in. 

I would really appreciate if you will let me know why I am getting these 
other values in the URL and also how to get rid of that. 

using xpath;
http://jquery.com

using scraper: 
*/url?q*=https://jquery.org/
*&sa=U&ei=9jW4UoW3FoGMrQfIsYD4Bg&ved=0CEYQFjAI&usg=AFQjCNF682LcCp3OrQvjAsqaCAJkke9gfQ*


*GauranG Shah*
On Monday, 23 December 2013 15:12:42 UTC+5:30, [email protected] wrote:
>
> Google use JavaScript to load results and because Scrapy itself doesn't 
> support JavaScript, you can't see results.
>
> Use this url: https://www.google.com/search?output=search&q=jquery
>
> Umair
>
>
> On Sunday, December 22, 2013 1:11:44 PM UTC+5, Gaurang shah wrote:
>>
>> Hi Guys, 
>>
>> I have just started with the scrapy and facing a problem.  Let me first 
>> tell you what i am trying to develope. 
>>
>>    - search for the keyword on the google. - 
>>    
>> https://www.google.com/?q=keyword/#q=<https://www.google.com/?q=selenium/#q=selenium>
>>    keyword 
>>    - get all the URL
>>
>> However the problem is when i hit the above url in the browser i am able 
>> to get the result for keyword but when i mention this in scraper it shows 
>> me google home page only. 
>> I would really appreciate is someone would help me understand what is 
>> going on? 
>>
>> Following my code
>>
>> from scrapy.spider import BaseSpider
>> from scrapy.selector import HtmlXPathSelector
>> from firstScrapper.items import FirstscrapperItem
>>
>> class googleSpider(BaseSpider):
>> name = "googleSpider"
>> allowed_domains = ["google.co.in"]
>> start_urls = ["https://www.google.com/?q=selenium/#q=selenium";]
>>
>>
>> def parse(sef, response):
>> hxs = HtmlXPathSelector(response)
>> links = hxs.select("//a")
>>
>> items = []
>> for link in links:
>> item = FirstscrapperItem()
>> item["urls"] = link.select("@href").extract()
>> item["title"] = link.select("text()").extract()
>> items.append(item)
>> return items
>> #print link.select("text()").extract()
>> # print link.select("@href").extract()
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Re: My First scrapper not working

Reply via email to