Re: How to scrape Ajax content

Travis Leleu Mon, 01 Dec 2014 17:12:31 -0800

Hi Chetan,

What happens when you only have the URL for page 2 in your start_urls?
That page seems to load fine without javascript, so I'm not convinced you
need any sort of ajax support.


Please provide the output you expect from the running script, and the
actual output -- that will help evaluate whether the bug is in your
understanding of scrapy's internals (something that happens a lot to me!
It's a confusing piece of software at times because there is so much going
on...) or if something else is occurring.

Cheers,
Travis

On Mon, Dec 1, 2014 at 5:07 PM, Chetan Motamarri <[email protected]> wrote:

> Hi All
>
> I need to extract *id's of games* in "
> http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC
> ".
>
> The point is, I was able to extract game id's in first page. I don't have
> any idea on how to move to next page and extract ids in those pages. My
> code is:
>
> class ScrapePriceSpider(BaseSpider):
>
>     name = 'UpdateGames'
>     allowed_domains = ['http://store.steampowered.com']
>     start_urls = 
> ['*http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC&;
> <http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC&;>page=1'*
> ]
>
>     def parse(self, response):
>         hxs = Selector(response)
>
>         path = hxs.xpath(".//div[@id='search_result_container']")
>         item = ItemscountItem()
>
>         for ids in path:
>             gameIds = pack.xpath('.//a/@data-ds-appid').extract() #
> extracting all game ids
>
>              item["GameID"] = str(gameIds)
>              return item
>
> Like this *my goal is to extract all game ids in 353 pages given there. *I
> think Ajax is used for pagination. I was not able to extract game ids from
> 2nd page onwards. I tried giving 
> *"http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC&;
> <http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC&;>page=2*"
> is given in start_urls but no use.
>
>
> Please help me with this.
>
>
> Thanks
> Chetan Motamarri
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: How to scrape Ajax content

Reply via email to