Re: Scrapy shell returns empty list!?

DataScience Mon, 16 Mar 2015 09:20:43 -0700

Actually, I've checked the "response.body" and it doesn't matches the 
content that I have in the webpage.
I am really confused, what can I do in this case?


Le lundi 16 mars 2015 17:15:14 UTC+1, Travis Leleu a écrit :
>
> It doesn't look to me like it's writing the HTML to the DOM with j.s., as 
> you noted.
>
> The big concern I have is that you are assuming the HTML content in your 
> browser is the same as in your code.  How have you asserted this?
>
> On Mon, Mar 16, 2015 at 9:02 AM, DataScience <[email protected] 
> <javascript:>> wrote:
>
>> Thank you Travis for you quick feedback.
>>
>> I am testing scrapy on this specefic webpage and try to get the job 
>> offers (and not profiles).
>> I read in some forums that it may be due to the website which is using 
>> Javascript to build most of the page, so the elements I want do not 
>> appear in the HTML source of the page. I've checked by disabling 
>> Javascript and reloading the page, but the result has been displayed on the 
>> page (I've also checked the network in firbug by filtering XHR and looked 
>> into the POST...and nothing).
>>
>> Any help would be more than welcome.
>> Thank you.
>>
>>
>> Le lundi 16 mars 2015 16:26:41 UTC+1, Travis Leleu a écrit :
>>>
>>> Linkedin can be a tough site to scrape, as they generally don't want 
>>> their data in other people's hands.  You will need to use a user-agent 
>>> switcher (you don't mention what UA you are sending), and most likely 
>>> require a proxy in addition.
>>>
>>> If you are looking to scrape the entirety of linkedin, it's > 30 million 
>>> profiles.  I've found it more economical to purchase a linkedin data dump 
>>> from scrapinghub.com than to scrape it myself.
>>>
>>> On Mon, Mar 16, 2015 at 8:05 AM, DataScience <[email protected]> wrote:
>>>
>>>> Hi Scrapy Guys,
>>>>
>>>> Scrapy returns me an empty list while using shell to pick a simple 
>>>> "title" field from this web page: http://goo.gl/dBR8P4
>>>> I've used:
>>>>
>>>>    - sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1]
>>>>    /div[@class="content"]/span/a[@class="title"]/text()’).extract()
>>>>    - sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/
>>>>    li[1]/div/span/a').extract()
>>>>    - ...
>>>>
>>>> I verified the issue of the POST with XHR using firebug, and I think 
>>>> there are no relationships with information generated using js code (what 
>>>> do you think?).
>>>>
>>>> Can you please help me to figure out with this problem?
>>>> Thank you in Advance.
>>>>
>>>> Best Regards,
>>>> K.
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "scrapy-users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Scrapy shell returns empty list!?

Reply via email to