It doesn't look to me like it's writing the HTML to the DOM with j.s., as
you noted.

The big concern I have is that you are assuming the HTML content in your
browser is the same as in your code.  How have you asserted this?

On Mon, Mar 16, 2015 at 9:02 AM, DataScience <[email protected]> wrote:

> Thank you Travis for you quick feedback.
>
> I am testing scrapy on this specefic webpage and try to get the job offers
> (and not profiles).
> I read in some forums that it may be due to the website which is using
> Javascript to build most of the page, so the elements I want do not
> appear in the HTML source of the page. I've checked by disabling
> Javascript and reloading the page, but the result has been displayed on the
> page (I've also checked the network in firbug by filtering XHR and looked
> into the POST...and nothing).
>
> Any help would be more than welcome.
> Thank you.
>
>
> Le lundi 16 mars 2015 16:26:41 UTC+1, Travis Leleu a écrit :
>>
>> Linkedin can be a tough site to scrape, as they generally don't want
>> their data in other people's hands.  You will need to use a user-agent
>> switcher (you don't mention what UA you are sending), and most likely
>> require a proxy in addition.
>>
>> If you are looking to scrape the entirety of linkedin, it's > 30 million
>> profiles.  I've found it more economical to purchase a linkedin data dump
>> from scrapinghub.com than to scrape it myself.
>>
>> On Mon, Mar 16, 2015 at 8:05 AM, DataScience <[email protected]> wrote:
>>
>>> Hi Scrapy Guys,
>>>
>>> Scrapy returns me an empty list while using shell to pick a simple
>>> "title" field from this web page: http://goo.gl/dBR8P4
>>> I've used:
>>>
>>>    - sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1]
>>>    /div[@class="content"]/span/a[@class="title"]/text()’).extract()
>>>    - sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/
>>>    li[1]/div/span/a').extract()
>>>    - ...
>>>
>>> I verified the issue of the POST with XHR using firebug, and I think
>>> there are no relationships with information generated using js code (what
>>> do you think?).
>>>
>>> Can you please help me to figure out with this problem?
>>> Thank you in Advance.
>>>
>>> Best Regards,
>>> K.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "scrapy-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to