Actually, I've checked the "response.body" and it doesn't matches the content that I have in the webpage. I am really confused, what can I do in this case?
Le lundi 16 mars 2015 17:15:14 UTC+1, Travis Leleu a écrit : > > It doesn't look to me like it's writing the HTML to the DOM with j.s., as > you noted. > > The big concern I have is that you are assuming the HTML content in your > browser is the same as in your code. How have you asserted this? > > On Mon, Mar 16, 2015 at 9:02 AM, DataScience <[email protected] > <javascript:>> wrote: > >> Thank you Travis for you quick feedback. >> >> I am testing scrapy on this specefic webpage and try to get the job >> offers (and not profiles). >> I read in some forums that it may be due to the website which is using >> Javascript to build most of the page, so the elements I want do not >> appear in the HTML source of the page. I've checked by disabling >> Javascript and reloading the page, but the result has been displayed on the >> page (I've also checked the network in firbug by filtering XHR and looked >> into the POST...and nothing). >> >> Any help would be more than welcome. >> Thank you. >> >> >> Le lundi 16 mars 2015 16:26:41 UTC+1, Travis Leleu a écrit : >>> >>> Linkedin can be a tough site to scrape, as they generally don't want >>> their data in other people's hands. You will need to use a user-agent >>> switcher (you don't mention what UA you are sending), and most likely >>> require a proxy in addition. >>> >>> If you are looking to scrape the entirety of linkedin, it's > 30 million >>> profiles. I've found it more economical to purchase a linkedin data dump >>> from scrapinghub.com than to scrape it myself. >>> >>> On Mon, Mar 16, 2015 at 8:05 AM, DataScience <[email protected]> wrote: >>> >>>> Hi Scrapy Guys, >>>> >>>> Scrapy returns me an empty list while using shell to pick a simple >>>> "title" field from this web page: http://goo.gl/dBR8P4 >>>> I've used: >>>> >>>> - sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1] >>>> /div[@class="content"]/span/a[@class="title"]/text()’).extract() >>>> - sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/ >>>> li[1]/div/span/a').extract() >>>> - ... >>>> >>>> I verified the issue of the POST with XHR using firebug, and I think >>>> there are no relationships with information generated using js code (what >>>> do you think?). >>>> >>>> Can you please help me to figure out with this problem? >>>> Thank you in Advance. >>>> >>>> Best Regards, >>>> K. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "scrapy-users" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
