I used 'scrapy shell' and your xpath worked fine!! and when i changed 'li[1]' to 'li' it scrapped all the jobs titles.
On Monday, March 16, 2015 at 6:19:01 PM UTC+2, DataScience wrote: > > Actually, I've checked the "response.body" and it doesn't matches the > content that I have in the webpage. > I am really confused, what can I do in this case? > > Le lundi 16 mars 2015 17:15:14 UTC+1, Travis Leleu a écrit : >> >> It doesn't look to me like it's writing the HTML to the DOM with j.s., as >> you noted. >> >> The big concern I have is that you are assuming the HTML content in your >> browser is the same as in your code. How have you asserted this? >> >> On Mon, Mar 16, 2015 at 9:02 AM, DataScience <[email protected]> wrote: >> >>> Thank you Travis for you quick feedback. >>> >>> I am testing scrapy on this specefic webpage and try to get the job >>> offers (and not profiles). >>> I read in some forums that it may be due to the website which is using >>> Javascript to build most of the page, so the elements I want do not >>> appear in the HTML source of the page. I've checked by disabling >>> Javascript and reloading the page, but the result has been displayed on the >>> page (I've also checked the network in firbug by filtering XHR and looked >>> into the POST...and nothing). >>> >>> Any help would be more than welcome. >>> Thank you. >>> >>> >>> Le lundi 16 mars 2015 16:26:41 UTC+1, Travis Leleu a écrit : >>>> >>>> Linkedin can be a tough site to scrape, as they generally don't want >>>> their data in other people's hands. You will need to use a user-agent >>>> switcher (you don't mention what UA you are sending), and most likely >>>> require a proxy in addition. >>>> >>>> If you are looking to scrape the entirety of linkedin, it's > 30 >>>> million profiles. I've found it more economical to purchase a linkedin >>>> data dump from scrapinghub.com than to scrape it myself. >>>> >>>> On Mon, Mar 16, 2015 at 8:05 AM, DataScience <[email protected]> wrote: >>>> >>>>> Hi Scrapy Guys, >>>>> >>>>> Scrapy returns me an empty list while using shell to pick a simple >>>>> "title" field from this web page: http://goo.gl/dBR8P4 >>>>> I've used: >>>>> >>>>> - sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1] >>>>> /div[@class="content"]/span/a[@class="title"]/text()’).extract() >>>>> - sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/ >>>>> li[1]/div/span/a').extract() >>>>> - ... >>>>> >>>>> I verified the issue of the POST with XHR using firebug, and I think >>>>> there are no relationships with information generated using js code (what >>>>> do you think?). >>>>> >>>>> Can you please help me to figure out with this problem? >>>>> Thank you in Advance. >>>>> >>>>> Best Regards, >>>>> K. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "scrapy-users" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "scrapy-users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/scrapy-users. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
