I obtain the following output: *https://www.linkedin.com/job/jobs-in-san-fransisco-ca/?page_num=1 <https://www.linkedin.com/job/jobs-in-san-fransisco-ca/?page_num=1> *
Regards, K. 2015-03-17 12:42 GMT+01:00 Morad Edwar <[email protected]>: > Please do it again but after step one run the following code : > print response.url > And make give us the output. > > Morad Edwar, > Software Developer | Bkam.com > On Mar 17, 2015 1:13 PM, "Kais DAI" <[email protected]> wrote: > >> This is what I did: >> >> 1. I opened the command line in windows and run the follwing command: >> *scrapy >> shell >> https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1 >> >> <https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1>* >> 2. Then, I run this command: >> >> *sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1]/div[@class="content"]/span/a[@class="title"]/text()’).extract() >> * In >> this case, an empty list is returned *[] *Also, the same thing with >> this xpath selection: >> >> *sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[1]/div/span/a').extract()* >> >> Did you obtained a result by following the same steps? >> Thank you for your help. >> >> Regards, >> K. >> >> 2015-03-17 11:34 GMT+01:00 Morad Edwar <[email protected]>: >> >>> I used 'scrapy shell' and your xpath worked fine!! >>> and when i changed 'li[1]' to 'li' it scrapped all the jobs titles. >>> >>> >>> On Monday, March 16, 2015 at 6:19:01 PM UTC+2, DataScience wrote: >>>> >>>> Actually, I've checked the "response.body" and it doesn't matches the >>>> content that I have in the webpage. >>>> I am really confused, what can I do in this case? >>>> >>>> Le lundi 16 mars 2015 17:15:14 UTC+1, Travis Leleu a écrit : >>>>> >>>>> It doesn't look to me like it's writing the HTML to the DOM with j.s., >>>>> as you noted. >>>>> >>>>> The big concern I have is that you are assuming the HTML content in >>>>> your browser is the same as in your code. How have you asserted this? >>>>> >>>>> On Mon, Mar 16, 2015 at 9:02 AM, DataScience <[email protected]> >>>>> wrote: >>>>> >>>>>> Thank you Travis for you quick feedback. >>>>>> >>>>>> I am testing scrapy on this specefic webpage and try to get the job >>>>>> offers (and not profiles). >>>>>> I read in some forums that it may be due to the website which is >>>>>> using Javascript to build most of the page, so the elements I want >>>>>> do not appear in the HTML source of the page. I've checked by disabling >>>>>> Javascript and reloading the page, but the result has been displayed on >>>>>> the >>>>>> page (I've also checked the network in firbug by filtering XHR and looked >>>>>> into the POST...and nothing). >>>>>> >>>>>> Any help would be more than welcome. >>>>>> Thank you. >>>>>> >>>>>> >>>>>> Le lundi 16 mars 2015 16:26:41 UTC+1, Travis Leleu a écrit : >>>>>>> >>>>>>> Linkedin can be a tough site to scrape, as they generally don't want >>>>>>> their data in other people's hands. You will need to use a user-agent >>>>>>> switcher (you don't mention what UA you are sending), and most likely >>>>>>> require a proxy in addition. >>>>>>> >>>>>>> If you are looking to scrape the entirety of linkedin, it's > 30 >>>>>>> million profiles. I've found it more economical to purchase a linkedin >>>>>>> data dump from scrapinghub.com than to scrape it myself. >>>>>>> >>>>>>> On Mon, Mar 16, 2015 at 8:05 AM, DataScience <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Scrapy Guys, >>>>>>>> >>>>>>>> Scrapy returns me an empty list while using shell to pick a simple >>>>>>>> "title" field from this web page: http://goo.gl/dBR8P4 >>>>>>>> I've used: >>>>>>>> >>>>>>>> - sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1] >>>>>>>> /div[@class="content"]/span/a[@class="title"]/text()’).extract() >>>>>>>> - sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[ >>>>>>>> 1]/div/span/a').extract() >>>>>>>> - ... >>>>>>>> >>>>>>>> I verified the issue of the POST with XHR using firebug, and I >>>>>>>> think there are no relationships with information generated using js >>>>>>>> code >>>>>>>> (what do you think?). >>>>>>>> >>>>>>>> Can you please help me to figure out with this problem? >>>>>>>> Thank you in Advance. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> K. >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "scrapy-users" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> To post to this group, send email to [email protected]. >>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "scrapy-users" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "scrapy-users" group. >>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/scrapy-users/BSmdIyfxiC4/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/scrapy-users. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
