Yes, I saw the difference. In this sense, I've changed the URL with the one you suggested then with another one ( https://www.linkedin.com/job/all-jobs/?sort=date)=> I obtained the same output when I run *print response URL*, but still having an empty list as a result of the* sel.xpath*. Please, find a screenshot expalining thre procedure I followed right here: <http://fr.tinypic.com/view.php?pic=yi9dv&s=8> http://fr.tinypic.com/view.php?pic=yi9dv&s=8
Regards, K. 2015-03-17 12:59 GMT+01:00 Morad Edwar <[email protected]>: > do you see the difference?? > scrapy shell didn't parse the full url because the special chars in the > url, try the following : > > scrapy shell https:// > www.linkedin.com/job/jobs-in-san-francisco-ca/\?page_num\=1\&trk\=jserp_pagination_1 > <http://www.linkedin.com/job/jobs-in-san-francisco-ca/%5C?page_num%5C=1%5C&trk%5C=jserp_pagination_1> > > > On Tuesday, March 17, 2015 at 1:52:20 PM UTC+2, DataScience wrote: >> >> I obtain the following output: >> *https://www.linkedin.com/job/jobs-in-san-fransisco-ca/?page_num=1 >> <https://www.linkedin.com/job/jobs-in-san-fransisco-ca/?page_num=1> * >> >> Regards, >> K. >> 2015-03-17 12:42 GMT+01:00 Morad Edwar <[email protected]>: >> >>> Please do it again but after step one run the following code : >>> print response.url >>> And make give us the output. >>> >>> Morad Edwar, >>> Software Developer | Bkam.com >>> On Mar 17, 2015 1:13 PM, "Kais DAI" <[email protected]> wrote: >>> >>>> This is what I did: >>>> >>>> 1. I opened the command line in windows and run the follwing >>>> command: *scrapy >>>> shell >>>> https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1 >>>> >>>> <https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1>* >>>> 2. Then, I run this command: >>>> >>>> *sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1]/div[@class="content"]/span/a[@class="title"]/text()’).extract() >>>> * In >>>> this case, an empty list is returned *[] *Also, the same thing with >>>> this xpath selection: >>>> >>>> *sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[1]/div/span/a').extract()* >>>> >>>> Did you obtained a result by following the same steps? >>>> Thank you for your help. >>>> >>>> Regards, >>>> K. >>>> >>>> 2015-03-17 11:34 GMT+01:00 Morad Edwar <[email protected]>: >>>> >>>>> I used 'scrapy shell' and your xpath worked fine!! >>>>> and when i changed 'li[1]' to 'li' it scrapped all the jobs titles. >>>>> >>>>> >>>>> On Monday, March 16, 2015 at 6:19:01 PM UTC+2, DataScience wrote: >>>>>> >>>>>> Actually, I've checked the "response.body" and it doesn't matches the >>>>>> content that I have in the webpage. >>>>>> I am really confused, what can I do in this case? >>>>>> >>>>>> Le lundi 16 mars 2015 17:15:14 UTC+1, Travis Leleu a écrit : >>>>>>> >>>>>>> It doesn't look to me like it's writing the HTML to the DOM with >>>>>>> j.s., as you noted. >>>>>>> >>>>>>> The big concern I have is that you are assuming the HTML content in >>>>>>> your browser is the same as in your code. How have you asserted this? >>>>>>> >>>>>>> On Mon, Mar 16, 2015 at 9:02 AM, DataScience <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Thank you Travis for you quick feedback. >>>>>>>> >>>>>>>> I am testing scrapy on this specefic webpage and try to get the job >>>>>>>> offers (and not profiles). >>>>>>>> I read in some forums that it may be due to the website which is >>>>>>>> using Javascript to build most of the page, so the elements I want >>>>>>>> do not appear in the HTML source of the page. I've checked by disabling >>>>>>>> Javascript and reloading the page, but the result has been displayed >>>>>>>> on the >>>>>>>> page (I've also checked the network in firbug by filtering XHR and >>>>>>>> looked >>>>>>>> into the POST...and nothing). >>>>>>>> >>>>>>>> Any help would be more than welcome. >>>>>>>> Thank you. >>>>>>>> >>>>>>>> >>>>>>>> Le lundi 16 mars 2015 16:26:41 UTC+1, Travis Leleu a écrit : >>>>>>>>> >>>>>>>>> Linkedin can be a tough site to scrape, as they generally don't >>>>>>>>> want their data in other people's hands. You will need to use a >>>>>>>>> user-agent >>>>>>>>> switcher (you don't mention what UA you are sending), and most likely >>>>>>>>> require a proxy in addition. >>>>>>>>> >>>>>>>>> If you are looking to scrape the entirety of linkedin, it's > 30 >>>>>>>>> million profiles. I've found it more economical to purchase a >>>>>>>>> linkedin >>>>>>>>> data dump from scrapinghub.com than to scrape it myself. >>>>>>>>> >>>>>>>>> On Mon, Mar 16, 2015 at 8:05 AM, DataScience <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Scrapy Guys, >>>>>>>>>> >>>>>>>>>> Scrapy returns me an empty list while using shell to pick a >>>>>>>>>> simple "title" field from this web page: http://goo.gl/dBR8P4 >>>>>>>>>> I've used: >>>>>>>>>> >>>>>>>>>> - sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1] >>>>>>>>>> /div[@class="content"]/span/a[@class="title"]/text()’).extra >>>>>>>>>> ct() >>>>>>>>>> - sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[1 >>>>>>>>>> ]/div/span/a').extract() >>>>>>>>>> - ... >>>>>>>>>> >>>>>>>>>> I verified the issue of the POST with XHR using firebug, and I >>>>>>>>>> think there are no relationships with information generated using js >>>>>>>>>> code >>>>>>>>>> (what do you think?). >>>>>>>>>> >>>>>>>>>> Can you please help me to figure out with this problem? >>>>>>>>>> Thank you in Advance. >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> K. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "scrapy-users" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to [email protected]. >>>>>>>>>> To post to this group, send email to [email protected]. >>>>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "scrapy-users" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> To post to this group, send email to [email protected]. >>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> -- >>>>> You received this message because you are subscribed to a topic in the >>>>> Google Groups "scrapy-users" group. >>>>> To unsubscribe from this topic, visit https://groups.google.com/d/ >>>>> topic/scrapy-users/BSmdIyfxiC4/unsubscribe. >>>>> To unsubscribe from this group and all its topics, send an email to >>>>> [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >> -- > You received this message because you are subscribed to a topic in the > Google Groups "scrapy-users" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/scrapy-users/BSmdIyfxiC4/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
