do you see the difference?? scrapy shell didn't parse the full url because the special chars in the url, try the following :
scrapy shell https: //www.linkedin.com/job/jobs-in-san-francisco-ca/\?page_num\=1\&trk\=jserp_pagination_1 On Tuesday, March 17, 2015 at 1:52:20 PM UTC+2, DataScience wrote: > > I obtain the following output: > *https://www.linkedin.com/job/jobs-in-san-fransisco-ca/?page_num=1 > <https://www.linkedin.com/job/jobs-in-san-fransisco-ca/?page_num=1> * > > Regards, > K. > 2015-03-17 12:42 GMT+01:00 Morad Edwar <[email protected] <javascript:>>: > >> Please do it again but after step one run the following code : >> print response.url >> And make give us the output. >> >> Morad Edwar, >> Software Developer | Bkam.com >> On Mar 17, 2015 1:13 PM, "Kais DAI" <[email protected] <javascript:>> >> wrote: >> >>> This is what I did: >>> >>> 1. I opened the command line in windows and run the follwing >>> command: *scrapy >>> shell >>> https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1 >>> >>> >>> <https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1>* >>> 2. Then, I run this command: >>> >>> *sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1]/div[@class="content"]/span/a[@class="title"]/text()’).extract() >>> * In >>> this case, an empty list is returned *[] *Also, the same thing with >>> this xpath selection: >>> >>> *sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[1]/div/span/a').extract()* >>> >>> Did you obtained a result by following the same steps? >>> Thank you for your help. >>> >>> Regards, >>> K. >>> >>> 2015-03-17 11:34 GMT+01:00 Morad Edwar <[email protected] <javascript:>>: >>> >>>> I used 'scrapy shell' and your xpath worked fine!! >>>> and when i changed 'li[1]' to 'li' it scrapped all the jobs titles. >>>> >>>> >>>> On Monday, March 16, 2015 at 6:19:01 PM UTC+2, DataScience wrote: >>>>> >>>>> Actually, I've checked the "response.body" and it doesn't matches the >>>>> content that I have in the webpage. >>>>> I am really confused, what can I do in this case? >>>>> >>>>> Le lundi 16 mars 2015 17:15:14 UTC+1, Travis Leleu a écrit : >>>>>> >>>>>> It doesn't look to me like it's writing the HTML to the DOM with >>>>>> j.s., as you noted. >>>>>> >>>>>> The big concern I have is that you are assuming the HTML content in >>>>>> your browser is the same as in your code. How have you asserted this? >>>>>> >>>>>> On Mon, Mar 16, 2015 at 9:02 AM, DataScience <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Thank you Travis for you quick feedback. >>>>>>> >>>>>>> I am testing scrapy on this specefic webpage and try to get the job >>>>>>> offers (and not profiles). >>>>>>> I read in some forums that it may be due to the website which is >>>>>>> using Javascript to build most of the page, so the elements I want >>>>>>> do not appear in the HTML source of the page. I've checked by disabling >>>>>>> Javascript and reloading the page, but the result has been displayed on >>>>>>> the >>>>>>> page (I've also checked the network in firbug by filtering XHR and >>>>>>> looked >>>>>>> into the POST...and nothing). >>>>>>> >>>>>>> Any help would be more than welcome. >>>>>>> Thank you. >>>>>>> >>>>>>> >>>>>>> Le lundi 16 mars 2015 16:26:41 UTC+1, Travis Leleu a écrit : >>>>>>>> >>>>>>>> Linkedin can be a tough site to scrape, as they generally don't >>>>>>>> want their data in other people's hands. You will need to use a >>>>>>>> user-agent >>>>>>>> switcher (you don't mention what UA you are sending), and most likely >>>>>>>> require a proxy in addition. >>>>>>>> >>>>>>>> If you are looking to scrape the entirety of linkedin, it's > 30 >>>>>>>> million profiles. I've found it more economical to purchase a >>>>>>>> linkedin >>>>>>>> data dump from scrapinghub.com than to scrape it myself. >>>>>>>> >>>>>>>> On Mon, Mar 16, 2015 at 8:05 AM, DataScience <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Scrapy Guys, >>>>>>>>> >>>>>>>>> Scrapy returns me an empty list while using shell to pick a simple >>>>>>>>> "title" field from this web page: http://goo.gl/dBR8P4 >>>>>>>>> I've used: >>>>>>>>> >>>>>>>>> - sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1] >>>>>>>>> /div[@class="content"]/span/a[@class="title"]/text()’).extra >>>>>>>>> ct() >>>>>>>>> - sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[ >>>>>>>>> 1]/div/span/a').extract() >>>>>>>>> - ... >>>>>>>>> >>>>>>>>> I verified the issue of the POST with XHR using firebug, and I >>>>>>>>> think there are no relationships with information generated using js >>>>>>>>> code >>>>>>>>> (what do you think?). >>>>>>>>> >>>>>>>>> Can you please help me to figure out with this problem? >>>>>>>>> Thank you in Advance. >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> K. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "scrapy-users" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To post to this group, send email to [email protected]. >>>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "scrapy-users" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "scrapy-users" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/scrapy-users/BSmdIyfxiC4/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> [email protected] <javascript:>. >>>> To post to this group, send email to [email protected] >>>> <javascript:>. >>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
