it's the same problem try response.url and you will see that it's another link because of the special chars.
On Tuesday, March 17, 2015 at 2:54:21 PM UTC+2, DataScience wrote: > > Yes, I saw the difference. In this sense, I've changed the URL with the > one you suggested then with another one ( > https://www.linkedin.com/job/all-jobs/?sort=date)=> I obtained the same > output when I run *print response URL*, but still having an empty list as > a result of the* sel.xpath*. > Please, find a screenshot expalining thre procedure I followed right here: > <http://fr.tinypic.com/view.php?pic=yi9dv&s=8> > http://fr.tinypic.com/view.php?pic=yi9dv&s=8 > > Regards, > K. > > 2015-03-17 12:59 GMT+01:00 Morad Edwar <[email protected] <javascript:>>: > >> do you see the difference?? >> scrapy shell didn't parse the full url because the special chars in the >> url, try the following : >> >> scrapy shell https:// >> www.linkedin.com/job/jobs-in-san-francisco-ca/\?page_num\=1\&trk\=jserp_pagination_1 >> >> <http://www.linkedin.com/job/jobs-in-san-francisco-ca/%5C?page_num%5C=1%5C&trk%5C=jserp_pagination_1> >> >> >> On Tuesday, March 17, 2015 at 1:52:20 PM UTC+2, DataScience wrote: >>> >>> I obtain the following output: >>> *https://www.linkedin.com/job/jobs-in-san-fransisco-ca/?page_num=1 >>> <https://www.linkedin.com/job/jobs-in-san-fransisco-ca/?page_num=1> * >>> >>> Regards, >>> K. >>> 2015-03-17 12:42 GMT+01:00 Morad Edwar <[email protected]>: >>> >>>> Please do it again but after step one run the following code : >>>> print response.url >>>> And make give us the output. >>>> >>>> Morad Edwar, >>>> Software Developer | Bkam.com >>>> On Mar 17, 2015 1:13 PM, "Kais DAI" <[email protected]> wrote: >>>> >>>>> This is what I did: >>>>> >>>>> 1. I opened the command line in windows and run the follwing >>>>> command: *scrapy >>>>> shell >>>>> https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1 >>>>> >>>>> >>>>> <https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1>* >>>>> 2. Then, I run this command: >>>>> >>>>> *sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1]/div[@class="content"]/span/a[@class="title"]/text()’).extract() >>>>> * In >>>>> this case, an empty list is returned *[] *Also, the same thing >>>>> with this xpath selection: >>>>> >>>>> *sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[1]/div/span/a').extract()* >>>>> >>>>> Did you obtained a result by following the same steps? >>>>> Thank you for your help. >>>>> >>>>> Regards, >>>>> K. >>>>> >>>>> 2015-03-17 11:34 GMT+01:00 Morad Edwar <[email protected]>: >>>>> >>>>>> I used 'scrapy shell' and your xpath worked fine!! >>>>>> and when i changed 'li[1]' to 'li' it scrapped all the jobs titles. >>>>>> >>>>>> >>>>>> On Monday, March 16, 2015 at 6:19:01 PM UTC+2, DataScience wrote: >>>>>>> >>>>>>> Actually, I've checked the "response.body" and it doesn't matches >>>>>>> the content that I have in the webpage. >>>>>>> I am really confused, what can I do in this case? >>>>>>> >>>>>>> Le lundi 16 mars 2015 17:15:14 UTC+1, Travis Leleu a écrit : >>>>>>>> >>>>>>>> It doesn't look to me like it's writing the HTML to the DOM with >>>>>>>> j.s., as you noted. >>>>>>>> >>>>>>>> The big concern I have is that you are assuming the HTML content in >>>>>>>> your browser is the same as in your code. How have you asserted this? >>>>>>>> >>>>>>>> On Mon, Mar 16, 2015 at 9:02 AM, DataScience <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thank you Travis for you quick feedback. >>>>>>>>> >>>>>>>>> I am testing scrapy on this specefic webpage and try to get the >>>>>>>>> job offers (and not profiles). >>>>>>>>> I read in some forums that it may be due to the website which is >>>>>>>>> using Javascript to build most of the page, so the elements I >>>>>>>>> want do not appear in the HTML source of the page. I've checked by >>>>>>>>> disabling >>>>>>>>> Javascript and reloading the page, but the result has been displayed >>>>>>>>> on the >>>>>>>>> page (I've also checked the network in firbug by filtering XHR and >>>>>>>>> looked >>>>>>>>> into the POST...and nothing). >>>>>>>>> >>>>>>>>> Any help would be more than welcome. >>>>>>>>> Thank you. >>>>>>>>> >>>>>>>>> >>>>>>>>> Le lundi 16 mars 2015 16:26:41 UTC+1, Travis Leleu a écrit : >>>>>>>>>> >>>>>>>>>> Linkedin can be a tough site to scrape, as they generally don't >>>>>>>>>> want their data in other people's hands. You will need to use a >>>>>>>>>> user-agent >>>>>>>>>> switcher (you don't mention what UA you are sending), and most >>>>>>>>>> likely >>>>>>>>>> require a proxy in addition. >>>>>>>>>> >>>>>>>>>> If you are looking to scrape the entirety of linkedin, it's > 30 >>>>>>>>>> million profiles. I've found it more economical to purchase a >>>>>>>>>> linkedin >>>>>>>>>> data dump from scrapinghub.com than to scrape it myself. >>>>>>>>>> >>>>>>>>>> On Mon, Mar 16, 2015 at 8:05 AM, DataScience <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Scrapy Guys, >>>>>>>>>>> >>>>>>>>>>> Scrapy returns me an empty list while using shell to pick a >>>>>>>>>>> simple "title" field from this web page: http://goo.gl/dBR8P4 >>>>>>>>>>> I've used: >>>>>>>>>>> >>>>>>>>>>> - sel.xpath(‘//div[@id="results- >>>>>>>>>>> rail"]/ul[@class="jobs"]/li[1]/div[@class="content"]/span/a[ >>>>>>>>>>> @class="title"]/text()’).extract() >>>>>>>>>>> - sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[ >>>>>>>>>>> 1]/div/span/a').extract() >>>>>>>>>>> - ... >>>>>>>>>>> >>>>>>>>>>> I verified the issue of the POST with XHR using firebug, and I >>>>>>>>>>> think there are no relationships with information generated using >>>>>>>>>>> js code >>>>>>>>>>> (what do you think?). >>>>>>>>>>> >>>>>>>>>>> Can you please help me to figure out with this problem? >>>>>>>>>>> Thank you in Advance. >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> K. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>> Google Groups "scrapy-users" group. >>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>>> To post to this group, send email to [email protected]. >>>>>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "scrapy-users" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To post to this group, send email to [email protected]. >>>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>> You received this message because you are subscribed to a topic in >>>>>> the Google Groups "scrapy-users" group. >>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/ >>>>>> topic/scrapy-users/BSmdIyfxiC4/unsubscribe. >>>>>> To unsubscribe from this group and all its topics, send an email to >>>>>> [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> >>> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "scrapy-users" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/scrapy-users/BSmdIyfxiC4/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
