Re: Scrapy shell returns empty list!?

Kais DAI Tue, 17 Mar 2015 05:54:38 -0700

Yes, I saw the difference. In this sense, I've changed the URL with the one
you suggested then with another one (
https://www.linkedin.com/job/all-jobs/?sort=date)=> I obtained the same
output when I run *print response URL*, but still having an empty list as a
result of the* sel.xpath*.
Please, find a screenshot expalining thre procedure I followed right here:
<http://fr.tinypic.com/view.php?pic=yi9dv&s=8>
http://fr.tinypic.com/view.php?pic=yi9dv&s=8


Regards,
K.

2015-03-17 12:59 GMT+01:00 Morad Edwar <[email protected]>:

> do you see the difference??
> scrapy shell didn't parse the full url because the special chars in the
> url, try the following :
>
> scrapy shell https://
> www.linkedin.com/job/jobs-in-san-francisco-ca/\?page_num\=1\&trk\=jserp_pagination_1
> <http://www.linkedin.com/job/jobs-in-san-francisco-ca/%5C?page_num%5C=1%5C&trk%5C=jserp_pagination_1>
>
>
> On Tuesday, March 17, 2015 at 1:52:20 PM UTC+2, DataScience wrote:
>>
>> I obtain the following output:
>> *https://www.linkedin.com/job/jobs-in-san-fransisco-ca/?page_num=1
>> <https://www.linkedin.com/job/jobs-in-san-fransisco-ca/?page_num=1> *
>>
>> Regards,
>> K.
>> 2015-03-17 12:42 GMT+01:00 Morad Edwar <[email protected]>:
>>
>>> Please do it again but after step one run the following code :
>>>     print response.url
>>> And make give us the output.
>>>
>>> Morad Edwar,
>>> Software Developer | Bkam.com
>>> On Mar 17, 2015 1:13 PM, "Kais DAI" <[email protected]> wrote:
>>>
>>>> This is what I did:
>>>>
>>>>    1. I opened the command line in windows and run the follwing
>>>>    command: *scrapy
>>>>    shell 
>>>> https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1
>>>>    
>>>> <https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1>*
>>>>    2. Then, I run this command:
>>>>    
>>>> *sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1]/div[@class="content"]/span/a[@class="title"]/text()’).extract()
>>>>  * In
>>>>    this case, an empty list is returned *[] *Also, the same thing with
>>>>    this xpath selection:
>>>>    
>>>> *sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[1]/div/span/a').extract()*
>>>>
>>>> Did you obtained a result by following the same steps?
>>>> Thank you for your help.
>>>>
>>>> Regards,
>>>> K.
>>>>
>>>> 2015-03-17 11:34 GMT+01:00 Morad Edwar <[email protected]>:
>>>>
>>>>> I used 'scrapy shell' and your xpath worked fine!!
>>>>> and when i changed 'li[1]' to 'li' it scrapped all the jobs titles.
>>>>>
>>>>>
>>>>> On Monday, March 16, 2015 at 6:19:01 PM UTC+2, DataScience wrote:
>>>>>>
>>>>>> Actually, I've checked the "response.body" and it doesn't matches the
>>>>>> content that I have in the webpage.
>>>>>> I am really confused, what can I do in this case?
>>>>>>
>>>>>> Le lundi 16 mars 2015 17:15:14 UTC+1, Travis Leleu a écrit :
>>>>>>>
>>>>>>> It doesn't look to me like it's writing the HTML to the DOM with
>>>>>>> j.s., as you noted.
>>>>>>>
>>>>>>> The big concern I have is that you are assuming the HTML content in
>>>>>>> your browser is the same as in your code.  How have you asserted this?
>>>>>>>
>>>>>>> On Mon, Mar 16, 2015 at 9:02 AM, DataScience <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thank you Travis for you quick feedback.
>>>>>>>>
>>>>>>>> I am testing scrapy on this specefic webpage and try to get the job
>>>>>>>> offers (and not profiles).
>>>>>>>> I read in some forums that it may be due to the website which is
>>>>>>>> using Javascript to build most of the page, so the elements I want
>>>>>>>> do not appear in the HTML source of the page. I've checked by disabling
>>>>>>>> Javascript and reloading the page, but the result has been displayed 
>>>>>>>> on the
>>>>>>>> page (I've also checked the network in firbug by filtering XHR and 
>>>>>>>> looked
>>>>>>>> into the POST...and nothing).
>>>>>>>>
>>>>>>>> Any help would be more than welcome.
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>>
>>>>>>>> Le lundi 16 mars 2015 16:26:41 UTC+1, Travis Leleu a écrit :
>>>>>>>>>
>>>>>>>>> Linkedin can be a tough site to scrape, as they generally don't
>>>>>>>>> want their data in other people's hands.  You will need to use a 
>>>>>>>>> user-agent
>>>>>>>>> switcher (you don't mention what UA you are sending), and most likely
>>>>>>>>> require a proxy in addition.
>>>>>>>>>
>>>>>>>>> If you are looking to scrape the entirety of linkedin, it's > 30
>>>>>>>>> million profiles.  I've found it more economical to purchase a 
>>>>>>>>> linkedin
>>>>>>>>> data dump from scrapinghub.com than to scrape it myself.
>>>>>>>>>
>>>>>>>>> On Mon, Mar 16, 2015 at 8:05 AM, DataScience <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Scrapy Guys,
>>>>>>>>>>
>>>>>>>>>> Scrapy returns me an empty list while using shell to pick a
>>>>>>>>>> simple "title" field from this web page: http://goo.gl/dBR8P4
>>>>>>>>>> I've used:
>>>>>>>>>>
>>>>>>>>>>    - sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1]
>>>>>>>>>>    /div[@class="content"]/span/a[@class="title"]/text()’).extra
>>>>>>>>>>    ct()
>>>>>>>>>>    - sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[1
>>>>>>>>>>    ]/div/span/a').extract()
>>>>>>>>>>    - ...
>>>>>>>>>>
>>>>>>>>>> I verified the issue of the POST with XHR using firebug, and I
>>>>>>>>>> think there are no relationships with information generated using js 
>>>>>>>>>> code
>>>>>>>>>> (what do you think?).
>>>>>>>>>>
>>>>>>>>>> Can you please help me to figure out with this problem?
>>>>>>>>>> Thank you in Advance.
>>>>>>>>>>
>>>>>>>>>> Best Regards,
>>>>>>>>>> K.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>> Google Groups "scrapy-users" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>>> send an email to [email protected].
>>>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "scrapy-users" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>> You received this message because you are subscribed to a topic in the
>>>>> Google Groups "scrapy-users" group.
>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>>>> topic/scrapy-users/BSmdIyfxiC4/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>> [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "scrapy-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/scrapy-users/BSmdIyfxiC4/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Scrapy shell returns empty list!?

Reply via email to