Re: Scrapy shell returns empty list!?

Kais DAI Tue, 17 Mar 2015 04:53:06 -0700

I obtain the following output:
*https://www.linkedin.com/job/jobs-in-san-fransisco-ca/?page_num=1
<https://www.linkedin.com/job/jobs-in-san-fransisco-ca/?page_num=1> *


Regards,
K.
2015-03-17 12:42 GMT+01:00 Morad Edwar <[email protected]>:

> Please do it again but after step one run the following code :
>     print response.url
> And make give us the output.
>
> Morad Edwar,
> Software Developer | Bkam.com
> On Mar 17, 2015 1:13 PM, "Kais DAI" <[email protected]> wrote:
>
>> This is what I did:
>>
>>    1. I opened the command line in windows and run the follwing command: 
>> *scrapy
>>    shell 
>> https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1
>>    
>> <https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1>*
>>    2. Then, I run this command:
>>    
>> *sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1]/div[@class="content"]/span/a[@class="title"]/text()’).extract()
>>  * In
>>    this case, an empty list is returned *[] *Also, the same thing with
>>    this xpath selection:
>>    
>> *sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[1]/div/span/a').extract()*
>>
>> Did you obtained a result by following the same steps?
>> Thank you for your help.
>>
>> Regards,
>> K.
>>
>> 2015-03-17 11:34 GMT+01:00 Morad Edwar <[email protected]>:
>>
>>> I used 'scrapy shell' and your xpath worked fine!!
>>> and when i changed 'li[1]' to 'li' it scrapped all the jobs titles.
>>>
>>>
>>> On Monday, March 16, 2015 at 6:19:01 PM UTC+2, DataScience wrote:
>>>>
>>>> Actually, I've checked the "response.body" and it doesn't matches the
>>>> content that I have in the webpage.
>>>> I am really confused, what can I do in this case?
>>>>
>>>> Le lundi 16 mars 2015 17:15:14 UTC+1, Travis Leleu a écrit :
>>>>>
>>>>> It doesn't look to me like it's writing the HTML to the DOM with j.s.,
>>>>> as you noted.
>>>>>
>>>>> The big concern I have is that you are assuming the HTML content in
>>>>> your browser is the same as in your code.  How have you asserted this?
>>>>>
>>>>> On Mon, Mar 16, 2015 at 9:02 AM, DataScience <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thank you Travis for you quick feedback.
>>>>>>
>>>>>> I am testing scrapy on this specefic webpage and try to get the job
>>>>>> offers (and not profiles).
>>>>>> I read in some forums that it may be due to the website which is
>>>>>> using Javascript to build most of the page, so the elements I want
>>>>>> do not appear in the HTML source of the page. I've checked by disabling
>>>>>> Javascript and reloading the page, but the result has been displayed on 
>>>>>> the
>>>>>> page (I've also checked the network in firbug by filtering XHR and looked
>>>>>> into the POST...and nothing).
>>>>>>
>>>>>> Any help would be more than welcome.
>>>>>> Thank you.
>>>>>>
>>>>>>
>>>>>> Le lundi 16 mars 2015 16:26:41 UTC+1, Travis Leleu a écrit :
>>>>>>>
>>>>>>> Linkedin can be a tough site to scrape, as they generally don't want
>>>>>>> their data in other people's hands.  You will need to use a user-agent
>>>>>>> switcher (you don't mention what UA you are sending), and most likely
>>>>>>> require a proxy in addition.
>>>>>>>
>>>>>>> If you are looking to scrape the entirety of linkedin, it's > 30
>>>>>>> million profiles.  I've found it more economical to purchase a linkedin
>>>>>>> data dump from scrapinghub.com than to scrape it myself.
>>>>>>>
>>>>>>> On Mon, Mar 16, 2015 at 8:05 AM, DataScience <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Scrapy Guys,
>>>>>>>>
>>>>>>>> Scrapy returns me an empty list while using shell to pick a simple
>>>>>>>> "title" field from this web page: http://goo.gl/dBR8P4
>>>>>>>> I've used:
>>>>>>>>
>>>>>>>>    - sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1]
>>>>>>>>    /div[@class="content"]/span/a[@class="title"]/text()’).extract()
>>>>>>>>    - sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[
>>>>>>>>    1]/div/span/a').extract()
>>>>>>>>    - ...
>>>>>>>>
>>>>>>>> I verified the issue of the POST with XHR using firebug, and I
>>>>>>>> think there are no relationships with information generated using js 
>>>>>>>> code
>>>>>>>> (what do you think?).
>>>>>>>>
>>>>>>>> Can you please help me to figure out with this problem?
>>>>>>>> Thank you in Advance.
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> K.
>>>>>>>>
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "scrapy-users" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "scrapy-users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "scrapy-users" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/scrapy-users/BSmdIyfxiC4/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Scrapy shell returns empty list!?

Reply via email to