Re: Scrapy shell returns empty list!?

Morad Edwar Tue, 17 Mar 2015 04:59:39 -0700

do you see the difference??
scrapy shell didn't parse the full url because the special chars in the 
url, try the following :


scrapy shell https:
//www.linkedin.com/job/jobs-in-san-francisco-ca/\?page_num\=1\&trk\=jserp_pagination_1


On Tuesday, March 17, 2015 at 1:52:20 PM UTC+2, DataScience wrote:
>
> I obtain the following output:
> *https://www.linkedin.com/job/jobs-in-san-fransisco-ca/?page_num=1 
> <https://www.linkedin.com/job/jobs-in-san-fransisco-ca/?page_num=1> *
>
> Regards,
> K.
> 2015-03-17 12:42 GMT+01:00 Morad Edwar <[email protected] <javascript:>>:
>
>> Please do it again but after step one run the following code :
>>     print response.url
>> And make give us the output. 
>>
>> Morad Edwar,
>> Software Developer | Bkam.com
>> On Mar 17, 2015 1:13 PM, "Kais DAI" <[email protected] <javascript:>> 
>> wrote:
>>
>>> This is what I did:
>>>
>>>    1. I opened the command line in windows and run the follwing 
>>>    command: *scrapy 
>>>    shell 
>>> https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1
>>>  
>>>    
>>> <https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1>*
>>>    2. Then, I run this command: 
>>>    
>>> *sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1]/div[@class="content"]/span/a[@class="title"]/text()’).extract()
>>>  * In 
>>>    this case, an empty list is returned *[] *Also, the same thing with 
>>>    this xpath selection: 
>>>    
>>> *sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[1]/div/span/a').extract()*
>>>
>>> Did you obtained a result by following the same steps?
>>> Thank you for your help.
>>>
>>> Regards,
>>> K.
>>>
>>> 2015-03-17 11:34 GMT+01:00 Morad Edwar <[email protected] <javascript:>>:
>>>
>>>> I used 'scrapy shell' and your xpath worked fine!!
>>>> and when i changed 'li[1]' to 'li' it scrapped all the jobs titles.
>>>>
>>>>
>>>> On Monday, March 16, 2015 at 6:19:01 PM UTC+2, DataScience wrote:
>>>>>
>>>>> Actually, I've checked the "response.body" and it doesn't matches the 
>>>>> content that I have in the webpage.
>>>>> I am really confused, what can I do in this case?
>>>>>
>>>>> Le lundi 16 mars 2015 17:15:14 UTC+1, Travis Leleu a écrit :
>>>>>>
>>>>>> It doesn't look to me like it's writing the HTML to the DOM with 
>>>>>> j.s., as you noted.
>>>>>>
>>>>>> The big concern I have is that you are assuming the HTML content in 
>>>>>> your browser is the same as in your code.  How have you asserted this?
>>>>>>
>>>>>> On Mon, Mar 16, 2015 at 9:02 AM, DataScience <[email protected]> 
>>>>>> wrote:
>>>>>>
>>>>>>> Thank you Travis for you quick feedback.
>>>>>>>
>>>>>>> I am testing scrapy on this specefic webpage and try to get the job 
>>>>>>> offers (and not profiles).
>>>>>>> I read in some forums that it may be due to the website which is 
>>>>>>> using Javascript to build most of the page, so the elements I want 
>>>>>>> do not appear in the HTML source of the page. I've checked by disabling 
>>>>>>> Javascript and reloading the page, but the result has been displayed on 
>>>>>>> the 
>>>>>>> page (I've also checked the network in firbug by filtering XHR and 
>>>>>>> looked 
>>>>>>> into the POST...and nothing).
>>>>>>>
>>>>>>> Any help would be more than welcome.
>>>>>>> Thank you.
>>>>>>>
>>>>>>>
>>>>>>> Le lundi 16 mars 2015 16:26:41 UTC+1, Travis Leleu a écrit :
>>>>>>>>
>>>>>>>> Linkedin can be a tough site to scrape, as they generally don't 
>>>>>>>> want their data in other people's hands.  You will need to use a 
>>>>>>>> user-agent 
>>>>>>>> switcher (you don't mention what UA you are sending), and most likely 
>>>>>>>> require a proxy in addition.
>>>>>>>>
>>>>>>>> If you are looking to scrape the entirety of linkedin, it's > 30 
>>>>>>>> million profiles.  I've found it more economical to purchase a 
>>>>>>>> linkedin 
>>>>>>>> data dump from scrapinghub.com than to scrape it myself.
>>>>>>>>
>>>>>>>> On Mon, Mar 16, 2015 at 8:05 AM, DataScience <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Scrapy Guys,
>>>>>>>>>
>>>>>>>>> Scrapy returns me an empty list while using shell to pick a simple 
>>>>>>>>> "title" field from this web page: http://goo.gl/dBR8P4
>>>>>>>>> I've used:
>>>>>>>>>
>>>>>>>>>    - sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1]
>>>>>>>>>    /div[@class="content"]/span/a[@class="title"]/text()’).extra
>>>>>>>>>    ct()
>>>>>>>>>    - sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[
>>>>>>>>>    1]/div/span/a').extract()
>>>>>>>>>    - ...
>>>>>>>>>
>>>>>>>>> I verified the issue of the POST with XHR using firebug, and I 
>>>>>>>>> think there are no relationships with information generated using js 
>>>>>>>>> code 
>>>>>>>>> (what do you think?).
>>>>>>>>>
>>>>>>>>> Can you please help me to figure out with this problem?
>>>>>>>>> Thank you in Advance.
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>> K.
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "scrapy-users" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to [email protected].
>>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>
>>>>>>>>
>>>>>>>>  -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "scrapy-users" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to [email protected].
>>>>>>> To post to this group, send email to [email protected].
>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>  -- 
>>>> You received this message because you are subscribed to a topic in the 
>>>> Google Groups "scrapy-users" group.
>>>> To unsubscribe from this topic, visit 
>>>> https://groups.google.com/d/topic/scrapy-users/BSmdIyfxiC4/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to 
>>>> [email protected] <javascript:>.
>>>> To post to this group, send email to [email protected] 
>>>> <javascript:>.
>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Scrapy shell returns empty list!?

Reply via email to