Re: Scrapy shell returns empty list!?

Morad Edwar Tue, 17 Mar 2015 06:19:27 -0700

it's the same problem try response.url and you will see that it's another 
link because of the special chars.


On Tuesday, March 17, 2015 at 2:54:21 PM UTC+2, DataScience wrote:
>
> Yes, I saw the difference. In this sense, I've changed the URL with the 
> one you suggested then with another one (
> https://www.linkedin.com/job/all-jobs/?sort=date)=> I obtained the same 
> output when I run *print response URL*, but still having an empty list as 
> a result of the* sel.xpath*.
> Please, find a screenshot expalining thre procedure I followed right here: 
> <http://fr.tinypic.com/view.php?pic=yi9dv&s=8>
> http://fr.tinypic.com/view.php?pic=yi9dv&s=8
>
> Regards, 
> K.
>
> 2015-03-17 12:59 GMT+01:00 Morad Edwar <[email protected] <javascript:>>:
>
>> do you see the difference??
>> scrapy shell didn't parse the full url because the special chars in the 
>> url, try the following :
>>
>> scrapy shell https://
>> www.linkedin.com/job/jobs-in-san-francisco-ca/\?page_num\=1\&trk\=jserp_pagination_1
>>  
>> <http://www.linkedin.com/job/jobs-in-san-francisco-ca/%5C?page_num%5C=1%5C&trk%5C=jserp_pagination_1>
>>
>>
>> On Tuesday, March 17, 2015 at 1:52:20 PM UTC+2, DataScience wrote:
>>>
>>> I obtain the following output:
>>> *https://www.linkedin.com/job/jobs-in-san-fransisco-ca/?page_num=1 
>>> <https://www.linkedin.com/job/jobs-in-san-fransisco-ca/?page_num=1> *
>>>
>>> Regards,
>>> K.
>>> 2015-03-17 12:42 GMT+01:00 Morad Edwar <[email protected]>:
>>>
>>>> Please do it again but after step one run the following code :
>>>>     print response.url
>>>> And make give us the output. 
>>>>
>>>> Morad Edwar,
>>>> Software Developer | Bkam.com
>>>> On Mar 17, 2015 1:13 PM, "Kais DAI" <[email protected]> wrote:
>>>>
>>>>> This is what I did:
>>>>>
>>>>>    1. I opened the command line in windows and run the follwing 
>>>>>    command: *scrapy 
>>>>>    shell 
>>>>> https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1
>>>>>  
>>>>>    
>>>>> <https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1>*
>>>>>    2. Then, I run this command: 
>>>>>    
>>>>> *sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1]/div[@class="content"]/span/a[@class="title"]/text()’).extract()
>>>>>  * In 
>>>>>    this case, an empty list is returned *[] *Also, the same thing 
>>>>>    with this xpath selection: 
>>>>>    
>>>>> *sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[1]/div/span/a').extract()*
>>>>>
>>>>> Did you obtained a result by following the same steps?
>>>>> Thank you for your help.
>>>>>
>>>>> Regards,
>>>>> K.
>>>>>
>>>>> 2015-03-17 11:34 GMT+01:00 Morad Edwar <[email protected]>:
>>>>>
>>>>>> I used 'scrapy shell' and your xpath worked fine!!
>>>>>> and when i changed 'li[1]' to 'li' it scrapped all the jobs titles.
>>>>>>
>>>>>>
>>>>>> On Monday, March 16, 2015 at 6:19:01 PM UTC+2, DataScience wrote:
>>>>>>>
>>>>>>> Actually, I've checked the "response.body" and it doesn't matches 
>>>>>>> the content that I have in the webpage.
>>>>>>> I am really confused, what can I do in this case?
>>>>>>>
>>>>>>> Le lundi 16 mars 2015 17:15:14 UTC+1, Travis Leleu a écrit :
>>>>>>>>
>>>>>>>> It doesn't look to me like it's writing the HTML to the DOM with 
>>>>>>>> j.s., as you noted.
>>>>>>>>
>>>>>>>> The big concern I have is that you are assuming the HTML content in 
>>>>>>>> your browser is the same as in your code.  How have you asserted this?
>>>>>>>>
>>>>>>>> On Mon, Mar 16, 2015 at 9:02 AM, DataScience <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thank you Travis for you quick feedback.
>>>>>>>>>
>>>>>>>>> I am testing scrapy on this specefic webpage and try to get the 
>>>>>>>>> job offers (and not profiles).
>>>>>>>>> I read in some forums that it may be due to the website which is 
>>>>>>>>> using Javascript to build most of the page, so the elements I 
>>>>>>>>> want do not appear in the HTML source of the page. I've checked by 
>>>>>>>>> disabling 
>>>>>>>>> Javascript and reloading the page, but the result has been displayed 
>>>>>>>>> on the 
>>>>>>>>> page (I've also checked the network in firbug by filtering XHR and 
>>>>>>>>> looked 
>>>>>>>>> into the POST...and nothing).
>>>>>>>>>
>>>>>>>>> Any help would be more than welcome.
>>>>>>>>> Thank you.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Le lundi 16 mars 2015 16:26:41 UTC+1, Travis Leleu a écrit :
>>>>>>>>>>
>>>>>>>>>> Linkedin can be a tough site to scrape, as they generally don't 
>>>>>>>>>> want their data in other people's hands.  You will need to use a 
>>>>>>>>>> user-agent 
>>>>>>>>>> switcher (you don't mention what UA you are sending), and most 
>>>>>>>>>> likely 
>>>>>>>>>> require a proxy in addition.
>>>>>>>>>>
>>>>>>>>>> If you are looking to scrape the entirety of linkedin, it's > 30 
>>>>>>>>>> million profiles.  I've found it more economical to purchase a 
>>>>>>>>>> linkedin 
>>>>>>>>>> data dump from scrapinghub.com than to scrape it myself.
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 16, 2015 at 8:05 AM, DataScience <[email protected]> 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Scrapy Guys,
>>>>>>>>>>>
>>>>>>>>>>> Scrapy returns me an empty list while using shell to pick a 
>>>>>>>>>>> simple "title" field from this web page: http://goo.gl/dBR8P4
>>>>>>>>>>> I've used:
>>>>>>>>>>>
>>>>>>>>>>>    - sel.xpath(‘//div[@id="results-
>>>>>>>>>>>    rail"]/ul[@class="jobs"]/li[1]/div[@class="content"]/span/a[
>>>>>>>>>>>    @class="title"]/text()’).extract()
>>>>>>>>>>>    - sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[
>>>>>>>>>>>    1]/div/span/a').extract()
>>>>>>>>>>>    - ...
>>>>>>>>>>>
>>>>>>>>>>> I verified the issue of the POST with XHR using firebug, and I 
>>>>>>>>>>> think there are no relationships with information generated using 
>>>>>>>>>>> js code 
>>>>>>>>>>> (what do you think?).
>>>>>>>>>>>
>>>>>>>>>>> Can you please help me to figure out with this problem?
>>>>>>>>>>> Thank you in Advance.
>>>>>>>>>>>
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> K.
>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>> Google Groups "scrapy-users" group.
>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>> it, send an email to [email protected].
>>>>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "scrapy-users" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to [email protected].
>>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>
>>>>>>>>
>>>>>>>>  -- 
>>>>>> You received this message because you are subscribed to a topic in 
>>>>>> the Google Groups "scrapy-users" group.
>>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>>>>> topic/scrapy-users/BSmdIyfxiC4/unsubscribe.
>>>>>> To unsubscribe from this group and all its topics, send an email to 
>>>>>> [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  
>>>  -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "scrapy-users" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/scrapy-users/BSmdIyfxiC4/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Scrapy shell returns empty list!?

Reply via email to