Hello!


Using Scrapy, I have been trying to scrape footwear data from a website. I 
want to scrape only the sizes that are available.

But while scraping, the spider scrapes all the sizes of the products.

Could someone help me with this? I have mentioned the details to my problem 
below.


For example: 

These are the sizes displayed on the website:


<https://lh3.googleusercontent.com/-4yOJMcwWLgE/V3Zjj8tbFhI/AAAAAAAAADA/QXVzEo9pcq0-brDF6sS1iZDhtqUcJd8mACLcB/s1600/sizes.png>
 







The sizes that are unavailable are non-navigable and have a strikethrough 
on them.

 

After scraping, I get values of all the sizes.


<https://lh3.googleusercontent.com/-_Z8fwXWbT30/V3Zj5GE-5OI/AAAAAAAAADE/q3o6AokEJ5AOIChH2cuOR7pv5brQoE9_gCLcB/s1600/size1.png>




 

Instead, I want ONLY the sizes that are *available* (i.e. Sizes 7,9,10)


The result should look like this:

<https://lh3.googleusercontent.com/-556L7yD8Fvw/V3ZkNFU9GzI/AAAAAAAAADI/s2bKMHoYiNs4-EymfaGLnXgVVstafVq4wCLcB/s1600/size2.png>
 




In the Elements(Developer tools) tab, the unavailable sizes have li class 
value *"disabled"* and have *data-quantity="0"*. Can this be used to solve 
the problem?

 

   
 <li class="first popover-options disabled"><a href="#" 
style="border-color:rgb(247,247,247)" data-trigger="hover" 
class="btn-popover swatch-item" data-placement="top" data-price="3295" 
data-special-price="2142" data-simple-sku="SOME_VALUE_1" data-discount="" 
data-quantity="0" data-low-inventory="0" data-original-title="" title="" 
data-content="<span class=&quot;popover-close hidden-xs&quot;></span><p>Euro 
Size 42</p>"><span>8</span></a><div class="content"><span class="popover-close 
hidden-xs"></span><p>Euro Size 42</p></div></li>

 

Note: The "disabled" value hasn't been put for the available sizes.

<li class="first popover-options "><a 
data-gaq-event="PDP~$~Size~$~BU024SH53NBYINDFAS-4705979|JWG0623821d28edd03d4d463319b7da981d3afae34117998753178d3952dc051bb06|7"
 
href="#" style="border-color:rgb(247,247,247)" data-trigger="hover" 
class="btn-popover swatch-item " data-placement="top" data-price="3295" 
data-special-price="2142" data-simple-sku="SOME_VALUE_2" data-discount="35" 
data-quantity="1" data-low-inventory="1" data-original-title="" title="" 
data-content="<span class=&quot;popover-close hidden-xs&quot;></span><p>Euro 
Size 41</p>"><span>7</span></a><div class="content"><span class="popover-close 
hidden-xs"></span><p>Euro Size 41</p></div></li>


 

Also, the xpath of the unavailable and available sizes have no difference 
apart from their index numbers.


*Available product size*

//*[@id="size-block"]/div[1]/ul/li[2]

*Unavaiable product size*

//*[@id="size-block"]/div[1]/ul/li[3]


This is the sample of the code which I have used.              
 item['SizeA'] = sel.xpath(
'//*[@id="size-block"]/div[1]/ul/li[1]/a/span/text()').extract()

 item['SizeB'] = sel.xpath(
'//*[@id="size-block"]/div[1]/ul/li[2]/a/span/text()').extract()

 item['SizeC'] = sel.xpath(
'//*[@id="size-block"]/div[1]/ul/li[3]/a/span/text()').extract()


 

PS. I am new to web scraping. 


Thanks in advance,

Mrun

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to