Hi guys. First and foremost, I am a total newbie to Scrapy and web crawling 
in general. I have this crawler I am working on that gets to extract static 
and dynamic data from youtube videos (such as User/Up-loader's name, Date 
of video published, Number of views, likes and dislikes etc.). The crawler 
is doing that perfectly but I am kind of stuck on how to make it crawl 
continuously without breaking. As you might be familiar with the youtube 
structure, when viewing a video, there is a list of "related" videos onto 
the right side of the main window. 

<https://lh3.googleusercontent.com/-0SkjBy57tEI/WLcHxTPoXcI/AAAAAAAADkc/ZuTP1eOZmoEP8FOInTVypk9dI0lE4GcpQCLcB/s1600/youtube_data4.png>



>From the image above, the highlighted line is the container am using with 
the *id = "watch-related" *to capture the links of these videos so as to 
extract the same data. From the results am getting, the crawler is 
evidently not getting all the links on the current seeded url and it 
finishes crawling after a while. The only way so far that I have tried and 
succeeded in getting it to crawl recursively is by using the *dont_filter=True 
*option which starts to crawl the same page indefinitely after a while and 
it is not what I need the crawler to do. Below is my very simple crawler. 
Again, am not good at this so my apologies for my poor coding skills. If 
Someone could show me a simple way to get the crawler scrape recursively 
and skip the already extracted urls, I'll be forever grateful. Thank you in 
advance.

<https://lh3.googleusercontent.com/-Lk1s6XEIb_Q/WLcMBAsjraI/AAAAAAAADkk/81SFWG4wEnwk-EdFhK8PUW4gLeUcVw0XACLcB/s1600/scrapy_1.JPG>


<https://lh3.googleusercontent.com/-Af-rJdiG15g/WLcMOMUYiDI/AAAAAAAADks/ZoU967U35MkJuMQNvm8bLOiuNbJZNIEQgCLcB/s1600/scrapy_2.JPG>


 

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to