On Thursday, March 2, 2017 at 5:38:43 AM UTC+8, JoeJoe wrote:
>
>
> Hi guys. First and foremost, I am a total newbie to Scrapy and web 
> crawling in general. I have this crawler I am working on that gets to 
> extract static and dynamic data from youtube videos (such as 
> User/Up-loader's name, Date of video published, Number of views, likes and 
> dislikes etc.). The crawler is doing that perfectly but I am kind of stuck 
> on how to make it crawl continuously without breaking. As you might be 
> familiar with the youtube structure, when viewing a video, there is a list 
> of "related" videos onto the right side of the main window. 
>
>
> <https://lh3.googleusercontent.com/-0SkjBy57tEI/WLcHxTPoXcI/AAAAAAAADkc/ZuTP1eOZmoEP8FOInTVypk9dI0lE4GcpQCLcB/s1600/youtube_data4.png>
>
>
>
> From the image above, the highlighted line is the container am using with 
> the *id = "watch-related" *to capture the links of these videos so as to 
> extract the same data. From the results am getting, the crawler is 
> evidently not getting all the links on the current seeded url and it 
> finishes crawling after a while. The only way so far that I have tried and 
> succeeded in getting it to crawl recursively is by using the 
> *dont_filter=True 
> *option which starts to crawl the same page indefinitely after a while 
> and it is not what I need the crawler to do. Below is my very simple 
> crawler. Again, am not good at this so my apologies for my poor coding 
> skills. If Someone could show me a simple way to get the crawler scrape 
> recursively and skip the already extracted urls, I'll be forever grateful. 
> Thank you in advance.
>
>
>
<https://lh3.googleusercontent.com/-DR73pTJS3oA/WLfsWePhm-I/AAAAAAAADk8/q2FkDBcz45wXeDoP75L_oe6ybhZM1wW2wCLcB/s1600/scrapy_3.JPG>
  
  

>
> <https://lh3.googleusercontent.com/-Af-rJdiG15g/WLcMOMUYiDI/AAAAAAAADks/ZoU967U35MkJuMQNvm8bLOiuNbJZNIEQgCLcB/s1600/scrapy_2.JPG>
>
>
>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to