Ya, there are a large number of Url's that contain hello and the url's are different also. Need to just get the 1st Url which contains hello and ignore rest all.
Currently I am fetching the first url that contains hello and updating a boolean value to ignore other url's having hello but the problem is it is always parsing all the ignored url's also. Need a way in which I can update the rules (deny_urls) as soon as I get the required url containing 'hello'. Thanks, Sunny On Wednesday, June 4, 2014 5:48:13 PM UTC+1, Lhassan Baazzi wrote: > > Hi > > Is there many URLs that containt "hello" ? if not scrapy filter duplicate > request aka url. > > Cheers. > Le 4 juin 2014 14:45, "sunny arora" <sunnya...@gmail.com <javascript:>> a > écrit : > >> Hi All, >> >> Is it possible in scrapy to crawl a url which contains 'hello' only once >> and update the rules dynamically to exclude it and continue scraping rest >> of the urls and follow them ? >> >> Any suggestions/help is appreciated. >> >> Thanks, >> Sunny >> >> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to scrapy-users...@googlegroups.com <javascript:>. >> To post to this group, send email to scrapy...@googlegroups.com >> <javascript:>. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.