Re: fetch fails at reduce stage because can not sense heartbeat for 600 seconds

Andrzej Bialecki Thu, 19 Oct 2006 13:12:31 -0700

Mike Smith wrote:

Then I tried a local crawl using these URLS and put some logging at
RegexURLFilter.java:86, I could catch the Regex (-.*(/.+?)/.*?\1/.*?\1/)
takes more than 10 min. The problem is that java script parser parsessome
bogus links like this:
http://www.discountedboots.com/<SELECT%20%20NAME%3D%22EDIT_BROWSE%22<http://www.discountedboots.com/%3cSELECT%20%20NAME%3D%22EDIT_BROWSE%22>>
………



These links are very very long and they have lots of / in it. These links
are created from scripts like this:

Yes, JS parser needs to be fixed - it's been on my TODO list for a longtime now, but my todo list is very long nowadays ... if someone elsewants to give it a try I won't object ;)


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: fetch fails at reduce stage because can not sense heartbeat for 600 seconds

Reply via email to