Thank you Ye for updating us with your findings. It is best to use the latest version of Nutch since there are updates and fixes for each version
On Sun, Mar 17, 2013 at 3:48 AM, ytthet <[email protected]> wrote: > Hi Folks, > > I found out where the issue was. Just thought it might be useful for > others. > > The performance issue I was facing in parse was due to the regular > expression URL filter and funny URL. "regex-URLfilter" plugin. One of the > regular expression was taking long... very long to process for some funny > URL. > > Removing the content "-.*(/[^/]+)/[^/]+\1/[^/]+\1/" from > regex-urlfilter.txt > in the conf saved tons of time on parsing. > > Following thread discussed the similar matter. > http://lucene.472066.n3.nabble.com/Reduce-Error-during-fetch-td609736.html > https://issues.apache.org/jira/browse/NUTCH-233 > > Cheers, > > Ye > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Parse-benchmark-performance-tp4045827p4048185.html > Sent from the Nutch - User mailing list archive at Nabble.com. > -- Kiran Chitturi <http://www.linkedin.com/in/kiranchitturi>

