Hi Folks, I found out where the issue was. Just thought it might be useful for others.
The performance issue I was facing in parse was due to the regular expression URL filter and funny URL. "regex-URLfilter" plugin. One of the regular expression was taking long... very long to process for some funny URL. Removing the content "-.*(/[^/]+)/[^/]+\1/[^/]+\1/" from regex-urlfilter.txt in the conf saved tons of time on parsing. Following thread discussed the similar matter. http://lucene.472066.n3.nabble.com/Reduce-Error-during-fetch-td609736.html https://issues.apache.org/jira/browse/NUTCH-233 Cheers, Ye -- View this message in context: http://lucene.472066.n3.nabble.com/Parse-benchmark-performance-tp4045827p4048185.html Sent from the Nutch - User mailing list archive at Nabble.com.

