You already have that rule configured? Is it one of the first simple 
expressions you have? How many records are you processing each time, is it 
roughly the same for all segments? And are you running on Hadoop or pseudo or 
local?

 
 
-----Original message-----
> From:sidbatra <[email protected]>
> Sent: Mon 02-Jul-2012 22:44
> To: [email protected]
> Subject: RE: ParseSegment taking a long time to finish
> 
> I'll run more experiments on that segment. My regex-urlfilter.txt removes
> urls longer than 350 chars.
> 
> -^.{350,}$
> 
> Any recommendations for max URL char length? or any other hypothesis that I
> can test to confirm the problem?
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/ParseSegment-taking-a-long-time-to-finish-tp3758053p3992601.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 

Reply via email to