Hi, Could you also try the parsechecker tool on that last url? It's possible.that the file has a.problem or simply a bug.
Remi On Sunday, February 19, 2012, Magnús Skúlason <[email protected]> wrote: > Hi, > > According to my logs a really long time +2 hours elapses between > parsing the last page in a segment until the ParseSegment finishes as > can be seen here: > > 2012-02-19 00:51:43,471 INFO parse.ParseSegment - Parsing: http:// .... > 2012-02-19 03:15:18,604 INFO parse.ParseSegment - ParseSegment: > finished at 2012-02-19 03:15:18, elapsed: 02:57:24 > > Since the total time of the parse job is just around 3 hours, this > represents a huge portion of the overall time > > Is it normal that the last step in the job takes such a long time and > is there anything I can do to speed it up? I have been running the > generator with -topN 20000 I wouldn't have expected that to be a big > enough value to cause a problem. I have now reconfigured my script to > skip the -topN parameter to see what happens. > > best regards, > Magnus >

