Hi,

I am running Nutch 1.0. I can run few rounds of generating, fetching
and db updating and create few segments of total size ~6GB, however
during segment merging I see hadoop creating 10x more data (~60GB)
than size of segments being merged, and it just keeps growing. It
takes longer than the fetching rounds and it seems endless - I was not
patient enough to let it finish.

Is it normal for segment merging to take so much disk space and time?
Will the merging process actually ever finish? It seems unreasonable,
and it is definitely not feasible. It means that Nutch users would
need to keep at least 90% of disk space free just to do segment
merging.

I would appreciate if someone could point me to a solution.

Thanks!

Reply via email to