Hi, I am running Nutch 1.0. I can run few rounds of generating, fetching and db updating and create few segments of total size ~6GB, however during segment merging I see hadoop creating 10x more data (~60GB) than size of segments being merged, and it just keeps growing. It takes longer than the fetching rounds and it seems endless - I was not patient enough to let it finish.
Is it normal for segment merging to take so much disk space and time? Will the merging process actually ever finish? It seems unreasonable, and it is definitely not feasible. It means that Nutch users would need to keep at least 90% of disk space free just to do segment merging. I would appreciate if someone could point me to a solution. Thanks!

