I am running into some problems.
I have 8 segments all with approximately 250K (~2 million) URLS. I am
trying to merge that into one.
But takes forever, it had been running for about 3 days before I
stopped it. It also has used 904 GB in the /tmp directory.
The machine that it is running on is a Dual Intel Quad core 2.8 GHz,
with 24 GB of RAM. The CPU stays at about 20% utilization.
Any ideas? I went through the nutch configs and didn't see anything
that seemed like it would add more memory, workers, etc to this task.
Any help would be greatly appreciated.
Thank you,
-John
John Martyniak
President/CEO
Before Dawn Solutions, Inc.
9457 S. University Blvd #266
Highlands Ranch, CO 80126
o: 877-499-1562
c: 303-522-1756
e: j...@beforedawnsoutions.com
w: http://www.beforedawnsolutions.com