[Nutch-dev] Copy DB by the piece

Jakob Heidebrecht Tue, 28 Jun 2005 02:09:05 -0700

Hi,

I'm trying to copy the nutch database.


It seems to be enough to list all pages by MD5 
and get all Links of those pages.

I open up a reader of the db directory, make an new db directory and open a
writer for it.

When i copy all the database the hd space isnt't enough to merge the
tempfile for big databases, but it works for small db's.

I tried to do it by the piece, to close the writer after a number of pages
and reopen it agailn. 
It works for pages but now there aren't enough links in the new db.
The more pages and links I do in one round the more links I get in the new
db.

Can somebody help me with this.
Is there a posibility to avoid this?

Regards,
Jakob

-- 
Geschenkt: 3 Monate GMX ProMail gratis + 3 Ausgaben stern gratis
++ Jetzt anmelden & testen ++ http://www.gmx.net/de/go/promail ++


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Copy DB by the piece

Reply via email to