I'm trying to use the SegmentWriter class to append data to an existing segment, but I can't seem to construct an instance of it with an existing segment directory. I tried setting the "force" argument
Segments are not meant to be appended to. Once created and closed, they are immutable. You can however create new segments by copying the data from the old segments, and appending new data while the segment is not yet closed.
to true, but the constructor still bombs out when it hits the MapFile.Writer constructors for writing to the data files in the fetcher/, content/, etc. directories. I checked the source code for MapFile.Writer, where I found the following code:
File dir = new File(dirName); if (nfs.exists(dir)) { throw new IOException("already exists: " + dir); } nfs.mkdirs(dir);
Thus, MapFile.Writer can NEVER write to an existing directory. The SequenceFile.Writer instances created in the MapFile.Writer constructor throw the same exception in a couple more places. Is
The meaning of the "force" flag in SegmentWriter constructors is that if it's true, then the previously existing segment data will be DELETED first. Apparently, this does not happen, so the current behaviour must be fixed. However, this was never supposed to mean that you could append to an already existing segment.
You may be interested to look at the SegmentSlicer tool for rearranging segment data.
there any way to work around this without rewriting all these writer classes? If not, then the "force" option is effectively useless, and a segment can never be modified after it is created.
As I said, segments - once created - are immutable, so it's not possible to fix SegmentWriter to do that. However, the behaviour of SegmentWriter that you described, related to the original meaning of the "force" flag, should be fixed anyway...
PS. Having said the above, this is just a computer program, so of course if you hack your way around, there is always a way to append new records to the segment data... ;-) But the current API doesn't allow this, because there is no use case for this.
-- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
