Daniel Russo wrote:
I'm trying to use the SegmentWriter class to append data to an
existing segment, but I can't seem to construct an instance of it with
an existing segment directory.  I tried setting the "force" argument

Segments are not meant to be appended to. Once created and closed, they are immutable. You can however create new segments by copying the data from the old segments, and appending new data while the segment is not yet closed.


to true, but the constructor still bombs out when it hits the
MapFile.Writer constructors for writing to the data files in the
fetcher/, content/, etc. directories.  I checked the source code for
MapFile.Writer, where I found the following code:

      File dir = new File(dirName);
      if (nfs.exists(dir)) {
          throw new IOException("already exists: " + dir);
      }
      nfs.mkdirs(dir);

Thus, MapFile.Writer can NEVER write to an existing directory.  The
SequenceFile.Writer instances created in the MapFile.Writer
constructor throw the same exception in a couple more places.  Is

The meaning of the "force" flag in SegmentWriter constructors is that if it's true, then the previously existing segment data will be DELETED first. Apparently, this does not happen, so the current behaviour must be fixed. However, this was never supposed to mean that you could append to an already existing segment.


You may be interested to look at the SegmentSlicer tool for rearranging segment data.

there any way to work around this without rewriting all these writer
classes?  If not, then the "force" option is effectively useless, and
a segment can never be modified after it is created.

As I said, segments - once created - are immutable, so it's not possible to fix SegmentWriter to do that. However, the behaviour of SegmentWriter that you described, related to the original meaning of the "force" flag, should be fixed anyway...


PS. Having said the above, this is just a computer program, so of course if you hack your way around, there is always a way to append new records to the segment data... ;-) But the current API doesn't allow this, because there is no use case for this.

--
Best regards,
Andrzej Bialecki
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to