Hi,

When running Fetcher sometimes it dies with the following exception:

050517 212854 SEVERE error writing output:java.io.IOException: key out of order: 3420 after 3420
java.io.IOException: key out of order: 3420 after 3420
at org.apache.nutch.io.MapFile$Writer.checkKey(MapFile.java:128)
at org.apache.nutch.io.MapFile$Writer.append(MapFile.java:114)
at org.apache.nutch.io.ArrayFile$Writer.append(ArrayFile.java:39)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:275)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:240)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:149)



This points to the following section in Fetcher:

    private void outputPage(FetcherOutput fo, Content content,
                            ParseText text, ParseData parseData) {
      try {
        synchronized (fetcherWriter) {
          fetcherWriter.append(fo);        // line 274
          contentWriter.append(content);   // <------ line 275
          if (Fetcher.this.parsing) {
            parseTextWriter.append(text);
            parseDataWriter.append(parseData);
          }
        }

That exception is quite a mystery - because the ArrayFile.Writer.append() method is itself synchronized (and it increments the key inside the synchronized section), so this should never occur... Any thoughts?

However, I think there is another potential problem here: since the critical section quoted above is synchronized only on the fetcherWriter, then it is possible for other threads to enter this section as soon as the current thread is done executing line 274. And this happens too soon - changes to all writers should be made as an atomic operation. So, I propose to synchronize the whole method.

--
Best regards,
Andrzej Bialecki
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to