Hi,
When running Fetcher sometimes it dies with the following exception:
050517 212854 SEVERE error writing output:java.io.IOException: key out of order: 3420 after 3420
java.io.IOException: key out of order: 3420 after 3420
at org.apache.nutch.io.MapFile$Writer.checkKey(MapFile.java:128)
at org.apache.nutch.io.MapFile$Writer.append(MapFile.java:114)
at org.apache.nutch.io.ArrayFile$Writer.append(ArrayFile.java:39)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:275)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:240)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:149)
This points to the following section in Fetcher:
private void outputPage(FetcherOutput fo, Content content,
ParseText text, ParseData parseData) {
try {
synchronized (fetcherWriter) {
fetcherWriter.append(fo); // line 274
contentWriter.append(content); // <------ line 275
if (Fetcher.this.parsing) {
parseTextWriter.append(text);
parseDataWriter.append(parseData);
}
}That exception is quite a mystery - because the ArrayFile.Writer.append() method is itself synchronized (and it increments the key inside the synchronized section), so this should never occur... Any thoughts?
However, I think there is another potential problem here: since the critical section quoted above is synchronized only on the fetcherWriter, then it is possible for other threads to enter this section as soon as the current thread is done executing line 274. And this happens too soon - changes to all writers should be made as an atomic operation. So, I propose to synchronize the whole method.
-- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
------------------------------------------------------- This SF.Net email is sponsored by Oracle Space Sweepstakes Want to be the first software developer in space? Enter now for the Oracle Space Sweepstakes! http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
