On Wed, Dec 4, 2013 at 7:29 PM, Terry P. <[email protected]> wrote: > Hi Eric, > Thanks for your reply, I'm just now getting back to this as I had more of > these the past two days. No tserver failures or master halts. With previous > errors we were still experiencing network issues that were indeed taking > tabletservers down, but now that they fixed a bad line card in a switch > that had been rebooting itself (but not failing over), those issues are all > gone (finally, knock on wood). > > Now that I see them again in isolation with no other errors, in the main > tserver log these bloom-loader thread failures appear to happen out of the > blue with no other issues surrounding them. > > However, I just checked the debug log and see they are occurring right at > the time of a Major Compaction. E.g. from one of the tservers debug log: > > 2013-12-03 11:48:14,738 [tabletserver.Tablet] DEBUG: MajC initiate lock > 0.00 secs, wait 0.00 secs > 2013-12-03 11:48:14,739 [tabletserver.Tablet] DEBUG: Starting MajC 2;f;d > (NORMAL) [/t-0000aa9/C0000zmf.rf, <several more rfiles listed> ] --> > [/t-0000aa9/C0000zn4.rf_tmp > 2013-12-03 11:48:14,780 [file.BloomFilterLayer] ERROR: Thread > "bloom-loader-41" died File /accumulo/tables/2/t-0000aa9/C0000zmf.rf is > closed > > The rest of the stack looks like what I posted earlier. The very next > debug log message after the bloom loader exception is shows that the > Compaction completed successfully in 0.112 seconds. > > So it looks like the bloom loader is trying to open an rfile 41ms after a > compaction started, and the file was likely just compacted during that gap > between the calls. If that's the case, can this error be safely ignored? >
Its probably safe to ignore. Bloomfilters are loaded lazily by a background thread and its possible the file will be closed by the time the background thread gets around to loading it. However it should log a debug in this case, so I am curious why an ERROR is logged. Is there a stack trace associated with the message 'Thread "bloom-loader-41" ...' ? > > Thanks, > Terry > > > > On Mon, Nov 18, 2013 at 8:56 PM, Eric Newton <[email protected]>wrote: > >> This is an educated guess... >> >> When a process dies "gracefully" there's a shutdown hook that closes the >> FileSystem. That can result in messages like this. It's likely there's an >> error before this about a zookeeper session being lost, or a halt issued by >> the master. See if this tserver died shortly after this message. If so, >> ignore the message. >> >> -Eric >> >> >> >> On Fri, Nov 15, 2013 at 4:31 PM, Terry P. <[email protected]> wrote: >> >>> Greetings folks, >>> In my Accumulo 1.4.2 cluster I am seeing ERRORS about bloom loader >>> threads dying due to an rfile being closed. I can't copy/paste the error >>> as it's on an air-gapped system, but it starts with: >>> >>> ERROR Thread "bloom-loader-2147" died File >>> /accumulo/tables/2/t-0000aa4/F0000q3g.rf is closed >>> java.lang.IllegalStateException: File >>> /accumulo/tables/2/t-0000aa4/F0000q3g.rf is closed >>> at >>> org.apache.accumulo.core.file.blockfile.impl.CacheableBlockFile$Reader.getBCFile(CacheableBlockFile.java:244) >>> at >>> org.apache.accumulo.core.file.blockfile.impl.CacheableBlockFile$Reader.access$000(CacheableBlockFile.java:142) >>> (10 more java files ... ends with java.lang.Thread.run(UnknownSource) ) >>> >>> No real rhyme or reason as to when they occur; we are predominantly >>> ingest heavy with light reads by rowkey with ~10 entries per rowkey. I >>> don't really know if client programs are getting errors when these occur or >>> not. >>> >>> I didn't find any JIRAs related to these. Should I be concerned about >>> these? >>> >> >> >
