On Thu, Dec 5, 2013 at 3:37 PM, Terry P. <[email protected]> wrote: > Hi Keith, > Here is the stack trace in the tserver DEBUG log for this most recent > exception. The exception section is the same as what's in the main tserver > log, but of course the MajC bits don't appear in the main log. This is > hand-typed, but I'm pretty sure it's right. > > > 2013-12-03 11:48:14,738 [tabletserver.Tablet] DEBUG: MajC initiate lock > 0.00 secs, wait 0.00 secs > 2013-12-03 11:48:14,739 [tabletserver.Tablet] DEBUG: Starting MajC 2;f;d > (NORMAL) [/t-0000aa9/C0000zmf.rf, <several more rfiles listed> ] --> > [/t-0000aa9/C0000zn4.rf_tmp > 2013-12-03 11:48:14,780 [file.BloomFilterLayer] ERROR: Thread > "bloom-loader-41" died File /accumulo/tables/2/t-0000aa9/C0000zmf.rf is > closed > java.lang.IllegalStateException: File > /accumulo/tables/2/t-0000aa9/C0000zmf.rf is closed > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:244) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$000(CachableBlockFile.java:142) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:211) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:307) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:357) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:142) > at > org.apache.accumulo.core.file.rfile.RFile$Reader.getMetaStore(Rfile.java:927) > at > org.apache.accumulo.core.file.BloomFilterLayer$BloomFilterLoader$1.run(BloomFilterLayer.java:210) > at > org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > 2013-12-03 11:48:14,900 [tabletserver.Compactor] DEBUG: Compaction 2;f;d > 280 read | 280 written | 2,500 entries/sec | 0.112 secs > 2013-12-03 11:48:14,924 [tabletserver.Tablet] DEBUG: MajC finish lock 0.00 > secs > > The biggest bummer here is just that it appears on the Monitor GUI as an > error, and we all know how Operators don't like "errors" on their screens > ;-) But if this is one that can be safely ignored, we'll just have to > write that up in a procedure somewhere. >
The code in BloomFilterLayer$BloomFilterLoader$1.run() logs IOExceptions at debug when the file is closed. Because this is an IllegalStateException its not being ignored. Would you like to open a bug for this? > > > > On Thu, Dec 5, 2013 at 9:50 AM, Keith Turner <[email protected]> wrote: > >> >> >> >> On Wed, Dec 4, 2013 at 7:29 PM, Terry P. <[email protected]> wrote: >> >>> Hi Eric, >>> Thanks for your reply, I'm just now getting back to this as I had more >>> of these the past two days. No tserver failures or master halts. With >>> previous errors we were still experiencing network issues that were indeed >>> taking tabletservers down, but now that they fixed a bad line card in a >>> switch that had been rebooting itself (but not failing over), those issues >>> are all gone (finally, knock on wood). >>> >>> Now that I see them again in isolation with no other errors, in the main >>> tserver log these bloom-loader thread failures appear to happen out of the >>> blue with no other issues surrounding them. >>> >>> However, I just checked the debug log and see they are occurring right >>> at the time of a Major Compaction. E.g. from one of the tservers debug log: >>> >>> 2013-12-03 11:48:14,738 [tabletserver.Tablet] DEBUG: MajC initiate lock >>> 0.00 secs, wait 0.00 secs >>> 2013-12-03 11:48:14,739 [tabletserver.Tablet] DEBUG: Starting MajC 2;f;d >>> (NORMAL) [/t-0000aa9/C0000zmf.rf, <several more rfiles listed> ] --> >>> [/t-0000aa9/C0000zn4.rf_tmp >>> 2013-12-03 11:48:14,780 [file.BloomFilterLayer] ERROR: Thread >>> "bloom-loader-41" died File /accumulo/tables/2/t-0000aa9/C0000zmf.rf is >>> closed >>> >>> The rest of the stack looks like what I posted earlier. The very next >>> debug log message after the bloom loader exception is shows that the >>> Compaction completed successfully in 0.112 seconds. >>> >>> So it looks like the bloom loader is trying to open an rfile 41ms after >>> a compaction started, and the file was likely just compacted during that >>> gap between the calls. If that's the case, can this error be safely ignored? >>> >> >> Its probably safe to ignore. Bloomfilters are loaded lazily by a >> background thread and its possible the file will be closed by the time the >> background thread gets around to loading it. However it should log a debug >> in this case, so I am curious why an ERROR is logged. Is there a stack >> trace associated with the message 'Thread "bloom-loader-41" ...' ? >> >> >>> >>> Thanks, >>> Terry >>> >>> >>> >>> On Mon, Nov 18, 2013 at 8:56 PM, Eric Newton <[email protected]>wrote: >>> >>>> This is an educated guess... >>>> >>>> When a process dies "gracefully" there's a shutdown hook that closes >>>> the FileSystem. That can result in messages like this. It's likely >>>> there's an error before this about a zookeeper session being lost, or a >>>> halt issued by the master. See if this tserver died shortly after this >>>> message. If so, ignore the message. >>>> >>>> -Eric >>>> >>>> >>>> >>>> On Fri, Nov 15, 2013 at 4:31 PM, Terry P. <[email protected]> wrote: >>>> >>>>> Greetings folks, >>>>> In my Accumulo 1.4.2 cluster I am seeing ERRORS about bloom loader >>>>> threads dying due to an rfile being closed. I can't copy/paste the error >>>>> as it's on an air-gapped system, but it starts with: >>>>> >>>>> ERROR Thread "bloom-loader-2147" died File >>>>> /accumulo/tables/2/t-0000aa4/F0000q3g.rf is closed >>>>> java.lang.IllegalStateException: File >>>>> /accumulo/tables/2/t-0000aa4/F0000q3g.rf is closed >>>>> at >>>>> org.apache.accumulo.core.file.blockfile.impl.CacheableBlockFile$Reader.getBCFile(CacheableBlockFile.java:244) >>>>> at >>>>> org.apache.accumulo.core.file.blockfile.impl.CacheableBlockFile$Reader.access$000(CacheableBlockFile.java:142) >>>>> (10 more java files ... ends with java.lang.Thread.run(UnknownSource) ) >>>>> >>>>> No real rhyme or reason as to when they occur; we are predominantly >>>>> ingest heavy with light reads by rowkey with ~10 entries per rowkey. I >>>>> don't really know if client programs are getting errors when these occur >>>>> or >>>>> not. >>>>> >>>>> I didn't find any JIRAs related to these. Should I be concerned about >>>>> these? >>>>> >>>> >>>> >>> >> >
