Meanwhile you can mitigate as specified above, by temporarily disabling expired file deletion. Please report if it doesn't work...
On Tue, Sep 24, 2013 at 4:08 PM, Jean-Marc Spaggiari < [email protected]> wrote: > Hi Tom, > > Thanks for reporting this and for providing all this information. > > I have attached a patch on the JIRA that Sergey's opened. > > This will need to be reviewed and we will need a commiter to push it if > it's accepted. > > JM > > > 2013/9/24 Tom Brown <[email protected]> > > > I tried the workaround, and it is working very well. The number of store > > files for all regions is now sane (went from about 8000 total store files > > to 1000), and scans are now much more efficient. > > > > Thanks for all your help, Jean-Marc and Sergey! > > > > --Tom > > > > > > On Tue, Sep 24, 2013 at 2:11 PM, Jean-Marc Spaggiari < > > [email protected]> wrote: > > > > > Hi Tom, > > > > > > Thanks for this information and the offer. I think we have enought to > > start > > > to look at this issue. I'm still trying to reproduce that locally. In > the > > > meantime, I sent a patch to fix the NullPointException your faced > before. > > > > > > I will post back here if I'm able to reproduce. Have you tried Sergey's > > > workarround? > > > > > > JM > > > > > > > > > 2013/9/24 Tom Brown <[email protected]> > > > > > > > Yes, it is empty. > > > > > > > > 13/09/24 13:03:03 INFO hfile.CacheConfig: Allocating LruBlockCache > with > > > > maximum size 2.9g > > > > 13/09/24 13:03:03 ERROR metrics.SchemaMetrics: Inconsistent > > > configuration. > > > > Previous configuration for using table name in metrics: true, new > > > > configuration: false > > > > 13/09/24 13:03:03 WARN metrics.SchemaConfigured: Could not determine > > > table > > > > and column family of the HFile path > /fca0882dc7624342a8f4fce4b89420ff. > > > > Expecting at least 5 path components. > > > > 13/09/24 13:03:03 WARN snappy.LoadSnappy: Snappy native library is > > > > available > > > > 13/09/24 13:03:03 INFO util.NativeCodeLoader: Loaded the > native-hadoop > > > > library > > > > 13/09/24 13:03:03 INFO snappy.LoadSnappy: Snappy native library > loaded > > > > 13/09/24 13:03:03 INFO compress.CodecPool: Got brand-new decompressor > > > > Stats: > > > > no data available for statistics > > > > Scanned kv count -> 0 > > > > > > > > If you want to examine the actual file, I would be happy to email it > to > > > you > > > > directly. > > > > > > > > --Tom > > > > > > > > > > > > On Tue, Sep 24, 2013 at 12:42 PM, Jean-Marc Spaggiari < > > > > [email protected]> wrote: > > > > > > > > > Can you try with less parameters and see if you are able to get > > > something > > > > > from it? This exception is caused by the "printMeta", so if you > > remove > > > -m > > > > > it should be ok. However, printMeta was what I was looking for ;) > > > > > > > > > > getFirstKey for this file seems to return null. So it might simply > be > > > an > > > > > empty file, not necessary a corrupted one. > > > > > > > > > > > > > > > 2013/9/24 Tom Brown <[email protected]> > > > > > > > > > > > /usr/lib/hbase/bin/hbase org.apache.hadoop.hbase.io.hfile.HFile > -m > > -s > > > > -v > > > > > -f > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fca0882dc7624342a8f4fce4b89420ff > > > > > > 13/09/24 12:33:40 INFO util.ChecksumType: Checksum using > > > > > > org.apache.hadoop.util.PureJavaCrc32 > > > > > > Scanning -> > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fca0882dc7624342a8f4fce4b89420ff > > > > > > 13/09/24 12:33:41 INFO hfile.CacheConfig: Allocating > LruBlockCache > > > with > > > > > > maximum size 2.9g > > > > > > 13/09/24 12:33:41 ERROR metrics.SchemaMetrics: Inconsistent > > > > > configuration. > > > > > > Previous configuration for using table name in metrics: true, new > > > > > > configuration: false > > > > > > 13/09/24 12:33:41 WARN snappy.LoadSnappy: Snappy native library > is > > > > > > available > > > > > > 13/09/24 12:33:41 INFO util.NativeCodeLoader: Loaded the > > > native-hadoop > > > > > > library > > > > > > 13/09/24 12:33:41 INFO snappy.LoadSnappy: Snappy native library > > > loaded > > > > > > 13/09/24 12:33:41 INFO compress.CodecPool: Got brand-new > > decompressor > > > > > > Block index size as per heapsize: 336 > > > > > > Exception in thread "main" java.lang.NullPointerException > > > > > > at > > > > > org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:716) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.io.hfile.AbstractHFileReader.toStringFirstKey(AbstractHFileReader.java:138) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.io.hfile.AbstractHFileReader.toString(AbstractHFileReader.java:149) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.printMeta(HFilePrettyPrinter.java:318) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:234) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:189) > > > > > > at > > > org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:756) > > > > > > > > > > > > > > > > > > Does this mean the problem might have been caused by a corrupted > > > > file(s)? > > > > > > > > > > > > --Tom > > > > > > > > > > > > > > > > > > On Tue, Sep 24, 2013 at 12:21 PM, Jean-Marc Spaggiari < > > > > > > [email protected]> wrote: > > > > > > > > > > > > > One more Tom, > > > > > > > > > > > > > > When you will have been able capture de HFile locally, please > run > > > run > > > > > the > > > > > > > HFile class on it to see the number of keys (is it empty?) and > > the > > > > > other > > > > > > > specific information. > > > > > > > > > > > > > > bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -m -s -v -f > > > > HFILENAME > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > JM > > > > > > > > > > > > > > > > > > > > > 2013/9/24 Jean-Marc Spaggiari <[email protected]> > > > > > > > > > > > > > > > We get -1 because of this: > > > > > > > > > > > > > > > > byte [] timerangeBytes = > metadataMap.get(TIMERANGE_KEY); > > > > > > > > if (timerangeBytes != null) { > > > > > > > > this.reader.timeRangeTracker = new > TimeRangeTracker(); > > > > > > > > Writables.copyWritable(timerangeBytes, > > > > > > > > this.reader.timeRangeTracker); > > > > > > > > } > > > > > > > > this.reader.timeRangeTracker will return -1 for the > > > > maximumTimestamp > > > > > > > > value. So now, we need to figure if it's normal or not to > have > > > > > > > > TIMERANGE_KEY not null here. > > > > > > > > > > > > > > > > I have created the same table locally on 0,94.10 with the > same > > > > > > attributes > > > > > > > > and I'm not facing this issue. > > > > > > > > > > > > > > > > We need to look at the related HFile, but files are rolled > VERY > > > > > > quickly, > > > > > > > > so might be difficult to get one. > > > > > > > > > > > > > > > > Maybe something like > > > > > > > > hadoop fs -get hdfs:// > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/* > > > > > > > > . > > > > > > > > > > > > > > > > might help to get the file? Then we can start to look at it > and > > > see > > > > > > what > > > > > > > > exactly trigger this behaviour? > > > > > > > > > > > > > > > > JM > > > > > > > > > > > > > > > > > > > > > > > > 2013/9/24 Sergey Shelukhin <[email protected]> > > > > > > > > > > > > > > > >> Yeah, I think c3580bdb62d64e42a9eeac50f1c582d2 store file > is a > > > > good > > > > > > > >> example. > > > > > > > >> Can you grep for c3580bdb62d64e42a9eeac50f1c582d2 and post > the > > > log > > > > > > just > > > > > > > to > > > > > > > >> be sure? Thanks. > > > > > > > >> It looks like an interaction of deleting expired files and > > > > > > > >> // Create the writer even if no kv(Empty store > file > > is > > > > > also > > > > > > > ok), > > > > > > > >> // because we need record the max seq id for the > > store > > > > > file, > > > > > > > see > > > > > > > >> // HBASE-6059 > > > > > > > >> in compactor. > > > > > > > >> The newly created file is immediately collected the same way > > and > > > > > > > replaced > > > > > > > >> by another file, which seems like not an intended behavior, > > even > > > > > > though > > > > > > > >> both pieces of code are technically correct (the empty file > is > > > > > > expired, > > > > > > > >> and > > > > > > > >> the new file is generally needed). > > > > > > > >> > > > > > > > >> I filed HBASE-9648 > > > > > > > >> > > > > > > > >> > > > > > > > >> On Tue, Sep 24, 2013 at 10:55 AM, Sergey Shelukhin > > > > > > > >> <[email protected]>wrote: > > > > > > > >> > > > > > > > >> > To mitigate, you can change > > > hbase.store.delete.expired.storefile > > > > > to > > > > > > > >> false > > > > > > > >> > on one region server, or for entire table, and restart > this > > > RS. > > > > > > > >> > This will trigger a different compaction, hopefully. > > > > > > > >> > We'd need to find what the bug is. My gut feeling (which > is > > > > known > > > > > to > > > > > > > be > > > > > > > >> > wrong often) is that it has to do with it selecting one > > file, > > > > > > probably > > > > > > > >> > invalid check somewhere, or interaction with the code that > > > > ensures > > > > > > > that > > > > > > > >> at > > > > > > > >> > least one file needs to be written to preserve metadata, > it > > > > might > > > > > be > > > > > > > >> just > > > > > > > >> > cycling thru such files. > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > On Tue, Sep 24, 2013 at 10:20 AM, Jean-Marc Spaggiari < > > > > > > > >> > [email protected]> wrote: > > > > > > > >> > > > > > > > > >> >> So. Looking at the code, this, for me, sound like a bug. > > > > > > > >> >> > > > > > > > >> >> I will try to reproduce it locally. Seems to be related > to > > > the > > > > > > > >> combination > > > > > > > >> >> of TTL + BLOOM. > > > > > > > >> >> > > > > > > > >> >> Creating a table for that right now, will keep you posted > > > very > > > > > > > shortly. > > > > > > > >> >> > > > > > > > >> >> JM > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> 2013/9/24 Tom Brown <[email protected]> > > > > > > > >> >> > > > > > > > >> >> > -rw------- 1 hadoop supergroup 2194 2013-09-21 > > > 14:32 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/014ead47a9484d67b55205be16802ff1 > > > > > > > >> >> > -rw------- 1 hadoop supergroup 31321 2013-09-24 > > > 05:49 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1305d625bd4a4be39a98ae4d91a66140 > > > > > > > >> >> > -rw------- 1 hadoop supergroup 1350 2013-09-24 > > > 10:31 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1352e0828f974f08b1f3d7a9dff04abd > > > > > > > >> >> > -rw------- 1 hadoop supergroup 4194 2013-09-21 > > > 10:38 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/17a546064bd840619816809ae0fc4c49 > > > > > > > >> >> > -rw------- 1 hadoop supergroup 1061 2013-09-20 > > > 22:55 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1cb3df115da244288bd076968ab4ccf6 > > > > > > > >> >> > -rw------- 1 hadoop supergroup 1375 2013-08-24 > > > 10:17 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1e41a96c49fc4e5ab59392d26935978d > > > > > > > >> >> > -rw------- 1 hadoop supergroup 96296 2013-08-26 > > > 15:48 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/22d72fd897e34424b5420a96483a571e > > > > > > > >> >> > -rw------- 1 hadoop supergroup 1356 2013-08-26 > > > 15:23 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/25fee1ffadbe42549bd0b7b13d782b72 > > > > > > > >> >> > -rw------- 1 hadoop supergroup 6229 2013-09-21 > > > 11:14 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/26289c777ec14dc5b7021b4d6b1050c5 > > > > > > > >> >> > -rw------- 1 hadoop supergroup 1223 2013-09-21 > > > 02:42 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/2757d7ba9c8448d6a3d5d46bd4d59758 > > > > > > > >> >> > -rw------- 1 hadoop supergroup 5302248 2013-08-24 > > > 02:22 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/2ec40943787246ea983608dd6591db24 > > > > > > > >> >> > -rw------- 1 hadoop supergroup 1596 2013-08-24 > > > 03:37 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/3157fd1cabe4483aaa4d9a21f75e4d88 > > > > > > > >> >> > -rw------- 1 hadoop supergroup 1338 2013-09-22 > > > 04:25 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/36b0f80a4a7b492f97358b64d879a2df > > > > > > > >> >> > -rw------- 1 hadoop supergroup 3264 2013-09-21 > > > 12:05 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/39e249fcb532400daed73aed6689ceeb > > > > > > > >> >> > -rw------- 1 hadoop supergroup 4549 2013-09-21 > > > 08:56 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/3bc9e2a566ad460a9b0ed336b2fb5ed9 > > > > > > > >> >> > -rw------- 1 hadoop supergroup 1630 2013-09-22 > > > 03:22 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/48026d08aae748f08aad59e4eea903be > > > > > > > >> >> > -rw------- 1 hadoop supergroup 105395 2013-09-20 > > > 21:12 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/53198825f085401cbbd4322faa0e3aae > > > > > > > >> >> > -rw------- 1 hadoop supergroup 3859 2013-09-21 > > > 09:09 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/71c2f9b2a8ff4c049fcc5a9a22af5cfe > > > > > > > >> >> > -rw------- 1 hadoop supergroup 311688 2013-09-20 > > > 21:12 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/97ff16d6da974c30835c6e0acc7c737a > > > > > > > >> >> > -rw------- 1 hadoop supergroup 1897 2013-08-24 > > > 08:43 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/a172d7577641434d82abcce88a433213 > > > > > > > >> >> > -rw------- 1 hadoop supergroup 3380 2013-09-21 > > > 13:04 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/be678e5c60534c65a012a798fbc7e284 > > > > > > > >> >> > -rw------- 1 hadoop supergroup 43710 2013-09-22 > > > 02:15 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/e2508a23acf1491f9d38b9a8594e41e8 > > > > > > > >> >> > -rw------- 1 hadoop supergroup 5409 2013-09-21 > > > 10:10 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/f432846182714b93a1c3df0f5835c09b > > > > > > > >> >> > -rw------- 1 hadoop supergroup 491 2013-09-24 > > > 11:18 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/f7d8669cf7a047b98c1d3b13c16cfaec > > > > > > > >> >> > -rw------- 1 hadoop supergroup 491 2013-09-24 > > > 11:18 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fa1b8f6cc9584eb28365dcd8f10d3f0a > > > > > > > >> >> > -rw------- 1 hadoop supergroup 491 2013-09-13 > > > 11:28 > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fca0882dc7624342a8f4fce4b89420ff > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > On Tue, Sep 24, 2013 at 11:14 AM, Jean-Marc Spaggiari < > > > > > > > >> >> > [email protected]> wrote: > > > > > > > >> >> > > > > > > > > >> >> > > TTL seems to be fine. > > > > > > > >> >> > > > > > > > > > >> >> > > -1 is the default value for > > > > > TimeRangeTracker.maximumTimestamp. > > > > > > > >> >> > > > > > > > > > >> >> > > Can you run: > > > > > > > >> >> > > hadoop fs -lsr hdfs:// > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/ > > > > > > > >> >> > > > > > > > > > >> >> > > Thanks, > > > > > > > >> >> > > > > > > > > > >> >> > > JM > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > 2013/9/24 Tom Brown <[email protected]> > > > > > > > >> >> > > > > > > > > > >> >> > > > 1. Hadoop version is 1.1.2. > > > > > > > >> >> > > > 2. All servers are synched with NTP. > > > > > > > >> >> > > > 3. Table definition is: 'compound0', { > > > > > > > >> >> > > > NAME => 'd', > > > > > > > >> >> > > > DATA_BLOCK_ENCODING => 'NONE', > > > > > > > >> >> > > > BLOOMFILTER => 'ROW', > > > > > > > >> >> > > > REPLICATION_SCOPE => '0', > > > > > > > >> >> > > > VERSIONS => '1', > > > > > > > >> >> > > > COMPRESSION => 'SNAPPY', > > > > > > > >> >> > > > MIN_VERSIONS => '0', > > > > > > > >> >> > > > TTL => '8640000', > > > > > > > >> >> > > > KEEP_DELETED_CELLS => 'false', > > > > > > > >> >> > > > BLOCKSIZE => '65536', > > > > > > > >> >> > > > IN_MEMORY => 'false', > > > > > > > >> >> > > > ENCODE_ON_DISK => 'true', > > > > > > > >> >> > > > BLOCKCACHE => 'true' > > > > > > > >> >> > > > } > > > > > > > >> >> > > > > > > > > > > >> >> > > > The TTL is supposed to be 100 days. > > > > > > > >> >> > > > > > > > > > > >> >> > > > --Tom > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > On Tue, Sep 24, 2013 at 10:53 AM, Jean-Marc > > Spaggiari < > > > > > > > >> >> > > > [email protected]> wrote: > > > > > > > >> >> > > > > > > > > > > >> >> > > > > Another important information why might be the > root > > > > cause > > > > > > of > > > > > > > >> this > > > > > > > >> >> > > > issue... > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > Do you have any TTL defines for this table? > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > JM > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > 2013/9/24 Jean-Marc Spaggiari < > > > [email protected] > > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > Strange. > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > Few questions then. > > > > > > > >> >> > > > > > 1) What is your hadoop version? > > > > > > > >> >> > > > > > 2) Is the clock on all your severs synched with > > > NTP? > > > > > > > >> >> > > > > > 3) What is you table definition? Bloom filters, > > > etc.? > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > This is the reason why it keep compacting: > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > 2013-09-24 10:04:00,548 INFO > > > > > > > >> >> > > > > > > > > > > > >> > > > org.apache.hadoop.hbase.regionserver.compactions.CompactSelection: > > > > > > > >> >> > > > Deleting > > > > > > > >> >> > > > > the expired store file by compaction: hdfs:// > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/7426f128469242ec8ee09f3965fd5a1awhosemaxTimeStampis-1whilethemaxexpiredtimestampis1371398640548 > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > maxTimeStamp = -1 > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > Each time there is a comparison between > > > maxTimeStamp > > > > > for > > > > > > > this > > > > > > > >> >> store > > > > > > > >> >> > > > file > > > > > > > >> >> > > > > > and the configured maxExpiredTimeStamp and > since > > > > > > > maxTimeStamp > > > > > > > >> >> > returns > > > > > > > >> >> > > > -1, > > > > > > > >> >> > > > > > it's always elected for a compaction. Now, we > > need > > > to > > > > > > find > > > > > > > >> >> why... > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > JM > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > 2013/9/24 Tom Brown <[email protected]> > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > >> My cluster is fully distributed (2 > regionserver > > > > > nodes). > > > > > > > >> >> > > > > >> > > > > > > > >> >> > > > > >> Here is a snippet of log entries that may > > explain > > > > why > > > > > it > > > > > > > >> >> started: > > > > > > > >> >> > > > > >> http://pastebin.com/wQECif8k. I had to go > back > > 2 > > > > days > > > > > > to > > > > > > > >> find > > > > > > > >> >> > when > > > > > > > >> >> > > it > > > > > > > >> >> > > > > >> started for this region. > > > > > > > >> >> > > > > >> > > > > > > > >> >> > > > > >> This is not the only region experiencing this > > > issue > > > > > (but > > > > > > > >> this > > > > > > > >> >> is > > > > > > > >> >> > the > > > > > > > >> >> > > > > >> smallest one it's happened to). > > > > > > > >> >> > > > > >> > > > > > > > >> >> > > > > >> --Tom > > > > > > > >> >> > > > > >> > > > > > > > >> >> > > > > >> > > > > > > > >> >> > > > > >> On Tue, Sep 24, 2013 at 10:13 AM, Jean-Marc > > > > Spaggiari > > > > > < > > > > > > > >> >> > > > > >> [email protected]> wrote: > > > > > > > >> >> > > > > >> > > > > > > > >> >> > > > > >> > Can you past logs a bit before that? To see > if > > > > > > anything > > > > > > > >> >> > triggered > > > > > > > >> >> > > > the > > > > > > > >> >> > > > > >> > compaction? > > > > > > > >> >> > > > > >> > Before the 1M compactions entries. > > > > > > > >> >> > > > > >> > > > > > > > > >> >> > > > > >> > Also, what is your setup? Are you running in > > > > > > Standalone? > > > > > > > >> >> > > > Pseudo-Dist? > > > > > > > >> >> > > > > >> > Fully-Dist? > > > > > > > >> >> > > > > >> > > > > > > > > >> >> > > > > >> > Thanks, > > > > > > > >> >> > > > > >> > > > > > > > > >> >> > > > > >> > JM > > > > > > > >> >> > > > > >> > > > > > > > > >> >> > > > > >> > > > > > > > > >> >> > > > > >> > 2013/9/24 Tom Brown <[email protected]> > > > > > > > >> >> > > > > >> > > > > > > > > >> >> > > > > >> > > There is one column family, d. Each row > has > > > > about > > > > > 10 > > > > > > > >> >> columns, > > > > > > > >> >> > > and > > > > > > > >> >> > > > > each > > > > > > > >> >> > > > > >> > > row's total data size is less than 2K. > > > > > > > >> >> > > > > >> > > > > > > > > > >> >> > > > > >> > > Here is a small snippet of logs from the > > > region > > > > > > > server: > > > > > > > >> >> > > > > >> > > http://pastebin.com/S2jE4ZAx > > > > > > > >> >> > > > > >> > > > > > > > > > >> >> > > > > >> > > --Tom > > > > > > > >> >> > > > > >> > > > > > > > > > >> >> > > > > >> > > > > > > > > > >> >> > > > > >> > > On Tue, Sep 24, 2013 at 9:59 AM, Bharath > > > > > > Vissapragada > > > > > > > < > > > > > > > >> >> > > > > >> > > [email protected] > > > > > > > >> >> > > > > >> > > > wrote: > > > > > > > >> >> > > > > >> > > > > > > > > > >> >> > > > > >> > > > It would help if you can show your RS > log > > > (via > > > > > > > >> >> pastebin?) . > > > > > > > >> >> > > Are > > > > > > > >> >> > > > > >> there > > > > > > > >> >> > > > > >> > > > frequent flushes for this region too? > > > > > > > >> >> > > > > >> > > > > > > > > > > >> >> > > > > >> > > > > > > > > > > >> >> > > > > >> > > > On Tue, Sep 24, 2013 at 9:20 PM, Tom > > Brown < > > > > > > > >> >> > > > [email protected]> > > > > > > > >> >> > > > > >> > wrote: > > > > > > > >> >> > > > > >> > > > > > > > > > > >> >> > > > > >> > > > > I have a region that is very small, > only > > > > 5MB. > > > > > > > >> Despite > > > > > > > >> >> it's > > > > > > > >> >> > > > size, > > > > > > > >> >> > > > > >> it > > > > > > > >> >> > > > > >> > has > > > > > > > >> >> > > > > >> > > > 24 > > > > > > > >> >> > > > > >> > > > > store files. The logs show that it's > > > > > compacting > > > > > > > >> (over > > > > > > > >> >> and > > > > > > > >> >> > > over > > > > > > > >> >> > > > > >> > again). > > > > > > > >> >> > > > > >> > > > > > > > > > > > >> >> > > > > >> > > > > The odd thing is that even though > there > > > are > > > > 24 > > > > > > > store > > > > > > > >> >> > files, > > > > > > > >> >> > > it > > > > > > > >> >> > > > > >> only > > > > > > > >> >> > > > > >> > > does > > > > > > > >> >> > > > > >> > > > > one at a time. Even more strange is > that > > > my > > > > > logs > > > > > > > are > > > > > > > >> >> > filling > > > > > > > >> >> > > > up > > > > > > > >> >> > > > > >> with > > > > > > > >> >> > > > > >> > > > > compacting this one region. In the > last > > 10 > > > > > > hours, > > > > > > > >> there > > > > > > > >> >> > have > > > > > > > >> >> > > > > been > > > > > > > >> >> > > > > >> > > > 1,876,200 > > > > > > > >> >> > > > > >> > > > > log entries corresponding to > compacting > > > this > > > > > > > region > > > > > > > >> >> alone. > > > > > > > >> >> > > > > >> > > > > > > > > > > > >> >> > > > > >> > > > > My cluster is 0.94.10, and using > almost > > > all > > > > > > > default > > > > > > > >> >> > > settings. > > > > > > > >> >> > > > > >> Only a > > > > > > > >> >> > > > > >> > > few > > > > > > > >> >> > > > > >> > > > > are not default: > > > > > > > >> >> > > > > >> > > > > hbase.hregion.max.filesize = > 4294967296 > > > > > > > >> >> > > > > >> > > > > hbase.hstore.compaction.min = 6 > > > > > > > >> >> > > > > >> > > > > > > > > > > > >> >> > > > > >> > > > > I am at a total loss as to why this > > > behavior > > > > > is > > > > > > > >> >> occurring. > > > > > > > >> >> > > Any > > > > > > > >> >> > > > > >> help > > > > > > > >> >> > > > > >> > is > > > > > > > >> >> > > > > >> > > > > appreciated. > > > > > > > >> >> > > > > >> > > > > > > > > > > > >> >> > > > > >> > > > > --Tom > > > > > > > >> >> > > > > >> > > > > > > > > > > > >> >> > > > > >> > > > > > > > > > > >> >> > > > > >> > > > > > > > > > > >> >> > > > > >> > > > > > > > > > > >> >> > > > > >> > > > -- > > > > > > > >> >> > > > > >> > > > Bharath Vissapragada > > > > > > > >> >> > > > > >> > > > <http://www.cloudera.com> > > > > > > > >> >> > > > > >> > > > > > > > > > > >> >> > > > > >> > > > > > > > > > >> >> > > > > >> > > > > > > > > >> >> > > > > >> > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> -- > > > > > > > >> CONFIDENTIALITY NOTICE > > > > > > > >> NOTICE: This message is intended for the use of the > individual > > > or > > > > > > entity > > > > > > > >> to > > > > > > > >> which it is addressed and may contain information that is > > > > > > confidential, > > > > > > > >> privileged and exempt from disclosure under applicable law. > If > > > the > > > > > > > reader > > > > > > > >> of this message is not the intended recipient, you are > hereby > > > > > notified > > > > > > > >> that > > > > > > > >> any printing, copying, dissemination, distribution, > disclosure > > > or > > > > > > > >> forwarding of this communication is strictly prohibited. If > > you > > > > have > > > > > > > >> received this communication in error, please contact the > > sender > > > > > > > >> immediately > > > > > > > >> and delete it from your system. Thank You. > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
