I tried the workaround, and it is working very well. The number of store
files for all regions is now sane (went from about 8000 total store files
to 1000), and scans are now much more efficient.

Thanks for all your help, Jean-Marc and Sergey!

--Tom


On Tue, Sep 24, 2013 at 2:11 PM, Jean-Marc Spaggiari <
[email protected]> wrote:

> Hi Tom,
>
> Thanks for this information and the offer. I think we have enought to start
> to look at this issue. I'm still trying to reproduce that locally. In the
> meantime, I sent a patch to fix the NullPointException your faced before.
>
> I will post back here if I'm able to reproduce. Have you tried Sergey's
> workarround?
>
> JM
>
>
> 2013/9/24 Tom Brown <[email protected]>
>
> > Yes, it is empty.
> >
> > 13/09/24 13:03:03 INFO hfile.CacheConfig: Allocating LruBlockCache with
> > maximum size 2.9g
> > 13/09/24 13:03:03 ERROR metrics.SchemaMetrics: Inconsistent
> configuration.
> > Previous configuration for using table name in metrics: true, new
> > configuration: false
> > 13/09/24 13:03:03 WARN metrics.SchemaConfigured: Could not determine
> table
> > and column family of the HFile path /fca0882dc7624342a8f4fce4b89420ff.
> > Expecting at least 5 path components.
> > 13/09/24 13:03:03 WARN snappy.LoadSnappy: Snappy native library is
> > available
> > 13/09/24 13:03:03 INFO util.NativeCodeLoader: Loaded the native-hadoop
> > library
> > 13/09/24 13:03:03 INFO snappy.LoadSnappy: Snappy native library loaded
> > 13/09/24 13:03:03 INFO compress.CodecPool: Got brand-new decompressor
> > Stats:
> > no data available for statistics
> > Scanned kv count -> 0
> >
> > If you want to examine the actual file, I would be happy to email it to
> you
> > directly.
> >
> > --Tom
> >
> >
> > On Tue, Sep 24, 2013 at 12:42 PM, Jean-Marc Spaggiari <
> > [email protected]> wrote:
> >
> > > Can you try with less parameters and see if you are able to get
> something
> > > from it? This exception is caused by the "printMeta", so if you remove
> -m
> > > it should be ok. However, printMeta was what I was looking for ;)
> > >
> > > getFirstKey for this file seems to return null. So it might simply be
> an
> > > empty file, not necessary a corrupted one.
> > >
> > >
> > > 2013/9/24 Tom Brown <[email protected]>
> > >
> > > > /usr/lib/hbase/bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -m -s
> > -v
> > > -f
> > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fca0882dc7624342a8f4fce4b89420ff
> > > > 13/09/24 12:33:40 INFO util.ChecksumType: Checksum using
> > > > org.apache.hadoop.util.PureJavaCrc32
> > > > Scanning ->
> > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fca0882dc7624342a8f4fce4b89420ff
> > > > 13/09/24 12:33:41 INFO hfile.CacheConfig: Allocating LruBlockCache
> with
> > > > maximum size 2.9g
> > > > 13/09/24 12:33:41 ERROR metrics.SchemaMetrics: Inconsistent
> > > configuration.
> > > > Previous configuration for using table name in metrics: true, new
> > > > configuration: false
> > > > 13/09/24 12:33:41 WARN snappy.LoadSnappy: Snappy native library is
> > > > available
> > > > 13/09/24 12:33:41 INFO util.NativeCodeLoader: Loaded the
> native-hadoop
> > > > library
> > > > 13/09/24 12:33:41 INFO snappy.LoadSnappy: Snappy native library
> loaded
> > > > 13/09/24 12:33:41 INFO compress.CodecPool: Got brand-new decompressor
> > > > Block index size as per heapsize: 336
> > > > Exception in thread "main" java.lang.NullPointerException
> > > >         at
> > > org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:716)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.AbstractHFileReader.toStringFirstKey(AbstractHFileReader.java:138)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.AbstractHFileReader.toString(AbstractHFileReader.java:149)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.printMeta(HFilePrettyPrinter.java:318)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:234)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:189)
> > > >         at
> org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:756)
> > > >
> > > >
> > > > Does this mean the problem might have been caused by a corrupted
> > file(s)?
> > > >
> > > > --Tom
> > > >
> > > >
> > > > On Tue, Sep 24, 2013 at 12:21 PM, Jean-Marc Spaggiari <
> > > > [email protected]> wrote:
> > > >
> > > > > One more Tom,
> > > > >
> > > > > When you will have been able capture de HFile locally, please run
> run
> > > the
> > > > > HFile class on it to see the number of keys (is it empty?) and the
> > > other
> > > > > specific information.
> > > > >
> > > > > bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -m -s -v -f
> > HFILENAME
> > > > >
> > > > > Thanks,
> > > > >
> > > > > JM
> > > > >
> > > > >
> > > > > 2013/9/24 Jean-Marc Spaggiari <[email protected]>
> > > > >
> > > > > > We get -1 because of this:
> > > > > >
> > > > > >       byte [] timerangeBytes = metadataMap.get(TIMERANGE_KEY);
> > > > > >       if (timerangeBytes != null) {
> > > > > >         this.reader.timeRangeTracker = new TimeRangeTracker();
> > > > > >         Writables.copyWritable(timerangeBytes,
> > > > > > this.reader.timeRangeTracker);
> > > > > >       }
> > > > > > this.reader.timeRangeTracker will return -1 for the
> > maximumTimestamp
> > > > > > value. So now, we need to figure if it's normal or not to have
> > > > > > TIMERANGE_KEY not null here.
> > > > > >
> > > > > > I have created the same table locally on 0,94.10 with the same
> > > > attributes
> > > > > > and I'm not facing this issue.
> > > > > >
> > > > > > We need to look at the related HFile, but files are rolled VERY
> > > > quickly,
> > > > > > so might be difficult to get one.
> > > > > >
> > > > > > Maybe something like
> > > > > > hadoop fs -get hdfs://
> > > > > >
> > > > >
> > > >
> > >
> >
> hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/*
> > > > > > .
> > > > > >
> > > > > > might help to get the file? Then we can start to look at it and
> see
> > > > what
> > > > > > exactly trigger this behaviour?
> > > > > >
> > > > > > JM
> > > > > >
> > > > > >
> > > > > > 2013/9/24 Sergey Shelukhin <[email protected]>
> > > > > >
> > > > > >> Yeah, I think c3580bdb62d64e42a9eeac50f1c582d2 store file is a
> > good
> > > > > >> example.
> > > > > >> Can you grep for c3580bdb62d64e42a9eeac50f1c582d2 and post the
> log
> > > > just
> > > > > to
> > > > > >> be sure? Thanks.
> > > > > >> It looks like an interaction of deleting expired files and
> > > > > >>           // Create the writer even if no kv(Empty store file is
> > > also
> > > > > ok),
> > > > > >>           // because we need record the max seq id for the store
> > > file,
> > > > > see
> > > > > >>           // HBASE-6059
> > > > > >> in compactor.
> > > > > >> The newly created file is immediately collected the same way and
> > > > > replaced
> > > > > >> by another file, which seems like not an intended behavior, even
> > > > though
> > > > > >> both pieces of code are technically correct (the empty file is
> > > > expired,
> > > > > >> and
> > > > > >> the new file is generally needed).
> > > > > >>
> > > > > >> I filed HBASE-9648
> > > > > >>
> > > > > >>
> > > > > >> On Tue, Sep 24, 2013 at 10:55 AM, Sergey Shelukhin
> > > > > >> <[email protected]>wrote:
> > > > > >>
> > > > > >> > To mitigate, you can change
> hbase.store.delete.expired.storefile
> > > to
> > > > > >> false
> > > > > >> > on one region server, or for entire table, and restart this
> RS.
> > > > > >> > This will trigger a different compaction, hopefully.
> > > > > >> > We'd need to find what the bug is. My gut feeling (which is
> > known
> > > to
> > > > > be
> > > > > >> > wrong often) is that it has to do with it selecting one file,
> > > > probably
> > > > > >> > invalid check somewhere, or interaction with the code that
> > ensures
> > > > > that
> > > > > >> at
> > > > > >> > least one file needs to be written to preserve metadata, it
> > might
> > > be
> > > > > >> just
> > > > > >> > cycling thru such files.
> > > > > >> >
> > > > > >> >
> > > > > >> > On Tue, Sep 24, 2013 at 10:20 AM, Jean-Marc Spaggiari <
> > > > > >> > [email protected]> wrote:
> > > > > >> >
> > > > > >> >> So. Looking at the code, this, for me, sound like a bug.
> > > > > >> >>
> > > > > >> >> I will try to reproduce it locally. Seems to be related to
> the
> > > > > >> combination
> > > > > >> >> of TTL + BLOOM.
> > > > > >> >>
> > > > > >> >> Creating a table for that right now, will keep you posted
> very
> > > > > shortly.
> > > > > >> >>
> > > > > >> >> JM
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> 2013/9/24 Tom Brown <[email protected]>
> > > > > >> >>
> > > > > >> >> > -rw-------   1 hadoop supergroup       2194 2013-09-21
> 14:32
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/014ead47a9484d67b55205be16802ff1
> > > > > >> >> > -rw-------   1 hadoop supergroup      31321 2013-09-24
> 05:49
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1305d625bd4a4be39a98ae4d91a66140
> > > > > >> >> > -rw-------   1 hadoop supergroup       1350 2013-09-24
> 10:31
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1352e0828f974f08b1f3d7a9dff04abd
> > > > > >> >> > -rw-------   1 hadoop supergroup       4194 2013-09-21
> 10:38
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/17a546064bd840619816809ae0fc4c49
> > > > > >> >> > -rw-------   1 hadoop supergroup       1061 2013-09-20
> 22:55
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1cb3df115da244288bd076968ab4ccf6
> > > > > >> >> > -rw-------   1 hadoop supergroup       1375 2013-08-24
> 10:17
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1e41a96c49fc4e5ab59392d26935978d
> > > > > >> >> > -rw-------   1 hadoop supergroup      96296 2013-08-26
> 15:48
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/22d72fd897e34424b5420a96483a571e
> > > > > >> >> > -rw-------   1 hadoop supergroup       1356 2013-08-26
> 15:23
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/25fee1ffadbe42549bd0b7b13d782b72
> > > > > >> >> > -rw-------   1 hadoop supergroup       6229 2013-09-21
> 11:14
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/26289c777ec14dc5b7021b4d6b1050c5
> > > > > >> >> > -rw-------   1 hadoop supergroup       1223 2013-09-21
> 02:42
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/2757d7ba9c8448d6a3d5d46bd4d59758
> > > > > >> >> > -rw-------   1 hadoop supergroup    5302248 2013-08-24
> 02:22
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/2ec40943787246ea983608dd6591db24
> > > > > >> >> > -rw-------   1 hadoop supergroup       1596 2013-08-24
> 03:37
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/3157fd1cabe4483aaa4d9a21f75e4d88
> > > > > >> >> > -rw-------   1 hadoop supergroup       1338 2013-09-22
> 04:25
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/36b0f80a4a7b492f97358b64d879a2df
> > > > > >> >> > -rw-------   1 hadoop supergroup       3264 2013-09-21
> 12:05
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/39e249fcb532400daed73aed6689ceeb
> > > > > >> >> > -rw-------   1 hadoop supergroup       4549 2013-09-21
> 08:56
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/3bc9e2a566ad460a9b0ed336b2fb5ed9
> > > > > >> >> > -rw-------   1 hadoop supergroup       1630 2013-09-22
> 03:22
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/48026d08aae748f08aad59e4eea903be
> > > > > >> >> > -rw-------   1 hadoop supergroup     105395 2013-09-20
> 21:12
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/53198825f085401cbbd4322faa0e3aae
> > > > > >> >> > -rw-------   1 hadoop supergroup       3859 2013-09-21
> 09:09
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/71c2f9b2a8ff4c049fcc5a9a22af5cfe
> > > > > >> >> > -rw-------   1 hadoop supergroup     311688 2013-09-20
> 21:12
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/97ff16d6da974c30835c6e0acc7c737a
> > > > > >> >> > -rw-------   1 hadoop supergroup       1897 2013-08-24
> 08:43
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/a172d7577641434d82abcce88a433213
> > > > > >> >> > -rw-------   1 hadoop supergroup       3380 2013-09-21
> 13:04
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/be678e5c60534c65a012a798fbc7e284
> > > > > >> >> > -rw-------   1 hadoop supergroup      43710 2013-09-22
> 02:15
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/e2508a23acf1491f9d38b9a8594e41e8
> > > > > >> >> > -rw-------   1 hadoop supergroup       5409 2013-09-21
> 10:10
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/f432846182714b93a1c3df0f5835c09b
> > > > > >> >> > -rw-------   1 hadoop supergroup        491 2013-09-24
> 11:18
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/f7d8669cf7a047b98c1d3b13c16cfaec
> > > > > >> >> > -rw-------   1 hadoop supergroup        491 2013-09-24
> 11:18
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fa1b8f6cc9584eb28365dcd8f10d3f0a
> > > > > >> >> > -rw-------   1 hadoop supergroup        491 2013-09-13
> 11:28
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fca0882dc7624342a8f4fce4b89420ff
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> > On Tue, Sep 24, 2013 at 11:14 AM, Jean-Marc Spaggiari <
> > > > > >> >> > [email protected]> wrote:
> > > > > >> >> >
> > > > > >> >> > > TTL seems to be fine.
> > > > > >> >> > >
> > > > > >> >> > > -1 is the default value for
> > > TimeRangeTracker.maximumTimestamp.
> > > > > >> >> > >
> > > > > >> >> > > Can you run:
> > > > > >> >> > > hadoop fs -lsr hdfs://
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/
> > > > > >> >> > >
> > > > > >> >> > > Thanks,
> > > > > >> >> > >
> > > > > >> >> > > JM
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > > 2013/9/24 Tom Brown <[email protected]>
> > > > > >> >> > >
> > > > > >> >> > > > 1. Hadoop version is 1.1.2.
> > > > > >> >> > > > 2. All servers are synched with NTP.
> > > > > >> >> > > > 3. Table definition is: 'compound0', {
> > > > > >> >> > > > NAME => 'd',
> > > > > >> >> > > > DATA_BLOCK_ENCODING => 'NONE',
> > > > > >> >> > > > BLOOMFILTER => 'ROW',
> > > > > >> >> > > > REPLICATION_SCOPE => '0',
> > > > > >> >> > > > VERSIONS => '1',
> > > > > >> >> > > > COMPRESSION => 'SNAPPY',
> > > > > >> >> > > > MIN_VERSIONS => '0',
> > > > > >> >> > > > TTL => '8640000',
> > > > > >> >> > > > KEEP_DELETED_CELLS => 'false',
> > > > > >> >> > > > BLOCKSIZE => '65536',
> > > > > >> >> > > > IN_MEMORY => 'false',
> > > > > >> >> > > > ENCODE_ON_DISK => 'true',
> > > > > >> >> > > > BLOCKCACHE => 'true'
> > > > > >> >> > > > }
> > > > > >> >> > > >
> > > > > >> >> > > > The TTL is supposed to be 100 days.
> > > > > >> >> > > >
> > > > > >> >> > > > --Tom
> > > > > >> >> > > >
> > > > > >> >> > > >
> > > > > >> >> > > > On Tue, Sep 24, 2013 at 10:53 AM, Jean-Marc Spaggiari <
> > > > > >> >> > > > [email protected]> wrote:
> > > > > >> >> > > >
> > > > > >> >> > > > > Another important information why might be the root
> > cause
> > > > of
> > > > > >> this
> > > > > >> >> > > > issue...
> > > > > >> >> > > > >
> > > > > >> >> > > > > Do you have any TTL defines for this table?
> > > > > >> >> > > > >
> > > > > >> >> > > > > JM
> > > > > >> >> > > > >
> > > > > >> >> > > > >
> > > > > >> >> > > > > 2013/9/24 Jean-Marc Spaggiari <
> [email protected]
> > >
> > > > > >> >> > > > >
> > > > > >> >> > > > > > Strange.
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > Few questions then.
> > > > > >> >> > > > > > 1) What is your hadoop version?
> > > > > >> >> > > > > > 2) Is the clock on all your severs synched with
> NTP?
> > > > > >> >> > > > > > 3) What is you table definition? Bloom filters,
> etc.?
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > This is the reason why it keep compacting:
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > 2013-09-24 10:04:00,548 INFO
> > > > > >> >> > > > >
> > > > > >>
> org.apache.hadoop.hbase.regionserver.compactions.CompactSelection:
> > > > > >> >> > > > Deleting
> > > > > >> >> > > > > the expired store file by compaction: hdfs://
> > > > > >> >> > > > >
> > > > > >> >> > > >
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/7426f128469242ec8ee09f3965fd5a1awhosemaxTimeStampis-1whilethemaxexpiredtimestampis1371398640548
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > maxTimeStamp = -1
> > > > > >> >> > > > > >
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > Each time there is a comparison between
> maxTimeStamp
> > > for
> > > > > this
> > > > > >> >> store
> > > > > >> >> > > > file
> > > > > >> >> > > > > > and the configured maxExpiredTimeStamp and since
> > > > > maxTimeStamp
> > > > > >> >> > returns
> > > > > >> >> > > > -1,
> > > > > >> >> > > > > > it's always elected for a compaction. Now, we need
> to
> > > > find
> > > > > >> >> why...
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > JM
> > > > > >> >> > > > > >
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > 2013/9/24 Tom Brown <[email protected]>
> > > > > >> >> > > > > >
> > > > > >> >> > > > > >> My cluster is fully distributed (2 regionserver
> > > nodes).
> > > > > >> >> > > > > >>
> > > > > >> >> > > > > >> Here is a snippet of log entries that may explain
> > why
> > > it
> > > > > >> >> started:
> > > > > >> >> > > > > >> http://pastebin.com/wQECif8k. I had to go back 2
> > days
> > > > to
> > > > > >> find
> > > > > >> >> > when
> > > > > >> >> > > it
> > > > > >> >> > > > > >> started for this region.
> > > > > >> >> > > > > >>
> > > > > >> >> > > > > >> This is not the only region experiencing this
> issue
> > > (but
> > > > > >> this
> > > > > >> >> is
> > > > > >> >> > the
> > > > > >> >> > > > > >> smallest one it's happened to).
> > > > > >> >> > > > > >>
> > > > > >> >> > > > > >> --Tom
> > > > > >> >> > > > > >>
> > > > > >> >> > > > > >>
> > > > > >> >> > > > > >> On Tue, Sep 24, 2013 at 10:13 AM, Jean-Marc
> > Spaggiari
> > > <
> > > > > >> >> > > > > >> [email protected]> wrote:
> > > > > >> >> > > > > >>
> > > > > >> >> > > > > >> > Can you past logs a bit before that? To see if
> > > > anything
> > > > > >> >> > triggered
> > > > > >> >> > > > the
> > > > > >> >> > > > > >> > compaction?
> > > > > >> >> > > > > >> > Before the 1M compactions entries.
> > > > > >> >> > > > > >> >
> > > > > >> >> > > > > >> > Also, what is your setup? Are you running in
> > > > Standalone?
> > > > > >> >> > > > Pseudo-Dist?
> > > > > >> >> > > > > >> > Fully-Dist?
> > > > > >> >> > > > > >> >
> > > > > >> >> > > > > >> > Thanks,
> > > > > >> >> > > > > >> >
> > > > > >> >> > > > > >> > JM
> > > > > >> >> > > > > >> >
> > > > > >> >> > > > > >> >
> > > > > >> >> > > > > >> > 2013/9/24 Tom Brown <[email protected]>
> > > > > >> >> > > > > >> >
> > > > > >> >> > > > > >> > > There is one column family, d. Each row has
> > about
> > > 10
> > > > > >> >> columns,
> > > > > >> >> > > and
> > > > > >> >> > > > > each
> > > > > >> >> > > > > >> > > row's total data size is less than 2K.
> > > > > >> >> > > > > >> > >
> > > > > >> >> > > > > >> > > Here is a small snippet of logs from the
> region
> > > > > server:
> > > > > >> >> > > > > >> > > http://pastebin.com/S2jE4ZAx
> > > > > >> >> > > > > >> > >
> > > > > >> >> > > > > >> > > --Tom
> > > > > >> >> > > > > >> > >
> > > > > >> >> > > > > >> > >
> > > > > >> >> > > > > >> > > On Tue, Sep 24, 2013 at 9:59 AM, Bharath
> > > > Vissapragada
> > > > > <
> > > > > >> >> > > > > >> > > [email protected]
> > > > > >> >> > > > > >> > > > wrote:
> > > > > >> >> > > > > >> > >
> > > > > >> >> > > > > >> > > > It would help if you can show your RS log
> (via
> > > > > >> >> pastebin?) .
> > > > > >> >> > > Are
> > > > > >> >> > > > > >> there
> > > > > >> >> > > > > >> > > > frequent flushes for this region too?
> > > > > >> >> > > > > >> > > >
> > > > > >> >> > > > > >> > > >
> > > > > >> >> > > > > >> > > > On Tue, Sep 24, 2013 at 9:20 PM, Tom Brown <
> > > > > >> >> > > > [email protected]>
> > > > > >> >> > > > > >> > wrote:
> > > > > >> >> > > > > >> > > >
> > > > > >> >> > > > > >> > > > > I have a region that is very small, only
> > 5MB.
> > > > > >> Despite
> > > > > >> >> it's
> > > > > >> >> > > > size,
> > > > > >> >> > > > > >> it
> > > > > >> >> > > > > >> > has
> > > > > >> >> > > > > >> > > > 24
> > > > > >> >> > > > > >> > > > > store files. The logs show that it's
> > > compacting
> > > > > >> (over
> > > > > >> >> and
> > > > > >> >> > > over
> > > > > >> >> > > > > >> > again).
> > > > > >> >> > > > > >> > > > >
> > > > > >> >> > > > > >> > > > > The odd thing is that even though there
> are
> > 24
> > > > > store
> > > > > >> >> > files,
> > > > > >> >> > > it
> > > > > >> >> > > > > >> only
> > > > > >> >> > > > > >> > > does
> > > > > >> >> > > > > >> > > > > one at a time. Even more strange is that
> my
> > > logs
> > > > > are
> > > > > >> >> > filling
> > > > > >> >> > > > up
> > > > > >> >> > > > > >> with
> > > > > >> >> > > > > >> > > > > compacting this one region. In the last 10
> > > > hours,
> > > > > >> there
> > > > > >> >> > have
> > > > > >> >> > > > > been
> > > > > >> >> > > > > >> > > > 1,876,200
> > > > > >> >> > > > > >> > > > > log entries corresponding to compacting
> this
> > > > > region
> > > > > >> >> alone.
> > > > > >> >> > > > > >> > > > >
> > > > > >> >> > > > > >> > > > > My cluster is 0.94.10, and using almost
> all
> > > > > default
> > > > > >> >> > > settings.
> > > > > >> >> > > > > >> Only a
> > > > > >> >> > > > > >> > > few
> > > > > >> >> > > > > >> > > > > are not default:
> > > > > >> >> > > > > >> > > > > hbase.hregion.max.filesize = 4294967296
> > > > > >> >> > > > > >> > > > > hbase.hstore.compaction.min = 6
> > > > > >> >> > > > > >> > > > >
> > > > > >> >> > > > > >> > > > > I am at a total loss as to why this
> behavior
> > > is
> > > > > >> >> occurring.
> > > > > >> >> > > Any
> > > > > >> >> > > > > >> help
> > > > > >> >> > > > > >> > is
> > > > > >> >> > > > > >> > > > > appreciated.
> > > > > >> >> > > > > >> > > > >
> > > > > >> >> > > > > >> > > > > --Tom
> > > > > >> >> > > > > >> > > > >
> > > > > >> >> > > > > >> > > >
> > > > > >> >> > > > > >> > > >
> > > > > >> >> > > > > >> > > >
> > > > > >> >> > > > > >> > > > --
> > > > > >> >> > > > > >> > > > Bharath Vissapragada
> > > > > >> >> > > > > >> > > > <http://www.cloudera.com>
> > > > > >> >> > > > > >> > > >
> > > > > >> >> > > > > >> > >
> > > > > >> >> > > > > >> >
> > > > > >> >> > > > > >>
> > > > > >> >> > > > > >
> > > > > >> >> > > > > >
> > > > > >> >> > > > >
> > > > > >> >> > > >
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > > >> --
> > > > > >> CONFIDENTIALITY NOTICE
> > > > > >> NOTICE: This message is intended for the use of the individual
> or
> > > > entity
> > > > > >> to
> > > > > >> which it is addressed and may contain information that is
> > > > confidential,
> > > > > >> privileged and exempt from disclosure under applicable law. If
> the
> > > > > reader
> > > > > >> of this message is not the intended recipient, you are hereby
> > > notified
> > > > > >> that
> > > > > >> any printing, copying, dissemination, distribution, disclosure
> or
> > > > > >> forwarding of this communication is strictly prohibited. If you
> > have
> > > > > >> received this communication in error, please contact the sender
> > > > > >> immediately
> > > > > >> and delete it from your system. Thank You.
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to