To mitigate, you can change hbase.store.delete.expired.storefile to false
on one region server, or for entire table, and restart this RS.
This will trigger a different compaction, hopefully.
We'd need to find what the bug is. My gut feeling (which is known to be
wrong often) is that it has to do with it selecting one file, probably
invalid check somewhere, or interaction with the code that ensures that at
least one file needs to be written to preserve metadata, it might be just
cycling thru such files.


On Tue, Sep 24, 2013 at 10:20 AM, Jean-Marc Spaggiari <
[email protected]> wrote:

> So. Looking at the code, this, for me, sound like a bug.
>
> I will try to reproduce it locally. Seems to be related to the combination
> of TTL + BLOOM.
>
> Creating a table for that right now, will keep you posted very shortly.
>
> JM
>
>
> 2013/9/24 Tom Brown <[email protected]>
>
> > -rw-------   1 hadoop supergroup       2194 2013-09-21 14:32
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/014ead47a9484d67b55205be16802ff1
> > -rw-------   1 hadoop supergroup      31321 2013-09-24 05:49
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1305d625bd4a4be39a98ae4d91a66140
> > -rw-------   1 hadoop supergroup       1350 2013-09-24 10:31
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1352e0828f974f08b1f3d7a9dff04abd
> > -rw-------   1 hadoop supergroup       4194 2013-09-21 10:38
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/17a546064bd840619816809ae0fc4c49
> > -rw-------   1 hadoop supergroup       1061 2013-09-20 22:55
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1cb3df115da244288bd076968ab4ccf6
> > -rw-------   1 hadoop supergroup       1375 2013-08-24 10:17
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1e41a96c49fc4e5ab59392d26935978d
> > -rw-------   1 hadoop supergroup      96296 2013-08-26 15:48
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/22d72fd897e34424b5420a96483a571e
> > -rw-------   1 hadoop supergroup       1356 2013-08-26 15:23
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/25fee1ffadbe42549bd0b7b13d782b72
> > -rw-------   1 hadoop supergroup       6229 2013-09-21 11:14
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/26289c777ec14dc5b7021b4d6b1050c5
> > -rw-------   1 hadoop supergroup       1223 2013-09-21 02:42
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/2757d7ba9c8448d6a3d5d46bd4d59758
> > -rw-------   1 hadoop supergroup    5302248 2013-08-24 02:22
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/2ec40943787246ea983608dd6591db24
> > -rw-------   1 hadoop supergroup       1596 2013-08-24 03:37
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/3157fd1cabe4483aaa4d9a21f75e4d88
> > -rw-------   1 hadoop supergroup       1338 2013-09-22 04:25
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/36b0f80a4a7b492f97358b64d879a2df
> > -rw-------   1 hadoop supergroup       3264 2013-09-21 12:05
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/39e249fcb532400daed73aed6689ceeb
> > -rw-------   1 hadoop supergroup       4549 2013-09-21 08:56
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/3bc9e2a566ad460a9b0ed336b2fb5ed9
> > -rw-------   1 hadoop supergroup       1630 2013-09-22 03:22
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/48026d08aae748f08aad59e4eea903be
> > -rw-------   1 hadoop supergroup     105395 2013-09-20 21:12
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/53198825f085401cbbd4322faa0e3aae
> > -rw-------   1 hadoop supergroup       3859 2013-09-21 09:09
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/71c2f9b2a8ff4c049fcc5a9a22af5cfe
> > -rw-------   1 hadoop supergroup     311688 2013-09-20 21:12
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/97ff16d6da974c30835c6e0acc7c737a
> > -rw-------   1 hadoop supergroup       1897 2013-08-24 08:43
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/a172d7577641434d82abcce88a433213
> > -rw-------   1 hadoop supergroup       3380 2013-09-21 13:04
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/be678e5c60534c65a012a798fbc7e284
> > -rw-------   1 hadoop supergroup      43710 2013-09-22 02:15
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/e2508a23acf1491f9d38b9a8594e41e8
> > -rw-------   1 hadoop supergroup       5409 2013-09-21 10:10
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/f432846182714b93a1c3df0f5835c09b
> > -rw-------   1 hadoop supergroup        491 2013-09-24 11:18
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/f7d8669cf7a047b98c1d3b13c16cfaec
> > -rw-------   1 hadoop supergroup        491 2013-09-24 11:18
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fa1b8f6cc9584eb28365dcd8f10d3f0a
> > -rw-------   1 hadoop supergroup        491 2013-09-13 11:28
> >
> >
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fca0882dc7624342a8f4fce4b89420ff
> >
> >
> >
> > On Tue, Sep 24, 2013 at 11:14 AM, Jean-Marc Spaggiari <
> > [email protected]> wrote:
> >
> > > TTL seems to be fine.
> > >
> > > -1 is the default value for TimeRangeTracker.maximumTimestamp.
> > >
> > > Can you run:
> > > hadoop fs -lsr hdfs://
> > >
> > >
> >
> hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/
> > >
> > > Thanks,
> > >
> > > JM
> > >
> > >
> > > 2013/9/24 Tom Brown <[email protected]>
> > >
> > > > 1. Hadoop version is 1.1.2.
> > > > 2. All servers are synched with NTP.
> > > > 3. Table definition is: 'compound0', {
> > > > NAME => 'd',
> > > > DATA_BLOCK_ENCODING => 'NONE',
> > > > BLOOMFILTER => 'ROW',
> > > > REPLICATION_SCOPE => '0',
> > > > VERSIONS => '1',
> > > > COMPRESSION => 'SNAPPY',
> > > > MIN_VERSIONS => '0',
> > > > TTL => '8640000',
> > > > KEEP_DELETED_CELLS => 'false',
> > > > BLOCKSIZE => '65536',
> > > > IN_MEMORY => 'false',
> > > > ENCODE_ON_DISK => 'true',
> > > > BLOCKCACHE => 'true'
> > > > }
> > > >
> > > > The TTL is supposed to be 100 days.
> > > >
> > > > --Tom
> > > >
> > > >
> > > > On Tue, Sep 24, 2013 at 10:53 AM, Jean-Marc Spaggiari <
> > > > [email protected]> wrote:
> > > >
> > > > > Another important information why might be the root cause of this
> > > > issue...
> > > > >
> > > > > Do you have any TTL defines for this table?
> > > > >
> > > > > JM
> > > > >
> > > > >
> > > > > 2013/9/24 Jean-Marc Spaggiari <[email protected]>
> > > > >
> > > > > > Strange.
> > > > > >
> > > > > > Few questions then.
> > > > > > 1) What is your hadoop version?
> > > > > > 2) Is the clock on all your severs synched with NTP?
> > > > > > 3) What is you table definition? Bloom filters, etc.?
> > > > > >
> > > > > > This is the reason why it keep compacting:
> > > > > >
> > > > > > 2013-09-24 10:04:00,548 INFO
> > > > > org.apache.hadoop.hbase.regionserver.compactions.CompactSelection:
> > > > Deleting
> > > > > the expired store file by compaction: hdfs://
> > > > >
> > > >
> > >
> >
> hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/7426f128469242ec8ee09f3965fd5a1awhosemaxTimeStampis-1while
>  the max expired timestamp is 1371398640548
> > > > > >
> > > > > > maxTimeStamp = -1
> > > > > >
> > > > > >
> > > > > > Each time there is a comparison between maxTimeStamp for this
> store
> > > > file
> > > > > > and the configured maxExpiredTimeStamp and since maxTimeStamp
> > returns
> > > > -1,
> > > > > > it's always elected for a compaction. Now, we need to find why...
> > > > > >
> > > > > > JM
> > > > > >
> > > > > >
> > > > > > 2013/9/24 Tom Brown <[email protected]>
> > > > > >
> > > > > >> My cluster is fully distributed (2 regionserver nodes).
> > > > > >>
> > > > > >> Here is a snippet of log entries that may explain why it
> started:
> > > > > >> http://pastebin.com/wQECif8k. I had to go back 2 days to find
> > when
> > > it
> > > > > >> started for this region.
> > > > > >>
> > > > > >> This is not the only region experiencing this issue (but this is
> > the
> > > > > >> smallest one it's happened to).
> > > > > >>
> > > > > >> --Tom
> > > > > >>
> > > > > >>
> > > > > >> On Tue, Sep 24, 2013 at 10:13 AM, Jean-Marc Spaggiari <
> > > > > >> [email protected]> wrote:
> > > > > >>
> > > > > >> > Can you past logs a bit before that? To see if anything
> > triggered
> > > > the
> > > > > >> > compaction?
> > > > > >> > Before the 1M compactions entries.
> > > > > >> >
> > > > > >> > Also, what is your setup? Are you running in Standalone?
> > > > Pseudo-Dist?
> > > > > >> > Fully-Dist?
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> >
> > > > > >> > JM
> > > > > >> >
> > > > > >> >
> > > > > >> > 2013/9/24 Tom Brown <[email protected]>
> > > > > >> >
> > > > > >> > > There is one column family, d. Each row has about 10
> columns,
> > > and
> > > > > each
> > > > > >> > > row's total data size is less than 2K.
> > > > > >> > >
> > > > > >> > > Here is a small snippet of logs from the region server:
> > > > > >> > > http://pastebin.com/S2jE4ZAx
> > > > > >> > >
> > > > > >> > > --Tom
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Tue, Sep 24, 2013 at 9:59 AM, Bharath Vissapragada <
> > > > > >> > > [email protected]
> > > > > >> > > > wrote:
> > > > > >> > >
> > > > > >> > > > It would help if you can show your RS log (via pastebin?)
> .
> > > Are
> > > > > >> there
> > > > > >> > > > frequent flushes for this region too?
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Tue, Sep 24, 2013 at 9:20 PM, Tom Brown <
> > > > [email protected]>
> > > > > >> > wrote:
> > > > > >> > > >
> > > > > >> > > > > I have a region that is very small, only 5MB. Despite
> it's
> > > > size,
> > > > > >> it
> > > > > >> > has
> > > > > >> > > > 24
> > > > > >> > > > > store files. The logs show that it's compacting (over
> and
> > > over
> > > > > >> > again).
> > > > > >> > > > >
> > > > > >> > > > > The odd thing is that even though there are 24 store
> > files,
> > > it
> > > > > >> only
> > > > > >> > > does
> > > > > >> > > > > one at a time. Even more strange is that my logs are
> > filling
> > > > up
> > > > > >> with
> > > > > >> > > > > compacting this one region. In the last 10 hours, there
> > have
> > > > > been
> > > > > >> > > > 1,876,200
> > > > > >> > > > > log entries corresponding to compacting this region
> alone.
> > > > > >> > > > >
> > > > > >> > > > > My cluster is 0.94.10, and using almost all default
> > > settings.
> > > > > >> Only a
> > > > > >> > > few
> > > > > >> > > > > are not default:
> > > > > >> > > > > hbase.hregion.max.filesize = 4294967296
> > > > > >> > > > > hbase.hstore.compaction.min = 6
> > > > > >> > > > >
> > > > > >> > > > > I am at a total loss as to why this behavior is
> occurring.
> > > Any
> > > > > >> help
> > > > > >> > is
> > > > > >> > > > > appreciated.
> > > > > >> > > > >
> > > > > >> > > > > --Tom
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > --
> > > > > >> > > > Bharath Vissapragada
> > > > > >> > > > <http://www.cloudera.com>
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to