Re: Why is this region compacting?

Jean-Marc Spaggiari Tue, 24 Sep 2013 11:12:22 -0700

We get -1 because of this:

      byte [] timerangeBytes = metadataMap.get(TIMERANGE_KEY);
      if (timerangeBytes != null) {
        this.reader.timeRangeTracker = new TimeRangeTracker();
        Writables.copyWritable(timerangeBytes,
this.reader.timeRangeTracker);
      }
this.reader.timeRangeTracker will return -1 for the maximumTimestamp value.
So now, we need to figure if it's normal or not to have TIMERANGE_KEY not
null here.


I have created the same table locally on 0,94.10 with the same attributes
and I'm not facing this issue.

We need to look at the related HFile, but files are rolled VERY quickly, so
might be difficult to get one.

Maybe something like
hadoop fs -get hdfs://
hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/*
.

might help to get the file? Then we can start to look at it and see what
exactly trigger this behaviour?

JM


2013/9/24 Sergey Shelukhin <[email protected]>

> Yeah, I think c3580bdb62d64e42a9eeac50f1c582d2 store file is a good
> example.
> Can you grep for c3580bdb62d64e42a9eeac50f1c582d2 and post the log just to
> be sure? Thanks.
> It looks like an interaction of deleting expired files and
>           // Create the writer even if no kv(Empty store file is also ok),
>           // because we need record the max seq id for the store file, see
>           // HBASE-6059
> in compactor.
> The newly created file is immediately collected the same way and replaced
> by another file, which seems like not an intended behavior, even though
> both pieces of code are technically correct (the empty file is expired, and
> the new file is generally needed).
>
> I filed HBASE-9648
>
>
> On Tue, Sep 24, 2013 at 10:55 AM, Sergey Shelukhin
> <[email protected]>wrote:
>
> > To mitigate, you can change hbase.store.delete.expired.storefile to false
> > on one region server, or for entire table, and restart this RS.
> > This will trigger a different compaction, hopefully.
> > We'd need to find what the bug is. My gut feeling (which is known to be
> > wrong often) is that it has to do with it selecting one file, probably
> > invalid check somewhere, or interaction with the code that ensures that
> at
> > least one file needs to be written to preserve metadata, it might be just
> > cycling thru such files.
> >
> >
> > On Tue, Sep 24, 2013 at 10:20 AM, Jean-Marc Spaggiari <
> > [email protected]> wrote:
> >
> >> So. Looking at the code, this, for me, sound like a bug.
> >>
> >> I will try to reproduce it locally. Seems to be related to the
> combination
> >> of TTL + BLOOM.
> >>
> >> Creating a table for that right now, will keep you posted very shortly.
> >>
> >> JM
> >>
> >>
> >> 2013/9/24 Tom Brown <[email protected]>
> >>
> >> > -rw-------   1 hadoop supergroup       2194 2013-09-21 14:32
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/014ead47a9484d67b55205be16802ff1
> >> > -rw-------   1 hadoop supergroup      31321 2013-09-24 05:49
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1305d625bd4a4be39a98ae4d91a66140
> >> > -rw-------   1 hadoop supergroup       1350 2013-09-24 10:31
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1352e0828f974f08b1f3d7a9dff04abd
> >> > -rw-------   1 hadoop supergroup       4194 2013-09-21 10:38
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/17a546064bd840619816809ae0fc4c49
> >> > -rw-------   1 hadoop supergroup       1061 2013-09-20 22:55
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1cb3df115da244288bd076968ab4ccf6
> >> > -rw-------   1 hadoop supergroup       1375 2013-08-24 10:17
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1e41a96c49fc4e5ab59392d26935978d
> >> > -rw-------   1 hadoop supergroup      96296 2013-08-26 15:48
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/22d72fd897e34424b5420a96483a571e
> >> > -rw-------   1 hadoop supergroup       1356 2013-08-26 15:23
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/25fee1ffadbe42549bd0b7b13d782b72
> >> > -rw-------   1 hadoop supergroup       6229 2013-09-21 11:14
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/26289c777ec14dc5b7021b4d6b1050c5
> >> > -rw-------   1 hadoop supergroup       1223 2013-09-21 02:42
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/2757d7ba9c8448d6a3d5d46bd4d59758
> >> > -rw-------   1 hadoop supergroup    5302248 2013-08-24 02:22
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/2ec40943787246ea983608dd6591db24
> >> > -rw-------   1 hadoop supergroup       1596 2013-08-24 03:37
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/3157fd1cabe4483aaa4d9a21f75e4d88
> >> > -rw-------   1 hadoop supergroup       1338 2013-09-22 04:25
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/36b0f80a4a7b492f97358b64d879a2df
> >> > -rw-------   1 hadoop supergroup       3264 2013-09-21 12:05
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/39e249fcb532400daed73aed6689ceeb
> >> > -rw-------   1 hadoop supergroup       4549 2013-09-21 08:56
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/3bc9e2a566ad460a9b0ed336b2fb5ed9
> >> > -rw-------   1 hadoop supergroup       1630 2013-09-22 03:22
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/48026d08aae748f08aad59e4eea903be
> >> > -rw-------   1 hadoop supergroup     105395 2013-09-20 21:12
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/53198825f085401cbbd4322faa0e3aae
> >> > -rw-------   1 hadoop supergroup       3859 2013-09-21 09:09
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/71c2f9b2a8ff4c049fcc5a9a22af5cfe
> >> > -rw-------   1 hadoop supergroup     311688 2013-09-20 21:12
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/97ff16d6da974c30835c6e0acc7c737a
> >> > -rw-------   1 hadoop supergroup       1897 2013-08-24 08:43
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/a172d7577641434d82abcce88a433213
> >> > -rw-------   1 hadoop supergroup       3380 2013-09-21 13:04
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/be678e5c60534c65a012a798fbc7e284
> >> > -rw-------   1 hadoop supergroup      43710 2013-09-22 02:15
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/e2508a23acf1491f9d38b9a8594e41e8
> >> > -rw-------   1 hadoop supergroup       5409 2013-09-21 10:10
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/f432846182714b93a1c3df0f5835c09b
> >> > -rw-------   1 hadoop supergroup        491 2013-09-24 11:18
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/f7d8669cf7a047b98c1d3b13c16cfaec
> >> > -rw-------   1 hadoop supergroup        491 2013-09-24 11:18
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fa1b8f6cc9584eb28365dcd8f10d3f0a
> >> > -rw-------   1 hadoop supergroup        491 2013-09-13 11:28
> >> >
> >> >
> >>
> /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fca0882dc7624342a8f4fce4b89420ff
> >> >
> >> >
> >> >
> >> > On Tue, Sep 24, 2013 at 11:14 AM, Jean-Marc Spaggiari <
> >> > [email protected]> wrote:
> >> >
> >> > > TTL seems to be fine.
> >> > >
> >> > > -1 is the default value for TimeRangeTracker.maximumTimestamp.
> >> > >
> >> > > Can you run:
> >> > > hadoop fs -lsr hdfs://
> >> > >
> >> > >
> >> >
> >>
> hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/
> >> > >
> >> > > Thanks,
> >> > >
> >> > > JM
> >> > >
> >> > >
> >> > > 2013/9/24 Tom Brown <[email protected]>
> >> > >
> >> > > > 1. Hadoop version is 1.1.2.
> >> > > > 2. All servers are synched with NTP.
> >> > > > 3. Table definition is: 'compound0', {
> >> > > > NAME => 'd',
> >> > > > DATA_BLOCK_ENCODING => 'NONE',
> >> > > > BLOOMFILTER => 'ROW',
> >> > > > REPLICATION_SCOPE => '0',
> >> > > > VERSIONS => '1',
> >> > > > COMPRESSION => 'SNAPPY',
> >> > > > MIN_VERSIONS => '0',
> >> > > > TTL => '8640000',
> >> > > > KEEP_DELETED_CELLS => 'false',
> >> > > > BLOCKSIZE => '65536',
> >> > > > IN_MEMORY => 'false',
> >> > > > ENCODE_ON_DISK => 'true',
> >> > > > BLOCKCACHE => 'true'
> >> > > > }
> >> > > >
> >> > > > The TTL is supposed to be 100 days.
> >> > > >
> >> > > > --Tom
> >> > > >
> >> > > >
> >> > > > On Tue, Sep 24, 2013 at 10:53 AM, Jean-Marc Spaggiari <
> >> > > > [email protected]> wrote:
> >> > > >
> >> > > > > Another important information why might be the root cause of
> this
> >> > > > issue...
> >> > > > >
> >> > > > > Do you have any TTL defines for this table?
> >> > > > >
> >> > > > > JM
> >> > > > >
> >> > > > >
> >> > > > > 2013/9/24 Jean-Marc Spaggiari <[email protected]>
> >> > > > >
> >> > > > > > Strange.
> >> > > > > >
> >> > > > > > Few questions then.
> >> > > > > > 1) What is your hadoop version?
> >> > > > > > 2) Is the clock on all your severs synched with NTP?
> >> > > > > > 3) What is you table definition? Bloom filters, etc.?
> >> > > > > >
> >> > > > > > This is the reason why it keep compacting:
> >> > > > > >
> >> > > > > > 2013-09-24 10:04:00,548 INFO
> >> > > > >
> org.apache.hadoop.hbase.regionserver.compactions.CompactSelection:
> >> > > > Deleting
> >> > > > > the expired store file by compaction: hdfs://
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/7426f128469242ec8ee09f3965fd5a1awhosemaxTimeStampis-1whilethe
>  max expired timestamp is 1371398640548
> >> > > > > >
> >> > > > > > maxTimeStamp = -1
> >> > > > > >
> >> > > > > >
> >> > > > > > Each time there is a comparison between maxTimeStamp for this
> >> store
> >> > > > file
> >> > > > > > and the configured maxExpiredTimeStamp and since maxTimeStamp
> >> > returns
> >> > > > -1,
> >> > > > > > it's always elected for a compaction. Now, we need to find
> >> why...
> >> > > > > >
> >> > > > > > JM
> >> > > > > >
> >> > > > > >
> >> > > > > > 2013/9/24 Tom Brown <[email protected]>
> >> > > > > >
> >> > > > > >> My cluster is fully distributed (2 regionserver nodes).
> >> > > > > >>
> >> > > > > >> Here is a snippet of log entries that may explain why it
> >> started:
> >> > > > > >> http://pastebin.com/wQECif8k. I had to go back 2 days to
> find
> >> > when
> >> > > it
> >> > > > > >> started for this region.
> >> > > > > >>
> >> > > > > >> This is not the only region experiencing this issue (but this
> >> is
> >> > the
> >> > > > > >> smallest one it's happened to).
> >> > > > > >>
> >> > > > > >> --Tom
> >> > > > > >>
> >> > > > > >>
> >> > > > > >> On Tue, Sep 24, 2013 at 10:13 AM, Jean-Marc Spaggiari <
> >> > > > > >> [email protected]> wrote:
> >> > > > > >>
> >> > > > > >> > Can you past logs a bit before that? To see if anything
> >> > triggered
> >> > > > the
> >> > > > > >> > compaction?
> >> > > > > >> > Before the 1M compactions entries.
> >> > > > > >> >
> >> > > > > >> > Also, what is your setup? Are you running in Standalone?
> >> > > > Pseudo-Dist?
> >> > > > > >> > Fully-Dist?
> >> > > > > >> >
> >> > > > > >> > Thanks,
> >> > > > > >> >
> >> > > > > >> > JM
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> > 2013/9/24 Tom Brown <[email protected]>
> >> > > > > >> >
> >> > > > > >> > > There is one column family, d. Each row has about 10
> >> columns,
> >> > > and
> >> > > > > each
> >> > > > > >> > > row's total data size is less than 2K.
> >> > > > > >> > >
> >> > > > > >> > > Here is a small snippet of logs from the region server:
> >> > > > > >> > > http://pastebin.com/S2jE4ZAx
> >> > > > > >> > >
> >> > > > > >> > > --Tom
> >> > > > > >> > >
> >> > > > > >> > >
> >> > > > > >> > > On Tue, Sep 24, 2013 at 9:59 AM, Bharath Vissapragada <
> >> > > > > >> > > [email protected]
> >> > > > > >> > > > wrote:
> >> > > > > >> > >
> >> > > > > >> > > > It would help if you can show your RS log (via
> >> pastebin?) .
> >> > > Are
> >> > > > > >> there
> >> > > > > >> > > > frequent flushes for this region too?
> >> > > > > >> > > >
> >> > > > > >> > > >
> >> > > > > >> > > > On Tue, Sep 24, 2013 at 9:20 PM, Tom Brown <
> >> > > > [email protected]>
> >> > > > > >> > wrote:
> >> > > > > >> > > >
> >> > > > > >> > > > > I have a region that is very small, only 5MB. Despite
> >> it's
> >> > > > size,
> >> > > > > >> it
> >> > > > > >> > has
> >> > > > > >> > > > 24
> >> > > > > >> > > > > store files. The logs show that it's compacting (over
> >> and
> >> > > over
> >> > > > > >> > again).
> >> > > > > >> > > > >
> >> > > > > >> > > > > The odd thing is that even though there are 24 store
> >> > files,
> >> > > it
> >> > > > > >> only
> >> > > > > >> > > does
> >> > > > > >> > > > > one at a time. Even more strange is that my logs are
> >> > filling
> >> > > > up
> >> > > > > >> with
> >> > > > > >> > > > > compacting this one region. In the last 10 hours,
> there
> >> > have
> >> > > > > been
> >> > > > > >> > > > 1,876,200
> >> > > > > >> > > > > log entries corresponding to compacting this region
> >> alone.
> >> > > > > >> > > > >
> >> > > > > >> > > > > My cluster is 0.94.10, and using almost all default
> >> > > settings.
> >> > > > > >> Only a
> >> > > > > >> > > few
> >> > > > > >> > > > > are not default:
> >> > > > > >> > > > > hbase.hregion.max.filesize = 4294967296
> >> > > > > >> > > > > hbase.hstore.compaction.min = 6
> >> > > > > >> > > > >
> >> > > > > >> > > > > I am at a total loss as to why this behavior is
> >> occurring.
> >> > > Any
> >> > > > > >> help
> >> > > > > >> > is
> >> > > > > >> > > > > appreciated.
> >> > > > > >> > > > >
> >> > > > > >> > > > > --Tom
> >> > > > > >> > > > >
> >> > > > > >> > > >
> >> > > > > >> > > >
> >> > > > > >> > > >
> >> > > > > >> > > > --
> >> > > > > >> > > > Bharath Vissapragada
> >> > > > > >> > > > <http://www.cloudera.com>
> >> > > > > >> > > >
> >> > > > > >> > >
> >> > > > > >> >
> >> > > > > >>
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Why is this region compacting?

Reply via email to