Same thing in pastebin: http://pastebin.com/tApr5CDX
On Tue, Sep 24, 2013 at 11:18 AM, Tom Brown <[email protected]> wrote: > -rw------- 1 hadoop supergroup 2194 2013-09-21 14:32 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/014ead47a9484d67b55205be16802ff1 > -rw------- 1 hadoop supergroup 31321 2013-09-24 05:49 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1305d625bd4a4be39a98ae4d91a66140 > -rw------- 1 hadoop supergroup 1350 2013-09-24 10:31 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1352e0828f974f08b1f3d7a9dff04abd > -rw------- 1 hadoop supergroup 4194 2013-09-21 10:38 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/17a546064bd840619816809ae0fc4c49 > -rw------- 1 hadoop supergroup 1061 2013-09-20 22:55 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1cb3df115da244288bd076968ab4ccf6 > -rw------- 1 hadoop supergroup 1375 2013-08-24 10:17 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/1e41a96c49fc4e5ab59392d26935978d > -rw------- 1 hadoop supergroup 96296 2013-08-26 15:48 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/22d72fd897e34424b5420a96483a571e > -rw------- 1 hadoop supergroup 1356 2013-08-26 15:23 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/25fee1ffadbe42549bd0b7b13d782b72 > -rw------- 1 hadoop supergroup 6229 2013-09-21 11:14 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/26289c777ec14dc5b7021b4d6b1050c5 > -rw------- 1 hadoop supergroup 1223 2013-09-21 02:42 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/2757d7ba9c8448d6a3d5d46bd4d59758 > -rw------- 1 hadoop supergroup 5302248 2013-08-24 02:22 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/2ec40943787246ea983608dd6591db24 > -rw------- 1 hadoop supergroup 1596 2013-08-24 03:37 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/3157fd1cabe4483aaa4d9a21f75e4d88 > -rw------- 1 hadoop supergroup 1338 2013-09-22 04:25 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/36b0f80a4a7b492f97358b64d879a2df > -rw------- 1 hadoop supergroup 3264 2013-09-21 12:05 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/39e249fcb532400daed73aed6689ceeb > -rw------- 1 hadoop supergroup 4549 2013-09-21 08:56 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/3bc9e2a566ad460a9b0ed336b2fb5ed9 > -rw------- 1 hadoop supergroup 1630 2013-09-22 03:22 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/48026d08aae748f08aad59e4eea903be > -rw------- 1 hadoop supergroup 105395 2013-09-20 21:12 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/53198825f085401cbbd4322faa0e3aae > -rw------- 1 hadoop supergroup 3859 2013-09-21 09:09 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/71c2f9b2a8ff4c049fcc5a9a22af5cfe > -rw------- 1 hadoop supergroup 311688 2013-09-20 21:12 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/97ff16d6da974c30835c6e0acc7c737a > -rw------- 1 hadoop supergroup 1897 2013-08-24 08:43 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/a172d7577641434d82abcce88a433213 > -rw------- 1 hadoop supergroup 3380 2013-09-21 13:04 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/be678e5c60534c65a012a798fbc7e284 > -rw------- 1 hadoop supergroup 43710 2013-09-22 02:15 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/e2508a23acf1491f9d38b9a8594e41e8 > -rw------- 1 hadoop supergroup 5409 2013-09-21 10:10 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/f432846182714b93a1c3df0f5835c09b > -rw------- 1 hadoop supergroup 491 2013-09-24 11:18 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/f7d8669cf7a047b98c1d3b13c16cfaec > -rw------- 1 hadoop supergroup 491 2013-09-24 11:18 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fa1b8f6cc9584eb28365dcd8f10d3f0a > -rw------- 1 hadoop supergroup 491 2013-09-13 11:28 > /hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/fca0882dc7624342a8f4fce4b89420ff > > > > On Tue, Sep 24, 2013 at 11:14 AM, Jean-Marc Spaggiari < > [email protected]> wrote: > >> TTL seems to be fine. >> >> -1 is the default value for TimeRangeTracker.maximumTimestamp. >> >> Can you run: >> hadoop fs -lsr hdfs:// >> >> hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/ >> >> Thanks, >> >> JM >> >> >> 2013/9/24 Tom Brown <[email protected]> >> >> > 1. Hadoop version is 1.1.2. >> > 2. All servers are synched with NTP. >> > 3. Table definition is: 'compound0', { >> > NAME => 'd', >> > DATA_BLOCK_ENCODING => 'NONE', >> > BLOOMFILTER => 'ROW', >> > REPLICATION_SCOPE => '0', >> > VERSIONS => '1', >> > COMPRESSION => 'SNAPPY', >> > MIN_VERSIONS => '0', >> > TTL => '8640000', >> > KEEP_DELETED_CELLS => 'false', >> > BLOCKSIZE => '65536', >> > IN_MEMORY => 'false', >> > ENCODE_ON_DISK => 'true', >> > BLOCKCACHE => 'true' >> > } >> > >> > The TTL is supposed to be 100 days. >> > >> > --Tom >> > >> > >> > On Tue, Sep 24, 2013 at 10:53 AM, Jean-Marc Spaggiari < >> > [email protected]> wrote: >> > >> > > Another important information why might be the root cause of this >> > issue... >> > > >> > > Do you have any TTL defines for this table? >> > > >> > > JM >> > > >> > > >> > > 2013/9/24 Jean-Marc Spaggiari <[email protected]> >> > > >> > > > Strange. >> > > > >> > > > Few questions then. >> > > > 1) What is your hadoop version? >> > > > 2) Is the clock on all your severs synched with NTP? >> > > > 3) What is you table definition? Bloom filters, etc.? >> > > > >> > > > This is the reason why it keep compacting: >> > > > >> > > > 2013-09-24 10:04:00,548 INFO >> > > org.apache.hadoop.hbase.regionserver.compactions.CompactSelection: >> > Deleting >> > > the expired store file by compaction: hdfs:// >> > > >> > >> hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/7426f128469242ec8ee09f3965fd5a1awhosemaxTimeStampis >> -1 while the max expired timestamp is 1371398640548 >> > > > >> > > > maxTimeStamp = -1 >> > > > >> > > > >> > > > Each time there is a comparison between maxTimeStamp for this store >> > file >> > > > and the configured maxExpiredTimeStamp and since maxTimeStamp >> returns >> > -1, >> > > > it's always elected for a compaction. Now, we need to find why... >> > > > >> > > > JM >> > > > >> > > > >> > > > 2013/9/24 Tom Brown <[email protected]> >> > > > >> > > >> My cluster is fully distributed (2 regionserver nodes). >> > > >> >> > > >> Here is a snippet of log entries that may explain why it started: >> > > >> http://pastebin.com/wQECif8k. I had to go back 2 days to find >> when it >> > > >> started for this region. >> > > >> >> > > >> This is not the only region experiencing this issue (but this is >> the >> > > >> smallest one it's happened to). >> > > >> >> > > >> --Tom >> > > >> >> > > >> >> > > >> On Tue, Sep 24, 2013 at 10:13 AM, Jean-Marc Spaggiari < >> > > >> [email protected]> wrote: >> > > >> >> > > >> > Can you past logs a bit before that? To see if anything triggered >> > the >> > > >> > compaction? >> > > >> > Before the 1M compactions entries. >> > > >> > >> > > >> > Also, what is your setup? Are you running in Standalone? >> > Pseudo-Dist? >> > > >> > Fully-Dist? >> > > >> > >> > > >> > Thanks, >> > > >> > >> > > >> > JM >> > > >> > >> > > >> > >> > > >> > 2013/9/24 Tom Brown <[email protected]> >> > > >> > >> > > >> > > There is one column family, d. Each row has about 10 columns, >> and >> > > each >> > > >> > > row's total data size is less than 2K. >> > > >> > > >> > > >> > > Here is a small snippet of logs from the region server: >> > > >> > > http://pastebin.com/S2jE4ZAx >> > > >> > > >> > > >> > > --Tom >> > > >> > > >> > > >> > > >> > > >> > > On Tue, Sep 24, 2013 at 9:59 AM, Bharath Vissapragada < >> > > >> > > [email protected] >> > > >> > > > wrote: >> > > >> > > >> > > >> > > > It would help if you can show your RS log (via pastebin?) . >> Are >> > > >> there >> > > >> > > > frequent flushes for this region too? >> > > >> > > > >> > > >> > > > >> > > >> > > > On Tue, Sep 24, 2013 at 9:20 PM, Tom Brown < >> > [email protected]> >> > > >> > wrote: >> > > >> > > > >> > > >> > > > > I have a region that is very small, only 5MB. Despite it's >> > size, >> > > >> it >> > > >> > has >> > > >> > > > 24 >> > > >> > > > > store files. The logs show that it's compacting (over and >> over >> > > >> > again). >> > > >> > > > > >> > > >> > > > > The odd thing is that even though there are 24 store >> files, it >> > > >> only >> > > >> > > does >> > > >> > > > > one at a time. Even more strange is that my logs are >> filling >> > up >> > > >> with >> > > >> > > > > compacting this one region. In the last 10 hours, there >> have >> > > been >> > > >> > > > 1,876,200 >> > > >> > > > > log entries corresponding to compacting this region alone. >> > > >> > > > > >> > > >> > > > > My cluster is 0.94.10, and using almost all default >> settings. >> > > >> Only a >> > > >> > > few >> > > >> > > > > are not default: >> > > >> > > > > hbase.hregion.max.filesize = 4294967296 >> > > >> > > > > hbase.hstore.compaction.min = 6 >> > > >> > > > > >> > > >> > > > > I am at a total loss as to why this behavior is occurring. >> Any >> > > >> help >> > > >> > is >> > > >> > > > > appreciated. >> > > >> > > > > >> > > >> > > > > --Tom >> > > >> > > > > >> > > >> > > > >> > > >> > > > >> > > >> > > > >> > > >> > > > -- >> > > >> > > > Bharath Vissapragada >> > > >> > > > <http://www.cloudera.com> >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > > >> > > > >> > > >> > >> > >
