No. It happened in our production environment after running counters increments every 5 minutes for a few weeks now. I could try to reproduce in test cluster environment but that would mean running for weeks as well... but I will keep digging and let you guys know if it happens again or / and I have more information or insights on the issue.
Thanks. On Wed, Apr 17, 2013 at 8:18 PM, ramkrishna vasudevan < [email protected]> wrote: > Is there any testcases that tries to reproduce your issue? > > Regards > Ram > > > On Wed, Apr 17, 2013 at 9:47 PM, ramkrishna vasudevan < > [email protected]> wrote: > > > There is a hint mechanism available when scanning happens. But i dont > > think there should be much of difference between a scan that happens > during > > flush and the normal scan. > > > > Will look thro the code and come back on this. > > > > Regards > > RAm > > > > > > On Wed, Apr 17, 2013 at 9:40 PM, Amit Sela <[email protected]> wrote: > > > >> No, no encoding. > >> > >> > >> On Wed, Apr 17, 2013 at 6:56 PM, ramkrishna vasudevan < > >> [email protected]> wrote: > >> > >> > @Lars > >> > You have any suggestions on this? > >> > > >> > @Amit > >> > You have any Encoder enabled like the Prefix Encoding stuff? > >> > There was one optimization added recently but that is not in 0.94.2 > >> > > >> > Regards > >> > Ram > >> > > >> > > >> > On Wed, Apr 17, 2013 at 5:17 PM, Amit Sela <[email protected]> > wrote: > >> > > >> > > I scanned over this counter with and without column specification > and > >> all > >> > > looks OK now. > >> > > I have no CPs in this table. > >> > > Is there some kind of a hint mechanism in HBase' internal scan ? > >> because > >> > > it's weird that ScanWildcardColumnTracker.checkColumn says that > >> column is > >> > > smaller than previous column: *imprersions_ALL_2013041617*. there > is > >> no > >> > > imprersions only impressions and r is indeed smaller than s, could > it > >> be > >> > > some kind of hint bug ? I don't think I know enough of HBase > >> internals to > >> > > fully understand that... > >> > > > >> > > > >> > > > >> > > On Wed, Apr 17, 2013 at 1:42 PM, ramkrishna vasudevan < > >> > > [email protected]> wrote: > >> > > > >> > > > Hi Amit > >> > > > > >> > > > Checking the code this is possible when the qualifiers are not > >> sorted. > >> > > Do > >> > > > you have any CPs in your path which tries to play with the KVs? > >> > > > > >> > > > Seems to be a very weird thing. > >> > > > Can you try doing a scan on the KV just before this happens. That > >> will > >> > > tel > >> > > > you the existing kvs that are present. > >> > > > > >> > > > Even now if you can have the cluster you can try scanning for the > >> > region > >> > > > for which the flush happened. That will give us some more info. > >> > > > > >> > > > Regards > >> > > > Ram > >> > > > > >> > > > > >> > > > On Wed, Apr 17, 2013 at 2:36 PM, Amit Sela <[email protected]> > >> > wrote: > >> > > > > >> > > > > The cluster runs Hadoop 1.0.4 and HBase 0.94.2 > >> > > > > > >> > > > > I have three families in this table: weekly, daily, hourly. each > >> > family > >> > > > has > >> > > > > the following qualifiers: > >> > > > > Weekly - impressions_{countrycode}_{week#} - country code is 0, > 1 > >> or > >> > > ALL > >> > > > > (aggregation of both 0 and 1) > >> > > > > Daily and hourly are the same but with yyyyMMdd and yyyyMMddhh > >> > > > > respectively. > >> > > > > > >> > > > > Just before the exception the regionserver StoreFile executes > the > >> > > > > following: > >> > > > > > >> > > > > 2013-04-16 17:56:06,769 [regionserver8041.cacheFlusher] INFO > >> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family > >> Bloom > >> > > > filter > >> > > > > type for hdfs:// > >> > > > > hadoop-master.infolinks.com:8000/hbase/URL_COUNTERS/af2760e > >> > > > > 4d04a9e3025d1fb53bdba8acf/.tmp/dc4ce516887f4e0bbaf6201d69ba90bc: > >> > > > > CompoundBloomFilterWriter > >> > > > > 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO > >> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom > >> and > >> > NO > >> > > > > DeleteFamily was added to HFile > >> > (hdfs://hbase-master-address:8000/hbase > >> > > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf* > >> > > > > /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc) > >> > > > > 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO > >> > > > > org.apache.hadoop.hbase.regionserver.Store: Flushed , > >> > > > sequenceid=210517246, > >> > > > > memsize=39.3m, into tmp file hdfs://hbase-master:8000/hbase > >> > > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf* > >> > > > > /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc > >> > > > > 2013-04-16 17:56:07,357 [regionserver8041.cacheFlusher] INFO > >> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family > >> Bloom > >> > > > filter > >> > > > > type for hdfs://hbase-master:8000/hbase/URL_COUNTERS/*af2760e* > >> > > > > > *4d04a9e3025d1fb53bdba8acf*/.tmp/3fa7993dcb294be1bca5e4d7357f4003: > >> > > > > CompoundBloomFilterWriter > >> > > > > 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] INFO > >> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom > >> and > >> > NO > >> > > > > DeleteFamily was added to HFile (hdfs://hbase-master:8000/hbase > >> > > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf* > >> > > > > /.tmp/3fa7993dcb294be1bca5e4d7357f4003) > >> > > > > 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] FATAL > >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING > >> region > >> > > > server > >> > > > > region-server-address,8041,1364993168088: Replay of HLog > required > >> > > > > . Forcing server shutdown > >> > > > > DroppedSnapshotException: region: TABLE,ROWKEY,1364317591568.* > >> > > > > af2760e4d04a9e3025d1fb53bdba8acf*. > >> > > > > .... > >> > > > > .... > >> > > > > ... > >> > > > > > >> > > > > > >> > > > > On Wed, Apr 17, 2013 at 11:47 AM, ramkrishna vasudevan < > >> > > > > [email protected]> wrote: > >> > > > > > >> > > > > > Seems interesting. Can you tell us what are the families and > >> the > >> > > > > > qualifiers available in your schema. > >> > > > > > > >> > > > > > Any other interesting logs that you can see before this? > >> > > > > > > >> > > > > > BTW the version of HBase is also needed? If we can track it > >> out we > >> > > can > >> > > > > > then file a JIRA if it is a bug. > >> > > > > > > >> > > > > > Regards > >> > > > > > RAm > >> > > > > > > >> > > > > > > >> > > > > > On Wed, Apr 17, 2013 at 2:00 PM, Amit Sela < > [email protected] > >> > > >> > > > wrote: > >> > > > > > > >> > > > > > > Hi all, > >> > > > > > > > >> > > > > > > I had a regionserver crushed during counters increment. > >> Looking > >> > at > >> > > > the > >> > > > > > > regionserver log I saw: > >> > > > > > > > >> > > > > > > org.apache.hadoop.hbase.DroppedSnapshotException: region: > >> > > TABLE_NAME, > >> > > > > > > ROW_KEY...at > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1472) > >> > > > > > > at > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1351) > >> > > > > > > at > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1292) > >> > > > > > > at > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:406) > >> > > > > > > at > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:380) > >> > > > > > > at > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243) > >> > > > > > > at java.lang.Thread.run(Thread.java:722) > >> > > > > > > Caused by: java.io.IOException: > >> > > ScanWildcardColumnTracker.checkColumn > >> > > > > ran > >> > > > > > > into a column actually smaller than the previous column: > >> > > *QUALIFIER* > >> > > > > > > at > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkColumn(ScanWildcardColumnTracker.java:104) > >> > > > > > > at > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:354) > >> > > > > > > at > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:362) > >> > > > > > > at > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311) > >> > > > > > > at > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:738) > >> > > > > > > at > >> > > > > > > > >> > > > org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:673) > >> > > > > > > at > >> > > > > > > > >> > > > org.apache.hadoop.hbase.regionserver.Store.access$400(Store.java:108) > >> > > > > > > at > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2276) > >> > > > > > > at > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1447) > >> > > > > > > > >> > > > > > > The strange thing is that the *QUALIFER* name as it appears > in > >> > the > >> > > > log > >> > > > > is > >> > > > > > > misspelled.... there is no, and never was such qualifier > name. > >> > > > > > > > >> > > > > > > Thanks, > >> > > > > > > > >> > > > > > > Amit. > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > > >
