Re: Region split caching issue
Thanks for the response! On Thu, Feb 15, 2018 at 11:52 PM, Stack wrote: > On Wed, Feb 14, 2018 at 3:19 PM, Zach York > wrote: > > > Hello, > > > > I have been trying for some time to figure out an issue where splits fail > > due to trying to cache the same key, but different value. > > > Whats the exception look like Mighty Zach? > 2018-01-12 16:45:17,263 INFO [regionserver/ip-172-31-46-118.us-west-2.compute.internal/172.31.46.118:16020-splits-1515775482373] regionserver.SplitRequest: Running rollback/cleanup of failed split of test,9hp9mum6k2r9j19dbvax2s35f0eaoxzb8nqwz6*024566245231038769,1510040627287.6d1496d3c9ac6ebc09b02b7b959e5c5f.; Failed ,16020,1515773679237-daughterOpener=c5ff63db3b59a7c8ebe7491811a28660 java.io.IOException: Failed ,16020,1515773679237-daughterOpener=c5ff63db3b59a7c8ebe7491811a28660 at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.openDaughters(SplitTransactionImpl.java:504) at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsAfterPONR(SplitTransactionImpl.java:597) at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:580) at org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: java.io.IOException: java.lang.RuntimeException: Cached block contents differ, which should not have happened.cacheKey:test=c46e62f66866b1df8d7e18c8551dbec3-3908e0f20ad3410aadf4e4e8809f90f4.6d1496d3c9ac6ebc09b02b7b959e5c5f_131435429 at org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:954) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:829) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:804) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6717) at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.openDaughterRegion(SplitTransactionImpl.java:731) at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl$DaughterOpener.run(SplitTransactionImpl.java:711) ... 1 more Caused by: java.io.IOException: java.lang.RuntimeException: Cached block contents differ, which should not have happened.cacheKey:test=c46e62f66866b1df8d7e18c8551dbec3-3908e0f20ad3410aadf4e4e8809f90f4.6d1496d3c9ac6ebc09b02b7b959e5c5f_131435429 at org.apache.hadoop.hbase.regionserver.HStore.openStoreFiles(HStore.java:560) at org.apache.hadoop.hbase.regionserver.HStore.loadStoreFiles(HStore.java:514) at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:277) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:5194) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:928) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:925) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more Caused by: java.lang.RuntimeException: Cached block contents differ, which should not have happened.cacheKey:test=c46e62f66866b1df8d7e18c8551dbec3-3908e0f20ad3410aadf4e4e8809f90f4.6d1496d3c9ac6ebc09b02b7b959e5c5f_131435429 at org.apache.hadoop.hbase.io.hfile.LruBlockCache.cacheBlock(LruBlockCache.java:368) at org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.cacheBlock(CombinedBlockCache.java:66) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:462) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:271) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:651) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:601) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:190) at org.apache.hadoop.hbase.io.HalfStoreFileReader.getFirstKey(HalfStoreFileReader.java:361) at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:511) at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:528) at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:518) at org.apache.had
Re: Region split caching issue
On Wed, Feb 14, 2018 at 3:19 PM, Zach York wrote: > Hello, > > I have been trying for some time to figure out an issue where splits fail > due to trying to cache the same key, but different value. Whats the exception look like Mighty Zach? ... > What I know/have observed: > This occurs when prefetch on open is true > Both daugher store openers will try to access and cache the first and last > values in the store (HalfStoreFile). [1] [2] [3] > Both daugher store openers will access the same key (the splitkey, I > believe). > The lastkey in one daughter will be the firstkey in the other? > * However they will access two different paths and retrieve two different > values. > Different values? Or is that the same path (the half store files resolve to same backing storefile) but different values somehow? > Then when they try to cache these values, it fails because the other thread > cached it's value and the two values aren't equivalent. [4] > This is past the point of no return for the region split so it tries to > roll forward, but fails. These regions are then corrupted and cannot be > recovered. > > One thing I have notices is that these blocks are all leaf index blocks. I > guess these index blocks would be different if it was in a different file. > This is on HBase 1.3.1. > > It is definitely a concurrency bug as both daughter openers need to try to > get the block from disk before the other caches the block. Why would it be > trying to access two different paths for the same key? Is that normal in a > reference file? > > The two paths are probably the two reference file paths, one for each daughter. That the paths are different should be fine (even if same value); the key the block cache uses is the based off filename. But you seem to be saying the key is the same so ultimately the file and offset we are getting the value with -- used to make the block cache key -- are the same but somehow the value gotten differs? > Does anyone have any idea as to what might be occurring? > If you need more info, I can show you the code path that it is running > through. > > Say more Zach. Perhaps we can help out. Thanks, S > Thanks, > Zach > > [1] > https://github.com/apache/hbase/blob/rel/1.3.1/hbase- > server/src/main/java/org/apache/hadoop/hbase/io/ > HalfStoreFileReader.java#L357 > [2] > https://github.com/apache/hbase/blob/rel/1.3.1/hbase- > server/src/main/java/org/apache/hadoop/hbase/io/hfile/ > HFileReaderV2.java#L454 > [3] > https://github.com/apache/hbase/blob/rel/1.3.1/hbase- > server/src/main/java/org/apache/hadoop/hbase/io/hfile/ > HFileReaderV2.java#L461 > [4] > https://github.com/apache/hbase/blob/rel/1.3.1/hbase- > server/src/main/java/org/apache/hadoop/hbase/io/hfile/ > LruBlockCache.java#L367 >
Region split caching issue
Hello, I have been trying for some time to figure out an issue where splits fail due to trying to cache the same key, but different value. At this point I am making very slow progress so I figured I would try to reach out for help. I will try to explain all that I know/have found out: What I know/have observed: This occurs when prefetch on open is true Both daugher store openers will try to access and cache the first and last values in the store (HalfStoreFile). [1] [2] [3] Both daugher store openers will access the same key (the splitkey, I believe). * However they will access two different paths and retrieve two different values. Then when they try to cache these values, it fails because the other thread cached it's value and the two values aren't equivalent. [4] This is past the point of no return for the region split so it tries to roll forward, but fails. These regions are then corrupted and cannot be recovered. One thing I have notices is that these blocks are all leaf index blocks. I guess these index blocks would be different if it was in a different file. This is on HBase 1.3.1. It is definitely a concurrency bug as both daughter openers need to try to get the block from disk before the other caches the block. Why would it be trying to access two different paths for the same key? Is that normal in a reference file? Does anyone have any idea as to what might be occurring? If you need more info, I can show you the code path that it is running through. Thanks, Zach [1] https://github.com/apache/hbase/blob/rel/1.3.1/hbase-server/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java#L357 [2] https://github.com/apache/hbase/blob/rel/1.3.1/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java#L454 [3] https://github.com/apache/hbase/blob/rel/1.3.1/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java#L461 [4] https://github.com/apache/hbase/blob/rel/1.3.1/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java#L367