Re: Region split caching issue

2018-02-20 Thread Zach York
Thanks for the response!


On Thu, Feb 15, 2018 at 11:52 PM, Stack  wrote:

> On Wed, Feb 14, 2018 at 3:19 PM, Zach York 
> wrote:
>
> > Hello,
> >
> > I have been trying for some time to figure out an issue where splits fail
> > due to trying to cache the same key, but different value.
>
>
> Whats the exception look like Mighty Zach?
>

2018-01-12 16:45:17,263 INFO
[regionserver/ip-172-31-46-118.us-west-2.compute.internal/172.31.46.118:16020-splits-1515775482373]
regionserver.SplitRequest: Running rollback/cleanup of failed split of
test,9hp9mum6k2r9j19dbvax2s35f0eaoxzb8nqwz6*024566245231038769,1510040627287.6d1496d3c9ac6ebc09b02b7b959e5c5f.;
Failed ,16020,1515773679237-daughterOpener=c5ff63db3b59a7c8ebe7491811a28660
java.io.IOException: Failed
,16020,1515773679237-daughterOpener=c5ff63db3b59a7c8ebe7491811a28660
at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.openDaughters(SplitTransactionImpl.java:504)
at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsAfterPONR(SplitTransactionImpl.java:597)
at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:580)
at 
org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
at 
org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: java.io.IOException:
java.lang.RuntimeException: Cached block contents differ, which should
not have 
happened.cacheKey:test=c46e62f66866b1df8d7e18c8551dbec3-3908e0f20ad3410aadf4e4e8809f90f4.6d1496d3c9ac6ebc09b02b7b959e5c5f_131435429
at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:954)
at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:829)
at 
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:804)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6717)
at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.openDaughterRegion(SplitTransactionImpl.java:731)
at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl$DaughterOpener.run(SplitTransactionImpl.java:711)
... 1 more
Caused by: java.io.IOException: java.lang.RuntimeException: Cached
block contents differ, which should not have
happened.cacheKey:test=c46e62f66866b1df8d7e18c8551dbec3-3908e0f20ad3410aadf4e4e8809f90f4.6d1496d3c9ac6ebc09b02b7b959e5c5f_131435429
at 
org.apache.hadoop.hbase.regionserver.HStore.openStoreFiles(HStore.java:560)
at 
org.apache.hadoop.hbase.regionserver.HStore.loadStoreFiles(HStore.java:514)
at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:277)
at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:5194)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:928)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:925)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: java.lang.RuntimeException: Cached block contents differ,
which should not have
happened.cacheKey:test=c46e62f66866b1df8d7e18c8551dbec3-3908e0f20ad3410aadf4e4e8809f90f4.6d1496d3c9ac6ebc09b02b7b959e5c5f_131435429
at 
org.apache.hadoop.hbase.io.hfile.LruBlockCache.cacheBlock(LruBlockCache.java:368)
at 
org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.cacheBlock(CombinedBlockCache.java:66)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:462)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:271)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:651)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:601)
at 
org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:190)
at 
org.apache.hadoop.hbase.io.HalfStoreFileReader.getFirstKey(HalfStoreFileReader.java:361)
at 
org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:511)
at 
org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:528)
at 
org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:518)
at 
org.apache.had

Re: Region split caching issue

2018-02-15 Thread Stack
On Wed, Feb 14, 2018 at 3:19 PM, Zach York 
wrote:

> Hello,
>
> I have been trying for some time to figure out an issue where splits fail
> due to trying to cache the same key, but different value.


Whats the exception look like Mighty Zach?

...


> What I know/have observed:
> This occurs when prefetch on open is true
> Both daugher store openers will try to access and cache the first and last
> values in the store (HalfStoreFile). [1] [2] [3]
> Both daugher store openers will access the same key (the splitkey, I
> believe).
>


The lastkey in one daughter will be the firstkey in the other?




> * However they will access two different paths and retrieve two different
> values.
>

Different values? Or is that the same path (the half store files resolve to
same backing storefile) but different values somehow?



> Then when they try to cache these values, it fails because the other thread
> cached it's value and the two values aren't equivalent. [4]
> This is past the point of no return for the region split so it tries to
> roll forward, but fails. These regions are then corrupted and cannot be
> recovered.
>
> One thing I have notices is that these blocks are all leaf index blocks. I
> guess these index blocks would be different if it was in a different file.
> This is on HBase 1.3.1.
>
> It is definitely a concurrency bug as both daughter openers need to try to
> get the block from disk before the other caches the block. Why would it be
> trying to access two different paths for the same key? Is that normal in a
> reference file?
>
>
The two paths are probably the two reference file paths, one for each
daughter. That the paths are different should be fine (even if same value);
the key the block cache uses is the based off filename.

But you seem to be saying the key is the same so ultimately the file and
offset we are getting the value with -- used to make the block cache key --
are the same but somehow the value gotten differs?



> Does anyone have any idea as to what might be occurring?
> If you need more info, I can show you the code path that it is running
> through.
>
>
Say more Zach. Perhaps we can help out.

Thanks,

S



> Thanks,
> Zach
>
> [1]
> https://github.com/apache/hbase/blob/rel/1.3.1/hbase-
> server/src/main/java/org/apache/hadoop/hbase/io/
> HalfStoreFileReader.java#L357
> [2]
> https://github.com/apache/hbase/blob/rel/1.3.1/hbase-
> server/src/main/java/org/apache/hadoop/hbase/io/hfile/
> HFileReaderV2.java#L454
> [3]
> https://github.com/apache/hbase/blob/rel/1.3.1/hbase-
> server/src/main/java/org/apache/hadoop/hbase/io/hfile/
> HFileReaderV2.java#L461
> [4]
> https://github.com/apache/hbase/blob/rel/1.3.1/hbase-
> server/src/main/java/org/apache/hadoop/hbase/io/hfile/
> LruBlockCache.java#L367
>


Region split caching issue

2018-02-14 Thread Zach York
Hello,

I have been trying for some time to figure out an issue where splits fail
due to trying to cache the same key, but different value. At this point I
am making very slow progress so I figured I would try to reach out for
help. I will try to explain all that I know/have found out:

What I know/have observed:
This occurs when prefetch on open is true
Both daugher store openers will try to access and cache the first and last
values in the store (HalfStoreFile). [1] [2] [3]
Both daugher store openers will access the same key (the splitkey, I
believe).
* However they will access two different paths and retrieve two different
values.
Then when they try to cache these values, it fails because the other thread
cached it's value and the two values aren't equivalent. [4]
This is past the point of no return for the region split so it tries to
roll forward, but fails. These regions are then corrupted and cannot be
recovered.

One thing I have notices is that these blocks are all leaf index blocks. I
guess these index blocks would be different if it was in a different file.
This is on HBase 1.3.1.

It is definitely a concurrency bug as both daughter openers need to try to
get the block from disk before the other caches the block. Why would it be
trying to access two different paths for the same key? Is that normal in a
reference file?

Does anyone have any idea as to what might be occurring?
If you need more info, I can show you the code path that it is running
through.

Thanks,
Zach

[1]
https://github.com/apache/hbase/blob/rel/1.3.1/hbase-server/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java#L357
[2]
https://github.com/apache/hbase/blob/rel/1.3.1/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java#L454
[3]
https://github.com/apache/hbase/blob/rel/1.3.1/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java#L461
[4]
https://github.com/apache/hbase/blob/rel/1.3.1/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java#L367