[jira] [Commented] (HBASE-15484) Correct the semantic of batch and partial
[ https://issues.apache.org/jira/browse/HBASE-15484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656439#comment-15656439 ] Duo Zhang commented on HBASE-15484: --- setBatch is almost the same with allowPartial as we do not guarantee that we will always give you the cells which count is exactly the batch you set. You can use setMaxResultSize and allowPartial I think. And for caching, it is the same reason that you can just use setMaxResultSize to limit the size of cells returned. And I think the 'limit' option is more useful because RS could close the scanner when the returned results reached the limit. > Correct the semantic of batch and partial > - > > Key: HBASE-15484 > URL: https://issues.apache.org/jira/browse/HBASE-15484 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.1.3 >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0 > > Attachments: HBASE-15484-v1.patch, HBASE-15484-v2.patch, > HBASE-15484-v3.patch, HBASE-15484-v4.patch > > > Follow-up to HBASE-15325, as discussed, the meaning of setBatch and > setAllowPartialResults should not be same. We should not regard setBatch as > setAllowPartialResults. > And isPartial should be define accurately. > (Considering getBatch==MaxInt if we don't setBatch.) If > result.rawcells.length row, isPartial==true, otherwise isPartial == false. So if user don't > setAllowPartialResults(true), isPartial should always be false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15484) Correct the semantic of batch and partial
[ https://issues.apache.org/jira/browse/HBASE-15484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656408#comment-15656408 ] Anoop Sam John commented on HBASE-15484: Sorry missed this one from a long time. Thanks for reviving the discussion. setBatch might be still useful for a paging kind of results presentation way? May be then also let users fetch data based on the max result size and any way that will come to client result cache and let only needed data be displayed. Ya now we have caching, batch, set max result size... Too much complicated. But caching is still useful? > Correct the semantic of batch and partial > - > > Key: HBASE-15484 > URL: https://issues.apache.org/jira/browse/HBASE-15484 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.1.3 >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0 > > Attachments: HBASE-15484-v1.patch, HBASE-15484-v2.patch, > HBASE-15484-v3.patch, HBASE-15484-v4.patch > > > Follow-up to HBASE-15325, as discussed, the meaning of setBatch and > setAllowPartialResults should not be same. We should not regard setBatch as > setAllowPartialResults. > And isPartial should be define accurately. > (Considering getBatch==MaxInt if we don't setBatch.) If > result.rawcells.length row, isPartial==true, otherwise isPartial == false. So if user don't > setAllowPartialResults(true), isPartial should always be false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17073) Increase the max number of buffers in ByteBufferPool
Anoop Sam John created HBASE-17073: -- Summary: Increase the max number of buffers in ByteBufferPool Key: HBASE-17073 URL: https://issues.apache.org/jira/browse/HBASE-17073 Project: HBase Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Before the HBASE-15525 issue fix, we had variable sized buffers in our buffer pool. The max size upto which one buffer can grow was 2 MB. Now we have changed it to be a fixed sized BBPool. By default 64 KB is the size of each buffer. But the max number of BBs allowed to be in the pool was not changed. ie. twice the number of handlers. May be we should be changing increasing it now? To make it equal to the way like 2 MB, we will need 32 * 2 * handlers. There is no initial #BBs any way. 2 MB is the default max response size what we have. And write reqs also, when it is Buffered mutator 2 MB is the default flush limit. We can make it to be 32 * #handlers as the def max #BBs I believe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC
[ https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656390#comment-15656390 ] stack commented on HBASE-17072: --- Nice analysis. The header prefetch can save a seek when scans cross hfile block boundaries. The thread local caching has been there a long time. Lets revisit in light of the findings here (Yeah, we don't carry over the header to cache anymore). At a minimum add the [~esteban] suggestion. > CPU usage starts to climb up to 90-100% when using G1GC > --- > > Key: HBASE-17072 > URL: https://issues.apache.org/jira/browse/HBASE-17072 > Project: HBase > Issue Type: Bug > Components: Performance, regionserver >Affects Versions: 1.0.0, 1.2.0 >Reporter: Eiichi Sato > Attachments: disable-block-header-cache.patch, mat-threadlocals.png, > mat-threads.png, metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg > > > h5. Problem > CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts > to gradually get higher up to nearly 90-100% when using G1GC. We've also run > into this problem on CDH 5.7.3 and CDH 5.8.2. > In our production cluster, it normally takes a few weeks for this to happen > after restarting a RS. We reproduced this on our test cluster and attached > the results. Please note that, to make it easy to reproduce, we did some > "anti-tuning" on a table when running tests. > In metrics.png, soon after we started running some workloads against a test > cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. > Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of > each RS process around 10:30 a.m. the next day. > After investigating heapdumps from another occurrence on a test cluster > running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of > contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary > clustering. This caused more loops in > {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU > time. What is worse is that the method is called from RPC metrics code, > which means even a small amount of per-RPC time soon adds up to a huge amount > of CPU time. > This is very similar to the issue in HBASE-16616, but we have many > {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances. > Here are some OQL counts from Eclipse Memory Analyzer (MAT). This shows a > number of ThreadLocal instances in the ThreadLocalMap of a single handler > thread. > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = > "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader" > #=> 10980 instances > {code} > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder" > #=> 2052 instances > {code} > Although as described in HBASE-16616 this somewhat seems to be an issue in > G1GC side regarding weakly-reachable objects, we should keep ThreadLocal > usage minimal and avoid creating an indefinite number (in this case, a number > of HFiles) of ThreadLocal instances. > HBASE-16146 removes ThreadLocals from the RPC metrics code. That may solve > the issue (I just saw the patch, never tested it at all), but the > {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which > may cause issues in the future again. > h5. Our Solution > We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and > fortunately we didn't notice any performance degradation for our production > workloads. > Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are > handled randomly in any of the handlers, small Get or small Scan RPCs do not > benefit from the caching (See HBASE-10676 and HBASE-11402 for the details). > Probably, we need to see how well reads are saved by the caching for large > Scan or Get RPCs and especially for compactions if we really remove the > caching. It's probably better if we can remove ThreadLocals without breaking > the current caching behavior. > FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17071) Do not initialize MemstoreChunkPool when use mslab option is turned off
[ https://issues.apache.org/jira/browse/HBASE-17071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656382#comment-15656382 ] Yu Li commented on HBASE-17071: --- +1 for the patch, reasonable to move {{USEMSLAB_KEY}} from {{SegmentFactory}} into {{MemStoreLAB}} > Do not initialize MemstoreChunkPool when use mslab option is turned off > --- > > Key: HBASE-17071 > URL: https://issues.apache.org/jira/browse/HBASE-17071 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-17071.patch > > > This is a 2.0 only issue and induced by HBASE-16407. > We are initializing MSLAB chunk pool along with RS start itself now. (To pass > it as a HeapMemoryTuneObserver). > When MSLAB is turned off (ie. hbase.hregion.memstore.mslab.enabled is > configured false) we should not be initializing MSLAB chunk pool at all. By > default the initial chunk count to be created will be 0 only. Still better > to avoid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17055) Disabling table not getting enabled after clean cluster restart.
[ https://issues.apache.org/jira/browse/HBASE-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656378#comment-15656378 ] Stephen Yuan Jiang commented on HBASE-17055: Thanks, [~sreenivasulureddy]. I will look the code for this. This is an interesting, for some reason SSH does not re-assign the region. > Disabling table not getting enabled after clean cluster restart. > > > Key: HBASE-17055 > URL: https://issues.apache.org/jira/browse/HBASE-17055 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 1.3.0 >Reporter: Y. SREENIVASULU REDDY >Assignee: Stephen Yuan Jiang > Fix For: 1.3.0 > > > scenario: > 1. Disable the table, while disabling the table is in progress. > 2. Restart whole HBase service. > 3. Then enable the table. > the above operation leads to RIT continously. > pls find the below logs for understanding. > while disabling the table whole hbase service went down. > the following is the master logs > {noformat} > 2016-11-09 19:32:55,102 INFO > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] master.HMaster: > Client=seenu//host-1 disable testTable > 2016-11-09 19:32:55,257 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > procedure2.ProcedureExecutor: Procedure DisableTableProcedure > (table=testTable) id=8 owner=seenu state=RUNNABLE:DISABLE_TABLE_PREPARE added > to the store. > 2016-11-09 19:32:55,264 DEBUG [ProcedureExecutor-5] > lock.ZKInterProcessLockBase: Acquired a lock for > /hbase/table-lock/testTable/write-master:165 > 2016-11-09 19:32:55,285 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > master.MasterRpcServices: Checking to see if procedure is done procId=8 > 2016-11-09 19:32:55,386 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > master.MasterRpcServices: Checking to see if procedure is done procId=8 > 2016-11-09 19:32:55,513 INFO [ProcedureExecutor-5] > zookeeper.ZKTableStateManager: Moving table testTable state from DISABLING to > DISABLING > 2016-11-09 19:32:55,587 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > master.MasterRpcServices: Checking to see if procedure is done procId=8 > 2016-11-09 19:32:55,628 INFO [ProcedureExecutor-5] > procedure.DisableTableProcedure: Offlining 1 regions. > . > . > . > . > . > . > . > . > 2016-11-09 19:33:02,871 INFO [AM.ZK.Worker-pool2-t7] master.RegionStates: > Offlined 1890fa9c085dcc2ee0602f4bab069d10 from host-1,16040,1478690163056 > Wed Nov 9 19:33:02 CST 2016 Terminating master > {noformat} > here we need to observe > {color:red} Offlined 1890fa9c085dcc2ee0602f4bab069d10 from > host-1,16040,1478690163056 {color} > then hmaster went down, all regionServers also made down. > After hmaster and regionserver are restarted > executed enable Table operation on the table. > {panel:title=HMaster > Logs|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE} > {noformat} > 2016-11-09 19:49:57,059 INFO > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] master.HMaster: > Client=seenu//host-1 enable testTable > 2016-11-09 19:49:57,325 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > procedure2.ProcedureExecutor: Procedure EnableTableProcedure > (table=testTable) id=9 owner=seenu state=RUNNABLE:ENABLE_TABLE_PREPARE added > to the store. > 2016-11-09 19:49:57,333 DEBUG [ProcedureExecutor-2] > lock.ZKInterProcessLockBase: Acquired a lock for > /hbase/table-lock/testTable/write-master:168 > 2016-11-09 19:49:57,335 DEBUG [hconnection-0x745317ee-shared--pool3-t11] > ipc.RpcClientImpl: Use SIMPLE authentication for service ClientService, > sasl=false > 2016-11-09 19:49:57,335 DEBUG [hconnection-0x745317ee-shared--pool3-t11] > ipc.RpcClientImpl: Connecting to host-1:16040 > 2016-11-09 19:49:57,347 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > master.MasterRpcServices: Checking to see if procedure is done procId=9 > 2016-11-09 19:49:57,449 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > master.MasterRpcServices: Checking to see if procedure is done procId=9 > 2016-11-09 19:49:57,579 INFO [ProcedureExecutor-2] > procedure.EnableTableProcedure: Attempting to enable the table testTable > 2016-11-09 19:49:57,580 INFO [ProcedureExecutor-2] > zookeeper.ZKTableStateManager: Moving table testTable state from DISABLED to > ENABLING > 2016-11-09 19:49:57,655 DEBUG > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16000] > master.MasterRpcServices: Checking to see if procedure is done procId=9 > 2016-11-09 19:49:57,707 INFO [ProcedureExecutor-2] > procedure.EnableTableProcedure: Table 'testTable' has 1 regions, of which 1
[jira] [Updated] (HBASE-17071) Do not initialize MemstoreChunkPool when use mslab option is turned off
[ https://issues.apache.org/jira/browse/HBASE-17071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-17071: --- Hadoop Flags: Reviewed Status: Patch Available (was: Open) > Do not initialize MemstoreChunkPool when use mslab option is turned off > --- > > Key: HBASE-17071 > URL: https://issues.apache.org/jira/browse/HBASE-17071 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-17071.patch > > > This is a 2.0 only issue and induced by HBASE-16407. > We are initializing MSLAB chunk pool along with RS start itself now. (To pass > it as a HeapMemoryTuneObserver). > When MSLAB is turned off (ie. hbase.hregion.memstore.mslab.enabled is > configured false) we should not be initializing MSLAB chunk pool at all. By > default the initial chunk count to be created will be 0 only. Still better > to avoid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17071) Do not initialize MemstoreChunkPool when use mslab option is turned off
[ https://issues.apache.org/jira/browse/HBASE-17071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656374#comment-15656374 ] stack commented on HBASE-17071: --- +1 > Do not initialize MemstoreChunkPool when use mslab option is turned off > --- > > Key: HBASE-17071 > URL: https://issues.apache.org/jira/browse/HBASE-17071 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-17071.patch > > > This is a 2.0 only issue and induced by HBASE-16407. > We are initializing MSLAB chunk pool along with RS start itself now. (To pass > it as a HeapMemoryTuneObserver). > When MSLAB is turned off (ie. hbase.hregion.memstore.mslab.enabled is > configured false) we should not be initializing MSLAB chunk pool at all. By > default the initial chunk count to be created will be 0 only. Still better > to avoid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC
[ https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656376#comment-15656376 ] Anoop Sam John commented on HBASE-17072: Ya when blocks are there in BC, we wont deal with this ThreadLocal variable at all. When we cache the blocks into BC, do we cache the next block's header also? I think no. Some issue fixed by Stack remove this I believe. > CPU usage starts to climb up to 90-100% when using G1GC > --- > > Key: HBASE-17072 > URL: https://issues.apache.org/jira/browse/HBASE-17072 > Project: HBase > Issue Type: Bug > Components: Performance, regionserver >Affects Versions: 1.0.0, 1.2.0 >Reporter: Eiichi Sato > Attachments: disable-block-header-cache.patch, mat-threadlocals.png, > mat-threads.png, metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg > > > h5. Problem > CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts > to gradually get higher up to nearly 90-100% when using G1GC. We've also run > into this problem on CDH 5.7.3 and CDH 5.8.2. > In our production cluster, it normally takes a few weeks for this to happen > after restarting a RS. We reproduced this on our test cluster and attached > the results. Please note that, to make it easy to reproduce, we did some > "anti-tuning" on a table when running tests. > In metrics.png, soon after we started running some workloads against a test > cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. > Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of > each RS process around 10:30 a.m. the next day. > After investigating heapdumps from another occurrence on a test cluster > running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of > contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary > clustering. This caused more loops in > {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU > time. What is worse is that the method is called from RPC metrics code, > which means even a small amount of per-RPC time soon adds up to a huge amount > of CPU time. > This is very similar to the issue in HBASE-16616, but we have many > {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances. > Here are some OQL counts from Eclipse Memory Analyzer (MAT). This shows a > number of ThreadLocal instances in the ThreadLocalMap of a single handler > thread. > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = > "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader" > #=> 10980 instances > {code} > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder" > #=> 2052 instances > {code} > Although as described in HBASE-16616 this somewhat seems to be an issue in > G1GC side regarding weakly-reachable objects, we should keep ThreadLocal > usage minimal and avoid creating an indefinite number (in this case, a number > of HFiles) of ThreadLocal instances. > HBASE-16146 removes ThreadLocals from the RPC metrics code. That may solve > the issue (I just saw the patch, never tested it at all), but the > {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which > may cause issues in the future again. > h5. Our Solution > We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and > fortunately we didn't notice any performance degradation for our production > workloads. > Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are > handled randomly in any of the handlers, small Get or small Scan RPCs do not > benefit from the caching (See HBASE-10676 and HBASE-11402 for the details). > Probably, we need to see how well reads are saved by the caching for large > Scan or Get RPCs and especially for compactions if we really remove the > caching. It's probably better if we can remove ThreadLocals without breaking > the current caching behavior. > FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC
[ https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656372#comment-15656372 ] Esteban Gutierrez commented on HBASE-17072: --- [~sato_eiichi], take a look at HBASE-17017 probably removing the per-region metrics could help. > CPU usage starts to climb up to 90-100% when using G1GC > --- > > Key: HBASE-17072 > URL: https://issues.apache.org/jira/browse/HBASE-17072 > Project: HBase > Issue Type: Bug > Components: Performance, regionserver >Affects Versions: 1.0.0, 1.2.0 >Reporter: Eiichi Sato > Attachments: disable-block-header-cache.patch, mat-threadlocals.png, > mat-threads.png, metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg > > > h5. Problem > CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts > to gradually get higher up to nearly 90-100% when using G1GC. We've also run > into this problem on CDH 5.7.3 and CDH 5.8.2. > In our production cluster, it normally takes a few weeks for this to happen > after restarting a RS. We reproduced this on our test cluster and attached > the results. Please note that, to make it easy to reproduce, we did some > "anti-tuning" on a table when running tests. > In metrics.png, soon after we started running some workloads against a test > cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. > Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of > each RS process around 10:30 a.m. the next day. > After investigating heapdumps from another occurrence on a test cluster > running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of > contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary > clustering. This caused more loops in > {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU > time. What is worse is that the method is called from RPC metrics code, > which means even a small amount of per-RPC time soon adds up to a huge amount > of CPU time. > This is very similar to the issue in HBASE-16616, but we have many > {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances. > Here are some OQL counts from Eclipse Memory Analyzer (MAT). This shows a > number of ThreadLocal instances in the ThreadLocalMap of a single handler > thread. > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = > "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader" > #=> 10980 instances > {code} > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder" > #=> 2052 instances > {code} > Although as described in HBASE-16616 this somewhat seems to be an issue in > G1GC side regarding weakly-reachable objects, we should keep ThreadLocal > usage minimal and avoid creating an indefinite number (in this case, a number > of HFiles) of ThreadLocal instances. > HBASE-16146 removes ThreadLocals from the RPC metrics code. That may solve > the issue (I just saw the patch, never tested it at all), but the > {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which > may cause issues in the future again. > h5. Our Solution > We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and > fortunately we didn't notice any performance degradation for our production > workloads. > Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are > handled randomly in any of the handlers, small Get or small Scan RPCs do not > benefit from the caching (See HBASE-10676 and HBASE-11402 for the details). > Probably, we need to see how well reads are saved by the caching for large > Scan or Get RPCs and especially for compactions if we really remove the > caching. It's probably better if we can remove ThreadLocals without breaking > the current caching behavior. > FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17056) Remove checked in PB generated files
[ https://issues.apache.org/jira/browse/HBASE-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-17056: -- Attachment: 0002-HBASE-17056-Remove-checked-in-PB-generated-files.patch Example. Does a few modules. Has protos generated on the fly. Removed from hbase-endpoint, hbase-rsgroup, and hbase-spark. > Remove checked in PB generated files > - > > Key: HBASE-17056 > URL: https://issues.apache.org/jira/browse/HBASE-17056 > Project: HBase > Issue Type: Bug >Reporter: Enis Soztutar > Fix For: 2.0.0 > > Attachments: > 0002-HBASE-17056-Remove-checked-in-PB-generated-files.patch > > > Now that we have the new PB maven plugin, there is no need to have the PB > files checked in to the repo. The reason we did that was to ease up developer > env setup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC
[ https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656368#comment-15656368 ] Esteban Gutierrez commented on HBASE-17072: --- As I think about this problem as described by [~sato_eiichi] I think there is some value of disabling optionally the prefetching of headers for some workloads (lots of regions, very large HFiles, SSDs) and it could be done via an HCD like PREFETCH_BLOCKS_ON_OPEN. However, regarding of the CPU usage I think the counters and our metrics per region are very expensive in general. > CPU usage starts to climb up to 90-100% when using G1GC > --- > > Key: HBASE-17072 > URL: https://issues.apache.org/jira/browse/HBASE-17072 > Project: HBase > Issue Type: Bug > Components: Performance, regionserver >Affects Versions: 1.0.0, 1.2.0 >Reporter: Eiichi Sato > Attachments: disable-block-header-cache.patch, mat-threadlocals.png, > mat-threads.png, metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg > > > h5. Problem > CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts > to gradually get higher up to nearly 90-100% when using G1GC. We've also run > into this problem on CDH 5.7.3 and CDH 5.8.2. > In our production cluster, it normally takes a few weeks for this to happen > after restarting a RS. We reproduced this on our test cluster and attached > the results. Please note that, to make it easy to reproduce, we did some > "anti-tuning" on a table when running tests. > In metrics.png, soon after we started running some workloads against a test > cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. > Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of > each RS process around 10:30 a.m. the next day. > After investigating heapdumps from another occurrence on a test cluster > running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of > contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary > clustering. This caused more loops in > {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU > time. What is worse is that the method is called from RPC metrics code, > which means even a small amount of per-RPC time soon adds up to a huge amount > of CPU time. > This is very similar to the issue in HBASE-16616, but we have many > {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances. > Here are some OQL counts from Eclipse Memory Analyzer (MAT). This shows a > number of ThreadLocal instances in the ThreadLocalMap of a single handler > thread. > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = > "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader" > #=> 10980 instances > {code} > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder" > #=> 2052 instances > {code} > Although as described in HBASE-16616 this somewhat seems to be an issue in > G1GC side regarding weakly-reachable objects, we should keep ThreadLocal > usage minimal and avoid creating an indefinite number (in this case, a number > of HFiles) of ThreadLocal instances. > HBASE-16146 removes ThreadLocals from the RPC metrics code. That may solve > the issue (I just saw the patch, never tested it at all), but the > {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which > may cause issues in the future again. > h5. Our Solution > We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and > fortunately we didn't notice any performance degradation for our production > workloads. > Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are > handled randomly in any of the handlers, small Get or small Scan RPCs do not > benefit from the caching (See HBASE-10676 and HBASE-11402 for the details). > Probably, we need to see how well reads are saved by the caching for large > Scan or Get RPCs and especially for compactions if we really remove the > caching. It's probably better if we can remove ThreadLocals without breaking > the current caching behavior. > FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17071) Do not initialize MemstoreChunkPool when use mslab option is turned off
[ https://issues.apache.org/jira/browse/HBASE-17071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-17071: --- Attachment: HBASE-17071.patch > Do not initialize MemstoreChunkPool when use mslab option is turned off > --- > > Key: HBASE-17071 > URL: https://issues.apache.org/jira/browse/HBASE-17071 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-17071.patch > > > This is a 2.0 only issue and induced by HBASE-16407. > We are initializing MSLAB chunk pool along with RS start itself now. (To pass > it as a HeapMemoryTuneObserver). > When MSLAB is turned off (ie. hbase.hregion.memstore.mslab.enabled is > configured false) we should not be initializing MSLAB chunk pool at all. By > default the initial chunk count to be created will be 0 only. Still better > to avoid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17047) Add an API to get HBase connection cache statistics
[ https://issues.apache.org/jira/browse/HBASE-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656355#comment-15656355 ] Hadoop QA commented on HBASE-17047: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 10s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} scaladoc {color} | {color:green} 0m 26s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} scalac {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 27m 54s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} | | {color:green}+1{color} | {color:green} scaladoc {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s {color} | {color:green} hbase-spark in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 6s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 35m 6s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:7bda515 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838514/HBASE-17047_v2.patch | | JIRA Issue | HBASE-17047 | | Optional Tests | asflicense scalac scaladoc unit compile | | uname | Linux 3fd26d956c44 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / f9c6b66 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/4431/testReport/ | | modules | C: hbase-spark U: hbase-spark | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/4431/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Add an API to get HBase connection cache statistics > --- > > Key: HBASE-17047 > URL: https://issues.apache.org/jira/browse/HBASE-17047 > Project: HBase > Issue Type: Improvement > Components: spark >Reporter: Weiqing Yang >Assignee: Weiqing Yang >Priority: Minor > Attachments: HBASE-17047_v1.patch, HBASE-17047_v2.patch > > > This patch will add a function "getStat" for the user to get the statistics > of the HBase connection cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15484) Correct the semantic of batch and partial
[ https://issues.apache.org/jira/browse/HBASE-15484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656344#comment-15656344 ] Duo Zhang commented on HBASE-15484: --- I think only allowPartial and maxResultSize are needed. And for implementing small scan, we need to add a 'limit' option to scan. > Correct the semantic of batch and partial > - > > Key: HBASE-15484 > URL: https://issues.apache.org/jira/browse/HBASE-15484 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.1.3 >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0 > > Attachments: HBASE-15484-v1.patch, HBASE-15484-v2.patch, > HBASE-15484-v3.patch, HBASE-15484-v4.patch > > > Follow-up to HBASE-15325, as discussed, the meaning of setBatch and > setAllowPartialResults should not be same. We should not regard setBatch as > setAllowPartialResults. > And isPartial should be define accurately. > (Considering getBatch==MaxInt if we don't setBatch.) If > result.rawcells.length row, isPartial==true, otherwise isPartial == false. So if user don't > setAllowPartialResults(true), isPartial should always be false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17056) Remove checked in PB generated files
[ https://issues.apache.org/jira/browse/HBASE-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656330#comment-15656330 ] stack commented on HBASE-17056: --- If we don't check in the shaded and generated files, then IDEs will report classes as not findable since they are not in our src tree. To get them, you'll need to run a mvn build first. That ok to require of our IDE users? On timing, for the smaller modules where there five or six protos -- e.g. hbase-endpoint -- then the build time for a 'clean install -DskipTests' goes from 5 seconds to 7 seconds. Acceptable I'd say. On checkStaleness, it is set already but I'd think that rather than generate the protos into our src tree, instead, we'd generate under target dir at pre-compile and then tell the compiler to pick up the protos along w/ src/main/java when it goes to compile. Means that a mvn clean will remove the generated protos. But no protos dirtying our src? (But IDEs will report missing files when you look at endpoints, etc.) > Remove checked in PB generated files > - > > Key: HBASE-17056 > URL: https://issues.apache.org/jira/browse/HBASE-17056 > Project: HBase > Issue Type: Bug >Reporter: Enis Soztutar > Fix For: 2.0.0 > > > Now that we have the new PB maven plugin, there is no need to have the PB > files checked in to the repo. The reason we did that was to ease up developer > env setup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-15484) Correct the semantic of batch and partial
[ https://issues.apache.org/jira/browse/HBASE-15484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656306#comment-15656306 ] Phil Yang edited comment on HBASE-15484 at 11/11/16 6:44 AM: - These days [~Apache9] is doing some work on async scan. It may be the time to reconsider this issue? In current implementation, we consider setBatch and setAllowPartialResults(true) as same meaning. My original idea is to distinguish them, but I agree with [~enis] that we can remove batch/cache. Like caching, batching may be also an old-style feature? We have allowPartialResults so we can use this to limit size/time for a large row. What do you think? Thanks. was (Author: yangzhe1991): These days [~Apache9] is doing some work on async scan. It may be the time to reconsider this issue? In current implementation, we consider setBatch and setAllowPartialResults(true) as same meaning. Like caching, batching may be also an old-style feature? We have allowPartialResults so we can use this to limit size/time for a large row. We should distinguish two methods or remove setBatch in 2.0? What do you think? Thanks. > Correct the semantic of batch and partial > - > > Key: HBASE-15484 > URL: https://issues.apache.org/jira/browse/HBASE-15484 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.1.3 >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0 > > Attachments: HBASE-15484-v1.patch, HBASE-15484-v2.patch, > HBASE-15484-v3.patch, HBASE-15484-v4.patch > > > Follow-up to HBASE-15325, as discussed, the meaning of setBatch and > setAllowPartialResults should not be same. We should not regard setBatch as > setAllowPartialResults. > And isPartial should be define accurately. > (Considering getBatch==MaxInt if we don't setBatch.) If > result.rawcells.length row, isPartial==true, otherwise isPartial == false. So if user don't > setAllowPartialResults(true), isPartial should always be false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC
[ https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656321#comment-15656321 ] ramkrishna.s.vasudevan commented on HBASE-17072: May be this is more helpful when the blocks are not cached rather than when they are cached? > CPU usage starts to climb up to 90-100% when using G1GC > --- > > Key: HBASE-17072 > URL: https://issues.apache.org/jira/browse/HBASE-17072 > Project: HBase > Issue Type: Bug > Components: Performance, regionserver >Affects Versions: 1.0.0, 1.2.0 >Reporter: Eiichi Sato > Attachments: disable-block-header-cache.patch, mat-threadlocals.png, > mat-threads.png, metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg > > > h5. Problem > CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts > to gradually get higher up to nearly 90-100% when using G1GC. We've also run > into this problem on CDH 5.7.3 and CDH 5.8.2. > In our production cluster, it normally takes a few weeks for this to happen > after restarting a RS. We reproduced this on our test cluster and attached > the results. Please note that, to make it easy to reproduce, we did some > "anti-tuning" on a table when running tests. > In metrics.png, soon after we started running some workloads against a test > cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. > Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of > each RS process around 10:30 a.m. the next day. > After investigating heapdumps from another occurrence on a test cluster > running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of > contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary > clustering. This caused more loops in > {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU > time. What is worse is that the method is called from RPC metrics code, > which means even a small amount of per-RPC time soon adds up to a huge amount > of CPU time. > This is very similar to the issue in HBASE-16616, but we have many > {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances. > Here are some OQL counts from Eclipse Memory Analyzer (MAT). This shows a > number of ThreadLocal instances in the ThreadLocalMap of a single handler > thread. > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = > "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader" > #=> 10980 instances > {code} > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder" > #=> 2052 instances > {code} > Although as described in HBASE-16616 this somewhat seems to be an issue in > G1GC side regarding weakly-reachable objects, we should keep ThreadLocal > usage minimal and avoid creating an indefinite number (in this case, a number > of HFiles) of ThreadLocal instances. > HBASE-16146 removes ThreadLocals from the RPC metrics code. That may solve > the issue (I just saw the patch, never tested it at all), but the > {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which > may cause issues in the future again. > h5. Our Solution > We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and > fortunately we didn't notice any performance degradation for our production > workloads. > Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are > handled randomly in any of the handlers, small Get or small Scan RPCs do not > benefit from the caching (See HBASE-10676 and HBASE-11402 for the details). > Probably, we need to see how well reads are saved by the caching for large > Scan or Get RPCs and especially for compactions if we really remove the > caching. It's probably better if we can remove ThreadLocals without breaking > the current caching behavior. > FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15484) Correct the semantic of batch and partial
[ https://issues.apache.org/jira/browse/HBASE-15484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656313#comment-15656313 ] Hadoop QA commented on HBASE-15484: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s {color} | {color:red} HBASE-15484 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12797329/HBASE-15484-v4.patch | | JIRA Issue | HBASE-15484 | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/4432/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Correct the semantic of batch and partial > - > > Key: HBASE-15484 > URL: https://issues.apache.org/jira/browse/HBASE-15484 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.1.3 >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0 > > Attachments: HBASE-15484-v1.patch, HBASE-15484-v2.patch, > HBASE-15484-v3.patch, HBASE-15484-v4.patch > > > Follow-up to HBASE-15325, as discussed, the meaning of setBatch and > setAllowPartialResults should not be same. We should not regard setBatch as > setAllowPartialResults. > And isPartial should be define accurately. > (Considering getBatch==MaxInt if we don't setBatch.) If > result.rawcells.length row, isPartial==true, otherwise isPartial == false. So if user don't > setAllowPartialResults(true), isPartial should always be false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17071) Do not initialize MemstoreChunkPool when use mslab option is turned off
[ https://issues.apache.org/jira/browse/HBASE-17071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656311#comment-15656311 ] Yu Li commented on HBASE-17071: --- +1 for the idea, makes sense. > Do not initialize MemstoreChunkPool when use mslab option is turned off > --- > > Key: HBASE-17071 > URL: https://issues.apache.org/jira/browse/HBASE-17071 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > > This is a 2.0 only issue and induced by HBASE-16407. > We are initializing MSLAB chunk pool along with RS start itself now. (To pass > it as a HeapMemoryTuneObserver). > When MSLAB is turned off (ie. hbase.hregion.memstore.mslab.enabled is > configured false) we should not be initializing MSLAB chunk pool at all. By > default the initial chunk count to be created will be 0 only. Still better > to avoid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15484) Correct the semantic of batch and partial
[ https://issues.apache.org/jira/browse/HBASE-15484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656306#comment-15656306 ] Phil Yang commented on HBASE-15484: --- These days [~Apache9] is doing some work on async scan. It may be the time to reconsider this issue? In current implementation, we consider setBatch and setAllowPartialResults(true) as same meaning. Like caching, batching may be also an old-style feature? We have allowPartialResults so we can use this to limit size/time for a large row. We should distinguish two methods or remove setBatch in 2.0? What do you think? Thanks. > Correct the semantic of batch and partial > - > > Key: HBASE-15484 > URL: https://issues.apache.org/jira/browse/HBASE-15484 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.1.3 >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0 > > Attachments: HBASE-15484-v1.patch, HBASE-15484-v2.patch, > HBASE-15484-v3.patch, HBASE-15484-v4.patch > > > Follow-up to HBASE-15325, as discussed, the meaning of setBatch and > setAllowPartialResults should not be same. We should not regard setBatch as > setAllowPartialResults. > And isPartial should be define accurately. > (Considering getBatch==MaxInt if we don't setBatch.) If > result.rawcells.length row, isPartial==true, otherwise isPartial == false. So if user don't > setAllowPartialResults(true), isPartial should always be false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC
[ https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656297#comment-15656297 ] Anoop Sam John commented on HBASE-17072: So the prefetching of the new block's header is not useful at all as per the patch. Ya in case of compaction it would have been beneficial at least. I agree to the point of handling of RPCs by the random handler. Large scans might be impacted. Single RPC next itself touching more than one block. Nice digging in.. > CPU usage starts to climb up to 90-100% when using G1GC > --- > > Key: HBASE-17072 > URL: https://issues.apache.org/jira/browse/HBASE-17072 > Project: HBase > Issue Type: Bug > Components: Performance, regionserver >Affects Versions: 1.0.0, 1.2.0 >Reporter: Eiichi Sato > Attachments: disable-block-header-cache.patch, mat-threadlocals.png, > mat-threads.png, metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg > > > h5. Problem > CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts > to gradually get higher up to nearly 90-100% when using G1GC. We've also run > into this problem on CDH 5.7.3 and CDH 5.8.2. > In our production cluster, it normally takes a few weeks for this to happen > after restarting a RS. We reproduced this on our test cluster and attached > the results. Please note that, to make it easy to reproduce, we did some > "anti-tuning" on a table when running tests. > In metrics.png, soon after we started running some workloads against a test > cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. > Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of > each RS process around 10:30 a.m. the next day. > After investigating heapdumps from another occurrence on a test cluster > running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of > contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary > clustering. This caused more loops in > {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU > time. What is worse is that the method is called from RPC metrics code, > which means even a small amount of per-RPC time soon adds up to a huge amount > of CPU time. > This is very similar to the issue in HBASE-16616, but we have many > {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances. > Here are some OQL counts from Eclipse Memory Analyzer (MAT). This shows a > number of ThreadLocal instances in the ThreadLocalMap of a single handler > thread. > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = > "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader" > #=> 10980 instances > {code} > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder" > #=> 2052 instances > {code} > Although as described in HBASE-16616 this somewhat seems to be an issue in > G1GC side regarding weakly-reachable objects, we should keep ThreadLocal > usage minimal and avoid creating an indefinite number (in this case, a number > of HFiles) of ThreadLocal instances. > HBASE-16146 removes ThreadLocals from the RPC metrics code. That may solve > the issue (I just saw the patch, never tested it at all), but the > {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which > may cause issues in the future again. > h5. Our Solution > We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and > fortunately we didn't notice any performance degradation for our production > workloads. > Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are > handled randomly in any of the handlers, small Get or small Scan RPCs do not > benefit from the caching (See HBASE-10676 and HBASE-11402 for the details). > Probably, we need to see how well reads are saved by the caching for large > Scan or Get RPCs and especially for compactions if we really remove the > caching. It's probably better if we can remove ThreadLocals without breaking > the current caching behavior. > FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17047) Add an API to get HBase connection cache statistics
[ https://issues.apache.org/jira/browse/HBASE-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656291#comment-15656291 ] Weiqing Yang commented on HBASE-17047: -- Thanks for the review, [~stack] > Add an API to get HBase connection cache statistics > --- > > Key: HBASE-17047 > URL: https://issues.apache.org/jira/browse/HBASE-17047 > Project: HBase > Issue Type: Improvement > Components: spark >Reporter: Weiqing Yang >Assignee: Weiqing Yang >Priority: Minor > Attachments: HBASE-17047_v1.patch, HBASE-17047_v2.patch > > > This patch will add a function "getStat" for the user to get the statistics > of the HBase connection cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17047) Add an API to get HBase connection cache statistics
[ https://issues.apache.org/jira/browse/HBASE-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656290#comment-15656290 ] Weiqing Yang commented on HBASE-17047: -- The second patch is to fix the scaladoc warning "warning: Could not find any member to link for 'HBaseConnectionCache'". > Add an API to get HBase connection cache statistics > --- > > Key: HBASE-17047 > URL: https://issues.apache.org/jira/browse/HBASE-17047 > Project: HBase > Issue Type: Improvement > Components: spark >Reporter: Weiqing Yang >Assignee: Weiqing Yang >Priority: Minor > Attachments: HBASE-17047_v1.patch, HBASE-17047_v2.patch > > > This patch will add a function "getStat" for the user to get the statistics > of the HBase connection cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17020) keylen in midkey() dont computed correctly
[ https://issues.apache.org/jira/browse/HBASE-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656282#comment-15656282 ] Hudson commented on HBASE-17020: SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #69 (See [https://builds.apache.org/job/HBase-1.2-JDK7/69/]) HBASE-17020 keylen in midkey() dont computed correctly (liyu: rev bf9614f72e3104ec0110ed018fb0b6d0174c6366) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java > keylen in midkey() dont computed correctly > -- > > Key: HBASE-17020 > URL: https://issues.apache.org/jira/browse/HBASE-17020 > Project: HBase > Issue Type: Bug > Components: HFile >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Yu Sun >Assignee: Yu Sun > Fix For: 2.0.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8 > > Attachments: HBASE-17020-branch-0.98.patch, HBASE-17020-v1.patch, > HBASE-17020-v2.patch, HBASE-17020-v2.patch, HBASE-17020-v3-branch1.1.patch, > HBASE-17020.branch-0.98.patch, HBASE-17020.branch-0.98.patch, > HBASE-17020.branch-1.1.patch > > > in CellBasedKeyBlockIndexReader.midkey(): > {code} > ByteBuff b = midLeafBlock.getBufferWithoutHeader(); > int numDataBlocks = b.getIntAfterPosition(0); > int keyRelOffset = b.getIntAfterPosition(Bytes.SIZEOF_INT * > (midKeyEntry + 1)); > int keyLen = b.getIntAfterPosition(Bytes.SIZEOF_INT * (midKeyEntry > + 2)) - keyRelOffset; > {code} > the local varible keyLen get this should be total length of: > SECONDARY_INDEX_ENTRY_OVERHEAD + firstKey.length; > the code is: > {code} > void add(byte[] firstKey, long blockOffset, int onDiskDataSize, > long curTotalNumSubEntries) { > // Record the offset for the secondary index > secondaryIndexOffsetMarks.add(curTotalNonRootEntrySize); > curTotalNonRootEntrySize += SECONDARY_INDEX_ENTRY_OVERHEAD > + firstKey.length; > {code} > when the midkey last entry of a leaf-level index block, this may throw: > {quote} > 2016-10-01 12:27:55,186 ERROR [MemStoreFlusher.0] > regionserver.MemStoreFlusher: Cache flusher failed for entry [flush region > pora_6_item_feature,0061:,1473838922457.12617bc4ebbfd171018bf96ac9bdd2a7.] > java.lang.ArrayIndexOutOfBoundsException > at > org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:936) > at > org.apache.hadoop.hbase.nio.SingleByteBuff.toBytes(SingleByteBuff.java:303) > at > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.midkey(HFileBlockIndex.java:419) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.midkey(HFileReaderImpl.java:1519) > at > org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1520) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getFileSplitPoint(StoreFile.java:706) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFileManager.getSplitPoint(DefaultStoreFileManager.java:126) > at > org.apache.hadoop.hbase.regionserver.HStore.getSplitPoint(HStore.java:1983) > at > org.apache.hadoop.hbase.regionserver.ConstantFamilySizeRegionSplitPolicy.getSplitPoint(ConstantFamilySizeRegionSplitPolicy.java:77) > at > org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:7756) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:513) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) > at java.lang.Thread.run(Thread.java:756) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17047) Add an API to get HBase connection cache statistics
[ https://issues.apache.org/jira/browse/HBASE-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiqing Yang updated HBASE-17047: - Attachment: HBASE-17047_v2.patch > Add an API to get HBase connection cache statistics > --- > > Key: HBASE-17047 > URL: https://issues.apache.org/jira/browse/HBASE-17047 > Project: HBase > Issue Type: Improvement > Components: spark >Reporter: Weiqing Yang >Assignee: Weiqing Yang >Priority: Minor > Attachments: HBASE-17047_v1.patch, HBASE-17047_v2.patch > > > This patch will add a function "getStat" for the user to get the statistics > of the HBase connection cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16838) Implement basic scan
[ https://issues.apache.org/jira/browse/HBASE-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656277#comment-15656277 ] Duo Zhang commented on HBASE-16838: --- {quote} Is a Scan always against a Region (is the Region redundant?). {quote} A scan can cross regions, the AsyncScanRegionRpcRetryingCaller is only used to scan one region. During a scan we may create many AsyncScanRegionRpcRetryingCallers. AsyncScanOneRegionRpcRetryingCaller maybe better? {quote} Looking at the Response, we need Scan in there? The Scan in Response is different from originalScan? {quote} They are the same. But I think it is a little confusing that we use the originalScan in AsyncClientScanner but modify it at another place. But anyway, we do change it so people can not use it as 'original'... Let me add some javadoc here... {quote} I think you should stick the above comment on the scan timeout so it is clear what the scan timeout means. It helps. {quote} I've add some comments in AsyncConnectionConfiguration. Let me add the above comments too. {quote} Is there example code on how I'd do an async Scan? I create a ScanConsumer and pass it in then it will get called with Results as the Scan progresses? The AsyncTable#scan returns immediately? Perhaps stick it in javadoc for the scan method? Is SimpleScanObserver a good example or just a stop gap with its queue? {quote} I plan to introduce a example when implementing getScanner, where I plan to add flow control support. This method is used to write high performance event-driven program so it is not very user friendly... [~carp84] also claimed that even for other method such as get and put, complete the CompletableFuture insdie the thread of the rpc framework is not safe as user may also do time consuming work when consuming the CompletableFuture. Maybe we need a new 'SafeAsyncTable' interface? Or change the name of this interface to 'RawAsyncTable' or 'UnsafeAsyncTable'? As the async client is still marked as Unstable, I think we can do this in a follow on issue. {quote} Dont kill me but should ScanConsumer be ScanResultConsumer (can do in followup if makes sense) or just ScanResult? {quote} I think it should be ScanResultConsumer as I've already introduce a 'ScanResultCache'(What's wrong with my brain...) {quote} CompleteResultScanResultCache should be CompleteScanResultCache to match AllowPartialScanResultCache? {quote} Fine. Will change. > Implement basic scan > > > Key: HBASE-16838 > URL: https://issues.apache.org/jira/browse/HBASE-16838 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-16838-v1.patch, HBASE-16838-v2.patch, > HBASE-16838-v3.patch, HBASE-16838.patch > > > Implement a scan works like the grpc streaming call that all returned results > will be passed to a ScanConsumer. The methods of the consumer will be called > directly in the rpc framework threads so it is not allowed to do time > consuming work in the methods. So in general only experts or the > implementation of other methods in AsyncTable can call this method directly, > that's why I call it 'basic scan'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC
[ https://issues.apache.org/jira/browse/HBASE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eiichi Sato updated HBASE-17072: Attachment: mat-threadlocals.png metrics.png disable-block-header-cache.patch mat-threads.png slave1.svg slave2.svg slave3.svg slave4.svg > CPU usage starts to climb up to 90-100% when using G1GC > --- > > Key: HBASE-17072 > URL: https://issues.apache.org/jira/browse/HBASE-17072 > Project: HBase > Issue Type: Bug > Components: Performance, regionserver >Affects Versions: 1.0.0, 1.2.0 >Reporter: Eiichi Sato > Attachments: disable-block-header-cache.patch, mat-threadlocals.png, > mat-threads.png, metrics.png, slave1.svg, slave2.svg, slave3.svg, slave4.svg > > > h5. Problem > CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts > to gradually get higher up to nearly 90-100% when using G1GC. We've also run > into this problem on CDH 5.7.3 and CDH 5.8.2. > In our production cluster, it normally takes a few weeks for this to happen > after restarting a RS. We reproduced this on our test cluster and attached > the results. Please note that, to make it easy to reproduce, we did some > "anti-tuning" on a table when running tests. > In metrics.png, soon after we started running some workloads against a test > cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. > Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of > each RS process around 10:30 a.m. the next day. > After investigating heapdumps from another occurrence on a test cluster > running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of > contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary > clustering. This caused more loops in > {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU > time. What is worse is that the method is called from RPC metrics code, > which means even a small amount of per-RPC time soon adds up to a huge amount > of CPU time. > This is very similar to the issue in HBASE-16616, but we have many > {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances. > Here are some OQL counts from Eclipse Memory Analyzer (MAT). This shows a > number of ThreadLocal instances in the ThreadLocalMap of a single handler > thread. > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = > "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader" > #=> 10980 instances > {code} > {code} > SELECT * > FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value > FROM OBJECTS 0x4ee380430) obj > WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder" > #=> 2052 instances > {code} > Although as described in HBASE-16616 this somewhat seems to be an issue in > G1GC side regarding weakly-reachable objects, we should keep ThreadLocal > usage minimal and avoid creating an indefinite number (in this case, a number > of HFiles) of ThreadLocal instances. > HBASE-16146 removes ThreadLocals from the RPC metrics code. That may solve > the issue (I just saw the patch, never tested it at all), but the > {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which > may cause issues in the future again. > h5. Our Solution > We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and > fortunately we didn't notice any performance degradation for our production > workloads. > Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are > handled randomly in any of the handlers, small Get or small Scan RPCs do not > benefit from the caching (See HBASE-10676 and HBASE-11402 for the details). > Probably, we need to see how well reads are saved by the caching for large > Scan or Get RPCs and especially for compactions if we really remove the > caching. It's probably better if we can remove ThreadLocals without breaking > the current caching behavior. > FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17072) CPU usage starts to climb up to 90-100% when using G1GC
Eiichi Sato created HBASE-17072: --- Summary: CPU usage starts to climb up to 90-100% when using G1GC Key: HBASE-17072 URL: https://issues.apache.org/jira/browse/HBASE-17072 Project: HBase Issue Type: Bug Components: Performance, regionserver Affects Versions: 1.2.0, 1.0.0 Reporter: Eiichi Sato h5. Problem CPU usage of a region server in our CDH 5.4.5 cluster, at some point, starts to gradually get higher up to nearly 90-100% when using G1GC. We've also run into this problem on CDH 5.7.3 and CDH 5.8.2. In our production cluster, it normally takes a few weeks for this to happen after restarting a RS. We reproduced this on our test cluster and attached the results. Please note that, to make it easy to reproduce, we did some "anti-tuning" on a table when running tests. In metrics.png, soon after we started running some workloads against a test cluster (CDH 5.8.2) at about 7 p.m. CPU usage of the two RSs started to rise. Flame Graphs (slave1.svg to slave4.svg) are generated from jstack dumps of each RS process around 10:30 a.m. the next day. After investigating heapdumps from another occurrence on a test cluster running CDH 5.7.3, we found that the ThreadLocalMap contain a lot of contiguous entries of {{HFileBlock$PrefetchedHeader}} probably due to primary clustering. This caused more loops in {{ThreadLocalMap#expungeStaleEntries()}}, consuming a certain amount of CPU time. What is worse is that the method is called from RPC metrics code, which means even a small amount of per-RPC time soon adds up to a huge amount of CPU time. This is very similar to the issue in HBASE-16616, but we have many {{HFileBlock$PrefetchedHeader}} not only {{Counter$IndexHolder}} instances. Here are some OQL counts from Eclipse Memory Analyzer (MAT). This shows a number of ThreadLocal instances in the ThreadLocalMap of a single handler thread. {code} SELECT * FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value FROM OBJECTS 0x4ee380430) obj WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.io.hfile.HFileBlock$PrefetchedHeader" #=> 10980 instances {code} {code} SELECT * FROM OBJECTS (SELECT AS RETAINED SET OBJECTS value FROM OBJECTS 0x4ee380430) obj WHERE obj.@clazz.@name = "org.apache.hadoop.hbase.util.Counter$IndexHolder" #=> 2052 instances {code} Although as described in HBASE-16616 this somewhat seems to be an issue in G1GC side regarding weakly-reachable objects, we should keep ThreadLocal usage minimal and avoid creating an indefinite number (in this case, a number of HFiles) of ThreadLocal instances. HBASE-16146 removes ThreadLocals from the RPC metrics code. That may solve the issue (I just saw the patch, never tested it at all), but the {{HFileBlock$PrefetchedHeader}} are still there in the ThreadLocalMap, which may cause issues in the future again. h5. Our Solution We simply removed the whole {{HFileBlock$PrefetchedHeader}} caching and fortunately we didn't notice any performance degradation for our production workloads. Because the PrefetchedHeader caching uses ThreadLocal and because RPCs are handled randomly in any of the handlers, small Get or small Scan RPCs do not benefit from the caching (See HBASE-10676 and HBASE-11402 for the details). Probably, we need to see how well reads are saved by the caching for large Scan or Get RPCs and especially for compactions if we really remove the caching. It's probably better if we can remove ThreadLocals without breaking the current caching behavior. FWIW, I'm attaching the patch we applied. It's for CDH 5.4.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17062) RegionSplitter throws ClassCastException
[ https://issues.apache.org/jira/browse/HBASE-17062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656268#comment-15656268 ] Hadoop QA commented on HBASE-17062: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 31s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 50s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 29m 18s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 46s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 119m 0s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Timed out junit tests | org.apache.hadoop.hbase.security.access.TestCellACLWithMultipleVersions | | | org.apache.hadoop.hbase.security.access.TestAccessController | | | org.apache.hadoop.hbase.security.access.TestTablePermissions | | | org.apache.hadoop.hbase.security.access.TestScanEarlyTermination | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:7bda515 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838498/HBASE-17062.003.patch | | JIRA Issue | HBASE-17062 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 84868f85ec0c 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 62e3b1e | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/4430/artifact/patchprocess/patch-unit-hbase-server.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HBASE-Build/4430/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/4430/testReport/ | | modules | C:
[jira] [Commented] (HBASE-16962) Add readPoint to preCompactScannerOpen() and preFlushScannerOpen() API
[ https://issues.apache.org/jira/browse/HBASE-16962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656255#comment-15656255 ] stack commented on HBASE-16962: --- Don't think so. Open one I'd say. > Add readPoint to preCompactScannerOpen() and preFlushScannerOpen() API > -- > > Key: HBASE-16962 > URL: https://issues.apache.org/jira/browse/HBASE-16962 > Project: HBase > Issue Type: Bug >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16956.branch-1.001.patch, > HBASE-16956.master.006.patch, HBASE-16962.master.001.patch, > HBASE-16962.master.002.patch, HBASE-16962.master.003.patch, > HBASE-16962.master.004.patch, HBASE-16962.rough.patch > > > Similar to HBASE-15759, I would like to add readPoint to the > preCompactScannerOpen() API. > I have a CP where I create a StoreScanner() as part of the > preCompactScannerOpen() API. I need the readpoint which was obtained in > Compactor.compact() method to be consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17047) Add an API to get HBase connection cache statistics
[ https://issues.apache.org/jira/browse/HBASE-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656243#comment-15656243 ] stack commented on HBASE-17047: --- +1 for commit. Thanks. > Add an API to get HBase connection cache statistics > --- > > Key: HBASE-17047 > URL: https://issues.apache.org/jira/browse/HBASE-17047 > Project: HBase > Issue Type: Improvement > Components: spark >Reporter: Weiqing Yang >Assignee: Weiqing Yang >Priority: Minor > Attachments: HBASE-17047_v1.patch > > > This patch will add a function "getStat" for the user to get the statistics > of the HBase connection cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16962) Add readPoint to preCompactScannerOpen() and preFlushScannerOpen() API
[ https://issues.apache.org/jira/browse/HBASE-16962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-16962: -- Hadoop Flags: Reviewed (was: Incompatible change,Reviewed) When is it safe to remove the old APIs? > Add readPoint to preCompactScannerOpen() and preFlushScannerOpen() API > -- > > Key: HBASE-16962 > URL: https://issues.apache.org/jira/browse/HBASE-16962 > Project: HBase > Issue Type: Bug >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16956.branch-1.001.patch, > HBASE-16956.master.006.patch, HBASE-16962.master.001.patch, > HBASE-16962.master.002.patch, HBASE-16962.master.003.patch, > HBASE-16962.master.004.patch, HBASE-16962.rough.patch > > > Similar to HBASE-15759, I would like to add readPoint to the > preCompactScannerOpen() API. > I have a CP where I create a StoreScanner() as part of the > preCompactScannerOpen() API. I need the readpoint which was obtained in > Compactor.compact() method to be consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16962) Add readPoint to preCompactScannerOpen() and preFlushScannerOpen() API
[ https://issues.apache.org/jira/browse/HBASE-16962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656240#comment-15656240 ] stack commented on HBASE-16962: --- True. > Add readPoint to preCompactScannerOpen() and preFlushScannerOpen() API > -- > > Key: HBASE-16962 > URL: https://issues.apache.org/jira/browse/HBASE-16962 > Project: HBase > Issue Type: Bug >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16956.branch-1.001.patch, > HBASE-16956.master.006.patch, HBASE-16962.master.001.patch, > HBASE-16962.master.002.patch, HBASE-16962.master.003.patch, > HBASE-16962.master.004.patch, HBASE-16962.rough.patch > > > Similar to HBASE-15759, I would like to add readPoint to the > preCompactScannerOpen() API. > I have a CP where I create a StoreScanner() as part of the > preCompactScannerOpen() API. I need the readpoint which was obtained in > Compactor.compact() method to be consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14882) Provide a Put API that adds the provided family, qualifier, value without copying
[ https://issues.apache.org/jira/browse/HBASE-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656231#comment-15656231 ] Anoop Sam John commented on HBASE-14882: Sorry missed this some how. Will look at the latest patch today. Thanks. > Provide a Put API that adds the provided family, qualifier, value without > copying > - > > Key: HBASE-14882 > URL: https://issues.apache.org/jira/browse/HBASE-14882 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Jerry He >Assignee: Xiang Li > Fix For: 2.0.0 > > Attachments: HBASE-14882.master.000.patch, > HBASE-14882.master.001.patch, HBASE-14882.master.002.patch, > HBASE-14882.master.003.patch > > > In the Put API, we have addImmutable() > {code} > /** >* See {@link #addColumn(byte[], byte[], byte[])}. This version expects >* that the underlying arrays won't change. It's intended >* for usage internal HBase to and for advanced client applications. >*/ > public Put addImmutable(byte [] family, byte [] qualifier, byte [] value) > {code} > But in the implementation, the family, qualifier and value are still being > copied locally to create kv. > Hopefully we should provide an API that truly uses immutable family, > qualifier and value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16962) Add readPoint to preCompactScannerOpen() and preFlushScannerOpen() API
[ https://issues.apache.org/jira/browse/HBASE-16962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656226#comment-15656226 ] Anoop Sam John commented on HBASE-16962: Changed the release notes a little. It is not that we wont call the old CP API at all.. If the user has not implemented the new hook but the old one, we will still call the old one. The BaseRegionObserver will do the old API call from its default impl of the new one. So this is not an incompatible change. [~saint@gmail.com] We can remove that marker also I believe. > Add readPoint to preCompactScannerOpen() and preFlushScannerOpen() API > -- > > Key: HBASE-16962 > URL: https://issues.apache.org/jira/browse/HBASE-16962 > Project: HBase > Issue Type: Bug >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16956.branch-1.001.patch, > HBASE-16956.master.006.patch, HBASE-16962.master.001.patch, > HBASE-16962.master.002.patch, HBASE-16962.master.003.patch, > HBASE-16962.master.004.patch, HBASE-16962.rough.patch > > > Similar to HBASE-15759, I would like to add readPoint to the > preCompactScannerOpen() API. > I have a CP where I create a StoreScanner() as part of the > preCompactScannerOpen() API. I need the readpoint which was obtained in > Compactor.compact() method to be consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16838) Implement basic scan
[ https://issues.apache.org/jira/browse/HBASE-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656227#comment-15656227 ] stack commented on HBASE-16838: --- Looking. Its much nicer now I think. We can work on the one-rpc small scan over in the linked issue. Pity names get to be like this: AsyncScanRegionRpcRetryingCaller... which runs a AsyncScanRegionRpcRetryingCallables. The RpcRetryingCaller stuff and RpcRetryingCallable was all there before you so not your fault... just saying. Is a Scan always against a Region (is the Region redundant?). Looking at the Response, we need Scan in there? The Scan in Response is different from originalScan? nvm... I see this Scan carries state of the general Scan as we progress. locateToPreviousRegion is when a reverse Scan? Thanks for comment on scan timeout. So scan timeout is different to operation timeout? It is at least according to your comment up in rb: "As now we have heartbeat support for scan, ideally a scan will never timeout unless the RS is crash. The RS will always return something before the rpc timeout or scan timeout to tell the client that it is still alive. The scan timeout is used as operation timeout for every operations in a scan, such as openScanner or next." I think you should stick the above comment on the scan timeout so it is clear what the scan timeout means. It helps. Update doc on commit: * The basic scan API uses the observer pattern. All results that match the given scan object will 356* be passed to the given {@code scanObserver} by calling {@link ScanConsumer#onNext(Result[])}. ... you changed observer to be a consumer. Is there example code on how I'd do an async Scan? I create a ScanConsumer and pass it in then it will get called with Results as the Scan progresses? The AsyncTable#scan returns immediately? Perhaps stick it in javadoc for the scan method? Is SimpleScanObserver a good example or just a stop gap with its queue? Dont kill me but should ScanConsumer be ScanResultConsumer (can do in followup if makes sense) or just ScanResult? CompleteResultScanResultCache should be CompleteScanResultCache to match AllowPartialScanResultCache? I'd be good committing this as is and addressing what remains in follow-on. +1 > Implement basic scan > > > Key: HBASE-16838 > URL: https://issues.apache.org/jira/browse/HBASE-16838 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-16838-v1.patch, HBASE-16838-v2.patch, > HBASE-16838-v3.patch, HBASE-16838.patch > > > Implement a scan works like the grpc streaming call that all returned results > will be passed to a ScanConsumer. The methods of the consumer will be called > directly in the rpc framework threads so it is not allowed to do time > consuming work in the methods. So in general only experts or the > implementation of other methods in AsyncTable can call this method directly, > that's why I call it 'basic scan'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16962) Add readPoint to preCompactScannerOpen() and preFlushScannerOpen() API
[ https://issues.apache.org/jira/browse/HBASE-16962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-16962: --- Release Note: The following RegionObserver methods are deprecated InternalScanner preFlushScannerOpen(final ObserverContext c, final Store store, final KeyValueScanner memstoreScanner, final InternalScanner s) throws IOException; InternalScanner preCompactScannerOpen(final ObserverContext c, final Store store, List scanners, final ScanType scanType, final long earliestPutTs, final InternalScanner s, CompactionRequest request) Instead, use the following methods: InternalScanner preFlushScannerOpen(final ObserverContext c, final Store store, final KeyValueScanner memstoreScanner, final InternalScanner s, final long readPoint) throws IOException; InternalScanner preCompactScannerOpen(final ObserverContext c, final Store store, List scanners, final ScanType scanType, final long earliestPutTs, final InternalScanner s, final CompactionRequest request, final long readPoint) throws IOException was: The following RegionObserver methods are deprecated and would no longer be called in hbase 2.0: InternalScanner preFlushScannerOpen(final ObserverContext c, final Store store, final KeyValueScanner memstoreScanner, final InternalScanner s) throws IOException; InternalScanner preCompactScannerOpen(final ObserverContext c, final Store store, List scanners, final ScanType scanType, final long earliestPutTs, final InternalScanner s, CompactionRequest request) Instead, use the following methods: InternalScanner preFlushScannerOpen(final ObserverContext c, final Store store, final KeyValueScanner memstoreScanner, final InternalScanner s, final long readPoint) throws IOException; InternalScanner preCompactScannerOpen(final ObserverContext c, final Store store, List scanners, final ScanType scanType, final long earliestPutTs, final InternalScanner s, final CompactionRequest request, final long readPoint) throws IOException > Add readPoint to preCompactScannerOpen() and preFlushScannerOpen() API > -- > > Key: HBASE-16962 > URL: https://issues.apache.org/jira/browse/HBASE-16962 > Project: HBase > Issue Type: Bug >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16956.branch-1.001.patch, > HBASE-16956.master.006.patch, HBASE-16962.master.001.patch, > HBASE-16962.master.002.patch, HBASE-16962.master.003.patch, > HBASE-16962.master.004.patch, HBASE-16962.rough.patch > > > Similar to HBASE-15759, I would like to add readPoint to the > preCompactScannerOpen() API. > I have a CP where I create a StoreScanner() as part of the > preCompactScannerOpen() API. I need the readpoint which was obtained in > Compactor.compact() method to be consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16962) Add readPoint to preCompactScannerOpen() and preFlushScannerOpen() API
[ https://issues.apache.org/jira/browse/HBASE-16962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-16962: --- Resolution: Fixed Hadoop Flags: Incompatible change,Reviewed (was: Incompatible change) Status: Resolved (was: Patch Available) Pushed to branch-1 and master. Thanks for the patch. Thanks all for the reviews and discussion. > Add readPoint to preCompactScannerOpen() and preFlushScannerOpen() API > -- > > Key: HBASE-16962 > URL: https://issues.apache.org/jira/browse/HBASE-16962 > Project: HBase > Issue Type: Bug >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16956.branch-1.001.patch, > HBASE-16956.master.006.patch, HBASE-16962.master.001.patch, > HBASE-16962.master.002.patch, HBASE-16962.master.003.patch, > HBASE-16962.master.004.patch, HBASE-16962.rough.patch > > > Similar to HBASE-15759, I would like to add readPoint to the > preCompactScannerOpen() API. > I have a CP where I create a StoreScanner() as part of the > preCompactScannerOpen() API. I need the readpoint which was obtained in > Compactor.compact() method to be consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17020) keylen in midkey() dont computed correctly
[ https://issues.apache.org/jira/browse/HBASE-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656203#comment-15656203 ] Hudson commented on HBASE-17020: SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #63 (See [https://builds.apache.org/job/HBase-1.2-JDK8/63/]) HBASE-17020 keylen in midkey() dont computed correctly (liyu: rev bf9614f72e3104ec0110ed018fb0b6d0174c6366) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java > keylen in midkey() dont computed correctly > -- > > Key: HBASE-17020 > URL: https://issues.apache.org/jira/browse/HBASE-17020 > Project: HBase > Issue Type: Bug > Components: HFile >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Yu Sun >Assignee: Yu Sun > Fix For: 2.0.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8 > > Attachments: HBASE-17020-branch-0.98.patch, HBASE-17020-v1.patch, > HBASE-17020-v2.patch, HBASE-17020-v2.patch, HBASE-17020-v3-branch1.1.patch, > HBASE-17020.branch-0.98.patch, HBASE-17020.branch-0.98.patch, > HBASE-17020.branch-1.1.patch > > > in CellBasedKeyBlockIndexReader.midkey(): > {code} > ByteBuff b = midLeafBlock.getBufferWithoutHeader(); > int numDataBlocks = b.getIntAfterPosition(0); > int keyRelOffset = b.getIntAfterPosition(Bytes.SIZEOF_INT * > (midKeyEntry + 1)); > int keyLen = b.getIntAfterPosition(Bytes.SIZEOF_INT * (midKeyEntry > + 2)) - keyRelOffset; > {code} > the local varible keyLen get this should be total length of: > SECONDARY_INDEX_ENTRY_OVERHEAD + firstKey.length; > the code is: > {code} > void add(byte[] firstKey, long blockOffset, int onDiskDataSize, > long curTotalNumSubEntries) { > // Record the offset for the secondary index > secondaryIndexOffsetMarks.add(curTotalNonRootEntrySize); > curTotalNonRootEntrySize += SECONDARY_INDEX_ENTRY_OVERHEAD > + firstKey.length; > {code} > when the midkey last entry of a leaf-level index block, this may throw: > {quote} > 2016-10-01 12:27:55,186 ERROR [MemStoreFlusher.0] > regionserver.MemStoreFlusher: Cache flusher failed for entry [flush region > pora_6_item_feature,0061:,1473838922457.12617bc4ebbfd171018bf96ac9bdd2a7.] > java.lang.ArrayIndexOutOfBoundsException > at > org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:936) > at > org.apache.hadoop.hbase.nio.SingleByteBuff.toBytes(SingleByteBuff.java:303) > at > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.midkey(HFileBlockIndex.java:419) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.midkey(HFileReaderImpl.java:1519) > at > org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1520) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getFileSplitPoint(StoreFile.java:706) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFileManager.getSplitPoint(DefaultStoreFileManager.java:126) > at > org.apache.hadoop.hbase.regionserver.HStore.getSplitPoint(HStore.java:1983) > at > org.apache.hadoop.hbase.regionserver.ConstantFamilySizeRegionSplitPolicy.getSplitPoint(ConstantFamilySizeRegionSplitPolicy.java:77) > at > org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:7756) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:513) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) > at java.lang.Thread.run(Thread.java:756) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16962) Add readPoint to preCompactScannerOpen() and preFlushScannerOpen() API
[ https://issues.apache.org/jira/browse/HBASE-16962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656189#comment-15656189 ] Anoop Sam John commented on HBASE-16962: Am not sure. [~saint@gmail.com], [~mbertozzi] ? > Add readPoint to preCompactScannerOpen() and preFlushScannerOpen() API > -- > > Key: HBASE-16962 > URL: https://issues.apache.org/jira/browse/HBASE-16962 > Project: HBase > Issue Type: Bug >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16956.branch-1.001.patch, > HBASE-16956.master.006.patch, HBASE-16962.master.001.patch, > HBASE-16962.master.002.patch, HBASE-16962.master.003.patch, > HBASE-16962.master.004.patch, HBASE-16962.rough.patch > > > Similar to HBASE-15759, I would like to add readPoint to the > preCompactScannerOpen() API. > I have a CP where I create a StoreScanner() as part of the > preCompactScannerOpen() API. I need the readpoint which was obtained in > Compactor.compact() method to be consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16838) Implement basic scan
[ https://issues.apache.org/jira/browse/HBASE-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656185#comment-15656185 ] Duo Zhang commented on HBASE-16838: --- So what's your option on the patch sir? Thanks. > Implement basic scan > > > Key: HBASE-16838 > URL: https://issues.apache.org/jira/browse/HBASE-16838 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-16838-v1.patch, HBASE-16838-v2.patch, > HBASE-16838-v3.patch, HBASE-16838.patch > > > Implement a scan works like the grpc streaming call that all returned results > will be passed to a ScanConsumer. The methods of the consumer will be called > directly in the rpc framework threads so it is not allowed to do time > consuming work in the methods. So in general only experts or the > implementation of other methods in AsyncTable can call this method directly, > that's why I call it 'basic scan'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15788) Use Offheap ByteBuffers from BufferPool to read RPC requests.
[ https://issues.apache.org/jira/browse/HBASE-15788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656183#comment-15656183 ] Anoop Sam John commented on HBASE-15788: As of now I can not see any such need. As of now I will leave it as is in the patch? If u strongly feel we should not, I can remove. I am not thinking that it is going to affect too much. > Use Offheap ByteBuffers from BufferPool to read RPC requests. > - > > Key: HBASE-15788 > URL: https://issues.apache.org/jira/browse/HBASE-15788 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: ramkrishna.s.vasudevan >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15788.patch, HBASE-15788_V4.patch, > HBASE-15788_V5.patch, HBASE-15788_V6.patch, HBASE-15788_V7.patch, > HBASE-15788_V8.patch > > > Right now, when an RPC request reaches RpcServer, we read the request into an > on demand created byte[]. When it is write request and including many > mutations, the request size will be some what larger and we end up creating > many temp on heap byte[] and causing more GCs. > We have a ByteBufferPool of fixed sized off heap BBs. This is used at > RpcServer while sending read response only. We can make use of the same while > reading reqs also. Instead of reading whole of the request bytes into a > single BB, we can read into N BBs (based on the req size). When BB is not > available from pool, we will fall back to old way of on demand on heap byte[] > creation. > Remember these are off heap BBs. We read many proto objects from this read > request bytes (like header, Mutation protos etc). Thanks to PB 3 and our > shading work as it supports off heap BB now. Also the payload cells are also > in these DBBs now. The codec decoder can work on these and create off heap > BBs. Whole of our write path work with Cells now. At the time of addition to > memstore, these cells are by default copied to MSLAB ( off heap based pooled > MSLAB issue to follow this one). If MSLAB copy is not possible, we will do a > copy to on heap byte[]. > One possible down side of this is : > Before adding to Memstore, we do write to WAL. So the Cells created out of > the offheap BBs (Codec#Decoder) will be used to write to WAL. The default > FSHLog works with an OS obtained from DFSClient. This will have only standard > OS write APIs which is byte[] based. So just to write to WAL, we will end up > in temp on heap copy for each of the Cell. The other WAL imp (ie. AsynWAL) > supports writing offheap Cells directly. We have work in progress to make > AsycnWAL as default. Also we can raise HDFS req to support BB based write > APIs in their client OS? Until then, will try for a temp workaround solution. > Patch to say more on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15788) Use Offheap ByteBuffers from BufferPool to read RPC requests.
[ https://issues.apache.org/jira/browse/HBASE-15788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656180#comment-15656180 ] Anoop Sam John commented on HBASE-15788: Ya all BBs may be not from pool. so we will need extension for SBB and MBB which can separate out buffer from pool or not. We can not mark a single BB as from pool or not. Or else we will have to go with hard assumption that if a BB Is DBB it is from pool. Ya that will hold true. But I dislike that kind of hard assumptions in code. Ya it is one place and I renamed that method to better convey that we do side effect in the method and added some comments also. Will go with this way. Thanks boss. > Use Offheap ByteBuffers from BufferPool to read RPC requests. > - > > Key: HBASE-15788 > URL: https://issues.apache.org/jira/browse/HBASE-15788 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: ramkrishna.s.vasudevan >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15788.patch, HBASE-15788_V4.patch, > HBASE-15788_V5.patch, HBASE-15788_V6.patch, HBASE-15788_V7.patch, > HBASE-15788_V8.patch > > > Right now, when an RPC request reaches RpcServer, we read the request into an > on demand created byte[]. When it is write request and including many > mutations, the request size will be some what larger and we end up creating > many temp on heap byte[] and causing more GCs. > We have a ByteBufferPool of fixed sized off heap BBs. This is used at > RpcServer while sending read response only. We can make use of the same while > reading reqs also. Instead of reading whole of the request bytes into a > single BB, we can read into N BBs (based on the req size). When BB is not > available from pool, we will fall back to old way of on demand on heap byte[] > creation. > Remember these are off heap BBs. We read many proto objects from this read > request bytes (like header, Mutation protos etc). Thanks to PB 3 and our > shading work as it supports off heap BB now. Also the payload cells are also > in these DBBs now. The codec decoder can work on these and create off heap > BBs. Whole of our write path work with Cells now. At the time of addition to > memstore, these cells are by default copied to MSLAB ( off heap based pooled > MSLAB issue to follow this one). If MSLAB copy is not possible, we will do a > copy to on heap byte[]. > One possible down side of this is : > Before adding to Memstore, we do write to WAL. So the Cells created out of > the offheap BBs (Codec#Decoder) will be used to write to WAL. The default > FSHLog works with an OS obtained from DFSClient. This will have only standard > OS write APIs which is byte[] based. So just to write to WAL, we will end up > in temp on heap copy for each of the Cell. The other WAL imp (ie. AsynWAL) > supports writing offheap Cells directly. We have work in progress to make > AsycnWAL as default. Also we can raise HDFS req to support BB based write > APIs in their client OS? Until then, will try for a temp workaround solution. > Patch to say more on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions
[ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656173#comment-15656173 ] Anoop Sam John commented on HBASE-16417: bq. I did a mistake when running data compaction in previous rounds. I turned off the mslab flag but did not remove the chunk pool parameters and as a result a chunk pool was allocated but not used. That is not fully ur mistake. When MSLAB is turned off by config, even if there are pool related configs present, we should not init the pool. This will never get used any way. This is the way it worked. Only in trunk this behave change happened. My bad. HBASE-16407 is the culprit. Raised HBASE-17071 to correct it in trunk. Thanks for the nice find. > In-Memory MemStore Policy for Flattening and Compactions > > > Key: HBASE-16417 > URL: https://issues.apache.org/jira/browse/HBASE-16417 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-16417-benchmarkresults-20161101.pdf, > HBASE-16417-benchmarkresults-20161110.pdf > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16783) Use ByteBufferPool for the header and message during Rpc response
[ https://issues.apache.org/jira/browse/HBASE-16783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656172#comment-15656172 ] ramkrishna.s.vasudevan commented on HBASE-16783: Linking related JIRAs > Use ByteBufferPool for the header and message during Rpc response > - > > Key: HBASE-16783 > URL: https://issues.apache.org/jira/browse/HBASE-16783 > Project: HBase > Issue Type: Improvement >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-16783.patch, HBASE-16783_1.patch, > HBASE-16783_2.patch, HBASE-16783_3.patch, HBASE-16783_4.patch, > HBASE-16783_5.patch, HBASE-16783_6.patch, HBASE-16783_7.patch, > HBASE-16783_7.patch > > > With ByteBufferPool in place we could avoid the byte[] creation in > RpcServer#createHeaderAndMessageBytes and try using the Buffer from the pool > rather than creating byte[] every time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17071) Do not initialize MemstoreChunkPool when use mslab option is turned off
Anoop Sam John created HBASE-17071: -- Summary: Do not initialize MemstoreChunkPool when use mslab option is turned off Key: HBASE-17071 URL: https://issues.apache.org/jira/browse/HBASE-17071 Project: HBase Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 This is a 2.0 only issue and induced by HBASE-16407. We are initializing MSLAB chunk pool along with RS start itself now. (To pass it as a HeapMemoryTuneObserver). When MSLAB is turned off (ie. hbase.hregion.memstore.mslab.enabled is configured false) we should not be initializing MSLAB chunk pool at all. By default the initial chunk count to be created will be 0 only. Still better to avoid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15324) Jitter may cause desiredMaxFileSize overflow in ConstantSizeRegionSplitPolicy and trigger unexpected split
[ https://issues.apache.org/jira/browse/HBASE-15324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656170#comment-15656170 ] huaxiang sun commented on HBASE-15324: -- Agree with [~liyu]. Or we can just check overflow if (jitterValue > 0). > Jitter may cause desiredMaxFileSize overflow in ConstantSizeRegionSplitPolicy > and trigger unexpected split > -- > > Key: HBASE-15324 > URL: https://issues.apache.org/jira/browse/HBASE-15324 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.1.3 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 1.1.8 > > Attachments: HBASE-15324.patch, HBASE-15324_v2.patch, > HBASE-15324_v3.patch, HBASE-15324_v3.patch > > > We introduce jitter for region split decision in HBASE-13412, but the > following line in {{ConstantSizeRegionSplitPolicy}} may cause long value > overflow if MAX_FILESIZE is specified to Long.MAX_VALUE: > {code} > this.desiredMaxFileSize += (long)(desiredMaxFileSize * (RANDOM.nextFloat() - > 0.5D) * jitter); > {code} > In our case we specify MAX_FILESIZE to Long.MAX_VALUE to prevent target > region to split. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions
[ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656167#comment-15656167 ] ramkrishna.s.vasudevan commented on HBASE-16417: In the figure that represent Write only work load , you see more GC when there is MSLAB (with no compaction)? How many region in your PE tool and YCSB? You have only 16G memory and 0.42 of it is 6.72 G for blocking memstore. So number of regions may be important here to check other wise you can easily overload a region with lot of blocking updates. Just saying. > In-Memory MemStore Policy for Flattening and Compactions > > > Key: HBASE-16417 > URL: https://issues.apache.org/jira/browse/HBASE-16417 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-16417-benchmarkresults-20161101.pdf, > HBASE-16417-benchmarkresults-20161110.pdf > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16972) Log more details for Scan#next request when responseTooSlow
[ https://issues.apache.org/jira/browse/HBASE-16972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656165#comment-15656165 ] Hudson commented on HBASE-16972: FAILURE: Integrated in Jenkins build HBase-1.3-JDK8 #77 (See [https://builds.apache.org/job/HBase-1.3-JDK8/77/]) HBASE-16972 Log more details for Scan#next request when responseTooSlow (liyu: rev 996b4847fa3867e9b69e6f35727732836354f7a3) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServerInterface.java > Log more details for Scan#next request when responseTooSlow > --- > > Key: HBASE-16972 > URL: https://issues.apache.org/jira/browse/HBASE-16972 > Project: HBase > Issue Type: Improvement > Components: Operability >Affects Versions: 1.2.3, 1.1.7 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8 > > Attachments: HBASE-16972.patch, HBASE-16972.v2.patch, > HBASE-16972.v3.patch > > > Currently for if responseTooSlow happens on the scan.next call, we will get > warn log like below: > {noformat} > 2016-10-31 11:43:23,430 WARN > [RpcServer.FifoWFPBQ.priority.handler=5,queue=1,port=60193] > ipc.RpcServer(2574): > (responseTooSlow): > {"call":"Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)", > "starttimems":1477885403428,"responsesize":52,"method":"Scan","param":"scanner_id: > 11 number_of_rows: 2147483647 > close_scanner: false next_call_seq: 0 client_handles_partials: true > client_handles_heartbeats: true > track_scan_metrics: false renew: > false","processingtimems":2,"client":"127.0.0.1:60254","queuetimems":0,"class":"HMaster"} > {noformat} > From which we only have a {{scanner_id}} and impossible to know what exactly > this scan is about, like against which region of which table. > After this JIRA, we will improve the message to something like below (notice > the last line): > {noformat} > 2016-10-31 11:43:23,430 WARN > [RpcServer.FifoWFPBQ.priority.handler=5,queue=1,port=60193] > ipc.RpcServer(2574): > (responseTooSlow): > {"call":"Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)", > "starttimems":1477885403428,"responsesize":52,"method":"Scan","param":"scanner_id: > 11 number_of_rows: 2147483647 > close_scanner: false next_call_seq: 0 client_handles_partials: true > client_handles_heartbeats: true > track_scan_metrics: false renew: > false","processingtimems":2,"client":"127.0.0.1:60254","queuetimems":0,"class":"HMaster", > "scandetails":"table: hbase:meta region: hbase:meta,,1.1588230740"} > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17047) Add an API to get HBase connection cache statistics
[ https://issues.apache.org/jira/browse/HBASE-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656155#comment-15656155 ] Weiqing Yang commented on HBASE-17047: -- Thanks for the reply. HBaseConnectionCacheStat calculate statistics of HBaseConnectionCache only. Spark users who care about the number of concurrent hbase connections or cost of database connect/disconnect may want to use this API. > Add an API to get HBase connection cache statistics > --- > > Key: HBASE-17047 > URL: https://issues.apache.org/jira/browse/HBASE-17047 > Project: HBase > Issue Type: Improvement > Components: spark >Reporter: Weiqing Yang >Assignee: Weiqing Yang >Priority: Minor > Attachments: HBASE-17047_v1.patch > > > This patch will add a function "getStat" for the user to get the statistics > of the HBase connection cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions
[ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656146#comment-15656146 ] ramkrishna.s.vasudevan commented on HBASE-16417: bq.I also ran no-compaction option with no mslabs and no chunk pool which turned out to be the best performing setting. (See full details in the latest report.) Can you tell more on this. We recently found that in the default memstore case - enabling mslab and chunkpool had gains in terms of PE's latency and GC. > In-Memory MemStore Policy for Flattening and Compactions > > > Key: HBASE-16417 > URL: https://issues.apache.org/jira/browse/HBASE-16417 > Project: HBase > Issue Type: Sub-task >Reporter: Anastasia Braginsky >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-16417-benchmarkresults-20161101.pdf, > HBASE-16417-benchmarkresults-20161110.pdf > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17043) parallelize select() work in mob compaction
[ https://issues.apache.org/jira/browse/HBASE-17043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656128#comment-15656128 ] huaxiang sun commented on HBASE-17043: -- Correct a typo, it should be 700k files instead of 70k files. > parallelize select() work in mob compaction > --- > > Key: HBASE-17043 > URL: https://issues.apache.org/jira/browse/HBASE-17043 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun >Priority: Minor > > Today in > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/compactions/PartitionedMobCompactor.java#L141, > the select() is single-threaded. Give a large number of files, it will take > several seconds to finish the job. Will see how this work can be divided and > speed up the processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-17043) parallelize select() work in mob compaction
[ https://issues.apache.org/jira/browse/HBASE-17043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15648370#comment-15648370 ] huaxiang sun edited comment on HBASE-17043 at 11/11/16 4:48 AM: For 700k files, found that it took 6 ~ 7 seconds to finish the select logic. Compared with the file compact (I/O), it is nothing, still will see how to speed up to reduce this 6 ~ 7 seconds time. was (Author: huaxiang): For 70k files, found that it took 6 ~ 7 seconds to finish the select logic. Compared with the file compact (I/O), it is nothing, still will see how to speed up to reduce this 6 ~ 7 seconds time. > parallelize select() work in mob compaction > --- > > Key: HBASE-17043 > URL: https://issues.apache.org/jira/browse/HBASE-17043 > Project: HBase > Issue Type: Improvement > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun >Priority: Minor > > Today in > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/compactions/PartitionedMobCompactor.java#L141, > the select() is single-threaded. Give a large number of files, it will take > several seconds to finish the job. Will see how this work can be divided and > speed up the processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16838) Implement basic scan
[ https://issues.apache.org/jira/browse/HBASE-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656086#comment-15656086 ] stack commented on HBASE-16838: --- Link to issue to address Enis remarks. > Implement basic scan > > > Key: HBASE-16838 > URL: https://issues.apache.org/jira/browse/HBASE-16838 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-16838-v1.patch, HBASE-16838-v2.patch, > HBASE-16838-v3.patch, HBASE-16838.patch > > > Implement a scan works like the grpc streaming call that all returned results > will be passed to a ScanConsumer. The methods of the consumer will be called > directly in the rpc framework threads so it is not allowed to do time > consuming work in the methods. So in general only experts or the > implementation of other methods in AsyncTable can call this method directly, > that's why I call it 'basic scan'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17062) RegionSplitter throws ClassCastException
[ https://issues.apache.org/jira/browse/HBASE-17062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeongdae Kim updated HBASE-17062: - Attachment: HBASE-17062.003.patch > RegionSplitter throws ClassCastException > > > Key: HBASE-17062 > URL: https://issues.apache.org/jira/browse/HBASE-17062 > Project: HBase > Issue Type: Bug > Components: util >Reporter: Jeongdae Kim >Priority: Minor > Attachments: HBASE-17062.001.patch, HBASE-17062.002.patch, > HBASE-17062.003.patch > > > RegionSplitter throws Exception as below. > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.hbase.ServerName cannot be cast to java.lang.String > at java.lang.String.compareTo(String.java:108) > at java.util.TreeMap.getEntry(TreeMap.java:346) > at java.util.TreeMap.get(TreeMap.java:273) > at > org.apache.hadoop.hbase.util.RegionSplitter$1.compare(RegionSplitter.java:504) > at > org.apache.hadoop.hbase.util.RegionSplitter$1.compare(RegionSplitter.java:502) > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324) > at java.util.TimSort.sort(TimSort.java:189) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.hbase.util.RegionSplitter.rollingSplit(RegionSplitter.java:502) > at > org.apache.hadoop.hbase.util.RegionSplitter.main(RegionSplitter.java:367) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17062) RegionSplitter throws ClassCastException
[ https://issues.apache.org/jira/browse/HBASE-17062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeongdae Kim updated HBASE-17062: - Status: Patch Available (was: Open) > RegionSplitter throws ClassCastException > > > Key: HBASE-17062 > URL: https://issues.apache.org/jira/browse/HBASE-17062 > Project: HBase > Issue Type: Bug > Components: util >Reporter: Jeongdae Kim >Priority: Minor > Attachments: HBASE-17062.001.patch, HBASE-17062.002.patch, > HBASE-17062.003.patch > > > RegionSplitter throws Exception as below. > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.hbase.ServerName cannot be cast to java.lang.String > at java.lang.String.compareTo(String.java:108) > at java.util.TreeMap.getEntry(TreeMap.java:346) > at java.util.TreeMap.get(TreeMap.java:273) > at > org.apache.hadoop.hbase.util.RegionSplitter$1.compare(RegionSplitter.java:504) > at > org.apache.hadoop.hbase.util.RegionSplitter$1.compare(RegionSplitter.java:502) > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324) > at java.util.TimSort.sort(TimSort.java:189) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.hbase.util.RegionSplitter.rollingSplit(RegionSplitter.java:502) > at > org.apache.hadoop.hbase.util.RegionSplitter.main(RegionSplitter.java:367) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17062) RegionSplitter throws ClassCastException
[ https://issues.apache.org/jira/browse/HBASE-17062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeongdae Kim updated HBASE-17062: - Status: Open (was: Patch Available) > RegionSplitter throws ClassCastException > > > Key: HBASE-17062 > URL: https://issues.apache.org/jira/browse/HBASE-17062 > Project: HBase > Issue Type: Bug > Components: util >Reporter: Jeongdae Kim >Priority: Minor > Attachments: HBASE-17062.001.patch, HBASE-17062.002.patch > > > RegionSplitter throws Exception as below. > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.hbase.ServerName cannot be cast to java.lang.String > at java.lang.String.compareTo(String.java:108) > at java.util.TreeMap.getEntry(TreeMap.java:346) > at java.util.TreeMap.get(TreeMap.java:273) > at > org.apache.hadoop.hbase.util.RegionSplitter$1.compare(RegionSplitter.java:504) > at > org.apache.hadoop.hbase.util.RegionSplitter$1.compare(RegionSplitter.java:502) > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324) > at java.util.TimSort.sort(TimSort.java:189) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.hbase.util.RegionSplitter.rollingSplit(RegionSplitter.java:502) > at > org.apache.hadoop.hbase.util.RegionSplitter.main(RegionSplitter.java:367) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17062) RegionSplitter throws ClassCastException
[ https://issues.apache.org/jira/browse/HBASE-17062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656038#comment-15656038 ] Ted Yu commented on HBASE-17062: Almost there. {code} 23 import java.util.*; {code} Please don't use '*' in import. > RegionSplitter throws ClassCastException > > > Key: HBASE-17062 > URL: https://issues.apache.org/jira/browse/HBASE-17062 > Project: HBase > Issue Type: Bug > Components: util >Reporter: Jeongdae Kim >Priority: Minor > Attachments: HBASE-17062.001.patch, HBASE-17062.002.patch > > > RegionSplitter throws Exception as below. > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.hbase.ServerName cannot be cast to java.lang.String > at java.lang.String.compareTo(String.java:108) > at java.util.TreeMap.getEntry(TreeMap.java:346) > at java.util.TreeMap.get(TreeMap.java:273) > at > org.apache.hadoop.hbase.util.RegionSplitter$1.compare(RegionSplitter.java:504) > at > org.apache.hadoop.hbase.util.RegionSplitter$1.compare(RegionSplitter.java:502) > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324) > at java.util.TimSort.sort(TimSort.java:189) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.hbase.util.RegionSplitter.rollingSplit(RegionSplitter.java:502) > at > org.apache.hadoop.hbase.util.RegionSplitter.main(RegionSplitter.java:367) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17021) Use RingBuffer to reduce the contention in AsyncFSWAL
[ https://issues.apache.org/jira/browse/HBASE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656035#comment-15656035 ] Hudson commented on HBASE-17021: ABORTED: Integrated in Jenkins build HBase-Trunk_matrix #1942 (See [https://builds.apache.org/job/HBase-Trunk_matrix/1942/]) HBASE-17021 Use RingBuffer to reduce the contention in AsyncFSWAL (zhangduo: rev 3b629d632ae660b618422b3e2f67533a6fdc7106) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestAsyncFSWAL.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/RingBufferTruck.java * (edit) pom.xml * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AsyncFSWAL.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractFSWAL.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestAsyncWALReplay.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WAL.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/AbstractTestWALReplay.java > Use RingBuffer to reduce the contention in AsyncFSWAL > - > > Key: HBASE-17021 > URL: https://issues.apache.org/jira/browse/HBASE-17021 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: AsyncWAL_disruptor_7.patch, HBASE-17021-v1.patch, > HBASE-17021-v2.patch, HBASE-17021-v3.patch, HBASE-17021.patch > > > The WALPE result in HBASE-16890 shows that with disruptor's RingBuffer we can > get a better performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17020) keylen in midkey() dont computed correctly
[ https://issues.apache.org/jira/browse/HBASE-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656032#comment-15656032 ] Hudson commented on HBASE-17020: ABORTED: Integrated in Jenkins build HBase-1.4 #528 (See [https://builds.apache.org/job/HBase-1.4/528/]) HBASE-17020 keylen in midkey() dont computed correctly (liyu: rev 18b31fdd32cbd59da0e41ec1083473023746f264) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java > keylen in midkey() dont computed correctly > -- > > Key: HBASE-17020 > URL: https://issues.apache.org/jira/browse/HBASE-17020 > Project: HBase > Issue Type: Bug > Components: HFile >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Yu Sun >Assignee: Yu Sun > Fix For: 2.0.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8 > > Attachments: HBASE-17020-branch-0.98.patch, HBASE-17020-v1.patch, > HBASE-17020-v2.patch, HBASE-17020-v2.patch, HBASE-17020-v3-branch1.1.patch, > HBASE-17020.branch-0.98.patch, HBASE-17020.branch-0.98.patch, > HBASE-17020.branch-1.1.patch > > > in CellBasedKeyBlockIndexReader.midkey(): > {code} > ByteBuff b = midLeafBlock.getBufferWithoutHeader(); > int numDataBlocks = b.getIntAfterPosition(0); > int keyRelOffset = b.getIntAfterPosition(Bytes.SIZEOF_INT * > (midKeyEntry + 1)); > int keyLen = b.getIntAfterPosition(Bytes.SIZEOF_INT * (midKeyEntry > + 2)) - keyRelOffset; > {code} > the local varible keyLen get this should be total length of: > SECONDARY_INDEX_ENTRY_OVERHEAD + firstKey.length; > the code is: > {code} > void add(byte[] firstKey, long blockOffset, int onDiskDataSize, > long curTotalNumSubEntries) { > // Record the offset for the secondary index > secondaryIndexOffsetMarks.add(curTotalNonRootEntrySize); > curTotalNonRootEntrySize += SECONDARY_INDEX_ENTRY_OVERHEAD > + firstKey.length; > {code} > when the midkey last entry of a leaf-level index block, this may throw: > {quote} > 2016-10-01 12:27:55,186 ERROR [MemStoreFlusher.0] > regionserver.MemStoreFlusher: Cache flusher failed for entry [flush region > pora_6_item_feature,0061:,1473838922457.12617bc4ebbfd171018bf96ac9bdd2a7.] > java.lang.ArrayIndexOutOfBoundsException > at > org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:936) > at > org.apache.hadoop.hbase.nio.SingleByteBuff.toBytes(SingleByteBuff.java:303) > at > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.midkey(HFileBlockIndex.java:419) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.midkey(HFileReaderImpl.java:1519) > at > org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1520) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getFileSplitPoint(StoreFile.java:706) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFileManager.getSplitPoint(DefaultStoreFileManager.java:126) > at > org.apache.hadoop.hbase.regionserver.HStore.getSplitPoint(HStore.java:1983) > at > org.apache.hadoop.hbase.regionserver.ConstantFamilySizeRegionSplitPolicy.getSplitPoint(ConstantFamilySizeRegionSplitPolicy.java:77) > at > org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:7756) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:513) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) > at java.lang.Thread.run(Thread.java:756) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17017) Remove the current per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656036#comment-15656036 ] Hudson commented on HBASE-17017: ABORTED: Integrated in Jenkins build HBase-Trunk_matrix #1942 (See [https://builds.apache.org/job/HBase-Trunk_matrix/1942/]) HBASE-17017 Remove the current per-region latency histogram metrics (enis: rev 03bc884ea085197a651b50ddc21f575560854f1c) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegion.java * (edit) hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSource.java * (edit) hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * (edit) hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSource.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java > Remove the current per-region latency histogram metrics > --- > > Key: HBASE-17017 > URL: https://issues.apache.org/jira/browse/HBASE-17017 > Project: HBase > Issue Type: Sub-task > Components: metrics >Affects Versions: 2.0.0, 1.3.0, 1.4.0 >Reporter: Duo Zhang >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.3.0, 1.4.0 > > Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot > 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch, hbase-17017_v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17039) SimpleLoadBalancer schedules large amount of invalid region moves
[ https://issues.apache.org/jira/browse/HBASE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656034#comment-15656034 ] Hudson commented on HBASE-17039: ABORTED: Integrated in Jenkins build HBase-1.4 #528 (See [https://builds.apache.org/job/HBase-1.4/528/]) HBASE-17039 SimpleLoadBalancer schedules large amount of invalid region (liyu: rev d248d6b0b3d3f6f0b7a265d5f8607d5f5c62eefb) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java > SimpleLoadBalancer schedules large amount of invalid region moves > - > > Key: HBASE-17039 > URL: https://issues.apache.org/jira/browse/HBASE-17039 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 2.0.0, 1.3.0, 1.1.7, 1.2.4 >Reporter: Charlie Qiangeng Xu >Assignee: Charlie Qiangeng Xu > Fix For: 2.0.0, 1.4.0, 1.2.5, 1.1.8 > > Attachments: HBASE-17039.patch > > > After increasing one of our clusters to 1600 nodes, we observed a large > amount of invalid region moves(more than 30k moves) fired by the balance > chore. Thus we simulated the problem and printed out the balance plan, only > to find out many servers that had two regions for a certain table(we use by > table strategy), sent out both regions to other two servers that have zero > region. > In the SimpleLoadBalancer's balanceCluster function, > the code block that determines the underLoadedServers might have a problem: > {code} > if (load >= min && load > 0) { > continue; // look for other servers which haven't reached min > } > int regionsToPut = min - load; > if (regionsToPut == 0) > { > regionsToPut = 1; > } > {code} > if min is zero, some server that has load of zero, which equals to min would > be marked as underloaded, which would cause the phenomenon mentioned above. > Since we increased the cluster's size to 1600+, many tables that only have > 1000 regions, now would encounter such issue. > By fixing it up, the balance plan went back to normal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16985) TestClusterId failed due to wrong hbase rootdir
[ https://issues.apache.org/jira/browse/HBASE-16985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656033#comment-15656033 ] Hudson commented on HBASE-16985: ABORTED: Integrated in Jenkins build HBase-1.4 #528 (See [https://builds.apache.org/job/HBase-1.4/528/]) HBASE-16985 TestClusterId failed due to wrong hbase rootdir (stack: rev e929156f96de004b2b8a0535463eff7fe8c38116) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java > TestClusterId failed due to wrong hbase rootdir > --- > > Key: HBASE-16985 > URL: https://issues.apache.org/jira/browse/HBASE-16985 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 1.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 1.1.8 > > Attachments: HBASE-16985-branch-1.1.patch, > HBASE-16985-branch-1.2.patch, HBASE-16985-branch-1.patch, > HBASE-16985-branch-1.patch, HBASE-16985-v1.patch, HBASE-16985-v1.patch, > HBASE-16985.patch > > > https://builds.apache.org/job/PreCommit-HBASE-Build/4253/testReport/org.apache.hadoop.hbase.regionserver/TestClusterId/testClusterId/ > {code} > java.io.IOException: Shutting down > at > org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:230) > at > org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:409) > at > org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:227) > at > org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:96) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1071) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1037) > at > org.apache.hadoop.hbase.regionserver.TestClusterId.testClusterId(TestClusterId.java:85) > {code} > The cluster can not start up because there are no active master. The active > master can not finish initialing because the hbase:namespace region can not > be assign. > In TestClusterId unit test, TEST_UTIL.startMiniHBaseCluster set new hbase > root dir. But the regionserver thread which stared first used a different > hbase root dir. If assign hbase:namespace region to this regionserver, the > region can not be assigned because there are no tableinfo on wrong hbase root > dir. > When regionserver report to master, it will get back some new config. But the > FSTableDescriptors has been initialed so it's root dir didn't changed. > {code} > if (LOG.isDebugEnabled()) { > LOG.info("Config from master: " + key + "=" + value); > } > {code} > I thought FSTableDescriptors need update the rootdir when regionserver get > report from master. > The master branch has same problem, too. But the balancer always assign > hbase:namesapce region to master. So this unit test can passed on master > branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16570) Compute region locality in parallel at startup
[ https://issues.apache.org/jira/browse/HBASE-16570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656028#comment-15656028 ] Hudson commented on HBASE-16570: ABORTED: Integrated in Jenkins build HBase-1.4 #528 (See [https://builds.apache.org/job/HBase-1.4/528/]) HBASE-16570 Compute region locality in parallel at startup (addendum) (liyu: rev dac73eceb03bf871ce6def7982b39950e68be1e2) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestRegionLocationFinder.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/RegionLocationFinder.java > Compute region locality in parallel at startup > -- > > Key: HBASE-16570 > URL: https://issues.apache.org/jira/browse/HBASE-16570 > Project: HBase > Issue Type: Sub-task >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16570-master_V1.patch, > HBASE-16570-master_V2.patch, HBASE-16570-master_V3.patch, > HBASE-16570-master_V4.patch, HBASE-16570.branch-1.3-addendum.patch, > HBASE-16570_addnum.patch, HBASE-16570_addnum_v2.patch, > HBASE-16570_addnum_v3.patch, HBASE-16570_addnum_v4.patch, > HBASE-16570_addnum_v5.patch, HBASE-16570_addnum_v6.patch, > HBASE-16570_addnum_v7.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16938) TableCFsUpdater maybe failed due to no write permission on peerNode
[ https://issues.apache.org/jira/browse/HBASE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656030#comment-15656030 ] Hudson commented on HBASE-16938: ABORTED: Integrated in Jenkins build HBase-1.4 #528 (See [https://builds.apache.org/job/HBase-1.4/528/]) HBASE-16938 TableCFsUpdater maybe failed due to no write permission on (enis: rev a6397e3b0c5a9c938c0a00cb5d3cd762d498afd1) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/master/TableCFsUpdater.java > TableCFsUpdater maybe failed due to no write permission on peerNode > --- > > Key: HBASE-16938 > URL: https://issues.apache.org/jira/browse/HBASE-16938 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 1.4.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16938.patch, HBASE-16938.patch > > > After HBASE-11393, replication table-cfs use a PB object. So it need copy the > old string config to new PB object when upgrade cluster. In our use case, we > have different kerberos for different cluster, etc. online serve cluster and > offline processing cluster. And we use a unify global admin kerberos for all > clusters. The peer node is created by client. So only global admin has the > write permission for it. When upgrade cluster, HMaster doesn't has the write > permission on peer node, it maybe failed to copy old table-cfs string to new > PB Object. I thought it need a tool for client to do this copy job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17054) Compactor#preCreateCoprocScanner should be passed user
[ https://issues.apache.org/jira/browse/HBASE-17054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656027#comment-15656027 ] Hudson commented on HBASE-17054: ABORTED: Integrated in Jenkins build HBase-1.4 #528 (See [https://builds.apache.org/job/HBase-1.4/528/]) HBASE-17054 Compactor#preCreateCoprocScanner should be passed user (tedyu: rev 1e322e68a5f383b59011d50c7f09257c459831c3) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/Compactor.java > Compactor#preCreateCoprocScanner should be passed user > -- > > Key: HBASE-17054 > URL: https://issues.apache.org/jira/browse/HBASE-17054 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.4.0 > > Attachments: 17054.v1.txt > > > As Anoop mentioned at the end of HBASE-16962: > {code} > ScanType scanType = scannerFactory.getScanType(request); > scanner = preCreateCoprocScanner(request, scanType, fd.earliestPutTs, > scanners); > {code} > user should be passed to preCreateCoprocScanner(). > Otherwise null User would be used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17017) Remove the current per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656011#comment-15656011 ] Hudson commented on HBASE-17017: ABORTED: Integrated in Jenkins build HBase-1.4 #527 (See [https://builds.apache.org/job/HBase-1.4/527/]) HBASE-17017 Remove the current per-region latency histogram metrics (enis: rev 123d26ed907a9d1532386ce965ff2c388e44fe39) * (edit) hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSourceImpl.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java * (edit) hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSource.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegion.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * (edit) hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionSource.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java > Remove the current per-region latency histogram metrics > --- > > Key: HBASE-17017 > URL: https://issues.apache.org/jira/browse/HBASE-17017 > Project: HBase > Issue Type: Sub-task > Components: metrics >Affects Versions: 2.0.0, 1.3.0, 1.4.0 >Reporter: Duo Zhang >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.3.0, 1.4.0 > > Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot > 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch, hbase-17017_v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17062) RegionSplitter throws ClassCastException
[ https://issues.apache.org/jira/browse/HBASE-17062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656002#comment-15656002 ] Hadoop QA commented on HBASE-17062: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 52s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 30s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 4s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 27m 28s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 13s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 133m 16s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.TestJMXListener | | Timed out junit tests | org.apache.hadoop.hbase.master.procedure.TestDeleteTableProcedure | | | org.apache.hadoop.hbase.master.procedure.TestMasterProcedureSchedulerConcurrency | | | org.apache.hadoop.hbase.master.procedure.TestRestoreSnapshotProcedure | | | org.apache.hadoop.hbase.master.procedure.TestMasterProcedureWalLease | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:7bda515 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838475/HBASE-17062.002.patch | | JIRA Issue | HBASE-17062 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux e00239b26f40 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 62e3b1e | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/4428/artifact/patchprocess/patch-unit-hbase-server.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HBASE-Build/4428/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results |
[jira] [Commented] (HBASE-17068) Procedure v2 - inherit region locks
[ https://issues.apache.org/jira/browse/HBASE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655972#comment-15655972 ] Hadoop QA commented on HBASE-17068: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 44s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 25m 14s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 59s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 109m 6s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Timed out junit tests | org.apache.hadoop.hbase.util.TestMiniClusterLoadEncoded | | | org.apache.hadoop.hbase.util.TestMergeTable | | | org.apache.hadoop.hbase.util.TestMergeTool | | | org.apache.hadoop.hbase.util.TestConnectionCache | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:7bda515 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838481/HBASE-17068-v1.patch | | JIRA Issue | HBASE-17068 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 477969be7b6b 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 62e3b1e | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/4429/artifact/patchprocess/patch-unit-hbase-server.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HBASE-Build/4429/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/4429/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/4429/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was
[jira] [Updated] (HBASE-17039) SimpleLoadBalancer schedules large amount of invalid region moves
[ https://issues.apache.org/jira/browse/HBASE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HBASE-17039: -- Resolution: Fixed Fix Version/s: (was: 1.3.1) Status: Resolved (was: Patch Available) Closing this one and will track the backport for 1.3.1 through HBASE-17069. Thanks [~xharlie] for the patch and thanks all for review. > SimpleLoadBalancer schedules large amount of invalid region moves > - > > Key: HBASE-17039 > URL: https://issues.apache.org/jira/browse/HBASE-17039 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 2.0.0, 1.3.0, 1.1.7, 1.2.4 >Reporter: Charlie Qiangeng Xu >Assignee: Charlie Qiangeng Xu > Fix For: 2.0.0, 1.4.0, 1.2.5, 1.1.8 > > Attachments: HBASE-17039.patch > > > After increasing one of our clusters to 1600 nodes, we observed a large > amount of invalid region moves(more than 30k moves) fired by the balance > chore. Thus we simulated the problem and printed out the balance plan, only > to find out many servers that had two regions for a certain table(we use by > table strategy), sent out both regions to other two servers that have zero > region. > In the SimpleLoadBalancer's balanceCluster function, > the code block that determines the underLoadedServers might have a problem: > {code} > if (load >= min && load > 0) { > continue; // look for other servers which haven't reached min > } > int regionsToPut = min - load; > if (regionsToPut == 0) > { > regionsToPut = 1; > } > {code} > if min is zero, some server that has load of zero, which equals to min would > be marked as underloaded, which would cause the phenomenon mentioned above. > Since we increased the cluster's size to 1600+, many tables that only have > 1000 regions, now would encounter such issue. > By fixing it up, the balance plan went back to normal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-17039) SimpleLoadBalancer schedules large amount of invalid region moves
[ https://issues.apache.org/jira/browse/HBASE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655937#comment-15655937 ] Yu Li edited comment on HBASE-17039 at 11/11/16 2:54 AM: - Closing this one and will track the backport for 1.3.1 through HBASE-17059. Thanks [~xharlie] for the patch and thanks all for review. was (Author: carp84): Closing this one and will track the backport for 1.3.1 through HBASE-17069. Thanks [~xharlie] for the patch and thanks all for review. > SimpleLoadBalancer schedules large amount of invalid region moves > - > > Key: HBASE-17039 > URL: https://issues.apache.org/jira/browse/HBASE-17039 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 2.0.0, 1.3.0, 1.1.7, 1.2.4 >Reporter: Charlie Qiangeng Xu >Assignee: Charlie Qiangeng Xu > Fix For: 2.0.0, 1.4.0, 1.2.5, 1.1.8 > > Attachments: HBASE-17039.patch > > > After increasing one of our clusters to 1600 nodes, we observed a large > amount of invalid region moves(more than 30k moves) fired by the balance > chore. Thus we simulated the problem and printed out the balance plan, only > to find out many servers that had two regions for a certain table(we use by > table strategy), sent out both regions to other two servers that have zero > region. > In the SimpleLoadBalancer's balanceCluster function, > the code block that determines the underLoadedServers might have a problem: > {code} > if (load >= min && load > 0) { > continue; // look for other servers which haven't reached min > } > int regionsToPut = min - load; > if (regionsToPut == 0) > { > regionsToPut = 1; > } > {code} > if min is zero, some server that has load of zero, which equals to min would > be marked as underloaded, which would cause the phenomenon mentioned above. > Since we increased the cluster's size to 1600+, many tables that only have > 1000 regions, now would encounter such issue. > By fixing it up, the balance plan went back to normal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-17039) SimpleLoadBalancer schedules large amount of invalid region moves
[ https://issues.apache.org/jira/browse/HBASE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652918#comment-15652918 ] Yu Li edited comment on HBASE-17039 at 11/11/16 2:53 AM: - Created HBASE-17059 for backporting to 1.3.1 was (Author: carp84): Create sub-task for backporting to 1.3.1 and update fix version relatively. Leave this JIRA open until sub-task done. > SimpleLoadBalancer schedules large amount of invalid region moves > - > > Key: HBASE-17039 > URL: https://issues.apache.org/jira/browse/HBASE-17039 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 2.0.0, 1.3.0, 1.1.7, 1.2.4 >Reporter: Charlie Qiangeng Xu >Assignee: Charlie Qiangeng Xu > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 1.1.8 > > Attachments: HBASE-17039.patch > > > After increasing one of our clusters to 1600 nodes, we observed a large > amount of invalid region moves(more than 30k moves) fired by the balance > chore. Thus we simulated the problem and printed out the balance plan, only > to find out many servers that had two regions for a certain table(we use by > table strategy), sent out both regions to other two servers that have zero > region. > In the SimpleLoadBalancer's balanceCluster function, > the code block that determines the underLoadedServers might have a problem: > {code} > if (load >= min && load > 0) { > continue; // look for other servers which haven't reached min > } > int regionsToPut = min - load; > if (regionsToPut == 0) > { > regionsToPut = 1; > } > {code} > if min is zero, some server that has load of zero, which equals to min would > be marked as underloaded, which would cause the phenomenon mentioned above. > Since we increased the cluster's size to 1600+, many tables that only have > 1000 regions, now would encounter such issue. > By fixing it up, the balance plan went back to normal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17059) backport HBASE-17039 to 1.3.1
[ https://issues.apache.org/jira/browse/HBASE-17059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HBASE-17059: -- Affects Version/s: 1.3.0 Fix Version/s: (was: 1.1.8) (was: 1.2.5) (was: 1.4.0) (was: 2.0.0) 1.3.1 > backport HBASE-17039 to 1.3.1 > - > > Key: HBASE-17059 > URL: https://issues.apache.org/jira/browse/HBASE-17059 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 1.3.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 1.3.1 > > > Currently branch-1.3 codes are freezing for 1.3.0 release, need to backport > HBASE-17039 to 1.3.1 afterwards. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17059) backport HBASE-17039 to 1.3.1
[ https://issues.apache.org/jira/browse/HBASE-17059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HBASE-17059: -- Issue Type: Bug (was: Sub-task) Parent: (was: HBASE-17039) > backport HBASE-17039 to 1.3.1 > - > > Key: HBASE-17059 > URL: https://issues.apache.org/jira/browse/HBASE-17059 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0, 1.4.0, 1.2.5, 1.1.8 > > > Currently branch-1.3 codes are freezing for 1.3.0 release, need to backport > HBASE-17039 to 1.3.1 afterwards. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17020) keylen in midkey() dont computed correctly
[ https://issues.apache.org/jira/browse/HBASE-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HBASE-17020: -- Resolution: Fixed Status: Resolved (was: Patch Available) Pushed commit into branch-0.98 since HadoopQA looks good, and opened HBASE-17070 for branch-1.3 Closing this JIRA since all work done here. Thanks [~haoran] for the patch and thanks all for review. > keylen in midkey() dont computed correctly > -- > > Key: HBASE-17020 > URL: https://issues.apache.org/jira/browse/HBASE-17020 > Project: HBase > Issue Type: Bug > Components: HFile >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Yu Sun >Assignee: Yu Sun > Fix For: 2.0.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8 > > Attachments: HBASE-17020-branch-0.98.patch, HBASE-17020-v1.patch, > HBASE-17020-v2.patch, HBASE-17020-v2.patch, HBASE-17020-v3-branch1.1.patch, > HBASE-17020.branch-0.98.patch, HBASE-17020.branch-0.98.patch, > HBASE-17020.branch-1.1.patch > > > in CellBasedKeyBlockIndexReader.midkey(): > {code} > ByteBuff b = midLeafBlock.getBufferWithoutHeader(); > int numDataBlocks = b.getIntAfterPosition(0); > int keyRelOffset = b.getIntAfterPosition(Bytes.SIZEOF_INT * > (midKeyEntry + 1)); > int keyLen = b.getIntAfterPosition(Bytes.SIZEOF_INT * (midKeyEntry > + 2)) - keyRelOffset; > {code} > the local varible keyLen get this should be total length of: > SECONDARY_INDEX_ENTRY_OVERHEAD + firstKey.length; > the code is: > {code} > void add(byte[] firstKey, long blockOffset, int onDiskDataSize, > long curTotalNumSubEntries) { > // Record the offset for the secondary index > secondaryIndexOffsetMarks.add(curTotalNonRootEntrySize); > curTotalNonRootEntrySize += SECONDARY_INDEX_ENTRY_OVERHEAD > + firstKey.length; > {code} > when the midkey last entry of a leaf-level index block, this may throw: > {quote} > 2016-10-01 12:27:55,186 ERROR [MemStoreFlusher.0] > regionserver.MemStoreFlusher: Cache flusher failed for entry [flush region > pora_6_item_feature,0061:,1473838922457.12617bc4ebbfd171018bf96ac9bdd2a7.] > java.lang.ArrayIndexOutOfBoundsException > at > org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:936) > at > org.apache.hadoop.hbase.nio.SingleByteBuff.toBytes(SingleByteBuff.java:303) > at > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.midkey(HFileBlockIndex.java:419) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.midkey(HFileReaderImpl.java:1519) > at > org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1520) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getFileSplitPoint(StoreFile.java:706) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFileManager.getSplitPoint(DefaultStoreFileManager.java:126) > at > org.apache.hadoop.hbase.regionserver.HStore.getSplitPoint(HStore.java:1983) > at > org.apache.hadoop.hbase.regionserver.ConstantFamilySizeRegionSplitPolicy.getSplitPoint(ConstantFamilySizeRegionSplitPolicy.java:77) > at > org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:7756) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:513) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) > at java.lang.Thread.run(Thread.java:756) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17070) backport HBASE-17020 to 1.3.1
Yu Li created HBASE-17070: - Summary: backport HBASE-17020 to 1.3.1 Key: HBASE-17070 URL: https://issues.apache.org/jira/browse/HBASE-17070 Project: HBase Issue Type: Bug Affects Versions: 1.3.0 Reporter: Yu Li Assignee: Yu Li Fix For: 1.3.1 As titled, backport HBASE-17020 after 1.3.0 got released. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17020) keylen in midkey() dont computed correctly
[ https://issues.apache.org/jira/browse/HBASE-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655917#comment-15655917 ] Yu Li commented on HBASE-17020: --- Thanks for the confirm, opened HBASE-17070 to track this. > keylen in midkey() dont computed correctly > -- > > Key: HBASE-17020 > URL: https://issues.apache.org/jira/browse/HBASE-17020 > Project: HBase > Issue Type: Bug > Components: HFile >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Yu Sun >Assignee: Yu Sun > Fix For: 2.0.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8 > > Attachments: HBASE-17020-branch-0.98.patch, HBASE-17020-v1.patch, > HBASE-17020-v2.patch, HBASE-17020-v2.patch, HBASE-17020-v3-branch1.1.patch, > HBASE-17020.branch-0.98.patch, HBASE-17020.branch-0.98.patch, > HBASE-17020.branch-1.1.patch > > > in CellBasedKeyBlockIndexReader.midkey(): > {code} > ByteBuff b = midLeafBlock.getBufferWithoutHeader(); > int numDataBlocks = b.getIntAfterPosition(0); > int keyRelOffset = b.getIntAfterPosition(Bytes.SIZEOF_INT * > (midKeyEntry + 1)); > int keyLen = b.getIntAfterPosition(Bytes.SIZEOF_INT * (midKeyEntry > + 2)) - keyRelOffset; > {code} > the local varible keyLen get this should be total length of: > SECONDARY_INDEX_ENTRY_OVERHEAD + firstKey.length; > the code is: > {code} > void add(byte[] firstKey, long blockOffset, int onDiskDataSize, > long curTotalNumSubEntries) { > // Record the offset for the secondary index > secondaryIndexOffsetMarks.add(curTotalNonRootEntrySize); > curTotalNonRootEntrySize += SECONDARY_INDEX_ENTRY_OVERHEAD > + firstKey.length; > {code} > when the midkey last entry of a leaf-level index block, this may throw: > {quote} > 2016-10-01 12:27:55,186 ERROR [MemStoreFlusher.0] > regionserver.MemStoreFlusher: Cache flusher failed for entry [flush region > pora_6_item_feature,0061:,1473838922457.12617bc4ebbfd171018bf96ac9bdd2a7.] > java.lang.ArrayIndexOutOfBoundsException > at > org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:936) > at > org.apache.hadoop.hbase.nio.SingleByteBuff.toBytes(SingleByteBuff.java:303) > at > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.midkey(HFileBlockIndex.java:419) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.midkey(HFileReaderImpl.java:1519) > at > org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1520) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getFileSplitPoint(StoreFile.java:706) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFileManager.getSplitPoint(DefaultStoreFileManager.java:126) > at > org.apache.hadoop.hbase.regionserver.HStore.getSplitPoint(HStore.java:1983) > at > org.apache.hadoop.hbase.regionserver.ConstantFamilySizeRegionSplitPolicy.getSplitPoint(ConstantFamilySizeRegionSplitPolicy.java:77) > at > org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:7756) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:513) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) > at java.lang.Thread.run(Thread.java:756) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17020) keylen in midkey() dont computed correctly
[ https://issues.apache.org/jira/browse/HBASE-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655906#comment-15655906 ] Hadoop QA commented on HBASE-17020: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 33s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 10s {color} | {color:green} 0.98 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} 0.98 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 22s {color} | {color:green} 0.98 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} 0.98 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 38s {color} | {color:red} hbase-server in 0.98 has 84 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} 0.98 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 12m 17s {color} | {color:green} The patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 117m 44s {color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 156m 59s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:568b3f7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838464/HBASE-17020.branch-0.98.patch | | JIRA Issue | HBASE-17020 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux eed811a9368f 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/hbase.sh | | git revision | 0.98 / 5f9cd86 | | Default Java | 1.7.0_80 | | findbugs | v2.0.1 | | findbugs | https://builds.apache.org/job/PreCommit-HBASE-Build/4427/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/4427/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/4427/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > keylen in midkey() dont computed correctly > -- > > Key: HBASE-17020 > URL: https://issues.apache.org/jira/browse/HBASE-17020 > Project: HBase > Issue Type: Bug > Components: HFile >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Yu Sun >
[jira] [Updated] (HBASE-16972) Log more details for Scan#next request when responseTooSlow
[ https://issues.apache.org/jira/browse/HBASE-16972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HBASE-16972: -- Fix Version/s: 1.3.0 Pushed commit into branch-1.3, thanks [~mantonov] for the message in HBASE-17011. > Log more details for Scan#next request when responseTooSlow > --- > > Key: HBASE-16972 > URL: https://issues.apache.org/jira/browse/HBASE-16972 > Project: HBase > Issue Type: Improvement > Components: Operability >Affects Versions: 1.2.3, 1.1.7 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8 > > Attachments: HBASE-16972.patch, HBASE-16972.v2.patch, > HBASE-16972.v3.patch > > > Currently for if responseTooSlow happens on the scan.next call, we will get > warn log like below: > {noformat} > 2016-10-31 11:43:23,430 WARN > [RpcServer.FifoWFPBQ.priority.handler=5,queue=1,port=60193] > ipc.RpcServer(2574): > (responseTooSlow): > {"call":"Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)", > "starttimems":1477885403428,"responsesize":52,"method":"Scan","param":"scanner_id: > 11 number_of_rows: 2147483647 > close_scanner: false next_call_seq: 0 client_handles_partials: true > client_handles_heartbeats: true > track_scan_metrics: false renew: > false","processingtimems":2,"client":"127.0.0.1:60254","queuetimems":0,"class":"HMaster"} > {noformat} > From which we only have a {{scanner_id}} and impossible to know what exactly > this scan is about, like against which region of which table. > After this JIRA, we will improve the message to something like below (notice > the last line): > {noformat} > 2016-10-31 11:43:23,430 WARN > [RpcServer.FifoWFPBQ.priority.handler=5,queue=1,port=60193] > ipc.RpcServer(2574): > (responseTooSlow): > {"call":"Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)", > "starttimems":1477885403428,"responsesize":52,"method":"Scan","param":"scanner_id: > 11 number_of_rows: 2147483647 > close_scanner: false next_call_seq: 0 client_handles_partials: true > client_handles_heartbeats: true > track_scan_metrics: false renew: > false","processingtimems":2,"client":"127.0.0.1:60254","queuetimems":0,"class":"HMaster", > "scandetails":"table: hbase:meta region: hbase:meta,,1.1588230740"} > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-17011) backport HBASE-16972 to 1.3.1
[ https://issues.apache.org/jira/browse/HBASE-17011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li resolved HBASE-17011. --- Resolution: Invalid Fix Version/s: (was: 1.3.1) Mark this JIRA as invalid and will commit to branch-1.3 in HBASE-16972 itself. > backport HBASE-16972 to 1.3.1 > - > > Key: HBASE-17011 > URL: https://issues.apache.org/jira/browse/HBASE-17011 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.3.0 >Reporter: Yu Li >Assignee: Yu Li > > As discussed in HBASE-16972, holding commits for now until 1.3.0 comes out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17011) backport HBASE-16972 to 1.3.1
[ https://issues.apache.org/jira/browse/HBASE-17011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655894#comment-15655894 ] Yu Li commented on HBASE-17011: --- Sure, no problem, let me get this into 1.3.0. Thanks for the message sir [~mantonov] > backport HBASE-16972 to 1.3.1 > - > > Key: HBASE-17011 > URL: https://issues.apache.org/jira/browse/HBASE-17011 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.3.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 1.3.1 > > > As discussed in HBASE-16972, holding commits for now until 1.3.0 comes out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16841) Data loss in MOB files after cloning a snapshot and deleting that snapshot
[ https://issues.apache.org/jira/browse/HBASE-16841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingcheng Du updated HBASE-16841: - Attachment: HBASE-16841-V5.patch Thanks [~tedyu]! Upload a new patch V5 according Ted's comments and fix the check style issues. > Data loss in MOB files after cloning a snapshot and deleting that snapshot > -- > > Key: HBASE-16841 > URL: https://issues.apache.org/jira/browse/HBASE-16841 > Project: HBase > Issue Type: Bug > Components: mob, snapshots >Reporter: Jingcheng Du >Assignee: Jingcheng Du > Attachments: HBASE-16841-V2.patch, HBASE-16841-V3.patch, > HBASE-16841-V4.patch, HBASE-16841-V5.patch, HBASE-16841.patch > > > Running the following steps will probably lose MOB data when working with > snapshots. > 1. Create a mob-enabled table by running create 't1', {NAME => 'f1', IS_MOB > => true, MOB_THRESHOLD => 0}. > 2. Put millions of data. > 3. Run {{snapshot 't1','t1_snapshot'}} to take a snapshot for this table t1. > 4. Run {{clone_snapshot 't1_snapshot','t1_cloned'}} to clone this snapshot. > 5. Run {{delete_snapshot 't1_snapshot'}} to delete this snapshot. > 6. Run {{disable 't1'}} and {{delete 't1'}} to delete the table. > 7. Now go to the archive directory of t1, the number of .link directories is > different from the number of hfiles which means some data will be lost after > the hfile cleaner runs. > This is because, when taking a snapshot on a enabled mob table, each region > flushes itself and takes a snapshot, and the mob snapshot is taken only if > the current region is first region of the table. At that time, the flushing > of some regions might not be finished, and some mob files are not flushed to > disk yet. Eventually some mob files are not recorded in the snapshot manifest. > To solve this, we need to take the mob snapshot at last after the snapshots > on all the online and offline regions are finished in > {{EnabledTableSnapshotHandler}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances
[ https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655835#comment-15655835 ] Andrew Purtell commented on HBASE-17069: Sounds fair. I don't think HBASE-17044 is a blocker; nobody seems to have hit it. > RegionServer writes invalid META entries for split daughters in some > circumstances > -- > > Key: HBASE-17069 > URL: https://issues.apache.org/jira/browse/HBASE-17069 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.4 >Reporter: Andrew Purtell >Priority: Critical > Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, > daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, > parent-393d2bfd8b1c52ce08540306659624f2.log > > > I have been seeing frequent ITBLL failures testing various versions of 1.2.x. > Over the lifetime of 1.2.x the following issues have been fixed: > - HBASE-15315 (Remove always set super user call as high priority) > - HBASE-16093 (Fix splits failed before creating daughter regions leave meta > inconsistent) > And this one is pending: > - HBASE-17044 (Fix merge failed before creating merged region leaves meta > inconsistent) > I can apply all of the above to branch-1.2 and still see this failure: > *The life of stillborn region d55ef81c2f8299abbddfce0445067830* > *Master sees SPLITTING_NEW* > {noformat} > 2016-11-08 04:23:21,186 INFO [AM.ZK.Worker-pool2-t82] master.RegionStates: > Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, > ts=1478579001186, server=node-3.cluster,16020,1478578389506} > {noformat} > *The RegionServer creates it* > {noformat} > 2016-11-08 04:23:26,035 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,038 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for big: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,442 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, > currentSize=17187656, freeSize=12821524664, maxSize=12838712320, > heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,713 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,715 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,717 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95,
[jira] [Commented] (HBASE-17011) backport HBASE-16972 to 1.3.1
[ https://issues.apache.org/jira/browse/HBASE-17011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655829#comment-15655829 ] Mikhail Antonov commented on HBASE-17011: - [~carp84] since on 1.3.0RC0 I found flaky/broken compaction test (HBASE-16852) and looking, you guys want to piggy back on it and get this one to 1.3.0? Seems good to me at the moment (won't wait specifically, but would be ok to commit). Sorry for any additional labor that could have caused :) > backport HBASE-16972 to 1.3.1 > - > > Key: HBASE-17011 > URL: https://issues.apache.org/jira/browse/HBASE-17011 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.3.0 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 1.3.1 > > > As discussed in HBASE-16972, holding commits for now until 1.3.0 comes out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16838) Implement basic scan
[ https://issues.apache.org/jira/browse/HBASE-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655825#comment-15655825 ] Duo Zhang commented on HBASE-16838: --- Ping [~stack]. Will commit later today if no other objections. Thanks. > Implement basic scan > > > Key: HBASE-16838 > URL: https://issues.apache.org/jira/browse/HBASE-16838 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-16838-v1.patch, HBASE-16838-v2.patch, > HBASE-16838-v3.patch, HBASE-16838.patch > > > Implement a scan works like the grpc streaming call that all returned results > will be passed to a ScanConsumer. The methods of the consumer will be called > directly in the rpc framework threads so it is not allowed to do time > consuming work in the methods. So in general only experts or the > implementation of other methods in AsyncTable can call this method directly, > that's why I call it 'basic scan'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances
[ https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655813#comment-15655813 ] Mikhail Antonov commented on HBASE-17069: - [~apurtell] I don't remember seeing that, but that might be just a factor of having slightly different configuration for ITBLL.. My point was more that as this issue (and possibly, others) is present in 1.2, and we moved stable pointer to 1.2, and assuming it's also present in 1.3 (which I don't know yet if it's true, just assuming), what's your call on RC0 with that out. I guess unless objections, I'd aim to roll first RC regardless and get more feedback. > RegionServer writes invalid META entries for split daughters in some > circumstances > -- > > Key: HBASE-17069 > URL: https://issues.apache.org/jira/browse/HBASE-17069 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.4 >Reporter: Andrew Purtell >Priority: Critical > Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, > daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, > parent-393d2bfd8b1c52ce08540306659624f2.log > > > I have been seeing frequent ITBLL failures testing various versions of 1.2.x. > Over the lifetime of 1.2.x the following issues have been fixed: > - HBASE-15315 (Remove always set super user call as high priority) > - HBASE-16093 (Fix splits failed before creating daughter regions leave meta > inconsistent) > And this one is pending: > - HBASE-17044 (Fix merge failed before creating merged region leaves meta > inconsistent) > I can apply all of the above to branch-1.2 and still see this failure: > *The life of stillborn region d55ef81c2f8299abbddfce0445067830* > *Master sees SPLITTING_NEW* > {noformat} > 2016-11-08 04:23:21,186 INFO [AM.ZK.Worker-pool2-t82] master.RegionStates: > Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, > ts=1478579001186, server=node-3.cluster,16020,1478578389506} > {noformat} > *The RegionServer creates it* > {noformat} > 2016-11-08 04:23:26,035 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,038 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for big: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,442 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, > currentSize=17187656, freeSize=12821524664, maxSize=12838712320, > heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,713 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,715 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, >
[jira] [Commented] (HBASE-16938) TableCFsUpdater maybe failed due to no write permission on peerNode
[ https://issues.apache.org/jira/browse/HBASE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655806#comment-15655806 ] Guanghao Zhang commented on HBASE-16938: We have used a same tool on our production cluster. Thanks [~enis]. > TableCFsUpdater maybe failed due to no write permission on peerNode > --- > > Key: HBASE-16938 > URL: https://issues.apache.org/jira/browse/HBASE-16938 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 1.4.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16938.patch, HBASE-16938.patch > > > After HBASE-11393, replication table-cfs use a PB object. So it need copy the > old string config to new PB object when upgrade cluster. In our use case, we > have different kerberos for different cluster, etc. online serve cluster and > offline processing cluster. And we use a unify global admin kerberos for all > clusters. The peer node is created by client. So only global admin has the > write permission for it. When upgrade cluster, HMaster doesn't has the write > permission on peer node, it maybe failed to copy old table-cfs string to new > PB Object. I thought it need a tool for client to do this copy job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14882) Provide a Put API that adds the provided family, qualifier, value without copying
[ https://issues.apache.org/jira/browse/HBASE-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655802#comment-15655802 ] Xiang Li commented on HBASE-14882: -- [~anoop.hbase] Would you please help to review patch 003 when you have time ^_^ > Provide a Put API that adds the provided family, qualifier, value without > copying > - > > Key: HBASE-14882 > URL: https://issues.apache.org/jira/browse/HBASE-14882 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Jerry He >Assignee: Xiang Li > Fix For: 2.0.0 > > Attachments: HBASE-14882.master.000.patch, > HBASE-14882.master.001.patch, HBASE-14882.master.002.patch, > HBASE-14882.master.003.patch > > > In the Put API, we have addImmutable() > {code} > /** >* See {@link #addColumn(byte[], byte[], byte[])}. This version expects >* that the underlying arrays won't change. It's intended >* for usage internal HBase to and for advanced client applications. >*/ > public Put addImmutable(byte [] family, byte [] qualifier, byte [] value) > {code} > But in the implementation, the family, qualifier and value are still being > copied locally to create kv. > Hopefully we should provide an API that truly uses immutable family, > qualifier and value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17063) Cleanup TestHRegion : remove duplicate variables for method name and two unused params in initRegion
[ https://issues.apache.org/jira/browse/HBASE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Appy updated HBASE-17063: - Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~stack] for the review. > Cleanup TestHRegion : remove duplicate variables for method name and two > unused params in initRegion > > > Key: HBASE-17063 > URL: https://issues.apache.org/jira/browse/HBASE-17063 > Project: HBase > Issue Type: Improvement >Reporter: Appy >Assignee: Appy >Priority: Minor > Labels: cleanup, tests > Fix For: 2.0.0 > > Attachments: HBASE-17063.master.001.patch > > > - Replaces test function local tablename and method names with those > initialized in setup() function > - Remove unused params from initHRegion(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17063) Cleanup TestHRegion : remove duplicate variables for method name and two unused params in initRegion
[ https://issues.apache.org/jira/browse/HBASE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Appy updated HBASE-17063: - Fix Version/s: 2.0.0 > Cleanup TestHRegion : remove duplicate variables for method name and two > unused params in initRegion > > > Key: HBASE-17063 > URL: https://issues.apache.org/jira/browse/HBASE-17063 > Project: HBase > Issue Type: Improvement >Reporter: Appy >Assignee: Appy >Priority: Minor > Labels: cleanup, tests > Fix For: 2.0.0 > > Attachments: HBASE-17063.master.001.patch > > > - Replaces test function local tablename and method names with those > initialized in setup() function > - Remove unused params from initHRegion(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17063) Cleanup TestHRegion : remove duplicate variables for method name and two unused params in initRegion
[ https://issues.apache.org/jira/browse/HBASE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Appy updated HBASE-17063: - Labels: cleanup tests (was: ) > Cleanup TestHRegion : remove duplicate variables for method name and two > unused params in initRegion > > > Key: HBASE-17063 > URL: https://issues.apache.org/jira/browse/HBASE-17063 > Project: HBase > Issue Type: Improvement >Reporter: Appy >Assignee: Appy >Priority: Minor > Labels: cleanup, tests > Fix For: 2.0.0 > > Attachments: HBASE-17063.master.001.patch > > > - Replaces test function local tablename and method names with those > initialized in setup() function > - Remove unused params from initHRegion(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17068) Procedure v2 - inherit region locks
[ https://issues.apache.org/jira/browse/HBASE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17068: Attachment: HBASE-17068-v1.patch > Procedure v2 - inherit region locks > > > Key: HBASE-17068 > URL: https://issues.apache.org/jira/browse/HBASE-17068 > Project: HBase > Issue Type: Sub-task > Components: master, proc-v2 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Fix For: 2.0.0 > > Attachments: HBASE-17068-v0.patch, HBASE-17068-v1.patch > > > Add support for inherited region locks. > e.g. Split will have Assign/Unassign as child which will take the lock on the > same region split is running on -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17068) Procedure v2 - inherit region locks
[ https://issues.apache.org/jira/browse/HBASE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655753#comment-15655753 ] Hadoop QA commented on HBASE-17068: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 46s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 32s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 25m 7s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 41s {color} | {color:red} hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 43s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 10s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 120m 44s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hbase-server | | | Inconsistent synchronization of org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler$Queue.exclusiveLockProcIdOwner; locked 75% of time Unsynchronized access at MasterProcedureScheduler.java:75% of time Unsynchronized access at MasterProcedureScheduler.java:[line 1153] | | Timed out junit tests | org.apache.hadoop.hbase.util.TestIdLock | | | org.apache.hadoop.hbase.util.TestHBaseFsckReplicas | | | org.apache.hadoop.hbase.util.TestRegionSplitter | | | org.apache.hadoop.hbase.util.TestMiniClusterLoadParallel | | | org.apache.hadoop.hbase.util.TestIdReadWriteLock | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:7bda515 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838456/HBASE-17068-v0.patch | | JIRA Issue | HBASE-17068 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 7e8ca7003313 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 12eec5b | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | findbugs | https://builds.apache.org/job/PreCommit-HBASE-Build/4426/artifact/patchprocess/new-findbugs-hbase-server.html
[jira] [Updated] (HBASE-17062) RegionSplitter throws ClassCastException
[ https://issues.apache.org/jira/browse/HBASE-17062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeongdae Kim updated HBASE-17062: - Status: Open (was: Patch Available) > RegionSplitter throws ClassCastException > > > Key: HBASE-17062 > URL: https://issues.apache.org/jira/browse/HBASE-17062 > Project: HBase > Issue Type: Bug > Components: util >Reporter: Jeongdae Kim >Priority: Minor > Attachments: HBASE-17062.001.patch, HBASE-17062.002.patch > > > RegionSplitter throws Exception as below. > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.hbase.ServerName cannot be cast to java.lang.String > at java.lang.String.compareTo(String.java:108) > at java.util.TreeMap.getEntry(TreeMap.java:346) > at java.util.TreeMap.get(TreeMap.java:273) > at > org.apache.hadoop.hbase.util.RegionSplitter$1.compare(RegionSplitter.java:504) > at > org.apache.hadoop.hbase.util.RegionSplitter$1.compare(RegionSplitter.java:502) > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324) > at java.util.TimSort.sort(TimSort.java:189) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.hbase.util.RegionSplitter.rollingSplit(RegionSplitter.java:502) > at > org.apache.hadoop.hbase.util.RegionSplitter.main(RegionSplitter.java:367) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17062) RegionSplitter throws ClassCastException
[ https://issues.apache.org/jira/browse/HBASE-17062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeongdae Kim updated HBASE-17062: - Status: Patch Available (was: Open) > RegionSplitter throws ClassCastException > > > Key: HBASE-17062 > URL: https://issues.apache.org/jira/browse/HBASE-17062 > Project: HBase > Issue Type: Bug > Components: util >Reporter: Jeongdae Kim >Priority: Minor > Attachments: HBASE-17062.001.patch, HBASE-17062.002.patch > > > RegionSplitter throws Exception as below. > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.hbase.ServerName cannot be cast to java.lang.String > at java.lang.String.compareTo(String.java:108) > at java.util.TreeMap.getEntry(TreeMap.java:346) > at java.util.TreeMap.get(TreeMap.java:273) > at > org.apache.hadoop.hbase.util.RegionSplitter$1.compare(RegionSplitter.java:504) > at > org.apache.hadoop.hbase.util.RegionSplitter$1.compare(RegionSplitter.java:502) > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324) > at java.util.TimSort.sort(TimSort.java:189) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.hbase.util.RegionSplitter.rollingSplit(RegionSplitter.java:502) > at > org.apache.hadoop.hbase.util.RegionSplitter.main(RegionSplitter.java:367) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17062) RegionSplitter throws ClassCastException
[ https://issues.apache.org/jira/browse/HBASE-17062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeongdae Kim updated HBASE-17062: - Attachment: HBASE-17062.002.patch > RegionSplitter throws ClassCastException > > > Key: HBASE-17062 > URL: https://issues.apache.org/jira/browse/HBASE-17062 > Project: HBase > Issue Type: Bug > Components: util >Reporter: Jeongdae Kim >Priority: Minor > Attachments: HBASE-17062.001.patch, HBASE-17062.002.patch > > > RegionSplitter throws Exception as below. > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.hbase.ServerName cannot be cast to java.lang.String > at java.lang.String.compareTo(String.java:108) > at java.util.TreeMap.getEntry(TreeMap.java:346) > at java.util.TreeMap.get(TreeMap.java:273) > at > org.apache.hadoop.hbase.util.RegionSplitter$1.compare(RegionSplitter.java:504) > at > org.apache.hadoop.hbase.util.RegionSplitter$1.compare(RegionSplitter.java:502) > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324) > at java.util.TimSort.sort(TimSort.java:189) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.hbase.util.RegionSplitter.rollingSplit(RegionSplitter.java:502) > at > org.apache.hadoop.hbase.util.RegionSplitter.main(RegionSplitter.java:367) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances
[ https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655722#comment-15655722 ] Andrew Purtell commented on HBASE-17069: I can't say [~mantonov]. I haven't test 1.3 and up. Pretty busy here. I'm assuming you have not seen it? I will aim to get a test of the latest 1.3.0 tomorrow, but can't promise it > RegionServer writes invalid META entries for split daughters in some > circumstances > -- > > Key: HBASE-17069 > URL: https://issues.apache.org/jira/browse/HBASE-17069 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.4 >Reporter: Andrew Purtell >Priority: Critical > Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, > daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, > parent-393d2bfd8b1c52ce08540306659624f2.log > > > I have been seeing frequent ITBLL failures testing various versions of 1.2.x. > Over the lifetime of 1.2.x the following issues have been fixed: > - HBASE-15315 (Remove always set super user call as high priority) > - HBASE-16093 (Fix splits failed before creating daughter regions leave meta > inconsistent) > And this one is pending: > - HBASE-17044 (Fix merge failed before creating merged region leaves meta > inconsistent) > I can apply all of the above to branch-1.2 and still see this failure: > *The life of stillborn region d55ef81c2f8299abbddfce0445067830* > *Master sees SPLITTING_NEW* > {noformat} > 2016-11-08 04:23:21,186 INFO [AM.ZK.Worker-pool2-t82] master.RegionStates: > Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, > ts=1478579001186, server=node-3.cluster,16020,1478578389506} > {noformat} > *The RegionServer creates it* > {noformat} > 2016-11-08 04:23:26,035 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,038 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for big: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,442 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, > currentSize=17187656, freeSize=12821524664, maxSize=12838712320, > heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,713 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,715 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,717 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440,
[jira] [Updated] (HBASE-17060) backport HBASE-16570 to 1.3.1
[ https://issues.apache.org/jira/browse/HBASE-17060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov updated HBASE-17060: Fix Version/s: 1.3.1 > backport HBASE-16570 to 1.3.1 > - > > Key: HBASE-17060 > URL: https://issues.apache.org/jira/browse/HBASE-17060 > Project: HBase > Issue Type: Sub-task >Affects Versions: 1.3.0 >Reporter: Yu Li >Assignee: binlijin > Fix For: 1.3.1 > > > Need some backport after 1.3.0 got released -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17020) keylen in midkey() dont computed correctly
[ https://issues.apache.org/jira/browse/HBASE-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655708#comment-15655708 ] Mikhail Antonov commented on HBASE-17020: - thanks for the ping! 1.3.1 backport seems good. > keylen in midkey() dont computed correctly > -- > > Key: HBASE-17020 > URL: https://issues.apache.org/jira/browse/HBASE-17020 > Project: HBase > Issue Type: Bug > Components: HFile >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Yu Sun >Assignee: Yu Sun > Fix For: 2.0.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8 > > Attachments: HBASE-17020-branch-0.98.patch, HBASE-17020-v1.patch, > HBASE-17020-v2.patch, HBASE-17020-v2.patch, HBASE-17020-v3-branch1.1.patch, > HBASE-17020.branch-0.98.patch, HBASE-17020.branch-0.98.patch, > HBASE-17020.branch-1.1.patch > > > in CellBasedKeyBlockIndexReader.midkey(): > {code} > ByteBuff b = midLeafBlock.getBufferWithoutHeader(); > int numDataBlocks = b.getIntAfterPosition(0); > int keyRelOffset = b.getIntAfterPosition(Bytes.SIZEOF_INT * > (midKeyEntry + 1)); > int keyLen = b.getIntAfterPosition(Bytes.SIZEOF_INT * (midKeyEntry > + 2)) - keyRelOffset; > {code} > the local varible keyLen get this should be total length of: > SECONDARY_INDEX_ENTRY_OVERHEAD + firstKey.length; > the code is: > {code} > void add(byte[] firstKey, long blockOffset, int onDiskDataSize, > long curTotalNumSubEntries) { > // Record the offset for the secondary index > secondaryIndexOffsetMarks.add(curTotalNonRootEntrySize); > curTotalNonRootEntrySize += SECONDARY_INDEX_ENTRY_OVERHEAD > + firstKey.length; > {code} > when the midkey last entry of a leaf-level index block, this may throw: > {quote} > 2016-10-01 12:27:55,186 ERROR [MemStoreFlusher.0] > regionserver.MemStoreFlusher: Cache flusher failed for entry [flush region > pora_6_item_feature,0061:,1473838922457.12617bc4ebbfd171018bf96ac9bdd2a7.] > java.lang.ArrayIndexOutOfBoundsException > at > org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:936) > at > org.apache.hadoop.hbase.nio.SingleByteBuff.toBytes(SingleByteBuff.java:303) > at > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.midkey(HFileBlockIndex.java:419) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.midkey(HFileReaderImpl.java:1519) > at > org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1520) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getFileSplitPoint(StoreFile.java:706) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFileManager.getSplitPoint(DefaultStoreFileManager.java:126) > at > org.apache.hadoop.hbase.regionserver.HStore.getSplitPoint(HStore.java:1983) > at > org.apache.hadoop.hbase.regionserver.ConstantFamilySizeRegionSplitPolicy.getSplitPoint(ConstantFamilySizeRegionSplitPolicy.java:77) > at > org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:7756) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:513) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) > at java.lang.Thread.run(Thread.java:756) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)