[jira] [Updated] (HBASE-13336) Consistent rules for security meta table protections

2015-06-25 Thread Srikanth Srungarapu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Srungarapu updated HBASE-13336:

Attachment: HBASE-13336_v2.patch

 Consistent rules for security meta table protections
 

 Key: HBASE-13336
 URL: https://issues.apache.org/jira/browse/HBASE-13336
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Srikanth Srungarapu
 Fix For: 2.0.0, 0.98.14, 1.3.0

 Attachments: HBASE-13336.patch, HBASE-13336_v2.patch


 The AccessController and VisibilityController do different things regarding 
 protecting their meta tables. The AC allows schema changes and disable/enable 
 if the user has permission. The VC unconditionally disallows all admin 
 actions. Generally, bad things will happen if these meta tables are damaged, 
 disabled, or dropped. The likely outcome is random frequent (or constant) 
 server side op failures with nasty stack traces. On the other hand some 
 things like column family and table attribute changes can have valid use 
 cases. We should have consistent and sensible rules for protecting security 
 meta tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13336) Consistent rules for security meta table protections

2015-06-25 Thread Srikanth Srungarapu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Srungarapu updated HBASE-13336:

Attachment: (was: HBASE-13336_v2.patch)

 Consistent rules for security meta table protections
 

 Key: HBASE-13336
 URL: https://issues.apache.org/jira/browse/HBASE-13336
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Srikanth Srungarapu
 Fix For: 2.0.0, 0.98.14, 1.3.0

 Attachments: HBASE-13336.patch, HBASE-13336_v2.patch


 The AccessController and VisibilityController do different things regarding 
 protecting their meta tables. The AC allows schema changes and disable/enable 
 if the user has permission. The VC unconditionally disallows all admin 
 actions. Generally, bad things will happen if these meta tables are damaged, 
 disabled, or dropped. The likely outcome is random frequent (or constant) 
 server side op failures with nasty stack traces. On the other hand some 
 things like column family and table attribute changes can have valid use 
 cases. We should have consistent and sensible rules for protecting security 
 meta tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HBASE-13336) Consistent rules for security meta table protections

2015-06-25 Thread Srikanth Srungarapu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-13336 started by Srikanth Srungarapu.
---
 Consistent rules for security meta table protections
 

 Key: HBASE-13336
 URL: https://issues.apache.org/jira/browse/HBASE-13336
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Srikanth Srungarapu
 Fix For: 2.0.0, 0.98.14, 1.3.0

 Attachments: HBASE-13336.patch, HBASE-13336_v2.patch


 The AccessController and VisibilityController do different things regarding 
 protecting their meta tables. The AC allows schema changes and disable/enable 
 if the user has permission. The VC unconditionally disallows all admin 
 actions. Generally, bad things will happen if these meta tables are damaged, 
 disabled, or dropped. The likely outcome is random frequent (or constant) 
 server side op failures with nasty stack traces. On the other hand some 
 things like column family and table attribute changes can have valid use 
 cases. We should have consistent and sensible rules for protecting security 
 meta tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases

2015-06-25 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601785#comment-14601785
 ] 

Lars Hofhansl commented on HBASE-13959:
---

Specifically we can set the default maximum to the 1/2 #blockingStoreFiles.
That way we have a good default, and folks can override and (a) decrease if 
they set blockStoreFiles to a large value or (b) increase if they have many 
column families.


 Region splitting takes too long because it uses a single thread in most 
 common cases
 

 Key: HBASE-13959
 URL: https://issues.apache.org/jira/browse/HBASE-13959
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.12
Reporter: Hari Krishna Dara
Assignee: Hari Krishna Dara
 Fix For: 0.98.14

 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, 
 HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png


 When storefiles need to be split as part of a region split, the current logic 
 uses a threadpool with the size set to the size of the number of stores. 
 Since most common table setup involves only a single column family, this 
 translates to having a single store and so the threadpool is run with a 
 single thread. However, in a write heavy workload, there could be several 
 tens of storefiles in a store at the time of splitting, and with a threadpool 
 size of one, these files end up getting split sequentially.
 With a bit of tracing, I noticed that it takes on an average of 350ms to 
 create a single reference file, and splitting each storefile involves 
 creating two of these, so with a storefile count of 20, it takes about 14s 
 just to get through this phase alone (2 reference files for each storefile), 
 pushing the total time the region is offline to 18s or more. For environments 
 that are setup to fail fast, this makes the client exhaust all retries and 
 fail with NotServingRegionException.
 The fix should increase the concurrency of this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13835) KeyValueHeap.current might be in heap when exception happens in pollRealKV

2015-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601867#comment-14601867
 ] 

Hudson commented on HBASE-13835:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #991 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/991/])
HBASE-13835 KeyValueHeap.current might be in heap when exception happens in 
pollRealKV. (zhouyingchao) (anoopsamjohn: rev 
46e9a8ea0a276cf23b33fbcafba8f00611c3c885)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReversedKeyValueHeap.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java


 KeyValueHeap.current might be in heap when exception happens in pollRealKV
 --

 Key: HBASE-13835
 URL: https://issues.apache.org/jira/browse/HBASE-13835
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Reporter: zhouyingchao
Assignee: zhouyingchao
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-13835-001.patch, HBASE-13835-002.patch, 
 HBASE-13835-branch1-001.patch, HBASE-13835_0.98.patch, 
 HBASE-13835_branch-1.0.patch, HBASE-13835_branch-1.patch, 
 HBASE-13835_branch-1.patch


 In a 0.94 hbase cluster, we found a NPE with following stack:
 {code}
 Exception in thread regionserver21600.leaseChecker 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1530)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:225)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:201)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:191)
 at 
 java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
 at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
 at java.util.PriorityQueue.poll(PriorityQueue.java:523)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:241)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.close(StoreScanner.java:355)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:237)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:4302)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:3033)
 at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:119)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 Before this NPE exception, there is an exception happens in pollRealKV, which 
 we think is the culprit of the NPE.
 {code}
 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
 java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for 
 reader reader=
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:371)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:366)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:116)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:455)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:154)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:4124)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4196)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4067)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4057)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.internalNext(HRegionServer.java:2898)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2833)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2815)
 at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:337)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1583)
 {code}
 Simply put, if there is an exception happens in 

[jira] [Resolved] (HBASE-13972) Hanging test finder should report killed test

2015-06-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-13972.

Resolution: Cannot Reproduce

Subsequent run of findHangingTests.py reported TestProcedureStoreTracker as 
hanging test.

 Hanging test finder should report killed test
 -

 Key: HBASE-13972
 URL: https://issues.apache.org/jira/browse/HBASE-13972
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor

 I was looking at 
 https://builds.apache.org/job/PreCommit-HBASE-Build/14576/console and found 
 that findHangingTests.py didn't report any hanging / failing test.
 {code}
 Running org.apache.hadoop.hbase.procedure2.store.TestProcedureStoreTracker
 Killed
 {code}
 It turns out that findHangingTests.py didn't distinguish the state for tests 
 that were killed.
 Patch coming shortly which allows printing of killed test(s)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases

2015-06-25 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601785#comment-14601785
 ] 

Lars Hofhansl edited comment on HBASE-13959 at 6/25/15 7:30 PM:


Specifically we can set the default maximum to the 1/2 #blockingStoreFiles (or 
maybe just #blockingStoreFiles)
That way we have a good default, and folks can override and (a) decrease if 
they set blockStoreFiles to a large value or (b) increase if they have many 
column families.



was (Author: lhofhansl):
Specifically we can set the default maximum to the 1/2 #blockingStoreFiles.
That way we have a good default, and folks can override and (a) decrease if 
they set blockStoreFiles to a large value or (b) increase if they have many 
column families.


 Region splitting takes too long because it uses a single thread in most 
 common cases
 

 Key: HBASE-13959
 URL: https://issues.apache.org/jira/browse/HBASE-13959
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.12
Reporter: Hari Krishna Dara
Assignee: Hari Krishna Dara
 Fix For: 0.98.14

 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, 
 HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png


 When storefiles need to be split as part of a region split, the current logic 
 uses a threadpool with the size set to the size of the number of stores. 
 Since most common table setup involves only a single column family, this 
 translates to having a single store and so the threadpool is run with a 
 single thread. However, in a write heavy workload, there could be several 
 tens of storefiles in a store at the time of splitting, and with a threadpool 
 size of one, these files end up getting split sequentially.
 With a bit of tracing, I noticed that it takes on an average of 350ms to 
 create a single reference file, and splitting each storefile involves 
 creating two of these, so with a storefile count of 20, it takes about 14s 
 just to get through this phase alone (2 reference files for each storefile), 
 pushing the total time the region is offline to 18s or more. For environments 
 that are setup to fail fast, this makes the client exhaust all retries and 
 fail with NotServingRegionException.
 The fix should increase the concurrency of this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13835) KeyValueHeap.current might be in heap when exception happens in pollRealKV

2015-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601842#comment-14601842
 ] 

Hudson commented on HBASE-13835:


FAILURE: Integrated in HBase-1.1 #557 (See 
[https://builds.apache.org/job/HBase-1.1/557/])
HBASE-13835 KeyValueHeap.current might be in heap when exception happens in 
pollRealKV. (zhouyingchao) (anoopsamjohn: rev 
a7f31ce357d8d90922fd7530bfc008839c2fa72d)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReversedKeyValueHeap.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java


 KeyValueHeap.current might be in heap when exception happens in pollRealKV
 --

 Key: HBASE-13835
 URL: https://issues.apache.org/jira/browse/HBASE-13835
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Reporter: zhouyingchao
Assignee: zhouyingchao
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-13835-001.patch, HBASE-13835-002.patch, 
 HBASE-13835-branch1-001.patch, HBASE-13835_0.98.patch, 
 HBASE-13835_branch-1.0.patch, HBASE-13835_branch-1.patch, 
 HBASE-13835_branch-1.patch


 In a 0.94 hbase cluster, we found a NPE with following stack:
 {code}
 Exception in thread regionserver21600.leaseChecker 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1530)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:225)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:201)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:191)
 at 
 java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
 at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
 at java.util.PriorityQueue.poll(PriorityQueue.java:523)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:241)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.close(StoreScanner.java:355)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:237)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:4302)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:3033)
 at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:119)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 Before this NPE exception, there is an exception happens in pollRealKV, which 
 we think is the culprit of the NPE.
 {code}
 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
 java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for 
 reader reader=
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:371)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:366)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:116)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:455)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:154)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:4124)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4196)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4067)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4057)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.internalNext(HRegionServer.java:2898)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2833)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2815)
 at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:337)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1583)
 {code}
 Simply put, if there is an exception happens in pollRealKV( ), the 
 

[jira] [Updated] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low

2015-06-25 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-13832:
-
Fix Version/s: 1.2.1
   1.3.0
   1.1.2
   2.0.0

 Procedure V2: master fail to start due to WALProcedureStore sync failures 
 when HDFS data nodes count is low
 ---

 Key: HBASE-13832
 URL: https://issues.apache.org/jira/browse/HBASE-13832
 Project: HBase
  Issue Type: Sub-task
  Components: master, proc-v2
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Stephen Yuan Jiang
Assignee: Matteo Bertozzi
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13832-v0.patch, HDFSPipeline.java


 when the data node  3, we got failure in WALProcedureStore#syncLoop() during 
 master start.  The failure prevents master to get started.  
 {noformat}
 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] 
 wal.WALProcedureStore: Sync slot failed, abort.
 java.io.IOException: Failed to replace a bad datanode on the existing 
 pipeline due to no more good datanodes being available to try. (Nodes: 
 current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  
 DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
  
 original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-
 490ece56c772,DISK]]). The current failed datanode replacement policy is 
 DEFAULT, and a client may configure this via 
 'dfs.client.block.write.replace-datanode-on-failure.policy'  in its 
 configuration.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
 {noformat}
 One proposal is to implement some similar logic as FSHLog: if IOException is 
 thrown during syncLoop in WALProcedureStore#start(), instead of immediate 
 abort, we could try to roll the log and see whether this resolve the issue; 
 if the new log cannot be created or more exception from rolling the log, we 
 then abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low

2015-06-25 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601868#comment-14601868
 ] 

Matteo Bertozzi commented on HBASE-13832:
-

even with the patch the master will not start until the 3rd data node is back.
in theory you should ping pong between the backup masters until something a DN 
is available.
what the patch does is just retry for some time hoping that a 3rd data node is 
online, before giving up.

 Procedure V2: master fail to start due to WALProcedureStore sync failures 
 when HDFS data nodes count is low
 ---

 Key: HBASE-13832
 URL: https://issues.apache.org/jira/browse/HBASE-13832
 Project: HBase
  Issue Type: Sub-task
  Components: master, proc-v2
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Stephen Yuan Jiang
Assignee: Matteo Bertozzi
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13832-v0.patch, HDFSPipeline.java


 when the data node  3, we got failure in WALProcedureStore#syncLoop() during 
 master start.  The failure prevents master to get started.  
 {noformat}
 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] 
 wal.WALProcedureStore: Sync slot failed, abort.
 java.io.IOException: Failed to replace a bad datanode on the existing 
 pipeline due to no more good datanodes being available to try. (Nodes: 
 current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  
 DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
  
 original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-
 490ece56c772,DISK]]). The current failed datanode replacement policy is 
 DEFAULT, and a client may configure this via 
 'dfs.client.block.write.replace-datanode-on-failure.policy'  in its 
 configuration.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
 {noformat}
 One proposal is to implement some similar logic as FSHLog: if IOException is 
 thrown during syncLoop in WALProcedureStore#start(), instead of immediate 
 abort, we could try to roll the log and see whether this resolve the issue; 
 if the new log cannot be created or more exception from rolling the log, we 
 then abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-06-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601878#comment-14601878
 ] 

Hadoop QA commented on HBASE-13702:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12741902/HBASE-13702-v5.patch
  against master branch at commit edef3d64bce41fffbc5649ffa19b2cf80ce28d7a.
  ATTACHMENT ID: 12741902

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.util.TestHBaseFsck
  org.apache.hadoop.hbase.TestRegionRebalancing

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14574//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14574//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14574//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14574//console

This message is automatically generated.

 ImportTsv: Add dry-run functionality and log bad rows
 -

 Key: HBASE-13702
 URL: https://issues.apache.org/jira/browse/HBASE-13702
 Project: HBase
  Issue Type: New Feature
Reporter: Apekshit Sharma
Assignee: Apekshit Sharma
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch, 
 HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch


 ImportTSV job skips bad records by default (keeps a count though). 
 -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is 
 encountered. 
 To be easily able to determine which rows are corrupted in an input, rather 
 than failing on one row at a time seems like a good feature to have.
 Moreover, there should be 'dry-run' functionality in such kinds of tools, 
 which can essentially does a quick run of tool without making any changes but 
 reporting any errors/warnings and success/failure.
 To identify corrupted rows, simply logging them should be enough. In worst 
 case, all rows will be logged and size of logs will be same as input size, 
 which seems fine. However, user might have to do some work figuring out where 
 the logs. Is there some link we can show to the user when the tool starts 
 which can help them with that?
 For the dry run, we can simply use if-else to skip over writing out KVs, and 
 any other mutations, if present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases

2015-06-25 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-13959:
--
Attachment: 13959-suggest.txt

Something like this for example.
(This does leak HStore references into SplitTransactionImpl, though)

I'm just trying to a good default, rather than something else that everyone 
will have to configure.

 Region splitting takes too long because it uses a single thread in most 
 common cases
 

 Key: HBASE-13959
 URL: https://issues.apache.org/jira/browse/HBASE-13959
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.12
Reporter: Hari Krishna Dara
Assignee: Hari Krishna Dara
 Fix For: 0.98.14

 Attachments: 13959-suggest.txt, HBASE-13959-2.patch, 
 HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959.patch, 
 region-split-durations-compared.png


 When storefiles need to be split as part of a region split, the current logic 
 uses a threadpool with the size set to the size of the number of stores. 
 Since most common table setup involves only a single column family, this 
 translates to having a single store and so the threadpool is run with a 
 single thread. However, in a write heavy workload, there could be several 
 tens of storefiles in a store at the time of splitting, and with a threadpool 
 size of one, these files end up getting split sequentially.
 With a bit of tracing, I noticed that it takes on an average of 350ms to 
 create a single reference file, and splitting each storefile involves 
 creating two of these, so with a storefile count of 20, it takes about 14s 
 just to get through this phase alone (2 reference files for each storefile), 
 pushing the total time the region is offline to 18s or more. For environments 
 that are setup to fail fast, this makes the client exhaust all retries and 
 fail with NotServingRegionException.
 The fix should increase the concurrency of this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low

2015-06-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602383#comment-14602383
 ] 

Hadoop QA commented on HBASE-13832:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12742004/HBASE-13832-v1.patch
  against master branch at commit 2ed058554c0b6d6da0388497562e254107f13d67.
  ATTACHMENT ID: 12742004

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1902 checkstyle errors (more than the master's current 1901 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14582//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14582//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14582//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14582//console

This message is automatically generated.

 Procedure V2: master fail to start due to WALProcedureStore sync failures 
 when HDFS data nodes count is low
 ---

 Key: HBASE-13832
 URL: https://issues.apache.org/jira/browse/HBASE-13832
 Project: HBase
  Issue Type: Sub-task
  Components: master, proc-v2
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Stephen Yuan Jiang
Assignee: Matteo Bertozzi
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, 
 HDFSPipeline.java


 when the data node  3, we got failure in WALProcedureStore#syncLoop() during 
 master start.  The failure prevents master to get started.  
 {noformat}
 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] 
 wal.WALProcedureStore: Sync slot failed, abort.
 java.io.IOException: Failed to replace a bad datanode on the existing 
 pipeline due to no more good datanodes being available to try. (Nodes: 
 current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  
 DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
  
 original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-
 490ece56c772,DISK]]). The current failed datanode replacement policy is 
 DEFAULT, and a client may configure this via 
 'dfs.client.block.write.replace-datanode-on-failure.policy'  in its 
 configuration.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
 {noformat}
 One proposal is to implement some similar logic as FSHLog: if IOException is 
 thrown during syncLoop in WALProcedureStore#start(), instead of immediate 
 abort, we could try to roll the log and see whether this resolve the issue; 
 if the new log cannot be created or more exception from rolling the log, we 
 then abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13750) set up jenkins builds that run branch-1 ITs with java 8

2015-06-25 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602390#comment-14602390
 ] 

Sean Busbey commented on HBASE-13750:
-

But yes, the matrix is part of the jenkins config. Sorry for leaving that out.

 set up jenkins builds that run branch-1 ITs with java 8
 ---

 Key: HBASE-13750
 URL: https://issues.apache.org/jira/browse/HBASE-13750
 Project: HBase
  Issue Type: Sub-task
Reporter: Sean Busbey
Assignee: Sean Busbey
 Fix For: 1.2.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13750) set up jenkins builds that run branch-1 ITs with java 8

2015-06-25 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602389#comment-14602389
 ] 

Sean Busbey commented on HBASE-13750:
-

A given execution is invoke with a line like this:

{noformat}
$ mvn -Dit.test=IntegrationTestBigLinkedList -Dtest=NoUnitTests clean package 
verify
{noformat}

Instead of using an exclusion the matrix just expressly lists each test that 
should be run.

 set up jenkins builds that run branch-1 ITs with java 8
 ---

 Key: HBASE-13750
 URL: https://issues.apache.org/jira/browse/HBASE-13750
 Project: HBase
  Issue Type: Sub-task
Reporter: Sean Busbey
Assignee: Sean Busbey
 Fix For: 1.2.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13939) Make HFileReaderImpl.getFirstKeyInBlock() to return a Cell

2015-06-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602408#comment-14602408
 ] 

stack commented on HBASE-13939:
---

Failures are:

kalashnikov:hbase.git stack$ python ./dev-support/findHangingTests.py 
https://builds.apache.org/job/PreCommit-HBASE-Build/14548/consoleText
Fetching the console output from the URL
Printing hanging tests
Printing Failing tests
Failing test : org.apache.hadoop.hbase.client.TestFastFail
Failing test : org.apache.hadoop.hbase.master.TestDistributedLogSplitting


Look unrelated and common failures currently.

Looked at patch again. Nice. +1 to commit [~ram_krish]

 Make HFileReaderImpl.getFirstKeyInBlock() to return a Cell
 --

 Key: HBASE-13939
 URL: https://issues.apache.org/jira/browse/HBASE-13939
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 2.0.0, 1.1.2

 Attachments: HBASE-13939.patch, HBASE-13939_1.patch, 
 HBASE-13939_2.patch, HBASE-13939_3.patch, HBASE-13939_3.patch, 
 HBASE-13939_branch-1.1.patch


 The getFirstKeyInBlock() in HFileReaderImpl is returning a BB. It is getting 
 used in seekBefore cases.  Because we return a BB we create a KeyOnlyKV once 
 for comparison
 {code}
   if (reader.getComparator()
   .compareKeyIgnoresMvcc(
   new KeyValue.KeyOnlyKeyValue(firstKey.array(), 
 firstKey.arrayOffset(),
   firstKey.limit()), key) = 0) {
 long previousBlockOffset = seekToBlock.getPrevBlockOffset();
 // The key we are interested in
 if (previousBlockOffset == -1) {
   // we have a 'problem', the key we want is the first of the file.
   return false;
 }
 
 {code}
 And if the compare fails we again create another KeyOnlyKv 
 {code}
   Cell firstKeyInCurrentBlock = new 
 KeyValue.KeyOnlyKeyValue(Bytes.getBytes(firstKey));
   loadBlockAndSeekToKey(seekToBlock, firstKeyInCurrentBlock, true, key, 
 true);
 {code}
 So one object will be enough and that can be returned by getFirstKeyInBlock. 
 Also will be useful when we go with Buffered backed server cell to change in 
 one place. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13947) Use MasterServices instead of Server in AssignmentManager

2015-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602423#comment-14602423
 ] 

Hudson commented on HBASE-13947:


FAILURE: Integrated in HBase-1.2 #34 (See 
[https://builds.apache.org/job/HBase-1.2/34/])
HBASE-13947 Use MasterServices instead of Server in AssignmentManager 
(matteo.bertozzi: rev d476b56c4b3a0f203dcfe8e4d0c652795ac9d50d)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


 Use MasterServices instead of Server in AssignmentManager
 -

 Key: HBASE-13947
 URL: https://issues.apache.org/jira/browse/HBASE-13947
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 1.2.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Trivial
 Fix For: 1.2.0

 Attachments: HBASE-13947-v0-branch-1.patch


 Working on a patch for branch-1, the AM is using Server instead of 
 MasterServices and does a cast to MasterServices when needed. We should have 
 MasterServices as arg as we do in master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer

2015-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602422#comment-14602422
 ] 

Hudson commented on HBASE-13969:


FAILURE: Integrated in HBase-1.2 #34 (See 
[https://builds.apache.org/job/HBase-1.2/34/])
HBASE-13969 AuthenticationTokenSecretManager is never stopped in RPCServer 
(Pankaj Kumar) (tedyu: rev 139cb4e979d2b7f19072bfd0873cb9f206a2038e)
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java


 AuthenticationTokenSecretManager is never stopped in RPCServer
 --

 Key: HBASE-13969
 URL: https://issues.apache.org/jira/browse/HBASE-13969
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.13
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: 13969-V2.patch, HBASE-13969-V2.patch, HBASE-13969.patch


 AuthenticationTokenSecretManager is never stopped in RPCServer.
 {code}
 AuthenticationTokenSecretManager mgr = createSecretManager();
 if (mgr != null) {
   setSecretManager(mgr);
   mgr.start();
 }
 {code}
 It should be stopped during exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer

2015-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602424#comment-14602424
 ] 

Hudson commented on HBASE-13969:


FAILURE: Integrated in HBase-TRUNK #6605 (See 
[https://builds.apache.org/job/HBase-TRUNK/6605/])
HBASE-13969 AuthenticationTokenSecretManager is never stopped in RPCServer 
(Pankaj Kumar) (tedyu: rev 2ed058554c0b6d6da0388497562e254107f13d67)
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java


 AuthenticationTokenSecretManager is never stopped in RPCServer
 --

 Key: HBASE-13969
 URL: https://issues.apache.org/jira/browse/HBASE-13969
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.13
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: 13969-V2.patch, HBASE-13969-V2.patch, HBASE-13969.patch


 AuthenticationTokenSecretManager is never stopped in RPCServer.
 {code}
 AuthenticationTokenSecretManager mgr = createSecretManager();
 if (mgr != null) {
   setSecretManager(mgr);
   mgr.start();
 }
 {code}
 It should be stopped during exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer

2015-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602438#comment-14602438
 ] 

Hudson commented on HBASE-13969:


SUCCESS: Integrated in HBase-1.2-IT #24 (See 
[https://builds.apache.org/job/HBase-1.2-IT/24/])
HBASE-13969 AuthenticationTokenSecretManager is never stopped in RPCServer 
(Pankaj Kumar) (tedyu: rev 139cb4e979d2b7f19072bfd0873cb9f206a2038e)
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java


 AuthenticationTokenSecretManager is never stopped in RPCServer
 --

 Key: HBASE-13969
 URL: https://issues.apache.org/jira/browse/HBASE-13969
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.13
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: 13969-V2.patch, HBASE-13969-V2.patch, HBASE-13969.patch


 AuthenticationTokenSecretManager is never stopped in RPCServer.
 {code}
 AuthenticationTokenSecretManager mgr = createSecretManager();
 if (mgr != null) {
   setSecretManager(mgr);
   mgr.start();
 }
 {code}
 It should be stopped during exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer

2015-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602448#comment-14602448
 ] 

Hudson commented on HBASE-13969:


SUCCESS: Integrated in HBase-1.3-IT #9 (See 
[https://builds.apache.org/job/HBase-1.3-IT/9/])
HBASE-13969 AuthenticationTokenSecretManager is never stopped in RPCServer 
(Pankaj Kumar) (tedyu: rev 6e9a30280871987c35dbb67c5d3217915f105d01)
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java


 AuthenticationTokenSecretManager is never stopped in RPCServer
 --

 Key: HBASE-13969
 URL: https://issues.apache.org/jira/browse/HBASE-13969
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.13
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: 13969-V2.patch, HBASE-13969-V2.patch, HBASE-13969.patch


 AuthenticationTokenSecretManager is never stopped in RPCServer.
 {code}
 AuthenticationTokenSecretManager mgr = createSecretManager();
 if (mgr != null) {
   setSecretManager(mgr);
   mgr.start();
 }
 {code}
 It should be stopped during exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13970) NPE during compaction in trunk

2015-06-25 Thread ramkrishna.s.vasudevan (JIRA)
ramkrishna.s.vasudevan created HBASE-13970:
--

 Summary: NPE during compaction in trunk
 Key: HBASE-13970
 URL: https://issues.apache.org/jira/browse/HBASE-13970
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan


Updated the trunk.. Loaded the table with PE tool.  Trigger a flush to ensure 
all data is flushed out to disk. When the first compaction is triggered we get 
an NPE and this is very easy to reproduce
{code}
015-06-25 21:33:46,041 INFO  [main-EventThread] 
procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
event: /hbase/flush-table-proc/acquired
2015-06-25 21:33:46,051 INFO  
[rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB
2015-06-25 21:33:46,159 ERROR 
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] 
regionserver.CompactSplitThread: Compaction failed Request = 
regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.,
 storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), 
priority=3, time=7536968291719985
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
at 
org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
at 
org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
at 
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
at 
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
at 
org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-06-25 21:33:46,745 INFO  
[rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, 
hasBloomFilter=true, into tmp file 
hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c
2015-06-25 21:33:46,772 INFO  
[rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
regionserver.HStore: Added 
hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c,
 entries=68116, sequenceid=1534, filesize=68.7 M
2015-06-25 21:33:46,773 INFO  
[rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, 
currentsize=0 B/0 for region 
TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.
 in 723ms, sequenceid=1534, compaction requested=true
2015-06-25 21:33:46,780 INFO  [main-EventThread] 
procedure.ZKProcedureMemberRpcs: Received created 
event:/hbase/flush-table-proc/reached/TestTable
2015-06-25 21:33:46,790 INFO  [main-EventThread] 
procedure.ZKProcedureMemberRpcs: Received created 
event:/hbase/flush-table-proc/abort/TestTable
2015-06-25 21:33:46,791 INFO  [main-EventThread] 
procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
event: /hbase/flush-table-proc/abort
2015-06-25 21:33:46,803 INFO  [main-EventThread] 
procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
event: /hbase/flush-table-proc/acquired
2015-06-25 21:33:46,818 INFO  [main-EventThread] 
procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
event: /hbase/flush-table-proc/abort
{code}
Will check this on what is the reason behind it. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13963) avoid leaking jdk.tools

2015-06-25 Thread Gabor Liptak (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601034#comment-14601034
 ] 

Gabor Liptak commented on HBASE-13963:
--

I will look at HADOOP-9406 to see their changes.
I decided to exclude test-jar dependencies )and wasn't sure about 
hbase-testing). I will continue to review this evening.

 avoid leaking jdk.tools
 ---

 Key: HBASE-13963
 URL: https://issues.apache.org/jira/browse/HBASE-13963
 Project: HBase
  Issue Type: Sub-task
  Components: build, documentation
Reporter: Sean Busbey
Assignee: Gabor Liptak
Priority: Critical
 Fix For: 2.0.0, 1.2.0, 1.3.0

 Attachments: HBASE-13963.1.patch


 Right now hbase-annotations uses jdk7 jdk.tools and exposes that to 
 downstream via hbase-client. We need it for building and using our custom 
 doclet, but can improve a couple of things: 
 -1) We should be using a jdk.tools version based on our java version (use jdk 
 activated profiles to set it)-
 2) We should not be including any jdk.tools version in our hbase-client 
 transitive dependencies (or other downstream-facing artifacts). 
 Unfortunately, system dependencies are included in transitive resolution, so 
 we'll need to exclude it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota

2015-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601201#comment-14601201
 ] 

Hudson commented on HBASE-13964:


FAILURE: Integrated in HBase-1.3 #17 (See 
[https://builds.apache.org/job/HBase-1.3/17/])
HBASE-13964 Skip region normalization for tables under namespace quota (tedyu: 
rev ed72fa212875814f7e44eebaf7789710ec670c6a)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/namespace/NamespaceAuditor.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


 Skip region normalization for tables under namespace quota
 --

 Key: HBASE-13964
 URL: https://issues.apache.org/jira/browse/HBASE-13964
 Project: HBase
  Issue Type: Task
  Components: Balancer, Usability
Reporter: Mikhail Antonov
Assignee: Ted Yu
 Fix For: 2.0.0, 1.2.0, 1.3.0

 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 
 13964-v1.txt


 As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to 
 normalize regions of tables under namespace control. What was proposed is to 
 disable normalization of such tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13970) NPE during compaction in trunk

2015-06-25 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600976#comment-14600976
 ] 

ramkrishna.s.vasudevan commented on HBASE-13970:


Note that when I tried to load some more data and when a compaction was 
triggered no NPE happened. 

 NPE during compaction in trunk
 --

 Key: HBASE-13970
 URL: https://issues.apache.org/jira/browse/HBASE-13970
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0


 Updated the trunk.. Loaded the table with PE tool.  Trigger a flush to ensure 
 all data is flushed out to disk. When the first compaction is triggered we 
 get an NPE and this is very easy to reproduce
 {code}
 015-06-25 21:33:46,041 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,051 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB
 2015-06-25 21:33:46,159 ERROR 
 [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] 
 regionserver.CompactSplitThread: Compaction failed Request = 
 regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.,
  storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), 
 priority=3, time=7536968291719985
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
 at 
 org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
 at 
 org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
 at 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-06-25 21:33:46,745 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, 
 hasBloomFilter=true, into tmp file 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c
 2015-06-25 21:33:46,772 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HStore: Added 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c,
  entries=68116, sequenceid=1534, filesize=68.7 M
 2015-06-25 21:33:46,773 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, 
 currentsize=0 B/0 for region 
 TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.
  in 723ms, sequenceid=1534, compaction requested=true
 2015-06-25 21:33:46,780 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/reached/TestTable
 2015-06-25 21:33:46,790 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/abort/TestTable
 2015-06-25 21:33:46,791 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 2015-06-25 21:33:46,803 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,818 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 {code}
 Will check this on what is the reason behind it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases

2015-06-25 Thread Hari Krishna Dara (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601035#comment-14601035
 ] 

Hari Krishna Dara commented on HBASE-13959:
---

I just attached region-split-durations-compared.png.

I have done a basic comparison of split times with one thread vs 8 threads on a 
table. The table had no presplits and had a single column family. Starting from 
an empty table, I loaded 400M rows (about 570 bytes/row). The run with 1 thread 
encountered NSRE exceptions a few times that coincides with a long running 
split. The run with 8 threads had no NSRE's. Here are some numbers:

Thread pool size = 1
Number of splits: 27
Average split duration: 8.44s
Min split duration: 3
Max split duration: 16
p99 split duration: 16

Thread pool size = 8
Number of splits: 25
Average split duration: 3.4s
Min split duration: 2
Max split duration: 6
p99 split duration: 5.76


I will attach an histogram showing the durations side by side.

 Region splitting takes too long because it uses a single thread in most 
 common cases
 

 Key: HBASE-13959
 URL: https://issues.apache.org/jira/browse/HBASE-13959
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.12
Reporter: Hari Krishna Dara
Assignee: Hari Krishna Dara
 Fix For: 0.98.14

 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, 
 HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png


 When storefiles need to be split as part of a region split, the current logic 
 uses a threadpool with the size set to the size of the number of stores. 
 Since most common table setup involves only a single column family, this 
 translates to having a single store and so the threadpool is run with a 
 single thread. However, in a write heavy workload, there could be several 
 tens of storefiles in a store at the time of splitting, and with a threadpool 
 size of one, these files end up getting split sequentially.
 With a bit of tracing, I noticed that it takes on an average of 350ms to 
 create a single reference file, and splitting each storefile involves 
 creating two of these, so with a storefile count of 20, it takes about 14s 
 just to get through this phase alone (2 reference files for each storefile), 
 pushing the total time the region is offline to 18s or more. For environments 
 that are setup to fail fast, this makes the client exhaust all retries and 
 fail with NotServingRegionException.
 The fix should increase the concurrency of this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases

2015-06-25 Thread Hari Krishna Dara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Krishna Dara updated HBASE-13959:
--
Attachment: region-split-durations-compared.png

 Region splitting takes too long because it uses a single thread in most 
 common cases
 

 Key: HBASE-13959
 URL: https://issues.apache.org/jira/browse/HBASE-13959
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.12
Reporter: Hari Krishna Dara
Assignee: Hari Krishna Dara
 Fix For: 0.98.14

 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, 
 HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png


 When storefiles need to be split as part of a region split, the current logic 
 uses a threadpool with the size set to the size of the number of stores. 
 Since most common table setup involves only a single column family, this 
 translates to having a single store and so the threadpool is run with a 
 single thread. However, in a write heavy workload, there could be several 
 tens of storefiles in a store at the time of splitting, and with a threadpool 
 size of one, these files end up getting split sequentially.
 With a bit of tracing, I noticed that it takes on an average of 350ms to 
 create a single reference file, and splitting each storefile involves 
 creating two of these, so with a storefile count of 20, it takes about 14s 
 just to get through this phase alone (2 reference files for each storefile), 
 pushing the total time the region is offline to 18s or more. For environments 
 that are setup to fail fast, this makes the client exhaust all retries and 
 fail with NotServingRegionException.
 The fix should increase the concurrency of this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota

2015-06-25 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600979#comment-14600979
 ] 

Mikhail Antonov commented on HBASE-13964:
-

Yeah, speaking of the savings.. I didn't do any profiling, but I'm fairly 
certain that cost of extra calls here is negligible compared to time of actual 
split/merge for any real-world regions :) so I wouldn't worry about it now.

 Skip region normalization for tables under namespace quota
 --

 Key: HBASE-13964
 URL: https://issues.apache.org/jira/browse/HBASE-13964
 Project: HBase
  Issue Type: Task
  Components: Balancer, Usability
Reporter: Mikhail Antonov
Assignee: Ted Yu
 Fix For: 2.0.0, 1.2.0, 1.3.0

 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 
 13964-v1.txt


 As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to 
 normalize regions of tables under namespace control. What was proposed is to 
 disable normalization of such tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota

2015-06-25 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600980#comment-14600980
 ] 

Mikhail Antonov commented on HBASE-13964:
-

Thanks [~te...@apache.org]!

 Skip region normalization for tables under namespace quota
 --

 Key: HBASE-13964
 URL: https://issues.apache.org/jira/browse/HBASE-13964
 Project: HBase
  Issue Type: Task
  Components: Balancer, Usability
Reporter: Mikhail Antonov
Assignee: Ted Yu
 Fix For: 2.0.0, 1.2.0, 1.3.0

 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 
 13964-v1.txt


 As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to 
 normalize regions of tables under namespace control. What was proposed is to 
 disable normalization of such tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13970) NPE during compaction in trunk

2015-06-25 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-13970:
---
Affects Version/s: 2.0.0
Fix Version/s: 2.0.0

 NPE during compaction in trunk
 --

 Key: HBASE-13970
 URL: https://issues.apache.org/jira/browse/HBASE-13970
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0


 Updated the trunk.. Loaded the table with PE tool.  Trigger a flush to ensure 
 all data is flushed out to disk. When the first compaction is triggered we 
 get an NPE and this is very easy to reproduce
 {code}
 015-06-25 21:33:46,041 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,051 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB
 2015-06-25 21:33:46,159 ERROR 
 [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] 
 regionserver.CompactSplitThread: Compaction failed Request = 
 regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.,
  storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), 
 priority=3, time=7536968291719985
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
 at 
 org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
 at 
 org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
 at 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-06-25 21:33:46,745 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, 
 hasBloomFilter=true, into tmp file 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c
 2015-06-25 21:33:46,772 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HStore: Added 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c,
  entries=68116, sequenceid=1534, filesize=68.7 M
 2015-06-25 21:33:46,773 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, 
 currentsize=0 B/0 for region 
 TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.
  in 723ms, sequenceid=1534, compaction requested=true
 2015-06-25 21:33:46,780 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/reached/TestTable
 2015-06-25 21:33:46,790 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/abort/TestTable
 2015-06-25 21:33:46,791 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 2015-06-25 21:33:46,803 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,818 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 {code}
 Will check this on what is the reason behind it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13964) Skip region normalization for tables under namespace quota

2015-06-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13964:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the reviews, Mikhail and Ashish.

 Skip region normalization for tables under namespace quota
 --

 Key: HBASE-13964
 URL: https://issues.apache.org/jira/browse/HBASE-13964
 Project: HBase
  Issue Type: Task
  Components: Balancer, Usability
Reporter: Mikhail Antonov
Assignee: Ted Yu
 Fix For: 2.0.0, 1.2.0, 1.3.0

 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 
 13964-v1.txt


 As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to 
 normalize regions of tables under namespace control. What was proposed is to 
 disable normalization of such tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13214) Remove deprecated and unused methods from HTable class

2015-06-25 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600993#comment-14600993
 ] 

Ashish Singhi commented on HBASE-13214:
---

Any more comments ? If not can we commit this I am worried about it getting 
rotten.

 Remove deprecated and unused methods from HTable class
 --

 Key: HBASE-13214
 URL: https://issues.apache.org/jira/browse/HBASE-13214
 Project: HBase
  Issue Type: Sub-task
  Components: API
Affects Versions: 2.0.0
Reporter: Mikhail Antonov
Assignee: Ashish Singhi
 Fix For: 2.0.0

 Attachments: HBASE-13214-v1.patch, HBASE-13214-v2-again-v1.patch, 
 HBASE-13214-v2-again.patch, HBASE-13214-v2.patch, HBASE-13214-v3.patch, 
 HBASE-13214-v3.patch, HBASE-13214.patch


 Methods like #getRegionLocation(), #isTableEnabled() etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-13970) NPE during compaction in trunk

2015-06-25 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601010#comment-14601010
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-13970 at 6/25/15 11:11 AM:
--

The reason here is that there are 2 compactions getting triggered from the 
CompactionSplitThread on a region that is newly split.  One may be due to the 
split that happened?
So when both the compaction goes parallely then the 
PressureAwareCompactionController is started by the Compactor thread
{code}
@Override
  public void start(String compactionName) {
activeCompactions.put(compactionName, new ActiveCompaction());
  }
{code}
Now by the time the second compaction is in progress the first compaction 
completes and does a finish
{code}
@Override
  public void finish(String compactionName) {
ActiveCompaction compaction = activeCompactions.remove(compactionName);
long elapsedTime = Math.max(1, EnvironmentEdgeManager.currentTime() - 
compaction.startTime);
.
{code}
The compactionName is going to be common here because 
{code}
String compactionName =
store.getRegionInfo().getRegionNameAsString() + # + 
store.getFamily().getNameAsString();
{code}
When the second compaction comes for completion the compaction has already been 
removed and hence we get an NPE.

Logs
One compaction started with 4 files
{code}
2015-06-25 22:07:49,135 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] 
regionserver.HRegion: Starting compaction on info in region 
TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
2015-06-25 22:07:49,135 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] 
regionserver.HStore: Starting compaction of 4 file(s) in info of 
TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
 into 
tmpdir=hdfs://stobdtserver3:9010/hbase/data/default/TestTable/5eb54f001fd85035ab448f44d049ab84/.tmp,
 totalSize=285.6 M
2015-06-25 22:07:49,165 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] 
hfile.CacheConfig: 
blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@71f1ce16, 
cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
prefetchOnOpen=false
2015-06-25 22:07:49,954 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040.logRoller] wal.FSHLog: Rolled 
WAL 
/hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250268365
 with entries=90, filesize=124.14 MB; new WAL 
/hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250269933
20
{code}
Another compaction has been started with 3 files
{code}
2015-06-25 22:07:53,405 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] 
regionserver.HRegion: Starting compaction on info in region 
TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
2015-06-25 22:07:53,406 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] 
regionserver.HStore: Starting compaction of 3 file(s) in info of 
TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
 into 
tmpdir=hdfs://stobdtserver3:9010/hbase/data/default/TestTable/5eb54f001fd85035ab448f44d049ab84/.tmp,
 totalSize=343.4 M
2015-06-25 22:07:53,411 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] 
hfile.CacheConfig: 
blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@71f1ce16, 
cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
prefetchOnOpen=false
2015-06-25 22:07:54,211 INFO  [MemStoreFlusher.1] regionserver.HRegion: 
Flushing 1/1 column families, memstore=128.23 MB
2015-06-25 22:07:54,639 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040.logRoller] wal.FSHLog: Rolled 
WAL 
/hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250273034
 with entries=90, filesize=
{code}
{code}
2015-06-25 22:08:09,446 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] 
compactions.PressureAwareCompactionThroughputController: 
TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.#info
 average throughput is 19.80 MB/sec, slept 30 time(s) and total slept time is 
27694 ms. 0 active compactions remaining, total limit is 10.00 MB/sec
2015-06-25 22:08:09,520 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] 
regionserver.HStore: Completed compaction of 4 (all) file(s) in info of 

[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota

2015-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601113#comment-14601113
 ] 

Hudson commented on HBASE-13964:


SUCCESS: Integrated in HBase-1.2 #31 (See 
[https://builds.apache.org/job/HBase-1.2/31/])
HBASE-13964 Skip region normalization for tables under namespace quota (tedyu: 
rev c680e2f40927407a0699b0b1ce687867bc2bb398)
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/namespace/NamespaceAuditor.java


 Skip region normalization for tables under namespace quota
 --

 Key: HBASE-13964
 URL: https://issues.apache.org/jira/browse/HBASE-13964
 Project: HBase
  Issue Type: Task
  Components: Balancer, Usability
Reporter: Mikhail Antonov
Assignee: Ted Yu
 Fix For: 2.0.0, 1.2.0, 1.3.0

 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 
 13964-v1.txt


 As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to 
 normalize regions of tables under namespace control. What was proposed is to 
 disable normalization of such tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota

2015-06-25 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600971#comment-14600971
 ] 

Ashish Singhi commented on HBASE-13964:
---

bq. I thought about this. Even if we do that, the null check must be made. So 
there is not much saving.
Yes null check must be made, suppose if it is not null then we call again 
quotaManager.getNamespaceQuotaManager() which can be save one extra call for 
each enabled table in the cluster. But again as you said there is not much 
saving, so I am ok with as it is also.

 Skip region normalization for tables under namespace quota
 --

 Key: HBASE-13964
 URL: https://issues.apache.org/jira/browse/HBASE-13964
 Project: HBase
  Issue Type: Task
  Components: Balancer, Usability
Reporter: Mikhail Antonov
Assignee: Ted Yu
 Fix For: 2.0.0, 1.2.0, 1.3.0

 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 
 13964-v1.txt


 As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to 
 normalize regions of tables under namespace control. What was proposed is to 
 disable normalization of such tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases

2015-06-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601087#comment-14601087
 ] 

Hadoop QA commented on HBASE-13959:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12741822/HBASE-13959-4.patch
  against master branch at commit 2df3236a4eee48bf723213a7c4ff3d29c832c8cf.
  ATTACHMENT ID: 12741822

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.util.TestHBaseFsck

 {color:red}-1 core zombie tests{color}.  There are 3 zombie test(s): 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14567//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14567//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14567//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14567//console

This message is automatically generated.

 Region splitting takes too long because it uses a single thread in most 
 common cases
 

 Key: HBASE-13959
 URL: https://issues.apache.org/jira/browse/HBASE-13959
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.12
Reporter: Hari Krishna Dara
Assignee: Hari Krishna Dara
 Fix For: 0.98.14

 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, 
 HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png


 When storefiles need to be split as part of a region split, the current logic 
 uses a threadpool with the size set to the size of the number of stores. 
 Since most common table setup involves only a single column family, this 
 translates to having a single store and so the threadpool is run with a 
 single thread. However, in a write heavy workload, there could be several 
 tens of storefiles in a store at the time of splitting, and with a threadpool 
 size of one, these files end up getting split sequentially.
 With a bit of tracing, I noticed that it takes on an average of 350ms to 
 create a single reference file, and splitting each storefile involves 
 creating two of these, so with a storefile count of 20, it takes about 14s 
 just to get through this phase alone (2 reference files for each storefile), 
 pushing the total time the region is offline to 18s or more. For environments 
 that are setup to fail fast, this makes the client exhaust all retries and 
 fail with NotServingRegionException.
 The fix should increase the concurrency of this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota

2015-06-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600969#comment-14600969
 ] 

Ted Yu commented on HBASE-13964:


bq. that we are skipping region normalizing

The log I added is consistent with existing log w.r.t. system table.

bq. Can we extract it to a local variable ?

I thought about this. Even if we do that, the null check must be made. So there 
is not much saving.

 Skip region normalization for tables under namespace quota
 --

 Key: HBASE-13964
 URL: https://issues.apache.org/jira/browse/HBASE-13964
 Project: HBase
  Issue Type: Task
  Components: Balancer, Usability
Reporter: Mikhail Antonov
Assignee: Ted Yu
 Fix For: 2.0.0, 1.2.0, 1.3.0

 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 
 13964-v1.txt


 As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to 
 normalize regions of tables under namespace control. What was proposed is to 
 disable normalization of such tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases

2015-06-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600973#comment-14600973
 ] 

Ted Yu commented on HBASE-13959:


Is it possible to measure performance gain with your patch ?

Thanks

 Region splitting takes too long because it uses a single thread in most 
 common cases
 

 Key: HBASE-13959
 URL: https://issues.apache.org/jira/browse/HBASE-13959
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.12
Reporter: Hari Krishna Dara
Assignee: Hari Krishna Dara
 Fix For: 0.98.14

 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, 
 HBASE-13959-4.patch, HBASE-13959.patch


 When storefiles need to be split as part of a region split, the current logic 
 uses a threadpool with the size set to the size of the number of stores. 
 Since most common table setup involves only a single column family, this 
 translates to having a single store and so the threadpool is run with a 
 single thread. However, in a write heavy workload, there could be several 
 tens of storefiles in a store at the time of splitting, and with a threadpool 
 size of one, these files end up getting split sequentially.
 With a bit of tracing, I noticed that it takes on an average of 350ms to 
 create a single reference file, and splitting each storefile involves 
 creating two of these, so with a storefile count of 20, it takes about 14s 
 just to get through this phase alone (2 reference files for each storefile), 
 pushing the total time the region is offline to 18s or more. For environments 
 that are setup to fail fast, this makes the client exhaust all retries and 
 fail with NotServingRegionException.
 The fix should increase the concurrency of this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13970) NPE during compaction in trunk

2015-06-25 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601010#comment-14601010
 ] 

ramkrishna.s.vasudevan commented on HBASE-13970:


The reason here is that there are 2 compactions getting triggered from the 
CompactionSplitThread on a region that is newly split.  One may be due to the 
split that happened?
So when both the compaction goes parallely then the 
PressureAwareCompactionController is started by the Compactor thread
{code}
@Override
  public void start(String compactionName) {
activeCompactions.put(compactionName, new ActiveCompaction());
  }
{code}
Now by the time the second compaction is in progress the first compaction 
completes and does a finish
{code}
@Override
  public void finish(String compactionName) {
ActiveCompaction compaction = activeCompactions.remove(compactionName);
long elapsedTime = Math.max(1, EnvironmentEdgeManager.currentTime() - 
compaction.startTime);
.
{code}
The compactionName is going to be common here because 
{code}
String compactionName =
store.getRegionInfo().getRegionNameAsString() + # + 
store.getFamily().getNameAsString();
{code}
When the second compaction comes for completion the compaction has already been 
removed and hence we get an NPE.

Logs
One compaction started with 4 files
{code}
2015-06-25 22:07:49,135 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] 
regionserver.HRegion: Starting compaction on info in region 
TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
2015-06-25 22:07:49,135 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] 
regionserver.HStore: Starting compaction of 4 file(s) in info of 
TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
 into 
tmpdir=hdfs://stobdtserver3:9010/hbase/data/default/TestTable/5eb54f001fd85035ab448f44d049ab84/.tmp,
 totalSize=285.6 M
2015-06-25 22:07:49,165 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] 
hfile.CacheConfig: 
blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@71f1ce16, 
cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
prefetchOnOpen=false
2015-06-25 22:07:49,954 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040.logRoller] wal.FSHLog: Rolled 
WAL 
/hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250268365
 with entries=90, filesize=124.14 MB; new WAL 
/hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250269933
20
{code}
Another compaction has been started with 3 files
{code}
2015-06-25 22:07:53,405 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] 
regionserver.HRegion: Starting compaction on info in region 
TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
2015-06-25 22:07:53,406 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] 
regionserver.HStore: Starting compaction of 3 file(s) in info of 
TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
 into 
tmpdir=hdfs://stobdtserver3:9010/hbase/data/default/TestTable/5eb54f001fd85035ab448f44d049ab84/.tmp,
 totalSize=343.4 M
2015-06-25 22:07:53,411 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] 
hfile.CacheConfig: 
blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@71f1ce16, 
cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
prefetchOnOpen=false
2015-06-25 22:07:54,211 INFO  [MemStoreFlusher.1] regionserver.HRegion: 
Flushing 1/1 column families, memstore=128.23 MB
2015-06-25 22:07:54,639 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040.logRoller] wal.FSHLog: Rolled 
WAL 
/hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250273034
 with entries=90, filesize=
{code}
{code}
2015-06-25 22:08:09,446 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] 
compactions.PressureAwareCompactionThroughputController: 
TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.#info
 average throughput is 19.80 MB/sec, slept 30 time(s) and total slept time is 
27694 ms. 0 active compactions remaining, total limit is 10.00 MB/sec
2015-06-25 22:08:09,520 INFO  
[regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] 
regionserver.HStore: Completed compaction of 4 (all) file(s) in info of 
TestTable,283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.
 into 

[jira] [Commented] (HBASE-8642) [Snapshot] List and delete snapshot by table

2015-06-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601096#comment-14601096
 ] 

Hadoop QA commented on HBASE-8642:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12741823/HBASE-8642-v1.patch
  against master branch at commit 2df3236a4eee48bf723213a7c4ff3d29c832c8cf.
  ATTACHMENT ID: 12741823

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+puts \nDelete the above #{count} snapshots (y/n)? \nNOTE: 
Snapshot(s) matching the given regular expressions and taken after the above 
list is displayed will be also deleted. unless count == 0
+puts No snapshots matched the table name regular expression 
#{tableNameregex.to_s} and the snapshot name regular expression 
#{snapshotNameRegex.to_s} if count == 0
+puts #{successfullyDeleted} snapshots successfully deleted. unless 
successfullyDeleted == 0

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.TestRegionRebalancing

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14566//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14566//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14566//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14566//console

This message is automatically generated.

 [Snapshot] List and delete snapshot by table
 

 Key: HBASE-8642
 URL: https://issues.apache.org/jira/browse/HBASE-8642
 Project: HBase
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 0.98.0, 0.95.0, 0.95.1, 0.95.2
Reporter: Julian Zhou
Assignee: Ashish Singhi
 Fix For: 2.0.0

 Attachments: 8642-trunk-0.95-v0.patch, 8642-trunk-0.95-v1.patch, 
 8642-trunk-0.95-v2.patch, HBASE-8642-v1.patch, HBASE-8642.patch


 Support list and delete snapshots by table names.
 User scenario:
 A user wants to delete all the snapshots which were taken in January month 
 for a table 't' where snapshot names starts with 'Jan'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota

2015-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601076#comment-14601076
 ] 

Hudson commented on HBASE-13964:


FAILURE: Integrated in HBase-TRUNK #6601 (See 
[https://builds.apache.org/job/HBase-TRUNK/6601/])
HBASE-13964 Skip region normalization for tables under namespace quota (tedyu: 
rev edef3d64bce41fffbc5649ffa19b2cf80ce28d7a)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/namespace/NamespaceAuditor.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


 Skip region normalization for tables under namespace quota
 --

 Key: HBASE-13964
 URL: https://issues.apache.org/jira/browse/HBASE-13964
 Project: HBase
  Issue Type: Task
  Components: Balancer, Usability
Reporter: Mikhail Antonov
Assignee: Ted Yu
 Fix For: 2.0.0, 1.2.0, 1.3.0

 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 
 13964-v1.txt


 As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to 
 normalize regions of tables under namespace control. What was proposed is to 
 disable normalization of such tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota

2015-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601212#comment-14601212
 ] 

Hudson commented on HBASE-13964:


SUCCESS: Integrated in HBase-1.2-IT #21 (See 
[https://builds.apache.org/job/HBase-1.2-IT/21/])
HBASE-13964 Skip region normalization for tables under namespace quota (tedyu: 
rev c680e2f40927407a0699b0b1ce687867bc2bb398)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/namespace/NamespaceAuditor.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


 Skip region normalization for tables under namespace quota
 --

 Key: HBASE-13964
 URL: https://issues.apache.org/jira/browse/HBASE-13964
 Project: HBase
  Issue Type: Task
  Components: Balancer, Usability
Reporter: Mikhail Antonov
Assignee: Ted Yu
 Fix For: 2.0.0, 1.2.0, 1.3.0

 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 
 13964-v1.txt


 As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to 
 normalize regions of tables under namespace control. What was proposed is to 
 disable normalization of such tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases

2015-06-25 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601685#comment-14601685
 ] 

Lars Hofhansl edited comment on HBASE-13959 at 6/25/15 7:25 PM:


Nice find and patch. The 8 seems to come out of nowhere.
Do you have numbers for different numbers of threads?
Maybe default it to 1/2 of blocking store file count...?


was (Author: lhofhansl):
Nice find and patch. The 8 seems to come out of nowhere.
Do you have numbers for different numbers of threads?
Maybe default it to 1/2 of block store file count...?

 Region splitting takes too long because it uses a single thread in most 
 common cases
 

 Key: HBASE-13959
 URL: https://issues.apache.org/jira/browse/HBASE-13959
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.12
Reporter: Hari Krishna Dara
Assignee: Hari Krishna Dara
 Fix For: 0.98.14

 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, 
 HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png


 When storefiles need to be split as part of a region split, the current logic 
 uses a threadpool with the size set to the size of the number of stores. 
 Since most common table setup involves only a single column family, this 
 translates to having a single store and so the threadpool is run with a 
 single thread. However, in a write heavy workload, there could be several 
 tens of storefiles in a store at the time of splitting, and with a threadpool 
 size of one, these files end up getting split sequentially.
 With a bit of tracing, I noticed that it takes on an average of 350ms to 
 create a single reference file, and splitting each storefile involves 
 creating two of these, so with a storefile count of 20, it takes about 14s 
 just to get through this phase alone (2 reference files for each storefile), 
 pushing the total time the region is offline to 18s or more. For environments 
 that are setup to fail fast, this makes the client exhaust all retries and 
 fail with NotServingRegionException.
 The fix should increase the concurrency of this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13950) Add a NoopProcedureStore for testing

2015-06-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601801#comment-14601801
 ] 

Hadoop QA commented on HBASE-13950:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12741896/HBASE-13950-v1.patch
  against master branch at commit edef3d64bce41fffbc5649ffa19b2cf80ce28d7a.
  ATTACHMENT ID: 12741896

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14573//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14573//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14573//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14573//console

This message is automatically generated.

 Add a NoopProcedureStore for testing
 

 Key: HBASE-13950
 URL: https://issues.apache.org/jira/browse/HBASE-13950
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Trivial
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-13950-v0-branch-1.patch, 
 HBASE-13950-v1-branch-1.patch, HBASE-13950-v1.patch


 Add a NoopProcedureStore and an helper in ProcedureTestingUtil to 
 submitAndWait() a procedure without having to do anything else.
 This is useful to avoid extra code like in case of 
 TestAssignmentManager.processServerShutdownHandler()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13336) Consistent rules for security meta table protections

2015-06-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601886#comment-14601886
 ] 

Hadoop QA commented on HBASE-13336:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12741807/HBASE-13336_v2.patch
  against master branch at commit edef3d64bce41fffbc5649ffa19b2cf80ce28d7a.
  ATTACHMENT ID: 12741807

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14575//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14575//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14575//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14575//console

This message is automatically generated.

 Consistent rules for security meta table protections
 

 Key: HBASE-13336
 URL: https://issues.apache.org/jira/browse/HBASE-13336
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Srikanth Srungarapu
 Fix For: 2.0.0, 0.98.14, 1.3.0

 Attachments: HBASE-13336.patch, HBASE-13336_v2.patch


 The AccessController and VisibilityController do different things regarding 
 protecting their meta tables. The AC allows schema changes and disable/enable 
 if the user has permission. The VC unconditionally disallows all admin 
 actions. Generally, bad things will happen if these meta tables are damaged, 
 disabled, or dropped. The likely outcome is random frequent (or constant) 
 server side op failures with nasty stack traces. On the other hand some 
 things like column family and table attribute changes can have valid use 
 cases. We should have consistent and sensible rules for protecting security 
 meta tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota

2015-06-25 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600955#comment-14600955
 ] 

Ashish Singhi commented on HBASE-13964:
---

+1(non-binding)
Minor nits (Your call whether to not fix or fix it on commit):
bq. Skipping normalizing 
Can we explicitly tell in the log that we are skipping region normalizing ?

bq. quotaManager.getNamespaceQuotaManager() 
Can we extract it to a local variable ?

 Skip region normalization for tables under namespace quota
 --

 Key: HBASE-13964
 URL: https://issues.apache.org/jira/browse/HBASE-13964
 Project: HBase
  Issue Type: Task
  Components: Balancer, Usability
Reporter: Mikhail Antonov
Assignee: Ted Yu
 Fix For: 2.0.0, 1.2.0, 1.3.0

 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 
 13964-v1.txt


 As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to 
 normalize regions of tables under namespace control. What was proposed is to 
 disable normalization of such tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13942) HBase client stalls during region split when client threads hbase.hconnection.threads.max

2015-06-25 Thread Mukund Murrali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Murrali updated HBASE-13942:
---
Component/s: regionserver

 HBase client stalls during region split when client threads  
 hbase.hconnection.threads.max
 ---

 Key: HBASE-13942
 URL: https://issues.apache.org/jira/browse/HBASE-13942
 Project: HBase
  Issue Type: Bug
  Components: Client, regionserver
Reporter: Mukund Murrali

 Performing any operataion using a single hconnection with client threads  
 hbase.hconnection.threads.max causing the client to stall indefinetly during 
 first region split. All the hconnection threads in client side are waiting 
 with the following stack. 
 hconnection-0x648a83fd-shared--pool1-t8 daemon prio=10 
 tid=0x7f447c003800 nid=0x62ff waiting on condition [0x7f44c72f]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x0007d768bdf0 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374)
 at 
 org.apache.hadoop.hbase.util.BoundedCompletionService.take(BoundedCompletionService.java:74)
 at 
 org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:174)
 at 
 org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:56)
 at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
 at 
 org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:145)
 at 
 org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1200)
 at 
 org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1109)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.findAllLocationsOrFail(AsyncProcess.java:916)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.groupAndSendMultiAction(AsyncProcess.java:833)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.resubmit(AsyncProcess.java:1156)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.receiveMultiAction(AsyncProcess.java:1296)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.access$1200(AsyncProcess.java:574)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:716)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer

2015-06-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601830#comment-14601830
 ] 

Ted Yu commented on HBASE-13969:


+1 if tests pass.

 AuthenticationTokenSecretManager is never stopped in RPCServer
 --

 Key: HBASE-13969
 URL: https://issues.apache.org/jira/browse/HBASE-13969
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.13
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-13969-V2.patch, HBASE-13969.patch


 AuthenticationTokenSecretManager is never stopped in RPCServer.
 {code}
 AuthenticationTokenSecretManager mgr = createSecretManager();
 if (mgr != null) {
   setSecretManager(mgr);
   mgr.start();
 }
 {code}
 It should be stopped during exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low

2015-06-25 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-13832:
-
Status: Patch Available  (was: Open)

 Procedure V2: master fail to start due to WALProcedureStore sync failures 
 when HDFS data nodes count is low
 ---

 Key: HBASE-13832
 URL: https://issues.apache.org/jira/browse/HBASE-13832
 Project: HBase
  Issue Type: Sub-task
  Components: master, proc-v2
Affects Versions: 1.1.0, 2.0.0, 1.2.0
Reporter: Stephen Yuan Jiang
Assignee: Matteo Bertozzi
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13832-v0.patch, HDFSPipeline.java


 when the data node  3, we got failure in WALProcedureStore#syncLoop() during 
 master start.  The failure prevents master to get started.  
 {noformat}
 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] 
 wal.WALProcedureStore: Sync slot failed, abort.
 java.io.IOException: Failed to replace a bad datanode on the existing 
 pipeline due to no more good datanodes being available to try. (Nodes: 
 current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  
 DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
  
 original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-
 490ece56c772,DISK]]). The current failed datanode replacement policy is 
 DEFAULT, and a client may configure this via 
 'dfs.client.block.write.replace-datanode-on-failure.policy'  in its 
 configuration.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
 {noformat}
 One proposal is to implement some similar logic as FSHLog: if IOException is 
 thrown during syncLoop in WALProcedureStore#start(), instead of immediate 
 abort, we could try to roll the log and see whether this resolve the issue; 
 if the new log cannot be created or more exception from rolling the log, we 
 then abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low

2015-06-25 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-13832:
-
Priority: Critical  (was: Major)

 Procedure V2: master fail to start due to WALProcedureStore sync failures 
 when HDFS data nodes count is low
 ---

 Key: HBASE-13832
 URL: https://issues.apache.org/jira/browse/HBASE-13832
 Project: HBase
  Issue Type: Sub-task
  Components: master, proc-v2
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Stephen Yuan Jiang
Assignee: Matteo Bertozzi
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13832-v0.patch, HDFSPipeline.java


 when the data node  3, we got failure in WALProcedureStore#syncLoop() during 
 master start.  The failure prevents master to get started.  
 {noformat}
 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] 
 wal.WALProcedureStore: Sync slot failed, abort.
 java.io.IOException: Failed to replace a bad datanode on the existing 
 pipeline due to no more good datanodes being available to try. (Nodes: 
 current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  
 DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
  
 original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-
 490ece56c772,DISK]]). The current failed datanode replacement policy is 
 DEFAULT, and a client may configure this via 
 'dfs.client.block.write.replace-datanode-on-failure.policy'  in its 
 configuration.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
 {noformat}
 One proposal is to implement some similar logic as FSHLog: if IOException is 
 thrown during syncLoop in WALProcedureStore#start(), instead of immediate 
 abort, we could try to roll the log and see whether this resolve the issue; 
 if the new log cannot be created or more exception from rolling the log, we 
 then abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13835) KeyValueHeap.current might be in heap when exception happens in pollRealKV

2015-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601819#comment-14601819
 ] 

Hudson commented on HBASE-13835:


FAILURE: Integrated in HBase-1.0 #975 (See 
[https://builds.apache.org/job/HBase-1.0/975/])
HBASE-13835 KeyValueHeap.current might be in heap when exception happens in 
pollRealKV. (zhouyingchao) (anoopsamjohn: rev 
59357ced27e5ac43c654500479502bd19f1b99ae)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReversedKeyValueHeap.java


 KeyValueHeap.current might be in heap when exception happens in pollRealKV
 --

 Key: HBASE-13835
 URL: https://issues.apache.org/jira/browse/HBASE-13835
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Reporter: zhouyingchao
Assignee: zhouyingchao
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-13835-001.patch, HBASE-13835-002.patch, 
 HBASE-13835-branch1-001.patch, HBASE-13835_0.98.patch, 
 HBASE-13835_branch-1.0.patch, HBASE-13835_branch-1.patch, 
 HBASE-13835_branch-1.patch


 In a 0.94 hbase cluster, we found a NPE with following stack:
 {code}
 Exception in thread regionserver21600.leaseChecker 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1530)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:225)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:201)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:191)
 at 
 java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
 at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
 at java.util.PriorityQueue.poll(PriorityQueue.java:523)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:241)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.close(StoreScanner.java:355)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:237)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:4302)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:3033)
 at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:119)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 Before this NPE exception, there is an exception happens in pollRealKV, which 
 we think is the culprit of the NPE.
 {code}
 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
 java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for 
 reader reader=
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:371)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:366)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:116)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:455)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:154)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:4124)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4196)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4067)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4057)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.internalNext(HRegionServer.java:2898)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2833)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2815)
 at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:337)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1583)
 {code}
 Simply put, if there is an exception happens in pollRealKV( ), the 
 

[jira] [Commented] (HBASE-13835) KeyValueHeap.current might be in heap when exception happens in pollRealKV

2015-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601856#comment-14601856
 ] 

Hudson commented on HBASE-13835:


FAILURE: Integrated in HBase-1.3 #18 (See 
[https://builds.apache.org/job/HBase-1.3/18/])
HBASE-13835 KeyValueHeap.current might be in heap when exception happens in 
pollRealKV. (zhouyingchao) (anoopsamjohn: rev 
92b6622d97d21700a92a4061a7b05dfc7cf5a3df)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReversedKeyValueHeap.java


 KeyValueHeap.current might be in heap when exception happens in pollRealKV
 --

 Key: HBASE-13835
 URL: https://issues.apache.org/jira/browse/HBASE-13835
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Reporter: zhouyingchao
Assignee: zhouyingchao
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-13835-001.patch, HBASE-13835-002.patch, 
 HBASE-13835-branch1-001.patch, HBASE-13835_0.98.patch, 
 HBASE-13835_branch-1.0.patch, HBASE-13835_branch-1.patch, 
 HBASE-13835_branch-1.patch


 In a 0.94 hbase cluster, we found a NPE with following stack:
 {code}
 Exception in thread regionserver21600.leaseChecker 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1530)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:225)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:201)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:191)
 at 
 java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
 at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
 at java.util.PriorityQueue.poll(PriorityQueue.java:523)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:241)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.close(StoreScanner.java:355)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:237)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:4302)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:3033)
 at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:119)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 Before this NPE exception, there is an exception happens in pollRealKV, which 
 we think is the culprit of the NPE.
 {code}
 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
 java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for 
 reader reader=
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:371)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:366)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:116)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:455)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:154)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:4124)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4196)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4067)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4057)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.internalNext(HRegionServer.java:2898)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2833)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2815)
 at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:337)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1583)
 {code}
 Simply put, if there is an exception happens in pollRealKV( ), the 
 

[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low

2015-06-25 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601855#comment-14601855
 ] 

Nick Dimiduk commented on HBASE-13832:
--

Failure of master to start is a problem. Bumping priority and setting some 
fix-version targets.

[~jinghe] are you able to reproduce? Can you take the attached patch for a spin?

 Procedure V2: master fail to start due to WALProcedureStore sync failures 
 when HDFS data nodes count is low
 ---

 Key: HBASE-13832
 URL: https://issues.apache.org/jira/browse/HBASE-13832
 Project: HBase
  Issue Type: Sub-task
  Components: master, proc-v2
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Stephen Yuan Jiang
Assignee: Matteo Bertozzi
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13832-v0.patch, HDFSPipeline.java


 when the data node  3, we got failure in WALProcedureStore#syncLoop() during 
 master start.  The failure prevents master to get started.  
 {noformat}
 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] 
 wal.WALProcedureStore: Sync slot failed, abort.
 java.io.IOException: Failed to replace a bad datanode on the existing 
 pipeline due to no more good datanodes being available to try. (Nodes: 
 current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  
 DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
  
 original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-
 490ece56c772,DISK]]). The current failed datanode replacement policy is 
 DEFAULT, and a client may configure this via 
 'dfs.client.block.write.replace-datanode-on-failure.policy'  in its 
 configuration.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
 {noformat}
 One proposal is to implement some similar logic as FSHLog: if IOException is 
 thrown during syncLoop in WALProcedureStore#start(), instead of immediate 
 abort, we could try to roll the log and see whether this resolve the issue; 
 if the new log cannot be created or more exception from rolling the log, we 
 then abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer

2015-06-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601885#comment-14601885
 ] 

Hadoop QA commented on HBASE-13969:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12741921/HBASE-13969-V2.patch
  against master branch at commit d9ba4d5bb513624fef8787f04b18a57ac5eb5203.
  ATTACHMENT ID: 12741921

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14576//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14576//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14576//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14576//console

This message is automatically generated.

 AuthenticationTokenSecretManager is never stopped in RPCServer
 --

 Key: HBASE-13969
 URL: https://issues.apache.org/jira/browse/HBASE-13969
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.13
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-13969-V2.patch, HBASE-13969.patch


 AuthenticationTokenSecretManager is never stopped in RPCServer.
 {code}
 AuthenticationTokenSecretManager mgr = createSecretManager();
 if (mgr != null) {
   setSecretManager(mgr);
   mgr.start();
 }
 {code}
 It should be stopped during exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13972) Hanging test finder should report killed test

2015-06-25 Thread Ted Yu (JIRA)
Ted Yu created HBASE-13972:
--

 Summary: Hanging test finder should report killed test
 Key: HBASE-13972
 URL: https://issues.apache.org/jira/browse/HBASE-13972
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor


I was looking at 
https://builds.apache.org/job/PreCommit-HBASE-Build/14576/console and found 
that findHangingTests.py didn't report any hanging / failing test.
{code}
Running org.apache.hadoop.hbase.procedure2.store.TestProcedureStoreTracker
Killed
{code}
It turns out that findHangingTests.py didn't distinguish the state for tests 
that were killed.
Patch coming shortly which allows printing of killed test(s)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13835) KeyValueHeap.current might be in heap when exception happens in pollRealKV

2015-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601938#comment-14601938
 ] 

Hudson commented on HBASE-13835:


SUCCESS: Integrated in HBase-1.2-IT #22 (See 
[https://builds.apache.org/job/HBase-1.2-IT/22/])
HBASE-13835 KeyValueHeap.current might be in heap when exception happens in 
pollRealKV. (zhouyingchao) (anoopsamjohn: rev 
27fd3441f5a69a6bb795e28da37f5039545a41e7)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReversedKeyValueHeap.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java


 KeyValueHeap.current might be in heap when exception happens in pollRealKV
 --

 Key: HBASE-13835
 URL: https://issues.apache.org/jira/browse/HBASE-13835
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Reporter: zhouyingchao
Assignee: zhouyingchao
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-13835-001.patch, HBASE-13835-002.patch, 
 HBASE-13835-branch1-001.patch, HBASE-13835_0.98.patch, 
 HBASE-13835_branch-1.0.patch, HBASE-13835_branch-1.patch, 
 HBASE-13835_branch-1.patch


 In a 0.94 hbase cluster, we found a NPE with following stack:
 {code}
 Exception in thread regionserver21600.leaseChecker 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1530)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:225)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:201)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:191)
 at 
 java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
 at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
 at java.util.PriorityQueue.poll(PriorityQueue.java:523)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:241)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.close(StoreScanner.java:355)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:237)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:4302)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:3033)
 at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:119)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 Before this NPE exception, there is an exception happens in pollRealKV, which 
 we think is the culprit of the NPE.
 {code}
 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
 java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for 
 reader reader=
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:371)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:366)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:116)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:455)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:154)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:4124)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4196)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4067)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4057)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.internalNext(HRegionServer.java:2898)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2833)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2815)
 at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:337)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1583)
 {code}
 Simply put, if there is an exception happens in pollRealKV( ), the 
 

[jira] [Commented] (HBASE-13835) KeyValueHeap.current might be in heap when exception happens in pollRealKV

2015-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601952#comment-14601952
 ] 

Hudson commented on HBASE-13835:


SUCCESS: Integrated in HBase-1.3-IT #7 (See 
[https://builds.apache.org/job/HBase-1.3-IT/7/])
HBASE-13835 KeyValueHeap.current might be in heap when exception happens in 
pollRealKV. (zhouyingchao) (anoopsamjohn: rev 
92b6622d97d21700a92a4061a7b05dfc7cf5a3df)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReversedKeyValueHeap.java


 KeyValueHeap.current might be in heap when exception happens in pollRealKV
 --

 Key: HBASE-13835
 URL: https://issues.apache.org/jira/browse/HBASE-13835
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Reporter: zhouyingchao
Assignee: zhouyingchao
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-13835-001.patch, HBASE-13835-002.patch, 
 HBASE-13835-branch1-001.patch, HBASE-13835_0.98.patch, 
 HBASE-13835_branch-1.0.patch, HBASE-13835_branch-1.patch, 
 HBASE-13835_branch-1.patch


 In a 0.94 hbase cluster, we found a NPE with following stack:
 {code}
 Exception in thread regionserver21600.leaseChecker 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1530)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:225)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:201)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:191)
 at 
 java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
 at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
 at java.util.PriorityQueue.poll(PriorityQueue.java:523)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:241)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.close(StoreScanner.java:355)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:237)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:4302)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:3033)
 at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:119)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 Before this NPE exception, there is an exception happens in pollRealKV, which 
 we think is the culprit of the NPE.
 {code}
 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
 java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for 
 reader reader=
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:371)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:366)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:116)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:455)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:154)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:4124)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4196)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4067)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4057)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.internalNext(HRegionServer.java:2898)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2833)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2815)
 at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:337)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1583)
 {code}
 Simply put, if there is an exception happens in pollRealKV( ), the 
 

[jira] [Commented] (HBASE-13670) [HBase MOB] ExpiredMobFileCleaner tool deletes mob files later for one more day after they are expired

2015-06-25 Thread Gururaj Shetty (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601259#comment-14601259
 ] 

Gururaj Shetty commented on HBASE-13670:


Hi [~anoop.hbase]
Incorporated you comments and attached the patch.
Thanks

 [HBase MOB] ExpiredMobFileCleaner tool deletes mob files later for one more 
 day after they are expired
 --

 Key: HBASE-13670
 URL: https://issues.apache.org/jira/browse/HBASE-13670
 Project: HBase
  Issue Type: Improvement
  Components: documentation, mob
Affects Versions: hbase-11339
Reporter: Y. SREENIVASULU REDDY
Assignee: Gururaj Shetty
 Fix For: hbase-11339

 Attachments: HBASE-13670.patch, HBASE-13670_01.patch


 Currently the ExpiredMobFileCleaner cleans the expired mob file according to 
 the date in the mob file name. The minimum unit of the date is day. So 
 ExpiredMobFileCleaner only cleans the expired mob files later for one more 
 day after they are expired. We need to document this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases

2015-06-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601382#comment-14601382
 ] 

Ted Yu commented on HBASE-13959:


Nice results in performance improvement.

Should the new constants be defined in SplitTransactionImpl.java ? They're only 
referenced by SplitTransactionImpl.java

 Region splitting takes too long because it uses a single thread in most 
 common cases
 

 Key: HBASE-13959
 URL: https://issues.apache.org/jira/browse/HBASE-13959
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.12
Reporter: Hari Krishna Dara
Assignee: Hari Krishna Dara
 Fix For: 0.98.14

 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, 
 HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png


 When storefiles need to be split as part of a region split, the current logic 
 uses a threadpool with the size set to the size of the number of stores. 
 Since most common table setup involves only a single column family, this 
 translates to having a single store and so the threadpool is run with a 
 single thread. However, in a write heavy workload, there could be several 
 tens of storefiles in a store at the time of splitting, and with a threadpool 
 size of one, these files end up getting split sequentially.
 With a bit of tracing, I noticed that it takes on an average of 350ms to 
 create a single reference file, and splitting each storefile involves 
 creating two of these, so with a storefile count of 20, it takes about 14s 
 just to get through this phase alone (2 reference files for each storefile), 
 pushing the total time the region is offline to 18s or more. For environments 
 that are setup to fail fast, this makes the client exhaust all retries and 
 fail with NotServingRegionException.
 The fix should increase the concurrency of this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13967) add jdk profiles for jdk.tools dependency

2015-06-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601386#comment-14601386
 ] 

stack commented on HBASE-13967:
---

+1

 add jdk profiles for jdk.tools dependency
 -

 Key: HBASE-13967
 URL: https://issues.apache.org/jira/browse/HBASE-13967
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 1.2.0
Reporter: Sean Busbey
Assignee: Sean Busbey
 Fix For: 2.0.0, 1.2.0, 1.3.0

 Attachments: HBASE-13967.1.patch


 Right now hbase-annotations uses jdk7 jdk.tools and exposes that to 
 downstream via hbase-client. We need it for building and using our custom 
 doclet, but we should be using a jdk.tools version based on our java version 
 (use jdk activated profiles to set it)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13970) NPE during compaction in trunk

2015-06-25 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-13970:
--
Attachment: HBASE-13970.patch

A simple patch that add an increment AtomicInteger as the suffix of compaction 
name.

[~ram_krish] Could you please test whether this patch works? Thanks.
And also, thanks for the great digging.

 NPE during compaction in trunk
 --

 Key: HBASE-13970
 URL: https://issues.apache.org/jira/browse/HBASE-13970
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0

 Attachments: HBASE-13970.patch


 Updated the trunk.. Loaded the table with PE tool.  Trigger a flush to ensure 
 all data is flushed out to disk. When the first compaction is triggered we 
 get an NPE and this is very easy to reproduce
 {code}
 015-06-25 21:33:46,041 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,051 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB
 2015-06-25 21:33:46,159 ERROR 
 [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] 
 regionserver.CompactSplitThread: Compaction failed Request = 
 regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.,
  storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), 
 priority=3, time=7536968291719985
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
 at 
 org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
 at 
 org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
 at 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-06-25 21:33:46,745 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, 
 hasBloomFilter=true, into tmp file 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c
 2015-06-25 21:33:46,772 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HStore: Added 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c,
  entries=68116, sequenceid=1534, filesize=68.7 M
 2015-06-25 21:33:46,773 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, 
 currentsize=0 B/0 for region 
 TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.
  in 723ms, sequenceid=1534, compaction requested=true
 2015-06-25 21:33:46,780 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/reached/TestTable
 2015-06-25 21:33:46,790 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/abort/TestTable
 2015-06-25 21:33:46,791 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 2015-06-25 21:33:46,803 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,818 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 {code}
 Will check this on what is the reason behind it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13970) NPE during compaction in trunk

2015-06-25 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601376#comment-14601376
 ] 

Anoop Sam John commented on HBASE-13970:


Ya this can work. 

 NPE during compaction in trunk
 --

 Key: HBASE-13970
 URL: https://issues.apache.org/jira/browse/HBASE-13970
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 0.98.13, 1.2.0, 1.1.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.2

 Attachments: HBASE-13970.patch


 Updated the trunk.. Loaded the table with PE tool.  Trigger a flush to ensure 
 all data is flushed out to disk. When the first compaction is triggered we 
 get an NPE and this is very easy to reproduce
 {code}
 015-06-25 21:33:46,041 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,051 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB
 2015-06-25 21:33:46,159 ERROR 
 [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] 
 regionserver.CompactSplitThread: Compaction failed Request = 
 regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.,
  storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), 
 priority=3, time=7536968291719985
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
 at 
 org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
 at 
 org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
 at 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-06-25 21:33:46,745 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, 
 hasBloomFilter=true, into tmp file 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c
 2015-06-25 21:33:46,772 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HStore: Added 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c,
  entries=68116, sequenceid=1534, filesize=68.7 M
 2015-06-25 21:33:46,773 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, 
 currentsize=0 B/0 for region 
 TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.
  in 723ms, sequenceid=1534, compaction requested=true
 2015-06-25 21:33:46,780 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/reached/TestTable
 2015-06-25 21:33:46,790 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/abort/TestTable
 2015-06-25 21:33:46,791 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 2015-06-25 21:33:46,803 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,818 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 {code}
 Will check this on what is the reason behind it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13670) [HBase MOB] ExpiredMobFileCleaner tool deletes mob files later for one more day after they are expired

2015-06-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601281#comment-14601281
 ] 

Hadoop QA commented on HBASE-13670:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12741858/HBASE-13670_01.patch
  against master branch at commit edef3d64bce41fffbc5649ffa19b2cf80ce28d7a.
  ATTACHMENT ID: 12741858

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14569//console

This message is automatically generated.

 [HBase MOB] ExpiredMobFileCleaner tool deletes mob files later for one more 
 day after they are expired
 --

 Key: HBASE-13670
 URL: https://issues.apache.org/jira/browse/HBASE-13670
 Project: HBase
  Issue Type: Improvement
  Components: documentation, mob
Affects Versions: hbase-11339
Reporter: Y. SREENIVASULU REDDY
Assignee: Gururaj Shetty
 Fix For: hbase-11339

 Attachments: HBASE-13670.patch, HBASE-13670_01.patch


 Currently the ExpiredMobFileCleaner cleans the expired mob file according to 
 the date in the mob file name. The minimum unit of the date is day. So 
 ExpiredMobFileCleaner only cleans the expired mob files later for one more 
 day after they are expired. We need to document this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13970) NPE during compaction in trunk

2015-06-25 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601295#comment-14601295
 ] 

Duo Zhang commented on HBASE-13970:
---

Oh, this should be a bug on all branches which contains HBASE-8329. I used to 
assume that compactions can not be executed parallel on the same store, so the 
compationName only contains regionName and storeName. I think we could add a 
counter in the name to avoid conflict.

 NPE during compaction in trunk
 --

 Key: HBASE-13970
 URL: https://issues.apache.org/jira/browse/HBASE-13970
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0


 Updated the trunk.. Loaded the table with PE tool.  Trigger a flush to ensure 
 all data is flushed out to disk. When the first compaction is triggered we 
 get an NPE and this is very easy to reproduce
 {code}
 015-06-25 21:33:46,041 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,051 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB
 2015-06-25 21:33:46,159 ERROR 
 [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] 
 regionserver.CompactSplitThread: Compaction failed Request = 
 regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.,
  storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), 
 priority=3, time=7536968291719985
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
 at 
 org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
 at 
 org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
 at 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-06-25 21:33:46,745 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, 
 hasBloomFilter=true, into tmp file 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c
 2015-06-25 21:33:46,772 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HStore: Added 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c,
  entries=68116, sequenceid=1534, filesize=68.7 M
 2015-06-25 21:33:46,773 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, 
 currentsize=0 B/0 for region 
 TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.
  in 723ms, sequenceid=1534, compaction requested=true
 2015-06-25 21:33:46,780 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/reached/TestTable
 2015-06-25 21:33:46,790 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/abort/TestTable
 2015-06-25 21:33:46,791 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 2015-06-25 21:33:46,803 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,818 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 {code}
 Will check this on what is the reason behind it. 



--
This message was sent by 

[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota

2015-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601239#comment-14601239
 ] 

Hudson commented on HBASE-13964:


SUCCESS: Integrated in HBase-1.3-IT #6 (See 
[https://builds.apache.org/job/HBase-1.3-IT/6/])
HBASE-13964 Skip region normalization for tables under namespace quota (tedyu: 
rev ed72fa212875814f7e44eebaf7789710ec670c6a)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/namespace/NamespaceAuditor.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


 Skip region normalization for tables under namespace quota
 --

 Key: HBASE-13964
 URL: https://issues.apache.org/jira/browse/HBASE-13964
 Project: HBase
  Issue Type: Task
  Components: Balancer, Usability
Reporter: Mikhail Antonov
Assignee: Ted Yu
 Fix For: 2.0.0, 1.2.0, 1.3.0

 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 
 13964-v1.txt


 As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to 
 normalize regions of tables under namespace control. What was proposed is to 
 disable normalization of such tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13670) [HBase MOB] ExpiredMobFileCleaner tool deletes mob files later for one more day after they are expired

2015-06-25 Thread Gururaj Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gururaj Shetty updated HBASE-13670:
---
Attachment: HBASE-13670_01.patch

 [HBase MOB] ExpiredMobFileCleaner tool deletes mob files later for one more 
 day after they are expired
 --

 Key: HBASE-13670
 URL: https://issues.apache.org/jira/browse/HBASE-13670
 Project: HBase
  Issue Type: Improvement
  Components: documentation, mob
Affects Versions: hbase-11339
Reporter: Y. SREENIVASULU REDDY
Assignee: Gururaj Shetty
 Fix For: hbase-11339

 Attachments: HBASE-13670.patch, HBASE-13670_01.patch


 Currently the ExpiredMobFileCleaner cleans the expired mob file according to 
 the date in the mob file name. The minimum unit of the date is day. So 
 ExpiredMobFileCleaner only cleans the expired mob files later for one more 
 day after they are expired. We need to document this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13970) NPE during compaction in trunk

2015-06-25 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-13970:
--
Fix Version/s: 1.1.2
   1.2.0
   0.98.14
Affects Version/s: 1.1.1
   1.2.0
   0.98.13
   Status: Patch Available  (was: Open)

 NPE during compaction in trunk
 --

 Key: HBASE-13970
 URL: https://issues.apache.org/jira/browse/HBASE-13970
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.13, 2.0.0, 1.2.0, 1.1.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.2

 Attachments: HBASE-13970.patch


 Updated the trunk.. Loaded the table with PE tool.  Trigger a flush to ensure 
 all data is flushed out to disk. When the first compaction is triggered we 
 get an NPE and this is very easy to reproduce
 {code}
 015-06-25 21:33:46,041 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,051 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB
 2015-06-25 21:33:46,159 ERROR 
 [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] 
 regionserver.CompactSplitThread: Compaction failed Request = 
 regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.,
  storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), 
 priority=3, time=7536968291719985
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
 at 
 org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
 at 
 org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
 at 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-06-25 21:33:46,745 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, 
 hasBloomFilter=true, into tmp file 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c
 2015-06-25 21:33:46,772 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HStore: Added 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c,
  entries=68116, sequenceid=1534, filesize=68.7 M
 2015-06-25 21:33:46,773 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, 
 currentsize=0 B/0 for region 
 TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.
  in 723ms, sequenceid=1534, compaction requested=true
 2015-06-25 21:33:46,780 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/reached/TestTable
 2015-06-25 21:33:46,790 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/abort/TestTable
 2015-06-25 21:33:46,791 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 2015-06-25 21:33:46,803 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,818 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 {code}
 Will check this on what is the reason behind it. 



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HBASE-13970) NPE during compaction in trunk

2015-06-25 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601451#comment-14601451
 ] 

ramkrishna.s.vasudevan commented on HBASE-13970:


I will test this out. But this should work.  Can we create this per counter per 
store?

 NPE during compaction in trunk
 --

 Key: HBASE-13970
 URL: https://issues.apache.org/jira/browse/HBASE-13970
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 0.98.13, 1.2.0, 1.1.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.2

 Attachments: HBASE-13970.patch


 Updated the trunk.. Loaded the table with PE tool.  Trigger a flush to ensure 
 all data is flushed out to disk. When the first compaction is triggered we 
 get an NPE and this is very easy to reproduce
 {code}
 015-06-25 21:33:46,041 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,051 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB
 2015-06-25 21:33:46,159 ERROR 
 [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] 
 regionserver.CompactSplitThread: Compaction failed Request = 
 regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.,
  storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), 
 priority=3, time=7536968291719985
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
 at 
 org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
 at 
 org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
 at 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-06-25 21:33:46,745 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, 
 hasBloomFilter=true, into tmp file 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c
 2015-06-25 21:33:46,772 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HStore: Added 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c,
  entries=68116, sequenceid=1534, filesize=68.7 M
 2015-06-25 21:33:46,773 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, 
 currentsize=0 B/0 for region 
 TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.
  in 723ms, sequenceid=1534, compaction requested=true
 2015-06-25 21:33:46,780 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/reached/TestTable
 2015-06-25 21:33:46,790 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/abort/TestTable
 2015-06-25 21:33:46,791 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 2015-06-25 21:33:46,803 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,818 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 {code}
 Will check this on what is the reason behind it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-13897) OOM occurs when Import importing a row that including too much KeyValue

2015-06-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-13897:
--

Assignee: Liu Junhong  (was: Ted Yu)

 OOM occurs when Import  importing a row that including too much KeyValue
 

 Key: HBASE-13897
 URL: https://issues.apache.org/jira/browse/HBASE-13897
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.13
Reporter: Liu Junhong
Assignee: Liu Junhong
 Fix For: 0.98.14

 Attachments: HBASE-13897-0.98.patch


 When importing a row with too many KeyValues (may have too many columns or 
 versions),KeyValueReducer will incur OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13939) Make HFileReaderImpl.getFirstKeyInBlock() to return a Cell

2015-06-25 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601562#comment-14601562
 ] 

ramkrishna.s.vasudevan commented on HBASE-13939:


Ping!!!

 Make HFileReaderImpl.getFirstKeyInBlock() to return a Cell
 --

 Key: HBASE-13939
 URL: https://issues.apache.org/jira/browse/HBASE-13939
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 2.0.0, 1.1.2

 Attachments: HBASE-13939.patch, HBASE-13939_1.patch, 
 HBASE-13939_2.patch, HBASE-13939_3.patch, HBASE-13939_3.patch, 
 HBASE-13939_branch-1.1.patch


 The getFirstKeyInBlock() in HFileReaderImpl is returning a BB. It is getting 
 used in seekBefore cases.  Because we return a BB we create a KeyOnlyKV once 
 for comparison
 {code}
   if (reader.getComparator()
   .compareKeyIgnoresMvcc(
   new KeyValue.KeyOnlyKeyValue(firstKey.array(), 
 firstKey.arrayOffset(),
   firstKey.limit()), key) = 0) {
 long previousBlockOffset = seekToBlock.getPrevBlockOffset();
 // The key we are interested in
 if (previousBlockOffset == -1) {
   // we have a 'problem', the key we want is the first of the file.
   return false;
 }
 
 {code}
 And if the compare fails we again create another KeyOnlyKv 
 {code}
   Cell firstKeyInCurrentBlock = new 
 KeyValue.KeyOnlyKeyValue(Bytes.getBytes(firstKey));
   loadBlockAndSeekToKey(seekToBlock, firstKeyInCurrentBlock, true, key, 
 true);
 {code}
 So one object will be enough and that can be returned by getFirstKeyInBlock. 
 Also will be useful when we go with Buffered backed server cell to change in 
 one place. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-06-25 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-13702:

Attachment: HBASE-13702-v5.patch

[~tedyu] you're right. fixed the issues.

 ImportTsv: Add dry-run functionality and log bad rows
 -

 Key: HBASE-13702
 URL: https://issues.apache.org/jira/browse/HBASE-13702
 Project: HBase
  Issue Type: New Feature
Reporter: Apekshit Sharma
Assignee: Apekshit Sharma
 Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch, 
 HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch


 ImportTSV job skips bad records by default (keeps a count though). 
 -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is 
 encountered. 
 To be easily able to determine which rows are corrupted in an input, rather 
 than failing on one row at a time seems like a good feature to have.
 Moreover, there should be 'dry-run' functionality in such kinds of tools, 
 which can essentially does a quick run of tool without making any changes but 
 reporting any errors/warnings and success/failure.
 To identify corrupted rows, simply logging them should be enough. In worst 
 case, all rows will be logged and size of logs will be same as input size, 
 which seems fine. However, user might have to do some work figuring out where 
 the logs. Is there some link we can show to the user when the tool starts 
 which can help them with that?
 For the dry run, we can simply use if-else to skip over writing out KVs, and 
 any other mutations, if present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-06-25 Thread Apekshit Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apekshit Sharma updated HBASE-13702:

Status: Patch Available  (was: Open)

 ImportTsv: Add dry-run functionality and log bad rows
 -

 Key: HBASE-13702
 URL: https://issues.apache.org/jira/browse/HBASE-13702
 Project: HBase
  Issue Type: New Feature
Reporter: Apekshit Sharma
Assignee: Apekshit Sharma
 Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch, 
 HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch


 ImportTSV job skips bad records by default (keeps a count though). 
 -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is 
 encountered. 
 To be easily able to determine which rows are corrupted in an input, rather 
 than failing on one row at a time seems like a good feature to have.
 Moreover, there should be 'dry-run' functionality in such kinds of tools, 
 which can essentially does a quick run of tool without making any changes but 
 reporting any errors/warnings and success/failure.
 To identify corrupted rows, simply logging them should be enough. In worst 
 case, all rows will be logged and size of logs will be same as input size, 
 which seems fine. However, user might have to do some work figuring out where 
 the logs. Is there some link we can show to the user when the tool starts 
 which can help them with that?
 For the dry run, we can simply use if-else to skip over writing out KVs, and 
 any other mutations, if present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13950) Add a NoopProcedureStore for testing

2015-06-25 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-13950:

Attachment: HBASE-13950-v1.patch

 Add a NoopProcedureStore for testing
 

 Key: HBASE-13950
 URL: https://issues.apache.org/jira/browse/HBASE-13950
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Trivial
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-13950-v0-branch-1.patch, 
 HBASE-13950-v1-branch-1.patch, HBASE-13950-v1.patch


 Add a NoopProcedureStore and an helper in ProcedureTestingUtil to 
 submitAndWait() a procedure without having to do anything else.
 This is useful to avoid extra code like in case of 
 TestAssignmentManager.processServerShutdownHandler()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13950) Add a NoopProcedureStore for testing

2015-06-25 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-13950:

Attachment: (was: HBASE-13950-v1.patch)

 Add a NoopProcedureStore for testing
 

 Key: HBASE-13950
 URL: https://issues.apache.org/jira/browse/HBASE-13950
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Trivial
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-13950-v0-branch-1.patch, 
 HBASE-13950-v1-branch-1.patch, HBASE-13950-v1.patch


 Add a NoopProcedureStore and an helper in ProcedureTestingUtil to 
 submitAndWait() a procedure without having to do anything else.
 This is useful to avoid extra code like in case of 
 TestAssignmentManager.processServerShutdownHandler()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13897) OOM occurs when Import importing a row that including too much KeyValue

2015-06-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601542#comment-14601542
 ] 

Ted Yu commented on HBASE-13897:


Any chance of an update, Junhong ?

 OOM occurs when Import  importing a row that including too much KeyValue
 

 Key: HBASE-13897
 URL: https://issues.apache.org/jira/browse/HBASE-13897
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.13
Reporter: Liu Junhong
Assignee: Ted Yu
 Fix For: 0.98.14

 Attachments: HBASE-13897-0.98.patch


 When importing a row with too many KeyValues (may have too many columns or 
 versions),KeyValueReducer will incur OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13670) [HBase MOB] ExpiredMobFileCleaner tool deletes mob files later for one more day after they are expired

2015-06-25 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601556#comment-14601556
 ] 

Anoop Sam John commented on HBASE-13670:


Seems u made this patch wrongly.  On top of the older patch u made it..  U have 
to freshly make it.

 [HBase MOB] ExpiredMobFileCleaner tool deletes mob files later for one more 
 day after they are expired
 --

 Key: HBASE-13670
 URL: https://issues.apache.org/jira/browse/HBASE-13670
 Project: HBase
  Issue Type: Improvement
  Components: documentation, mob
Affects Versions: hbase-11339
Reporter: Y. SREENIVASULU REDDY
Assignee: Gururaj Shetty
 Fix For: hbase-11339

 Attachments: HBASE-13670.patch, HBASE-13670_01.patch


 Currently the ExpiredMobFileCleaner cleans the expired mob file according to 
 the date in the mob file name. The minimum unit of the date is day. So 
 ExpiredMobFileCleaner only cleans the expired mob files later for one more 
 day after they are expired. We need to document this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-06-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13702:
---
Status: Open  (was: Patch Available)

 ImportTsv: Add dry-run functionality and log bad rows
 -

 Key: HBASE-13702
 URL: https://issues.apache.org/jira/browse/HBASE-13702
 Project: HBase
  Issue Type: New Feature
Reporter: Apekshit Sharma
Assignee: Apekshit Sharma
 Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch, 
 HBASE-13702-v4.patch, HBASE-13702.patch


 ImportTSV job skips bad records by default (keeps a count though). 
 -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is 
 encountered. 
 To be easily able to determine which rows are corrupted in an input, rather 
 than failing on one row at a time seems like a good feature to have.
 Moreover, there should be 'dry-run' functionality in such kinds of tools, 
 which can essentially does a quick run of tool without making any changes but 
 reporting any errors/warnings and success/failure.
 To identify corrupted rows, simply logging them should be enough. In worst 
 case, all rows will be logged and size of logs will be same as input size, 
 which seems fine. However, user might have to do some work figuring out where 
 the logs. Is there some link we can show to the user when the tool starts 
 which can help them with that?
 For the dry run, we can simply use if-else to skip over writing out KVs, and 
 any other mutations, if present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13970) NPE during compaction in trunk

2015-06-25 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601491#comment-14601491
 ] 

Anoop Sam John commented on HBASE-13970:


Per store counter means we will have to keep it in Map or so which is adding 
overhead. The integer even if it overflows over the run, is fine(?) Will go to 
-ve integers.  We need only unique Strings.  So this should be ok IMO.

 NPE during compaction in trunk
 --

 Key: HBASE-13970
 URL: https://issues.apache.org/jira/browse/HBASE-13970
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 0.98.13, 1.2.0, 1.1.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.2

 Attachments: HBASE-13970.patch


 Updated the trunk.. Loaded the table with PE tool.  Trigger a flush to ensure 
 all data is flushed out to disk. When the first compaction is triggered we 
 get an NPE and this is very easy to reproduce
 {code}
 015-06-25 21:33:46,041 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,051 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB
 2015-06-25 21:33:46,159 ERROR 
 [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] 
 regionserver.CompactSplitThread: Compaction failed Request = 
 regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.,
  storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), 
 priority=3, time=7536968291719985
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
 at 
 org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
 at 
 org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
 at 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-06-25 21:33:46,745 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, 
 hasBloomFilter=true, into tmp file 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c
 2015-06-25 21:33:46,772 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HStore: Added 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c,
  entries=68116, sequenceid=1534, filesize=68.7 M
 2015-06-25 21:33:46,773 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, 
 currentsize=0 B/0 for region 
 TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.
  in 723ms, sequenceid=1534, compaction requested=true
 2015-06-25 21:33:46,780 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/reached/TestTable
 2015-06-25 21:33:46,790 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/abort/TestTable
 2015-06-25 21:33:46,791 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 2015-06-25 21:33:46,803 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,818 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 {code}
 Will check this on what is the 

[jira] [Commented] (HBASE-13970) NPE during compaction in trunk

2015-06-25 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601533#comment-14601533
 ] 

ramkrishna.s.vasudevan commented on HBASE-13970:


What I thought was just do store.incrementandGetCounter()? That will return 
that atomic counter. Did not think as a map.  Main concern was that overflow 
part only.  (negative is fine then ok).

 NPE during compaction in trunk
 --

 Key: HBASE-13970
 URL: https://issues.apache.org/jira/browse/HBASE-13970
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 0.98.13, 1.2.0, 1.1.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.2

 Attachments: HBASE-13970.patch


 Updated the trunk.. Loaded the table with PE tool.  Trigger a flush to ensure 
 all data is flushed out to disk. When the first compaction is triggered we 
 get an NPE and this is very easy to reproduce
 {code}
 015-06-25 21:33:46,041 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,051 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB
 2015-06-25 21:33:46,159 ERROR 
 [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] 
 regionserver.CompactSplitThread: Compaction failed Request = 
 regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.,
  storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), 
 priority=3, time=7536968291719985
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
 at 
 org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
 at 
 org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
 at 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-06-25 21:33:46,745 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, 
 hasBloomFilter=true, into tmp file 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c
 2015-06-25 21:33:46,772 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HStore: Added 
 hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c,
  entries=68116, sequenceid=1534, filesize=68.7 M
 2015-06-25 21:33:46,773 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, 
 currentsize=0 B/0 for region 
 TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.
  in 723ms, sequenceid=1534, compaction requested=true
 2015-06-25 21:33:46,780 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/reached/TestTable
 2015-06-25 21:33:46,790 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received created 
 event:/hbase/flush-table-proc/abort/TestTable
 2015-06-25 21:33:46,791 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 2015-06-25 21:33:46,803 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,818 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure abort children changed 
 event: /hbase/flush-table-proc/abort
 {code}
 Will check this on what is the reason behind it. 




[jira] [Commented] (HBASE-13970) NPE during compaction in trunk

2015-06-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601661#comment-14601661
 ] 

Hadoop QA commented on HBASE-13970:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12741873/HBASE-13970.patch
  against master branch at commit edef3d64bce41fffbc5649ffa19b2cf80ce28d7a.
  ATTACHMENT ID: 12741873

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14571//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14571//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14571//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14571//console

This message is automatically generated.

 NPE during compaction in trunk
 --

 Key: HBASE-13970
 URL: https://issues.apache.org/jira/browse/HBASE-13970
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 0.98.13, 1.2.0, 1.1.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.2

 Attachments: HBASE-13970.patch


 Updated the trunk.. Loaded the table with PE tool.  Trigger a flush to ensure 
 all data is flushed out to disk. When the first compaction is triggered we 
 get an NPE and this is very easy to reproduce
 {code}
 015-06-25 21:33:46,041 INFO  [main-EventThread] 
 procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
 event: /hbase/flush-table-proc/acquired
 2015-06-25 21:33:46,051 INFO  
 [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] 
 regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB
 2015-06-25 21:33:46,159 ERROR 
 [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] 
 regionserver.CompactSplitThread: Compaction failed Request = 
 regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4.,
  storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), 
 priority=3, time=7536968291719985
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306)
 at 
 org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106)
 at 
 org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112)
 at 
 org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792)
 at 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

[jira] [Commented] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer

2015-06-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601712#comment-14601712
 ] 

Ted Yu commented on HBASE-13969:


lgtm
Minor comment:
{code}
2238if (authTokenSecretMgr != null) {
2239  authTokenSecretMgr.stop();
2240}
{code}
Set authTokenSecretMgr to null in the above if block after calling stop().

 AuthenticationTokenSecretManager is never stopped in RPCServer
 --

 Key: HBASE-13969
 URL: https://issues.apache.org/jira/browse/HBASE-13969
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.13
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-13969.patch


 AuthenticationTokenSecretManager is never stopped in RPCServer.
 {code}
 AuthenticationTokenSecretManager mgr = createSecretManager();
 if (mgr != null) {
   setSecretManager(mgr);
   mgr.start();
 }
 {code}
 It should be stopped during exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.

2015-06-25 Thread Abhilash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhilash updated HBASE-13971:
-
Description: 
One region server stuck while flushing(possible deadlock). Its trying to flush 
two regions since last 6 hours (see the screenshot).
Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 mapper 
jobs and 100 back references. ~37 Million writes on each regionserver till now 
but no writes happening on any other regionserver from past 6 hours  and their 
memstore size is zero(I dont know if this is related). But this particular 
regionserver has memstore size of 9GBs from past 6 hours.

Relevant snaps from debug dump:
Tasks:
===
Task: Flushing 
IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd.
Status: RUNNING:Preparing to flush by snapshotting stores in 
8e2d075f94ce7699f416ec4ced9873cd
Running for 22034s

Task: Flushing 
IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390.
Status: RUNNING:Preparing to flush by snapshotting stores in 
9f8d0e01a40405b835bf6e5a22a86390
Running for 22033s

Executors:
===
...
Thread 139 (MemStoreFlusher.1):
  State: WAITING
  Blocked count: 139711
  Waited count: 239212
  Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a
  Stack:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)

java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)

java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305)

org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422)

org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168)

org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)

org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011)
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
java.lang.Thread.run(Thread.java:745)
Thread 137 (MemStoreFlusher.0):
  State: WAITING
  Blocked count: 138931
  Waited count: 237448
  Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76
  Stack:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)

java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)

java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305)

org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422)

org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168)

org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)

org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011)
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
java.lang.Thread.run(Thread.java:745)


  was:
One region server stuck while flushing(possible deadlock). Its trying to flush 
two regions since last 6 hours (see the screenshot).
Caused while running 

[jira] [Updated] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer

2015-06-25 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-13969:
-
Attachment: HBASE-13969-V2.patch

Thanks [~tedyu] for reviewing the patch. Added the modified patch.

 AuthenticationTokenSecretManager is never stopped in RPCServer
 --

 Key: HBASE-13969
 URL: https://issues.apache.org/jira/browse/HBASE-13969
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.13
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-13969-V2.patch, HBASE-13969.patch


 AuthenticationTokenSecretManager is never stopped in RPCServer.
 {code}
 AuthenticationTokenSecretManager mgr = createSecretManager();
 if (mgr != null) {
   setSecretManager(mgr);
   mgr.start();
 }
 {code}
 It should be stopped during exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.

2015-06-25 Thread Abhilash (JIRA)
Abhilash created HBASE-13971:


 Summary: Flushes stuck since 6 hours on a regionserver.
 Key: HBASE-13971
 URL: https://issues.apache.org/jira/browse/HBASE-13971
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.3.0
 Environment: Caused while running IntegrationTestLoadAndVerify for 20 
M rows on cluster with 32 region servers each with max heap size of 24GBs.
Reporter: Abhilash


One region server stuck while flushing(possible deadlock). Its trying to flush 
two regions since last 6 hours (see the screenshot).
Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 mapper 
jobs and 100 back references. ~37 Million writes on each regionserver till now 
but no writes happening on any other regionserver from past 6 hours  and their 
memstore size is zero(I dont know if this is related). But this particular 
regionserver has memstore size of 9GBs from past 6 hours.

Relevant snaps from debug dump:
Tasks:
===
Task: Flushing 
IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd.
Status: RUNNING:Preparing to flush by snapshotting stores in 
8e2d075f94ce7699f416ec4ced9873cd
Running for 22034s

Task: Flushing 
IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390.
Status: RUNNING:Preparing to flush by snapshotting stores in 
9f8d0e01a40405b835bf6e5a22a86390
Running for 22033s

Thread 139 (MemStoreFlusher.1):
  State: WAITING
  Blocked count: 139711
  Waited count: 239212
  Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a
  Stack:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)

java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)

java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305)

org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422)

org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168)

org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)

org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011)
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
java.lang.Thread.run(Thread.java:745)
Thread 137 (MemStoreFlusher.0):
  State: WAITING
  Blocked count: 138931
  Waited count: 237448
  Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76
  Stack:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)

java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)

java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305)

org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422)

org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168)

org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)

org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011)
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75)


[jira] [Commented] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.

2015-06-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601728#comment-14601728
 ] 

Ted Yu commented on HBASE-13971:


Can you attach the complete jstack for the region server ?

Region server log would also be helpful.

 Flushes stuck since 6 hours on a regionserver.
 --

 Key: HBASE-13971
 URL: https://issues.apache.org/jira/browse/HBASE-13971
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.3.0
 Environment: Caused while running IntegrationTestLoadAndVerify for 20 
 M rows on cluster with 32 region servers each with max heap size of 24GBs.
Reporter: Abhilash
Priority: Critical
 Attachments: screenshot-1.png


 One region server stuck while flushing(possible deadlock). Its trying to 
 flush two regions since last 6 hours (see the screenshot).
 Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 
 mapper jobs and 100 back references. ~37 Million writes on each regionserver 
 till now but no writes happening on any regionserver from past 6 hours  and 
 their memstore size is zero(I dont know if this is related). But this 
 particular regionserver has memstore size of 9GBs from past 6 hours.
 Relevant snaps from debug dump:
 Tasks:
 ===
 Task: Flushing 
 IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd.
 Status: RUNNING:Preparing to flush by snapshotting stores in 
 8e2d075f94ce7699f416ec4ced9873cd
 Running for 22034s
 Task: Flushing 
 IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390.
 Status: RUNNING:Preparing to flush by snapshotting stores in 
 9f8d0e01a40405b835bf6e5a22a86390
 Running for 22033s
 Executors:
 ===
 ...
 Thread 139 (MemStoreFlusher.1):
   State: WAITING
   Blocked count: 139711
   Waited count: 239212
   Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
 org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305)
 
 org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011)
 org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
 org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828)
 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510)
 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75)
 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
 java.lang.Thread.run(Thread.java:745)
 Thread 137 (MemStoreFlusher.0):
   State: WAITING
   Blocked count: 138931
   Waited count: 237448
   Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
 org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305)
 
 org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)
 
 

[jira] [Updated] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.

2015-06-25 Thread Abhilash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhilash updated HBASE-13971:
-
Description: 
One region server stuck while flushing(possible deadlock). Its trying to flush 
two regions since last 6 hours (see the screenshot).
Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 mapper 
jobs and 100 back references. ~37 Million writes on each regionserver till now 
but no writes happening on any regionserver from past 6 hours  and their 
memstore size is zero(I dont know if this is related). But this particular 
regionserver has memstore size of 9GBs from past 6 hours.

Relevant snaps from debug dump:
Tasks:
===
Task: Flushing 
IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd.
Status: RUNNING:Preparing to flush by snapshotting stores in 
8e2d075f94ce7699f416ec4ced9873cd
Running for 22034s

Task: Flushing 
IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390.
Status: RUNNING:Preparing to flush by snapshotting stores in 
9f8d0e01a40405b835bf6e5a22a86390
Running for 22033s

Executors:
===
...
Thread 139 (MemStoreFlusher.1):
  State: WAITING
  Blocked count: 139711
  Waited count: 239212
  Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a
  Stack:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)

java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)

java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305)

org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422)

org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168)

org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)

org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011)
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
java.lang.Thread.run(Thread.java:745)
Thread 137 (MemStoreFlusher.0):
  State: WAITING
  Blocked count: 138931
  Waited count: 237448
  Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76
  Stack:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)

java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)

java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305)

org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422)

org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168)

org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)

org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011)
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75)

org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
java.lang.Thread.run(Thread.java:745)


  was:
One region server stuck while flushing(possible deadlock). Its trying to flush 
two regions since last 6 hours (see the screenshot).
Caused while running IntegrationTestLoadAndVerify 

[jira] [Updated] (HBASE-13336) Consistent rules for security meta table protections

2015-06-25 Thread Srikanth Srungarapu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Srungarapu updated HBASE-13336:

Status: Patch Available  (was: In Progress)

 Consistent rules for security meta table protections
 

 Key: HBASE-13336
 URL: https://issues.apache.org/jira/browse/HBASE-13336
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Srikanth Srungarapu
 Fix For: 2.0.0, 0.98.14, 1.3.0

 Attachments: HBASE-13336.patch, HBASE-13336_v2.patch


 The AccessController and VisibilityController do different things regarding 
 protecting their meta tables. The AC allows schema changes and disable/enable 
 if the user has permission. The VC unconditionally disallows all admin 
 actions. Generally, bad things will happen if these meta tables are damaged, 
 disabled, or dropped. The likely outcome is random frequent (or constant) 
 server side op failures with nasty stack traces. On the other hand some 
 things like column family and table attribute changes can have valid use 
 cases. We should have consistent and sensible rules for protecting security 
 meta tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-06-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601647#comment-14601647
 ] 

Ted Yu commented on HBASE-13702:


+1 if tests pass.

 ImportTsv: Add dry-run functionality and log bad rows
 -

 Key: HBASE-13702
 URL: https://issues.apache.org/jira/browse/HBASE-13702
 Project: HBase
  Issue Type: New Feature
Reporter: Apekshit Sharma
Assignee: Apekshit Sharma
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch, 
 HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch


 ImportTSV job skips bad records by default (keeps a count though). 
 -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is 
 encountered. 
 To be easily able to determine which rows are corrupted in an input, rather 
 than failing on one row at a time seems like a good feature to have.
 Moreover, there should be 'dry-run' functionality in such kinds of tools, 
 which can essentially does a quick run of tool without making any changes but 
 reporting any errors/warnings and success/failure.
 To identify corrupted rows, simply logging them should be enough. In worst 
 case, all rows will be logged and size of logs will be same as input size, 
 which seems fine. However, user might have to do some work figuring out where 
 the logs. Is there some link we can show to the user when the tool starts 
 which can help them with that?
 For the dry run, we can simply use if-else to skip over writing out KVs, and 
 any other mutations, if present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases

2015-06-25 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601685#comment-14601685
 ] 

Lars Hofhansl commented on HBASE-13959:
---

Nice find and patch. The 8 seems to come out of nowhere.
Do you have numbers for different numbers of threads?
Maybe default it to 1/2 of block store file count...?

 Region splitting takes too long because it uses a single thread in most 
 common cases
 

 Key: HBASE-13959
 URL: https://issues.apache.org/jira/browse/HBASE-13959
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.12
Reporter: Hari Krishna Dara
Assignee: Hari Krishna Dara
 Fix For: 0.98.14

 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, 
 HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png


 When storefiles need to be split as part of a region split, the current logic 
 uses a threadpool with the size set to the size of the number of stores. 
 Since most common table setup involves only a single column family, this 
 translates to having a single store and so the threadpool is run with a 
 single thread. However, in a write heavy workload, there could be several 
 tens of storefiles in a store at the time of splitting, and with a threadpool 
 size of one, these files end up getting split sequentially.
 With a bit of tracing, I noticed that it takes on an average of 350ms to 
 create a single reference file, and splitting each storefile involves 
 creating two of these, so with a storefile count of 20, it takes about 14s 
 just to get through this phase alone (2 reference files for each storefile), 
 pushing the total time the region is offline to 18s or more. For environments 
 that are setup to fail fast, this makes the client exhaust all retries and 
 fail with NotServingRegionException.
 The fix should increase the concurrency of this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.

2015-06-25 Thread Abhilash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhilash updated HBASE-13971:
-
Priority: Critical  (was: Major)

 Flushes stuck since 6 hours on a regionserver.
 --

 Key: HBASE-13971
 URL: https://issues.apache.org/jira/browse/HBASE-13971
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.3.0
 Environment: Caused while running IntegrationTestLoadAndVerify for 20 
 M rows on cluster with 32 region servers each with max heap size of 24GBs.
Reporter: Abhilash
Priority: Critical

 One region server stuck while flushing(possible deadlock). Its trying to 
 flush two regions since last 6 hours (see the screenshot).
 Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 
 mapper jobs and 100 back references. ~37 Million writes on each regionserver 
 till now but no writes happening on any other regionserver from past 6 hours  
 and their memstore size is zero(I dont know if this is related). But this 
 particular regionserver has memstore size of 9GBs from past 6 hours.
 Relevant snaps from debug dump:
 Tasks:
 ===
 Task: Flushing 
 IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd.
 Status: RUNNING:Preparing to flush by snapshotting stores in 
 8e2d075f94ce7699f416ec4ced9873cd
 Running for 22034s
 Task: Flushing 
 IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390.
 Status: RUNNING:Preparing to flush by snapshotting stores in 
 9f8d0e01a40405b835bf6e5a22a86390
 Running for 22033s
 Executors:
 ===
 ...
 Thread 139 (MemStoreFlusher.1):
   State: WAITING
   Blocked count: 139711
   Waited count: 239212
   Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
 org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305)
 
 org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011)
 org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
 org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828)
 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510)
 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75)
 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
 java.lang.Thread.run(Thread.java:745)
 Thread 137 (MemStoreFlusher.0):
   State: WAITING
   Blocked count: 138931
   Waited count: 237448
   Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
 org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305)
 
 org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011)
 org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
 

[jira] [Updated] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.

2015-06-25 Thread Abhilash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhilash updated HBASE-13971:
-
Attachment: rsDebugDump.txt

 Flushes stuck since 6 hours on a regionserver.
 --

 Key: HBASE-13971
 URL: https://issues.apache.org/jira/browse/HBASE-13971
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.3.0
 Environment: Caused while running IntegrationTestLoadAndVerify for 20 
 M rows on cluster with 32 region servers each with max heap size of 24GBs.
Reporter: Abhilash
Priority: Critical
 Attachments: rsDebugDump.txt, screenshot-1.png


 One region server stuck while flushing(possible deadlock). Its trying to 
 flush two regions since last 6 hours (see the screenshot).
 Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 
 mapper jobs and 100 back references. ~37 Million writes on each regionserver 
 till now but no writes happening on any regionserver from past 6 hours  and 
 their memstore size is zero(I dont know if this is related). But this 
 particular regionserver has memstore size of 9GBs from past 6 hours.
 Relevant snaps from debug dump:
 Tasks:
 ===
 Task: Flushing 
 IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd.
 Status: RUNNING:Preparing to flush by snapshotting stores in 
 8e2d075f94ce7699f416ec4ced9873cd
 Running for 22034s
 Task: Flushing 
 IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390.
 Status: RUNNING:Preparing to flush by snapshotting stores in 
 9f8d0e01a40405b835bf6e5a22a86390
 Running for 22033s
 Executors:
 ===
 ...
 Thread 139 (MemStoreFlusher.1):
   State: WAITING
   Blocked count: 139711
   Waited count: 239212
   Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
 org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305)
 
 org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011)
 org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
 org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828)
 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510)
 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75)
 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
 java.lang.Thread.run(Thread.java:745)
 Thread 137 (MemStoreFlusher.0):
   State: WAITING
   Blocked count: 138931
   Waited count: 237448
   Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
 org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305)
 
 org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011)
 org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
 

[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-06-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13702:
---
Fix Version/s: 1.3.0
   2.0.0

 ImportTsv: Add dry-run functionality and log bad rows
 -

 Key: HBASE-13702
 URL: https://issues.apache.org/jira/browse/HBASE-13702
 Project: HBase
  Issue Type: New Feature
Reporter: Apekshit Sharma
Assignee: Apekshit Sharma
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch, 
 HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch


 ImportTSV job skips bad records by default (keeps a count though). 
 -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is 
 encountered. 
 To be easily able to determine which rows are corrupted in an input, rather 
 than failing on one row at a time seems like a good feature to have.
 Moreover, there should be 'dry-run' functionality in such kinds of tools, 
 which can essentially does a quick run of tool without making any changes but 
 reporting any errors/warnings and success/failure.
 To identify corrupted rows, simply logging them should be enough. In worst 
 case, all rows will be logged and size of logs will be same as input size, 
 which seems fine. However, user might have to do some work figuring out where 
 the logs. Is there some link we can show to the user when the tool starts 
 which can help them with that?
 For the dry run, we can simply use if-else to skip over writing out KVs, and 
 any other mutations, if present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows

2015-06-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601667#comment-14601667
 ] 

Ted Yu commented on HBASE-13702:


There're several hunks in 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java 
which don't apply on branch-1

Mind providing patch for branch-1 ?

Thanks

 ImportTsv: Add dry-run functionality and log bad rows
 -

 Key: HBASE-13702
 URL: https://issues.apache.org/jira/browse/HBASE-13702
 Project: HBase
  Issue Type: New Feature
Reporter: Apekshit Sharma
Assignee: Apekshit Sharma
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch, 
 HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch


 ImportTSV job skips bad records by default (keeps a count though). 
 -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is 
 encountered. 
 To be easily able to determine which rows are corrupted in an input, rather 
 than failing on one row at a time seems like a good feature to have.
 Moreover, there should be 'dry-run' functionality in such kinds of tools, 
 which can essentially does a quick run of tool without making any changes but 
 reporting any errors/warnings and success/failure.
 To identify corrupted rows, simply logging them should be enough. In worst 
 case, all rows will be logged and size of logs will be same as input size, 
 which seems fine. However, user might have to do some work figuring out where 
 the logs. Is there some link we can show to the user when the tool starts 
 which can help them with that?
 For the dry run, we can simply use if-else to skip over writing out KVs, and 
 any other mutations, if present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13969) AuthenticationTokenSecretManager is never stopped in RPCServer

2015-06-25 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-13969:
-
Fix Version/s: 1.3.0
   1.1.2
   1.2.0
   1.0.2
   0.98.14
   2.0.0
Affects Version/s: 0.98.13
   Status: Patch Available  (was: Open)

 AuthenticationTokenSecretManager is never stopped in RPCServer
 --

 Key: HBASE-13969
 URL: https://issues.apache.org/jira/browse/HBASE-13969
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.13
Reporter: Pankaj Kumar
Assignee: Pankaj Kumar
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-13969.patch


 AuthenticationTokenSecretManager is never stopped in RPCServer.
 {code}
 AuthenticationTokenSecretManager mgr = createSecretManager();
 if (mgr != null) {
   setSecretManager(mgr);
   mgr.start();
 }
 {code}
 It should be stopped during exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13835) KeyValueHeap.current might be in heap when exception happens in pollRealKV

2015-06-25 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-13835:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 1.3.0
   1.1.2
   1.2.0
   1.0.2
   0.98.14
   2.0.0
   Status: Resolved  (was: Patch Available)

Pushed to 0.98+ branches. Thanks for the patch [~sinago]

 KeyValueHeap.current might be in heap when exception happens in pollRealKV
 --

 Key: HBASE-13835
 URL: https://issues.apache.org/jira/browse/HBASE-13835
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Reporter: zhouyingchao
Assignee: zhouyingchao
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-13835-001.patch, HBASE-13835-002.patch, 
 HBASE-13835-branch1-001.patch, HBASE-13835_0.98.patch, 
 HBASE-13835_branch-1.0.patch, HBASE-13835_branch-1.patch, 
 HBASE-13835_branch-1.patch


 In a 0.94 hbase cluster, we found a NPE with following stack:
 {code}
 Exception in thread regionserver21600.leaseChecker 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1530)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:225)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:201)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:191)
 at 
 java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
 at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
 at java.util.PriorityQueue.poll(PriorityQueue.java:523)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:241)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.close(StoreScanner.java:355)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:237)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:4302)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:3033)
 at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:119)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 Before this NPE exception, there is an exception happens in pollRealKV, which 
 we think is the culprit of the NPE.
 {code}
 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
 java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for 
 reader reader=
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:371)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:366)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:116)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:455)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:154)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:4124)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4196)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4067)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4057)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.internalNext(HRegionServer.java:2898)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2833)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2815)
 at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:337)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1583)
 {code}
 Simply put, if there is an exception happens in pollRealKV( ), the 
 KeyValueHeap.current might be in heap. Later on, when KeyValueHeap.close( ) 
 is called, the current would be closed firstly. However, since it might still 
 be in the heap, it would either be closed again or its peek() (which is null 
 after it is closed) is 

[jira] [Issue Comment Deleted] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases

2015-06-25 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-13959:
--
Comment: was deleted

(was: Can't there be other regions splits that could go on in parallel? 
[~apurtell] have suggestions on using shared executor pools with larger scope, 
which would scale and perform better than sizing this thread pool proportional 
to some metric related to the current region.)

 Region splitting takes too long because it uses a single thread in most 
 common cases
 

 Key: HBASE-13959
 URL: https://issues.apache.org/jira/browse/HBASE-13959
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.12
Reporter: Hari Krishna Dara
Assignee: Hari Krishna Dara
 Fix For: 0.98.14

 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, 
 HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png


 When storefiles need to be split as part of a region split, the current logic 
 uses a threadpool with the size set to the size of the number of stores. 
 Since most common table setup involves only a single column family, this 
 translates to having a single store and so the threadpool is run with a 
 single thread. However, in a write heavy workload, there could be several 
 tens of storefiles in a store at the time of splitting, and with a threadpool 
 size of one, these files end up getting split sequentially.
 With a bit of tracing, I noticed that it takes on an average of 350ms to 
 create a single reference file, and splitting each storefile involves 
 creating two of these, so with a storefile count of 20, it takes about 14s 
 just to get through this phase alone (2 reference files for each storefile), 
 pushing the total time the region is offline to 18s or more. For environments 
 that are setup to fail fast, this makes the client exhaust all retries and 
 fail with NotServingRegionException.
 The fix should increase the concurrency of this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases

2015-06-25 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-13959:
--
Comment: was deleted

(was: I am able to apply patch (created with `git format-patch`) with -p1, not 
sure what is wrong. I will attach again generated with `git diff`.)

 Region splitting takes too long because it uses a single thread in most 
 common cases
 

 Key: HBASE-13959
 URL: https://issues.apache.org/jira/browse/HBASE-13959
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.12
Reporter: Hari Krishna Dara
Assignee: Hari Krishna Dara
 Fix For: 0.98.14

 Attachments: HBASE-13959-2.patch, HBASE-13959-3.patch, 
 HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png


 When storefiles need to be split as part of a region split, the current logic 
 uses a threadpool with the size set to the size of the number of stores. 
 Since most common table setup involves only a single column family, this 
 translates to having a single store and so the threadpool is run with a 
 single thread. However, in a write heavy workload, there could be several 
 tens of storefiles in a store at the time of splitting, and with a threadpool 
 size of one, these files end up getting split sequentially.
 With a bit of tracing, I noticed that it takes on an average of 350ms to 
 create a single reference file, and splitting each storefile involves 
 creating two of these, so with a storefile count of 20, it takes about 14s 
 just to get through this phase alone (2 reference files for each storefile), 
 pushing the total time the region is offline to 18s or more. For environments 
 that are setup to fail fast, this makes the client exhaust all retries and 
 fail with NotServingRegionException.
 The fix should increase the concurrency of this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.

2015-06-25 Thread Abhilash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhilash updated HBASE-13971:
-
Attachment: screenshot-1.png

 Flushes stuck since 6 hours on a regionserver.
 --

 Key: HBASE-13971
 URL: https://issues.apache.org/jira/browse/HBASE-13971
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 1.3.0
 Environment: Caused while running IntegrationTestLoadAndVerify for 20 
 M rows on cluster with 32 region servers each with max heap size of 24GBs.
Reporter: Abhilash
Priority: Critical
 Attachments: screenshot-1.png


 One region server stuck while flushing(possible deadlock). Its trying to 
 flush two regions since last 6 hours (see the screenshot).
 Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 
 mapper jobs and 100 back references. ~37 Million writes on each regionserver 
 till now but no writes happening on any other regionserver from past 6 hours  
 and their memstore size is zero(I dont know if this is related). But this 
 particular regionserver has memstore size of 9GBs from past 6 hours.
 Relevant snaps from debug dump:
 Tasks:
 ===
 Task: Flushing 
 IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd.
 Status: RUNNING:Preparing to flush by snapshotting stores in 
 8e2d075f94ce7699f416ec4ced9873cd
 Running for 22034s
 Task: Flushing 
 IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390.
 Status: RUNNING:Preparing to flush by snapshotting stores in 
 9f8d0e01a40405b835bf6e5a22a86390
 Running for 22033s
 Executors:
 ===
 ...
 Thread 139 (MemStoreFlusher.1):
   State: WAITING
   Blocked count: 139711
   Waited count: 239212
   Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
 org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305)
 
 org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011)
 org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
 org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828)
 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510)
 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75)
 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
 java.lang.Thread.run(Thread.java:745)
 Thread 137 (MemStoreFlusher.0):
   State: WAITING
   Blocked count: 138931
   Waited count: 237448
   Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
 org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305)
 
 org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)
 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011)
 org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
 

[jira] [Commented] (HBASE-13864) HColumnDescriptor should parse the output from master and from describe for ttl

2015-06-25 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601739#comment-14601739
 ] 

Ashu Pachauri commented on HBASE-13864:
---

The test failures don't seem related to the change.

 HColumnDescriptor should parse the output from master and from describe for 
 ttl
 ---

 Key: HBASE-13864
 URL: https://issues.apache.org/jira/browse/HBASE-13864
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Elliott Clark
Assignee: Ashu Pachauri
 Attachments: HBASE-13864-1.patch, HBASE-13864-2.patch, 
 HBASE-13864-3.patch, HBASE-13864.patch


 The TTL printing on HColumnDescriptor adds a human readable time. When using 
 that string for the create command it throws an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases

2015-06-25 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-13959:
--
Priority: Critical  (was: Major)

 Region splitting takes too long because it uses a single thread in most 
 common cases
 

 Key: HBASE-13959
 URL: https://issues.apache.org/jira/browse/HBASE-13959
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.12
Reporter: Hari Krishna Dara
Assignee: Hari Krishna Dara
Priority: Critical
 Fix For: 0.98.14

 Attachments: 13959-suggest.txt, HBASE-13959-2.patch, 
 HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959.patch, 
 region-split-durations-compared.png


 When storefiles need to be split as part of a region split, the current logic 
 uses a threadpool with the size set to the size of the number of stores. 
 Since most common table setup involves only a single column family, this 
 translates to having a single store and so the threadpool is run with a 
 single thread. However, in a write heavy workload, there could be several 
 tens of storefiles in a store at the time of splitting, and with a threadpool 
 size of one, these files end up getting split sequentially.
 With a bit of tracing, I noticed that it takes on an average of 350ms to 
 create a single reference file, and splitting each storefile involves 
 creating two of these, so with a storefile count of 20, it takes about 14s 
 just to get through this phase alone (2 reference files for each storefile), 
 pushing the total time the region is offline to 18s or more. For environments 
 that are setup to fail fast, this makes the client exhaust all retries and 
 fail with NotServingRegionException.
 The fix should increase the concurrency of this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >