[jira] [Created] (HBASE-18144) Forward-port the old exclusive row lock; there are scenarios where it performs better

2017-05-31 Thread stack (JIRA)
stack created HBASE-18144:
-

 Summary: Forward-port the old exclusive row lock; there are 
scenarios where it performs better
 Key: HBASE-18144
 URL: https://issues.apache.org/jira/browse/HBASE-18144
 Project: HBase
  Issue Type: Bug
  Components: Increment
Affects Versions: 1.2.5
Reporter: stack
Assignee: stack
 Fix For: 2.0.0, 1.3.2, 1.2.7


Description to follow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (HBASE-18142) Deletion of a cell deletes the previous versions too

2017-05-31 Thread Karthick (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthick updated HBASE-18142:
-
Comment: was deleted

(was: 
https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java

see this file to fix the issue. This method (public Delete addColumns(final 
byte [] family, final byte [] qualifier, final long timestamp)) only deletes 
the current version of the cell. The previous versions are not deleted.)

> Deletion of a cell deletes the previous versions too
> 
>
> Key: HBASE-18142
> URL: https://issues.apache.org/jira/browse/HBASE-18142
> Project: HBase
>  Issue Type: Bug
>  Components: API
>Reporter: Karthick
>
> When I tried to delete a cell using it's timestamp in the Hbase Shell, the 
> previous versions of the same cell also got deleted. But when I tried the 
> same using the Java API, then the previous versions are not deleted and I can 
> retrive the previous values.
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java
> see this file to fix the issue. This method (public Delete addColumns(final 
> byte [] family, final byte [] qualifier, final long timestamp)) only deletes 
> the current version of the cell. The previous versions are not deleted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18142) Deletion of a cell deletes the previous versions too

2017-05-31 Thread Karthick (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthick updated HBASE-18142:
-
Component/s: API

https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java

see this file to fix the issue. This method (public Delete addColumns(final 
byte [] family, final byte [] qualifier, final long timestamp)) only deletes 
the current version of the cell. The previous versions are not deleted.

> Deletion of a cell deletes the previous versions too
> 
>
> Key: HBASE-18142
> URL: https://issues.apache.org/jira/browse/HBASE-18142
> Project: HBase
>  Issue Type: Bug
>  Components: API
>Reporter: Karthick
>
> When I tried to delete a cell using it's timestamp in the Hbase Shell, the 
> previous versions of the same cell also got deleted. But when I tried the 
> same using the Java API, then the previous versions are not deleted and I can 
> retrive the previous values.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18142) Deletion of a cell deletes the previous versions too

2017-05-31 Thread Karthick (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthick updated HBASE-18142:
-
Description: 
When I tried to delete a cell using it's timestamp in the Hbase Shell, the 
previous versions of the same cell also got deleted. But when I tried the same 
using the Java API, then the previous versions are not deleted and I can 
retrive the previous values.

https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java

see this file to fix the issue. This method (public Delete addColumns(final 
byte [] family, final byte [] qualifier, final long timestamp)) only deletes 
the current version of the cell. The previous versions are not deleted.

  was:When I tried to delete a cell using it's timestamp in the Hbase Shell, 
the previous versions of the same cell also got deleted. But when I tried the 
same using the Java API, then the previous versions are not deleted and I can 
retrive the previous values.


> Deletion of a cell deletes the previous versions too
> 
>
> Key: HBASE-18142
> URL: https://issues.apache.org/jira/browse/HBASE-18142
> Project: HBase
>  Issue Type: Bug
>  Components: API
>Reporter: Karthick
>
> When I tried to delete a cell using it's timestamp in the Hbase Shell, the 
> previous versions of the same cell also got deleted. But when I tried the 
> same using the Java API, then the previous versions are not deleted and I can 
> retrive the previous values.
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java
> see this file to fix the issue. This method (public Delete addColumns(final 
> byte [] family, final byte [] qualifier, final long timestamp)) only deletes 
> the current version of the cell. The previous versions are not deleted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18143) [AMv2] Backoff on failed report of region transition quickly goes to astronomical time scale

2017-05-31 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-18143:
--
Attachment: HBASE-18143.master.002.patch

> [AMv2] Backoff on failed report of region transition quickly goes to 
> astronomical time scale
> 
>
> Key: HBASE-18143
> URL: https://issues.apache.org/jira/browse/HBASE-18143
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-18143.master.001.patch, 
> HBASE-18143.master.002.patch
>
>
> Testing on cluster w/ aggressive killing, if Master is killed serially a few 
> times such that is offline a good while, regionservers that want to report a 
> region transition pause too long between retries.
> Here is the regionserver reporting failures:
> {code}
>   1 2017-05-31 20:50:53,840 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#0) after 1008ms delay (Master is coming online...).
>   2 2017-05-31 20:50:54,853 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#1) after 2026ms delay (Master is coming online...).
>   3 2017-05-31 20:50:56,886 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#2) after 6084ms delay (Master is coming online...).
>   4 2017-05-31 20:51:02,976 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#3) after 30588ms delay (Master is coming online...).
>   5 2017-05-31 20:51:33,570 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#4) after 308422ms delay (Master is coming online...).
>   6 2017-05-31 20:56:41,997 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#5) after 6171203ms delay (Master is coming online...).
> {code}
> See how by the time we get to the 5th retry, we are waiting 100 minutes 
> before we'll retry. That is too long. Make retry happen more frequently. Data 
> is offline until the close is successfully reported.



--
This message was sent by Atlassian 

[jira] [Updated] (HBASE-18143) [AMv2] Backoff on failed report of region transition quickly goes to astronomical time scale

2017-05-31 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-18143:
--
Priority: Critical  (was: Major)

Marking critical because region can be offline a long time.

> [AMv2] Backoff on failed report of region transition quickly goes to 
> astronomical time scale
> 
>
> Key: HBASE-18143
> URL: https://issues.apache.org/jira/browse/HBASE-18143
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-18143.master.001.patch
>
>
> Testing on cluster w/ aggressive killing, if Master is killed serially a few 
> times such that is offline a good while, regionservers that want to report a 
> region transition pause too long between retries.
> Here is the regionserver reporting failures:
> {code}
>   1 2017-05-31 20:50:53,840 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#0) after 1008ms delay (Master is coming online...).
>   2 2017-05-31 20:50:54,853 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#1) after 2026ms delay (Master is coming online...).
>   3 2017-05-31 20:50:56,886 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#2) after 6084ms delay (Master is coming online...).
>   4 2017-05-31 20:51:02,976 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#3) after 30588ms delay (Master is coming online...).
>   5 2017-05-31 20:51:33,570 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#4) after 308422ms delay (Master is coming online...).
>   6 2017-05-31 20:56:41,997 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#5) after 6171203ms delay (Master is coming online...).
> {code}
> See how by the time we get to the 5th retry, we are waiting 100 minutes 
> before we'll retry. That is too long. Make retry happen more frequently. Data 
> is offline until the close is successfully reported.



--
This message was 

[jira] [Updated] (HBASE-18143) [AMv2] Backoff on failed report of region transition quickly goes to astronomical time scale

2017-05-31 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-18143:
--
Fix Version/s: 2.0.0
   Status: Patch Available  (was: Open)

> [AMv2] Backoff on failed report of region transition quickly goes to 
> astronomical time scale
> 
>
> Key: HBASE-18143
> URL: https://issues.apache.org/jira/browse/HBASE-18143
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0
>
> Attachments: HBASE-18143.master.001.patch
>
>
> Testing on cluster w/ aggressive killing, if Master is killed serially a few 
> times such that is offline a good while, regionservers that want to report a 
> region transition pause too long between retries.
> Here is the regionserver reporting failures:
> {code}
>   1 2017-05-31 20:50:53,840 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#0) after 1008ms delay (Master is coming online...).
>   2 2017-05-31 20:50:54,853 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#1) after 2026ms delay (Master is coming online...).
>   3 2017-05-31 20:50:56,886 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#2) after 6084ms delay (Master is coming online...).
>   4 2017-05-31 20:51:02,976 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#3) after 30588ms delay (Master is coming online...).
>   5 2017-05-31 20:51:33,570 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#4) after 308422ms delay (Master is coming online...).
>   6 2017-05-31 20:56:41,997 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#5) after 6171203ms delay (Master is coming online...).
> {code}
> See how by the time we get to the 5th retry, we are waiting 100 minutes 
> before we'll retry. That is too long. Make retry happen more frequently. Data 
> is offline until the close is successfully reported.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18143) [AMv2] Backoff on failed report of region transition quickly goes to astronomical time scale

2017-05-31 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-18143:
--
Attachment: HBASE-18143.master.001.patch

> [AMv2] Backoff on failed report of region transition quickly goes to 
> astronomical time scale
> 
>
> Key: HBASE-18143
> URL: https://issues.apache.org/jira/browse/HBASE-18143
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0
>
> Attachments: HBASE-18143.master.001.patch
>
>
> Testing on cluster w/ aggressive killing, if Master is killed serially a few 
> times such that is offline a good while, regionservers that want to report a 
> region transition pause too long between retries.
> Here is the regionserver reporting failures:
> {code}
>   1 2017-05-31 20:50:53,840 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#0) after 1008ms delay (Master is coming online...).
>   2 2017-05-31 20:50:54,853 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#1) after 2026ms delay (Master is coming online...).
>   3 2017-05-31 20:50:56,886 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#2) after 6084ms delay (Master is coming online...).
>   4 2017-05-31 20:51:02,976 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#3) after 30588ms delay (Master is coming online...).
>   5 2017-05-31 20:51:33,570 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#4) after 308422ms delay (Master is coming online...).
>   6 2017-05-31 20:56:41,997 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
> regionserver.HRegionServer: Failed report of region transition server { 
> host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 
> } transition { transition_code: CLOSED region_info { region_id: 1496284931226 
> table_name { namespace: "default" qualifier: 
> "IntegrationTestBigLinkedList" } start_key: 
> "\337\377\377\377\377\377\377\362" end_key: 
> "\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 
> } }; retry (#5) after 6171203ms delay (Master is coming online...).
> {code}
> See how by the time we get to the 5th retry, we are waiting 100 minutes 
> before we'll retry. That is too long. Make retry happen more frequently. Data 
> is offline until the close is successfully reported.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-18143) [AMv2] Backoff on failed report of region transition quickly goes to astronomical time scale

2017-05-31 Thread stack (JIRA)
stack created HBASE-18143:
-

 Summary: [AMv2] Backoff on failed report of region transition 
quickly goes to astronomical time scale
 Key: HBASE-18143
 URL: https://issues.apache.org/jira/browse/HBASE-18143
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 2.0.0
Reporter: stack
Assignee: stack


Testing on cluster w/ aggressive killing, if Master is killed serially a few 
times such that is offline a good while, regionservers that want to report a 
region transition pause too long between retries.

Here is the regionserver reporting failures:
{code}
  1 2017-05-31 20:50:53,840 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
regionserver.HRegionServer: Failed report of region transition server { 
host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 } 
transition { transition_code: CLOSED region_info { region_id: 1496284931226 
table_name { namespace: "default" qualifier: "IntegrationTestBigLinkedList" 
} start_key: "\337\377\377\377\377\377\377\362" end_key: 
"\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 } 
}; retry (#0) after 1008ms delay (Master is coming online...).
  2 2017-05-31 20:50:54,853 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
regionserver.HRegionServer: Failed report of region transition server { 
host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 } 
transition { transition_code: CLOSED region_info { region_id: 1496284931226 
table_name { namespace: "default" qualifier: "IntegrationTestBigLinkedList" 
} start_key: "\337\377\377\377\377\377\377\362" end_key: 
"\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 } 
}; retry (#1) after 2026ms delay (Master is coming online...).
  3 2017-05-31 20:50:56,886 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
regionserver.HRegionServer: Failed report of region transition server { 
host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 } 
transition { transition_code: CLOSED region_info { region_id: 1496284931226 
table_name { namespace: "default" qualifier: "IntegrationTestBigLinkedList" 
} start_key: "\337\377\377\377\377\377\377\362" end_key: 
"\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 } 
}; retry (#2) after 6084ms delay (Master is coming online...).
  4 2017-05-31 20:51:02,976 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
regionserver.HRegionServer: Failed report of region transition server { 
host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 } 
transition { transition_code: CLOSED region_info { region_id: 1496284931226 
table_name { namespace: "default" qualifier: "IntegrationTestBigLinkedList" 
} start_key: "\337\377\377\377\377\377\377\362" end_key: 
"\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 } 
}; retry (#3) after 30588ms delay (Master is coming online...).
  5 2017-05-31 20:51:33,570 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
regionserver.HRegionServer: Failed report of region transition server { 
host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 } 
transition { transition_code: CLOSED region_info { region_id: 1496284931226 
table_name { namespace: "default" qualifier: "IntegrationTestBigLinkedList" 
} start_key: "\337\377\377\377\377\377\377\362" end_key: 
"\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 } 
}; retry (#4) after 308422ms delay (Master is coming online...).
  6 2017-05-31 20:56:41,997 INFO  [RS_CLOSE_REGION-ve0542:16020-2] 
regionserver.HRegionServer: Failed report of region transition server { 
host_name: "ve0542.halxg.cloudera.com" port: 16020 start_code: 1496279470954 } 
transition { transition_code: CLOSED region_info { region_id: 1496284931226 
table_name { namespace: "default" qualifier: "IntegrationTestBigLinkedList" 
} start_key: "\337\377\377\377\377\377\377\362" end_key: 
"\352\252\252\252\252\252\252\234" offline: false split: false replica_id: 0 } 
}; retry (#5) after 6171203ms delay (Master is coming online...).
{code}

See how by the time we get to the 5th retry, we are waiting 100 minutes before 
we'll retry. That is too long. Make retry happen more frequently. Data is 
offline until the close is successfully reported.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-18142) Deletion of a cell deletes the previous versions too

2017-05-31 Thread Karthick (JIRA)
Karthick created HBASE-18142:


 Summary: Deletion of a cell deletes the previous versions too
 Key: HBASE-18142
 URL: https://issues.apache.org/jira/browse/HBASE-18142
 Project: HBase
  Issue Type: Bug
Reporter: Karthick


When I tried to delete a cell using it's timestamp in the Hbase Shell, the 
previous versions of the same cell also got deleted. But when I tried the same 
using the Java API, then the previous versions are not deleted and I can 
retrive the previous values.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17678) FilterList with MUST_PASS_ONE lead to redundancy cells returned

2017-05-31 Thread Zheng Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-17678:
-
Status: Patch Available  (was: Open)

> FilterList with MUST_PASS_ONE lead to redundancy cells returned
> ---
>
> Key: HBASE-17678
> URL: https://issues.apache.org/jira/browse/HBASE-17678
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.2.1, 1.3.0, 2.0.0
> Environment: RedHat 7.x
>Reporter: Jason Tokayer
>Assignee: Zheng Hu
> Attachments: HBASE-17678.v1.patch, HBASE-17678.v1.rough.patch, 
> TestColumnPaginationFilterDemo.java
>
>
> When combining ColumnPaginationFilter with a single-element filterList, 
> MUST_PASS_ONE and MUST_PASS_ALL give different results when there are 
> multiple cells with the same timestamp. This is unexpected since there is 
> only a single filter in the list, and I would believe that MUST_PASS_ALL and 
> MUST_PASS_ONE should only affect the behavior of the joined filter and not 
> the behavior of any one of the individual filters. If this is not a bug then 
> it would be nice if the documentation is updated to explain this nuanced 
> behavior.
> I know that there was a decision made in an earlier Hbase version to keep 
> multiple cells with the same timestamp. This is generally fine but presents 
> an issue when using the aforementioned filter combination.
> Steps to reproduce:
> In the shell create a table and insert some data:
> {code:none}
> create 'ns:tbl',{NAME => 'family',VERSIONS => 100}
> put 'ns:tbl','row','family:name','John',1
> put 'ns:tbl','row','family:name','Jane',1
> put 'ns:tbl','row','family:name','Gil',1
> put 'ns:tbl','row','family:name','Jane',1
> {code}
> Then, use a Scala client as:
> {code:none}
> import org.apache.hadoop.hbase.filter._
> import org.apache.hadoop.hbase.util.Bytes
> import org.apache.hadoop.hbase.client._
> import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
> import scala.collection.mutable._
> val config = HBaseConfiguration.create()
> config.set("hbase.zookeeper.quorum", "localhost")
> config.set("hbase.zookeeper.property.clientPort", "2181")
> val connection = ConnectionFactory.createConnection(config)
> val logicalOp = FilterList.Operator.MUST_PASS_ONE
> val limit = 1
> var resultsList = ListBuffer[String]()
> for (offset <- 0 to 20 by limit) {
>   val table = connection.getTable(TableName.valueOf("ns:tbl"))
>   val paginationFilter = new ColumnPaginationFilter(limit,offset)
>   val filterList: FilterList = new FilterList(logicalOp,paginationFilter)
>   println("@ filterList = "+filterList)
>   val results = table.get(new 
> Get(Bytes.toBytes("row")).setFilter(filterList))
>   val cells = results.rawCells()
>   if (cells != null) {
>   for (cell <- cells) {
> val value = new String(CellUtil.cloneValue(cell))
> val qualifier = new String(CellUtil.cloneQualifier(cell))
> val family = new String(CellUtil.cloneFamily(cell))
> val result = "OFFSET = "+offset+":"+family + "," + qualifier 
> + "," + value + "," + cell.getTimestamp()
> resultsList.append(result)
>   }
>   }
> }
> resultsList.foreach(println)
> {code}
> Here are the results for different limit and logicalOp settings:
> {code:none}
> Limit = 1 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 1 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 1:family,name,Gil,1
> OFFSET = 2:family,name,Jane,1
> OFFSET = 3:family,name,John,1
> Limit = 2 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 2 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 2:family,name,Jane,1
> {code}
> So, it seems that MUST_PASS_ALL gives the expected behavior, but 
> MUST_PASS_ONE does not. Furthermore, MUST_PASS_ONE seems to give only a 
> single (not-duplicated)  within a page, but not across pages.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17678) FilterList with MUST_PASS_ONE lead to redundancy cells returned

2017-05-31 Thread Zheng Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-17678:
-
Attachment: HBASE-17678.v1.patch

> FilterList with MUST_PASS_ONE lead to redundancy cells returned
> ---
>
> Key: HBASE-17678
> URL: https://issues.apache.org/jira/browse/HBASE-17678
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 2.0.0, 1.3.0, 1.2.1
> Environment: RedHat 7.x
>Reporter: Jason Tokayer
>Assignee: Zheng Hu
> Attachments: HBASE-17678.v1.patch, HBASE-17678.v1.rough.patch, 
> TestColumnPaginationFilterDemo.java
>
>
> When combining ColumnPaginationFilter with a single-element filterList, 
> MUST_PASS_ONE and MUST_PASS_ALL give different results when there are 
> multiple cells with the same timestamp. This is unexpected since there is 
> only a single filter in the list, and I would believe that MUST_PASS_ALL and 
> MUST_PASS_ONE should only affect the behavior of the joined filter and not 
> the behavior of any one of the individual filters. If this is not a bug then 
> it would be nice if the documentation is updated to explain this nuanced 
> behavior.
> I know that there was a decision made in an earlier Hbase version to keep 
> multiple cells with the same timestamp. This is generally fine but presents 
> an issue when using the aforementioned filter combination.
> Steps to reproduce:
> In the shell create a table and insert some data:
> {code:none}
> create 'ns:tbl',{NAME => 'family',VERSIONS => 100}
> put 'ns:tbl','row','family:name','John',1
> put 'ns:tbl','row','family:name','Jane',1
> put 'ns:tbl','row','family:name','Gil',1
> put 'ns:tbl','row','family:name','Jane',1
> {code}
> Then, use a Scala client as:
> {code:none}
> import org.apache.hadoop.hbase.filter._
> import org.apache.hadoop.hbase.util.Bytes
> import org.apache.hadoop.hbase.client._
> import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
> import scala.collection.mutable._
> val config = HBaseConfiguration.create()
> config.set("hbase.zookeeper.quorum", "localhost")
> config.set("hbase.zookeeper.property.clientPort", "2181")
> val connection = ConnectionFactory.createConnection(config)
> val logicalOp = FilterList.Operator.MUST_PASS_ONE
> val limit = 1
> var resultsList = ListBuffer[String]()
> for (offset <- 0 to 20 by limit) {
>   val table = connection.getTable(TableName.valueOf("ns:tbl"))
>   val paginationFilter = new ColumnPaginationFilter(limit,offset)
>   val filterList: FilterList = new FilterList(logicalOp,paginationFilter)
>   println("@ filterList = "+filterList)
>   val results = table.get(new 
> Get(Bytes.toBytes("row")).setFilter(filterList))
>   val cells = results.rawCells()
>   if (cells != null) {
>   for (cell <- cells) {
> val value = new String(CellUtil.cloneValue(cell))
> val qualifier = new String(CellUtil.cloneQualifier(cell))
> val family = new String(CellUtil.cloneFamily(cell))
> val result = "OFFSET = "+offset+":"+family + "," + qualifier 
> + "," + value + "," + cell.getTimestamp()
> resultsList.append(result)
>   }
>   }
> }
> resultsList.foreach(println)
> {code}
> Here are the results for different limit and logicalOp settings:
> {code:none}
> Limit = 1 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 1 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 1:family,name,Gil,1
> OFFSET = 2:family,name,Jane,1
> OFFSET = 3:family,name,John,1
> Limit = 2 & logicalOp = MUST_PASS_ALL:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> Limit = 2 & logicalOp = MUST_PASS_ONE:
> scala> resultsList.foreach(println)
> OFFSET = 0:family,name,Jane,1
> OFFSET = 2:family,name,Jane,1
> {code}
> So, it seems that MUST_PASS_ALL gives the expected behavior, but 
> MUST_PASS_ONE does not. Furthermore, MUST_PASS_ONE seems to give only a 
> single (not-duplicated)  within a page, but not across pages.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HBASE-18097) Save bandwidth on partial_flag_per_result in ScanResponse proto

2017-05-31 Thread Karan Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karan Mehta reassigned HBASE-18097:
---

Assignee: Karan Mehta

> Save bandwidth on partial_flag_per_result in ScanResponse proto
> ---
>
> Key: HBASE-18097
> URL: https://issues.apache.org/jira/browse/HBASE-18097
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Karan Mehta
>Assignee: Karan Mehta
>
> Currently the {{ScanResponse}} proto sends out 1 bit per {{Result}} that it 
> has embeds inside the {{CellScanner}} to indicate if it is partial or not. 
> {code}
> // In every RPC response there should be at most a single partial result. 
> Furthermore, if
> // there is a partial result, it is guaranteed to be in the last position 
> of the array.
> {code}
> According to client, only the last result can be partial, thus this repeated 
> bool can be converted to a bool, thus reducing overhead of serialization and 
> deserialization of the array. This will break wire compatibility therefore 
> this is something to look for in upcoming versions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14614) Procedure v2: Core Assignment Manager

2017-05-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032346#comment-16032346
 ] 

Hudson commented on HBASE-14614:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #3113 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/3113/])
HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi) (stack: 
rev 3975bbd008b9341e8188406f4ef1a3257335fadc)
* (edit) hbase-protocol-shaded/src/main/protobuf/Admin.proto
* (add) 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterMetrics.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/AbstractStateMachineRegionProcedure.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestMergeTableRegionsProcedure.java
* (edit) 
hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/MetricsAssignmentManagerSource.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestAssignmentManager.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
* (edit) 
hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/MasterStatusTmpl.jamon
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSecureAsyncWALReplay.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin2.java
* (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/ProcedureInfo.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/TableProcedureInterface.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncTableGetMultiThreaded.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncRegionAdminApi.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplit.java
* (edit) 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureInMemoryChore.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ProcedureSyncWait.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestMasterProcedureWalLease.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/NoSuchProcedureException.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterBalanceThrottling.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestWarmupRegion.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckReplicas.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/TestMobExportSnapshot.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestModifyNamespaceProcedure.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionFileSystem.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MetricsAssignmentManager.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterWalManager.java
* (delete) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MergeTableRegionsProcedure.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestModifyTableProcedure.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/namespace/NamespaceAuditor.java
* (edit) 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureEvent.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/DeleteColumnFamilyProcedure.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestEnableTable.java
* (edit) 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/GCRegionProcedure.java
* (delete) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/SplitTableRegionProcedure.java
* (edit) 

[jira] [Updated] (HBASE-18132) Low replication should be checked in period in case of datanode rolling upgrade

2017-05-31 Thread Allan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-18132:
---
Attachment: HBASE-18132.patch

Added a patch for master branch. And the failed UTs in branch-1 are not related 
& passed locally

> Low replication should be checked in period in case of datanode rolling 
> upgrade
> ---
>
> Key: HBASE-18132
> URL: https://issues.apache.org/jira/browse/HBASE-18132
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.1.10
>Reporter: Allan Yang
>Assignee: Allan Yang
> Attachments: HBASE-18132-branch-1.patch, 
> HBASE-18132-branch-1.v2.patch, HBASE-18132-branch-1.v3.patch, 
> HBASE-18132-branch-1.v4.patch, HBASE-18132.patch
>
>
> For now, we just check low replication of WALs when there is a sync operation 
> (HBASE-2234), rolling the log if the replica of the WAL is less than 
> configured. But if the WAL has very little writes or no writes at all, low 
> replication will not be detected and thus no log will be rolled. 
> That is a problem when rolling updating datanode, all replica of the WAL with 
> no writes will be restarted and lead to the WAL file end up with a abnormal 
> state. Later operation of opening this file will be always failed.
> I bring up a patch to check low replication of WALs at a configured period. 
> When rolling updating datanodes, we just make sure the restart interval time 
> between two nodes is bigger than the low replication check time, the WAL will 
> be closed and rolled normally. A UT in the patch will show everything.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18054) log when we add/remove failed servers in client

2017-05-31 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18054:
---
Status: Open  (was: Patch Available)

TestFailedServers is missing the required ASF licence header. Please add it.

> log when we add/remove failed servers in client
> ---
>
> Key: HBASE-18054
> URL: https://issues.apache.org/jira/browse/HBASE-18054
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Operability
>Affects Versions: 1.3.0
>Reporter: Sean Busbey
>Assignee: Ali
> Attachments: HBASE-18054.patch, HBASE-18054.v2.master.patch
>
>
> Currently we log if a server is in the failed server list when we go to 
> connect to it, but we don't log anything about when the server got into the 
> list.
> This means we have to search the log for errors involving the same server 
> name that (hopefully) managed to get into the log within 
> {{FAILED_SERVER_EXPIRY_KEY}} milliseconds earlier (default 2 seconds).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18111) Replication stuck when cluster connection is closed

2017-05-31 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18111:
---
Attachment: HBASE-18111-v2.patch

bq. HBaseInterClusterReplicationEndpoint is a hbase client and write entries to 
peer cluster. We should handle the connection close case no matter what reason 
lead it? And now the replication will stuck in the while loop.

Sounds reasonable to me. Here's a v2 patch rebased on latest master for another 
round of HadoopQA

> Replication stuck when cluster connection is closed
> ---
>
> Key: HBASE-18111
> URL: https://issues.apache.org/jira/browse/HBASE-18111
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 0.98.24, 1.1.10
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-18111.patch, HBASE-18111-v1.patch, 
> HBASE-18111-v2.patch
>
>
> Log:
> {code}
> 2017-05-24,03:01:25,603 ERROR [regionserver13700-SendThread(hostxxx:11000)] 
> org.apache.zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum 
> member failed: javax.security.sasl.SaslException: An error: 
> (java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
> GSS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper 
> Quorum Member's  received SASL token. Zookeeper Client will go to AUTH_FAILED 
> state.
> 2017-05-24,03:01:25,615 FATAL [regionserver13700-EventThread] 
> org.apache.hadoop.hbase.client.HConnectionImplementation: 
> hconnection-0x1148dd9b-0x35b6b4d4ca999c6, 
> quorum=10.108.37.30:11000,10.108.38.30:11000,10.108.39.30:11000,10.108.84.25:11000,10.108.84.32:11000,
>  baseZNode=/hbase/c3prc-xiaomi98 hconnection-0x1148dd9b-0x35b6b4d4ca999c6 
> received auth failed from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:425)
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:333)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-05-24,03:01:25,615 INFO [regionserver13700-EventThread] 
> org.apache.hadoop.hbase.client.HConnectionImplementation: Closing zookeeper 
> sessionid=0x35b6b4d4ca999c6
> 2017-05-24,03:01:25,623 WARN [regionserver13700.replicationSource,800] 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
>  Replicate edites to peer cluster failed.
> java.io.IOException: Call to hostxxx/10.136.22.6:24600 failed on local 
> exception: java.io.IOException: Connection closed
> {code}
> jstack
> {code}
>  java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.sleepForRetries(HBaseInterClusterReplicationEndpoint.java:127)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:199)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:905)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:492)
> {code}
> The cluster connection was aborted when the ZookeeperWatcher receive a 
> AuthFailed event. Then the HBaseInterClusterReplicationEndpoint's replicate() 
> method will stuck in a while loop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18126) Increment class

2017-05-31 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18126:
---
Attachment: 18126.v6.txt

Patch for Increment.

Append would be added in another JIRA.

> Increment class
> ---
>
> Key: HBASE-18126
> URL: https://issues.apache.org/jira/browse/HBASE-18126
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 18126.v6.txt
>
>
> These Increment objects are used by the Table implementation to perform 
> increment operation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18054) log when we add/remove failed servers in client

2017-05-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032287#comment-16032287
 ] 

Hadoop QA commented on HBASE-18054:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
33s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
44s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
46s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
58m 10s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 25s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 12s 
{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 77m 55s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870696/HBASE-18054.v2.master.patch
 |
| JIRA Issue | HBASE-18054 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux de3c46e10452 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 140ce14 |
| Default Java | 1.8.0_131 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7030/testReport/ |
| asflicense | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7030/artifact/patchprocess/patch-asflicense-problems.txt
 |
| modules | C: hbase-client U: hbase-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7030/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> log when we add/remove failed servers in client
> ---
>
> Key: HBASE-18054
> URL: https://issues.apache.org/jira/browse/HBASE-18054
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Operability
>Affects Versions: 1.3.0
>Reporter: Sean Busbey
>

[jira] [Updated] (HBASE-17959) Canary timeout should be configurable on a per-table basis

2017-05-31 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-17959:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to master and branch-1

> Canary timeout should be configurable on a per-table basis
> --
>
> Key: HBASE-17959
> URL: https://issues.apache.org/jira/browse/HBASE-17959
> Project: HBase
>  Issue Type: Improvement
>  Components: canary
>Reporter: Andrew Purtell
>Assignee: Chinmay Kulkarni
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17959.002.patch, HBASE-17959.003.patch, 
> HBASE-17959.004.patch, HBASE-17959-branch-1.patch, HBASE-17959.patch
>
>
> The Canary read and write timeouts should be configurable on a per-table 
> basis, for cases where different tables have different latency SLAs. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-14614) Procedure v2: Core Assignment Manager

2017-05-31 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14614:
--
Attachment: (was: HBASE-14614.master.013.patch)

> Procedure v2: Core Assignment Manager
> -
>
> Key: HBASE-14614
> URL: https://issues.apache.org/jira/browse/HBASE-14614
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Stephen Yuan Jiang
>Assignee: Matteo Bertozzi
> Fix For: 2.0.0
>
> Attachments: HBASE-14614.master.003.patch, 
> HBASE-14614.master.004.patch, HBASE-14614.master.005.patch, 
> HBASE-14614.master.006.patch, HBASE-14614.master.007.patch, 
> HBASE-14614.master.008.patch, HBASE-14614.master.009.patch, 
> HBASE-14614.master.010.patch, HBASE-14614.master.014.patch, 
> HBASE-14614.master.015.patch, HBASE-14614.master.017.patch, 
> HBASE-14614.master.018.patch, HBASE-14614.master.019.patch, 
> HBASE-14614.master.020.patch, HBASE-14614.master.022.patch, 
> HBASE-14614.master.023.patch, HBASE-14614.master.024.patch, 
> HBASE-14614.master.025.patch, HBASE-14614.master.026.patch, 
> HBASE-14614.master.027.patch, HBASE-14614.master.028.patch, 
> HBASE-14614.master.029.patch, HBASE-14614.master.030.patch, 
> HBASE-14614.master.033.patch, HBASE-14614.master.038.patch, 
> HBASE-14614.master.039.patch, HBASE-14614.master.040.patch, 
> HBASE-14614.master.041.patch, HBASE-14614.master.042.patch, 
> HBASE-14614.master.043.patch, HBASE-14614.master.044.patch, 
> HBASE-14614.master.045.patch, HBASE-14614.master.045.patch, 
> HBASE-14614.master.046.patch, HBASE-14614.master.047.patch, 
> HBASE-14614.master.048.patch, HBASE-14614.master.049.patch, 
> HBASE-14614.master.050.patch, HBASE-14614.master.051.patch
>
>
> New AssignmentManager implemented using proc-v2.
>  - AssignProcedure handle assignment operation
>  - UnassignProcedure handle unassign operation
>  - MoveRegionProcedure handle move/balance operation
> Concurrent Assign operations are batched together and sent to the balancer
> Concurrent Assign and Unassign operation ready to be sent to the RS are 
> batched together
> This patch is an intermediate state where we add the new AM as 
> AssignmentManager2() to the master, to be reached by tests. but the new AM 
> will not be integrated with the rest of the system. Only new am unit-tests 
> will exercise the new assigment manager. The integration with the master code 
> is part of HBASE-14616



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17959) Canary timeout should be configurable on a per-table basis

2017-05-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032280#comment-16032280
 ] 

Hadoop QA commented on HBASE-17959:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
42s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
43s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
39s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
28m 31s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 115m 38s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
28s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 157m 11s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.13.1 Server=1.13.1 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870667/HBASE-17959.004.patch 
|
| JIRA Issue | HBASE-17959 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 7f0f12ad6406 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 140ce14 |
| Default Java | 1.8.0_131 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7029/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7029/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Canary timeout should be configurable on a per-table basis
> --
>
> Key: HBASE-17959
> URL: https://issues.apache.org/jira/browse/HBASE-17959
> Project: HBase
>  Issue Type: Improvement
>  Components: canary
>Reporter: Andrew Purtell
>Assignee: Chinmay Kulkarni
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> 

[jira] [Commented] (HBASE-14614) Procedure v2: Core Assignment Manager

2017-05-31 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032279#comment-16032279
 ] 

stack commented on HBASE-14614:
---

Merged to master after DISCUSSION and VOTE. See 
http://apache-hbase.679495.n3.nabble.com/VOTE-Merge-new-Assignment-Manager-AMv2-HBASE-14614-td4088220.html

Thanks all for reviews, discussion, and vote.

Will work on issues that are sure to happen after this merge.

> Procedure v2: Core Assignment Manager
> -
>
> Key: HBASE-14614
> URL: https://issues.apache.org/jira/browse/HBASE-14614
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Stephen Yuan Jiang
>Assignee: Matteo Bertozzi
> Fix For: 2.0.0
>
> Attachments: HBASE-14614.master.003.patch, 
> HBASE-14614.master.004.patch, HBASE-14614.master.005.patch, 
> HBASE-14614.master.006.patch, HBASE-14614.master.007.patch, 
> HBASE-14614.master.008.patch, HBASE-14614.master.009.patch, 
> HBASE-14614.master.010.patch, HBASE-14614.master.013.patch, 
> HBASE-14614.master.014.patch, HBASE-14614.master.015.patch, 
> HBASE-14614.master.017.patch, HBASE-14614.master.018.patch, 
> HBASE-14614.master.019.patch, HBASE-14614.master.020.patch, 
> HBASE-14614.master.022.patch, HBASE-14614.master.023.patch, 
> HBASE-14614.master.024.patch, HBASE-14614.master.025.patch, 
> HBASE-14614.master.026.patch, HBASE-14614.master.027.patch, 
> HBASE-14614.master.028.patch, HBASE-14614.master.029.patch, 
> HBASE-14614.master.030.patch, HBASE-14614.master.033.patch, 
> HBASE-14614.master.038.patch, HBASE-14614.master.039.patch, 
> HBASE-14614.master.040.patch, HBASE-14614.master.041.patch, 
> HBASE-14614.master.042.patch, HBASE-14614.master.043.patch, 
> HBASE-14614.master.044.patch, HBASE-14614.master.045.patch, 
> HBASE-14614.master.045.patch, HBASE-14614.master.046.patch, 
> HBASE-14614.master.047.patch, HBASE-14614.master.048.patch, 
> HBASE-14614.master.049.patch, HBASE-14614.master.050.patch, 
> HBASE-14614.master.051.patch
>
>
> New AssignmentManager implemented using proc-v2.
>  - AssignProcedure handle assignment operation
>  - UnassignProcedure handle unassign operation
>  - MoveRegionProcedure handle move/balance operation
> Concurrent Assign operations are batched together and sent to the balancer
> Concurrent Assign and Unassign operation ready to be sent to the RS are 
> batched together
> This patch is an intermediate state where we add the new AM as 
> AssignmentManager2() to the master, to be reached by tests. but the new AM 
> will not be integrated with the rest of the system. Only new am unit-tests 
> will exercise the new assigment manager. The integration with the master code 
> is part of HBASE-14616



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-14614) Procedure v2: Core Assignment Manager

2017-05-31 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14614:
--
Attachment: (was: HBASE-14614.master.012.patch)

> Procedure v2: Core Assignment Manager
> -
>
> Key: HBASE-14614
> URL: https://issues.apache.org/jira/browse/HBASE-14614
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Stephen Yuan Jiang
>Assignee: Matteo Bertozzi
> Fix For: 2.0.0
>
> Attachments: HBASE-14614.master.003.patch, 
> HBASE-14614.master.004.patch, HBASE-14614.master.005.patch, 
> HBASE-14614.master.006.patch, HBASE-14614.master.007.patch, 
> HBASE-14614.master.008.patch, HBASE-14614.master.009.patch, 
> HBASE-14614.master.010.patch, HBASE-14614.master.013.patch, 
> HBASE-14614.master.014.patch, HBASE-14614.master.015.patch, 
> HBASE-14614.master.017.patch, HBASE-14614.master.018.patch, 
> HBASE-14614.master.019.patch, HBASE-14614.master.020.patch, 
> HBASE-14614.master.022.patch, HBASE-14614.master.023.patch, 
> HBASE-14614.master.024.patch, HBASE-14614.master.025.patch, 
> HBASE-14614.master.026.patch, HBASE-14614.master.027.patch, 
> HBASE-14614.master.028.patch, HBASE-14614.master.029.patch, 
> HBASE-14614.master.030.patch, HBASE-14614.master.033.patch, 
> HBASE-14614.master.038.patch, HBASE-14614.master.039.patch, 
> HBASE-14614.master.040.patch, HBASE-14614.master.041.patch, 
> HBASE-14614.master.042.patch, HBASE-14614.master.043.patch, 
> HBASE-14614.master.044.patch, HBASE-14614.master.045.patch, 
> HBASE-14614.master.045.patch, HBASE-14614.master.046.patch, 
> HBASE-14614.master.047.patch, HBASE-14614.master.048.patch, 
> HBASE-14614.master.049.patch, HBASE-14614.master.050.patch, 
> HBASE-14614.master.051.patch
>
>
> New AssignmentManager implemented using proc-v2.
>  - AssignProcedure handle assignment operation
>  - UnassignProcedure handle unassign operation
>  - MoveRegionProcedure handle move/balance operation
> Concurrent Assign operations are batched together and sent to the balancer
> Concurrent Assign and Unassign operation ready to be sent to the RS are 
> batched together
> This patch is an intermediate state where we add the new AM as 
> AssignmentManager2() to the master, to be reached by tests. but the new AM 
> will not be integrated with the rest of the system. Only new am unit-tests 
> will exercise the new assigment manager. The integration with the master code 
> is part of HBASE-14616



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14614) Procedure v2: Core Assignment Manager

2017-05-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032268#comment-16032268
 ] 

Hadoop QA commented on HBASE-14614:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 108 new or modified 
test files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
1s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 33s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 
57s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
37s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 2s 
{color} | {color:red} hbase-protocol-shaded in master has 24 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
41s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 655 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 18s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 2m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 3s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 59s 
{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s 
{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 37s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 108m 8s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 40s 
{color} | {color:green} hbase-rsgroup in the 

[jira] [Created] (HBASE-18141) Regionserver fails to shutdown when abort triggered in RegionScannerImpl during RPC call

2017-05-31 Thread Gary Helmling (JIRA)
Gary Helmling created HBASE-18141:
-

 Summary: Regionserver fails to shutdown when abort triggered in 
RegionScannerImpl during RPC call
 Key: HBASE-18141
 URL: https://issues.apache.org/jira/browse/HBASE-18141
 Project: HBase
  Issue Type: Bug
  Components: regionserver, security
Affects Versions: 1.3.1
Reporter: Gary Helmling
Assignee: Gary Helmling
Priority: Critical
 Fix For: 1.3.2


When an abort is triggered within the RPC call path by 
HRegion.RegionScannerImpl, AccessController is incorrectly apply the RPC caller 
identity in the RegionServerObserver.preStopRegionServer() hook.  This leaves 
the regionserver in a non-responsive state, where its regions are not 
reassigned and it returns exceptions for all requests.

When an abort is triggered on the server side, we should not allow a 
coprocessor to reject the abort at all.

Here is a sample stack trace:
{noformat}
17/05/25 06:10:29 FATAL regionserver.HRegionServer: RegionServer abort: loaded 
coprocessors are: [org.apache.hadoop.hbase.security.access.AccessController, 
org.apache.hadoop.hbase.security.token.TokenProvider]
17/05/25 06:10:29 WARN regionserver.HRegionServer: The region server did not 
stop
org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
permissions for user 'rpcuser' (global, action=ADMIN)
at 
org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:548)
at 
org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522)
at 
org.apache.hadoop.hbase.security.access.AccessController.preStopRegionServer(AccessController.java:2501)
at 
org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost$1.call(RegionServerCoprocessorHost.java:86)
at 
org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.execShutdown(RegionServerCoprocessorHost.java:300)
at 
org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.preStop(RegionServerCoprocessorHost.java:82)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1905)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2118)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2125)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.abortRegionServer(HRegion.java:6326)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6319)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5941)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6084)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5858)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2649)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2320)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
{noformat}

I haven't yet evaluated which other release branches this might apply to.

I have a patch currently in progress, which I will post as soon as I complete a 
test case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-16261) MultiHFileOutputFormat Enhancement

2017-05-31 Thread Jerry He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry He updated HBASE-16261:
-
Priority: Major  (was: Minor)

>  MultiHFileOutputFormat Enhancement 
> 
>
> Key: HBASE-16261
> URL: https://issues.apache.org/jira/browse/HBASE-16261
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbase, mapreduce
>Affects Versions: 2.0.0
>Reporter: Yi Liang
>Assignee: Yi Liang
> Fix For: 2.0.0
>
> Attachments: HBASE-16261-V1.patch, HBASE-16261-V2.patch, 
> HBASE-16261-V3.patch, HBASE-16261-V4.patch, HBASE-16261-V5.patch, 
> HBase-16261-V6.patch, HBase-16261-V7.patch, HBase-16261-V8.patch, 
> HBase-16261-V9.patch
>
>
> Change MultiHFileOutputFormat to MultiTableHFileOutputFormat, Continuing work 
> to enhance the MultiTableHFileOutputFormat to make it more usable:
> MultiTableHFileOutputFormat follow HFileOutputFormat2
> (1) HFileOutputFormat2 can read one table's region split keys. and then 
> output multiple hfiles for one family, and each hfile map to one region. We 
> can add partitioner in MultiTableHFileOutputFormat to make it support this 
> feature.
> (2) HFileOutputFormat2 support Customized Compression algorithm for column 
> family and BloomFilter, also support customized DataBlockEncoding for the 
> output hfiles. We can also make MultiTableHFileOutputFormat to support these 
> features.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16261) MultiHFileOutputFormat Enhancement

2017-05-31 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032229#comment-16032229
 ] 

Jerry He commented on HBASE-16261:
--

I will commit it soon if there is no more review comment.

>  MultiHFileOutputFormat Enhancement 
> 
>
> Key: HBASE-16261
> URL: https://issues.apache.org/jira/browse/HBASE-16261
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbase, mapreduce
>Affects Versions: 2.0.0
>Reporter: Yi Liang
>Assignee: Yi Liang
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16261-V1.patch, HBASE-16261-V2.patch, 
> HBASE-16261-V3.patch, HBASE-16261-V4.patch, HBASE-16261-V5.patch, 
> HBase-16261-V6.patch, HBase-16261-V7.patch, HBase-16261-V8.patch, 
> HBase-16261-V9.patch
>
>
> Change MultiHFileOutputFormat to MultiTableHFileOutputFormat, Continuing work 
> to enhance the MultiTableHFileOutputFormat to make it more usable:
> MultiTableHFileOutputFormat follow HFileOutputFormat2
> (1) HFileOutputFormat2 can read one table's region split keys. and then 
> output multiple hfiles for one family, and each hfile map to one region. We 
> can add partitioner in MultiTableHFileOutputFormat to make it support this 
> feature.
> (2) HFileOutputFormat2 support Customized Compression algorithm for column 
> family and BloomFilter, also support customized DataBlockEncoding for the 
> output hfiles. We can also make MultiTableHFileOutputFormat to support these 
> features.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18054) log when we add/remove failed servers in client

2017-05-31 Thread Ali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ali updated HBASE-18054:

Status: Patch Available  (was: Open)

Unit test added in HBASE-18054.v2.master.patch

> log when we add/remove failed servers in client
> ---
>
> Key: HBASE-18054
> URL: https://issues.apache.org/jira/browse/HBASE-18054
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Operability
>Affects Versions: 1.3.0
>Reporter: Sean Busbey
>Assignee: Ali
> Attachments: HBASE-18054.patch, HBASE-18054.v2.master.patch
>
>
> Currently we log if a server is in the failed server list when we go to 
> connect to it, but we don't log anything about when the server got into the 
> list.
> This means we have to search the log for errors involving the same server 
> name that (hopefully) managed to get into the log within 
> {{FAILED_SERVER_EXPIRY_KEY}} milliseconds earlier (default 2 seconds).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18054) log when we add/remove failed servers in client

2017-05-31 Thread Ali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ali updated HBASE-18054:

Attachment: HBASE-18054.v2.master.patch

> log when we add/remove failed servers in client
> ---
>
> Key: HBASE-18054
> URL: https://issues.apache.org/jira/browse/HBASE-18054
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Operability
>Affects Versions: 1.3.0
>Reporter: Sean Busbey
>Assignee: Ali
> Attachments: HBASE-18054.patch, HBASE-18054.v2.master.patch
>
>
> Currently we log if a server is in the failed server list when we go to 
> connect to it, but we don't log anything about when the server got into the 
> list.
> This means we have to search the log for errors involving the same server 
> name that (hopefully) managed to get into the log within 
> {{FAILED_SERVER_EXPIRY_KEY}} milliseconds earlier (default 2 seconds).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18054) log when we add/remove failed servers in client

2017-05-31 Thread Ali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ali updated HBASE-18054:

Status: Open  (was: Patch Available)

> log when we add/remove failed servers in client
> ---
>
> Key: HBASE-18054
> URL: https://issues.apache.org/jira/browse/HBASE-18054
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Operability
>Affects Versions: 1.3.0
>Reporter: Sean Busbey
>Assignee: Ali
> Attachments: HBASE-18054.patch, HBASE-18054.v2.master.patch
>
>
> Currently we log if a server is in the failed server list when we go to 
> connect to it, but we don't log anything about when the server got into the 
> list.
> This means we have to search the log for errors involving the same server 
> name that (hopefully) managed to get into the log within 
> {{FAILED_SERVER_EXPIRY_KEY}} milliseconds earlier (default 2 seconds).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HBASE-17495) TestHRegionWithInMemoryFlush#testFlushCacheWhileScanning intermittently fails due to assertion error

2017-05-31 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-17495.

Resolution: Cannot Reproduce

> TestHRegionWithInMemoryFlush#testFlushCacheWhileScanning intermittently fails 
> due to assertion error
> 
>
> Key: HBASE-17495
> URL: https://issues.apache.org/jira/browse/HBASE-17495
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Critical
> Attachments: 17495-dbg.txt, 
> 17495-testHRegionWithInMemoryFlush-output-2.0123, 
> testHRegionWithInMemoryFlush-flush-output.0123, 
> TestHRegionWithInMemoryFlush-out.0222.tar.gz, 
> TestHRegionWithInMemoryFlush-out.0301, 
> TestHRegionWithInMemoryFlush-out.613bcb3622ecb1783c030f34ea2975280e1c43c1, 
> testHRegionWithInMemoryFlush-output.0119, test-in-mem.trunk.2597
>
>
> Looping through the test (based on commit 
> 76dc957f64fa38ce88694054db7dbf590f368ae7), I saw the following test failure:
> {code}
> testFlushCacheWhileScanning(org.apache.hadoop.hbase.regionserver.TestHRegionWithInMemoryFlush)
>   Time elapsed: 0.53 sec  <<< FAILURE!
> java.lang.AssertionError: toggle=false i=940 ts=1484852861597 expected:<94> 
> but was:<92>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testFlushCacheWhileScanning(TestHRegion.java:3533)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> {code}
> See test output for details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17959) Canary timeout should be configurable on a per-table basis

2017-05-31 Thread Chinmay Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinmay Kulkarni updated HBASE-17959:
-
Attachment: HBASE-17959.004.patch

Small change for code style.

> Canary timeout should be configurable on a per-table basis
> --
>
> Key: HBASE-17959
> URL: https://issues.apache.org/jira/browse/HBASE-17959
> Project: HBase
>  Issue Type: Improvement
>  Components: canary
>Reporter: Andrew Purtell
>Assignee: Chinmay Kulkarni
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17959.002.patch, HBASE-17959.003.patch, 
> HBASE-17959.004.patch, HBASE-17959-branch-1.patch, HBASE-17959.patch
>
>
> The Canary read and write timeouts should be configurable on a per-table 
> basis, for cases where different tables have different latency SLAs. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17949) Shorten the execution time of TestBackupMultipleDeletes

2017-05-31 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17949:
---
Labels: backup  (was: )

> Shorten the execution time of TestBackupMultipleDeletes
> ---
>
> Key: HBASE-17949
> URL: https://issues.apache.org/jira/browse/HBASE-17949
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Minor
>  Labels: backup
>
> On my Mac, TestBackupMultipleDeletes took 10 minutes to run.
> The test performs 5 incremental backups in total for 3 tables.
> We can reduce the number of incremental backups so that runtime comes down by 
> a few minutes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18086) Create native client which creates load on selected cluster

2017-05-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032047#comment-16032047
 ] 

Ted Yu commented on HBASE-18086:


You can specify value for parameter(s) with the following syntax (prefixing 
with --):

--num_rows=123

> Create native client which creates load on selected cluster
> ---
>
> Key: HBASE-18086
> URL: https://issues.apache.org/jira/browse/HBASE-18086
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 18086.v1.txt
>
>
> This task is to create a client which uses multiple threads to conduct Puts 
> followed by Gets against selected cluster.
> Default is to run the tool against local cluster.
> This would give us some idea on the characteristics of native client in terms 
> of handling high load.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-16392) Backup delete fault tolerance

2017-05-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032018#comment-16032018
 ] 

Ted Yu edited comment on HBASE-16392 at 5/31/17 9:44 PM:
-

Please take a look at MasterSyncObserver in 
hbase-server/src/test//java/org/apache/hadoop/hbase/snapshot/TestSnapshotClientRetries.java

You can register observer for preSnapshot / preRestoreSnapshot / 
preDeleteSnapshot hook which introduces failure at certain step in the flow.


was (Author: yuzhih...@gmail.com):
Please take a look at MasterSyncObserver in 
hbase-server/src/test//java/org/apache/hadoop/hbase/snapshot/TestSnapshotClientRetries.java

You can register observer for preSnapshot / preRestoreSnapshot hook which 
introduces failure at certain step in the flow.

> Backup delete fault tolerance
> -
>
> Key: HBASE-16392
> URL: https://issues.apache.org/jira/browse/HBASE-16392
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: HBASE-16392-v1.patch, HBASE-16392-v2.patch
>
>
> Backup delete modified file system and backup system table. We have to make 
> sure that operation is atomic, durable and isolated.
> Delete operation:
> # Start backup session (this guarantees) that system will be blocked for all 
> backup commands during delete operation
> # Save list of tables being deleted to system table
> # Before delete operation we take backup system table snapshot  
> # During delete operation we detect any failures and restore backup system 
> table from snapshot, then finish backup session
> # To guarantee consistency of the data, delete operation MUST be repeated
> # We guarantee that all file delete operations are idempotent, can be 
> repeated multiple times
> # Any backup operations will be blocked until consistency is restored
> # To restore consistency, repair command must be executed.
> # Repair command checks if there is failed delete op in a backup system 
> table, and repeats delete operation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17959) Canary timeout should be configurable on a per-table basis

2017-05-31 Thread Chinmay Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinmay Kulkarni updated HBASE-17959:
-
Attachment: HBASE-17959-branch-1.patch

Attaching patch for branch-1.

> Canary timeout should be configurable on a per-table basis
> --
>
> Key: HBASE-17959
> URL: https://issues.apache.org/jira/browse/HBASE-17959
> Project: HBase
>  Issue Type: Improvement
>  Components: canary
>Reporter: Andrew Purtell
>Assignee: Chinmay Kulkarni
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17959.002.patch, HBASE-17959.003.patch, 
> HBASE-17959-branch-1.patch, HBASE-17959.patch
>
>
> The Canary read and write timeouts should be configurable on a per-table 
> basis, for cases where different tables have different latency SLAs. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16392) Backup delete fault tolerance

2017-05-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032018#comment-16032018
 ] 

Ted Yu commented on HBASE-16392:


Please take a look at MasterSyncObserver in 
hbase-server/src/test//java/org/apache/hadoop/hbase/snapshot/TestSnapshotClientRetries.java

You can register observer for preSnapshot / preRestoreSnapshot hook which 
introduces failure at certain step in the flow.

> Backup delete fault tolerance
> -
>
> Key: HBASE-16392
> URL: https://issues.apache.org/jira/browse/HBASE-16392
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: HBASE-16392-v1.patch, HBASE-16392-v2.patch
>
>
> Backup delete modified file system and backup system table. We have to make 
> sure that operation is atomic, durable and isolated.
> Delete operation:
> # Start backup session (this guarantees) that system will be blocked for all 
> backup commands during delete operation
> # Save list of tables being deleted to system table
> # Before delete operation we take backup system table snapshot  
> # During delete operation we detect any failures and restore backup system 
> table from snapshot, then finish backup session
> # To guarantee consistency of the data, delete operation MUST be repeated
> # We guarantee that all file delete operations are idempotent, can be 
> repeated multiple times
> # Any backup operations will be blocked until consistency is restored
> # To restore consistency, repair command must be executed.
> # Repair command checks if there is failed delete op in a backup system 
> table, and repeats delete operation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-14614) Procedure v2: Core Assignment Manager

2017-05-31 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14614:
--
Attachment: HBASE-14614.master.051.patch

> Procedure v2: Core Assignment Manager
> -
>
> Key: HBASE-14614
> URL: https://issues.apache.org/jira/browse/HBASE-14614
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Stephen Yuan Jiang
>Assignee: Matteo Bertozzi
> Fix For: 2.0.0
>
> Attachments: HBASE-14614.master.003.patch, 
> HBASE-14614.master.004.patch, HBASE-14614.master.005.patch, 
> HBASE-14614.master.006.patch, HBASE-14614.master.007.patch, 
> HBASE-14614.master.008.patch, HBASE-14614.master.009.patch, 
> HBASE-14614.master.010.patch, HBASE-14614.master.012.patch, 
> HBASE-14614.master.013.patch, HBASE-14614.master.014.patch, 
> HBASE-14614.master.015.patch, HBASE-14614.master.017.patch, 
> HBASE-14614.master.018.patch, HBASE-14614.master.019.patch, 
> HBASE-14614.master.020.patch, HBASE-14614.master.022.patch, 
> HBASE-14614.master.023.patch, HBASE-14614.master.024.patch, 
> HBASE-14614.master.025.patch, HBASE-14614.master.026.patch, 
> HBASE-14614.master.027.patch, HBASE-14614.master.028.patch, 
> HBASE-14614.master.029.patch, HBASE-14614.master.030.patch, 
> HBASE-14614.master.033.patch, HBASE-14614.master.038.patch, 
> HBASE-14614.master.039.patch, HBASE-14614.master.040.patch, 
> HBASE-14614.master.041.patch, HBASE-14614.master.042.patch, 
> HBASE-14614.master.043.patch, HBASE-14614.master.044.patch, 
> HBASE-14614.master.045.patch, HBASE-14614.master.045.patch, 
> HBASE-14614.master.046.patch, HBASE-14614.master.047.patch, 
> HBASE-14614.master.048.patch, HBASE-14614.master.049.patch, 
> HBASE-14614.master.050.patch, HBASE-14614.master.051.patch
>
>
> New AssignmentManager implemented using proc-v2.
>  - AssignProcedure handle assignment operation
>  - UnassignProcedure handle unassign operation
>  - MoveRegionProcedure handle move/balance operation
> Concurrent Assign operations are batched together and sent to the balancer
> Concurrent Assign and Unassign operation ready to be sent to the RS are 
> batched together
> This patch is an intermediate state where we add the new AM as 
> AssignmentManager2() to the master, to be reached by tests. but the new AM 
> will not be integrated with the rest of the system. Only new am unit-tests 
> will exercise the new assigment manager. The integration with the master code 
> is part of HBASE-14616



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14614) Procedure v2: Core Assignment Manager

2017-05-31 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031989#comment-16031989
 ] 

stack commented on HBASE-14614:
---

Passes locally and is on the flake test. Retry.

> Procedure v2: Core Assignment Manager
> -
>
> Key: HBASE-14614
> URL: https://issues.apache.org/jira/browse/HBASE-14614
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Stephen Yuan Jiang
>Assignee: Matteo Bertozzi
> Fix For: 2.0.0
>
> Attachments: HBASE-14614.master.003.patch, 
> HBASE-14614.master.004.patch, HBASE-14614.master.005.patch, 
> HBASE-14614.master.006.patch, HBASE-14614.master.007.patch, 
> HBASE-14614.master.008.patch, HBASE-14614.master.009.patch, 
> HBASE-14614.master.010.patch, HBASE-14614.master.012.patch, 
> HBASE-14614.master.013.patch, HBASE-14614.master.014.patch, 
> HBASE-14614.master.015.patch, HBASE-14614.master.017.patch, 
> HBASE-14614.master.018.patch, HBASE-14614.master.019.patch, 
> HBASE-14614.master.020.patch, HBASE-14614.master.022.patch, 
> HBASE-14614.master.023.patch, HBASE-14614.master.024.patch, 
> HBASE-14614.master.025.patch, HBASE-14614.master.026.patch, 
> HBASE-14614.master.027.patch, HBASE-14614.master.028.patch, 
> HBASE-14614.master.029.patch, HBASE-14614.master.030.patch, 
> HBASE-14614.master.033.patch, HBASE-14614.master.038.patch, 
> HBASE-14614.master.039.patch, HBASE-14614.master.040.patch, 
> HBASE-14614.master.041.patch, HBASE-14614.master.042.patch, 
> HBASE-14614.master.043.patch, HBASE-14614.master.044.patch, 
> HBASE-14614.master.045.patch, HBASE-14614.master.045.patch, 
> HBASE-14614.master.046.patch, HBASE-14614.master.047.patch, 
> HBASE-14614.master.048.patch, HBASE-14614.master.049.patch, 
> HBASE-14614.master.050.patch
>
>
> New AssignmentManager implemented using proc-v2.
>  - AssignProcedure handle assignment operation
>  - UnassignProcedure handle unassign operation
>  - MoveRegionProcedure handle move/balance operation
> Concurrent Assign operations are batched together and sent to the balancer
> Concurrent Assign and Unassign operation ready to be sent to the RS are 
> batched together
> This patch is an intermediate state where we add the new AM as 
> AssignmentManager2() to the master, to be reached by tests. but the new AM 
> will not be integrated with the rest of the system. Only new am unit-tests 
> will exercise the new assigment manager. The integration with the master code 
> is part of HBASE-14616



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-16392) Backup delete fault tolerance

2017-05-31 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031939#comment-16031939
 ] 

Vladimir Rodionov edited comment on HBASE-16392 at 5/31/17 9:22 PM:


That is not feasible. Will require rewriting partially BackupAdminImpl and 
adding additional methods to this class exclusively for testing.

The existing unit test does the following

# Full backup of table T
# Snapshot of backup system table - S
# Delete backup T 
# Restore backup system from S
# Verify that we have 1 backup in history
# Manual modification of backup system table to emulate failed delete operation 
- adding delete row with list of backup ids
# Runs repair tool
# Verifies that we have 0 backup sessions in a history


 


was (Author: vrodionov):
OK
 

> Backup delete fault tolerance
> -
>
> Key: HBASE-16392
> URL: https://issues.apache.org/jira/browse/HBASE-16392
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: HBASE-16392-v1.patch, HBASE-16392-v2.patch
>
>
> Backup delete modified file system and backup system table. We have to make 
> sure that operation is atomic, durable and isolated.
> Delete operation:
> # Start backup session (this guarantees) that system will be blocked for all 
> backup commands during delete operation
> # Save list of tables being deleted to system table
> # Before delete operation we take backup system table snapshot  
> # During delete operation we detect any failures and restore backup system 
> table from snapshot, then finish backup session
> # To guarantee consistency of the data, delete operation MUST be repeated
> # We guarantee that all file delete operations are idempotent, can be 
> repeated multiple times
> # Any backup operations will be blocked until consistency is restored
> # To restore consistency, repair command must be executed.
> # Repair command checks if there is failed delete op in a backup system 
> table, and repeats delete operation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18137) Replication gets stuck for empty WALs

2017-05-31 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-18137:

Priority: Critical  (was: Major)

> Replication gets stuck for empty WALs
> -
>
> Key: HBASE-18137
> URL: https://issues.apache.org/jira/browse/HBASE-18137
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
>
> Replication assumes that only the last WAL of a recovered queue can be empty. 
> But, intermittent DFS issues may cause empty WALs being created (without the 
> PWAL magic), and a roll of WAL to happen without a regionserver crash. This 
> will cause recovered queues to have empty WALs in the middle. This cause 
> replication to get stuck:
> {code}
> TRACE regionserver.ReplicationSource: Opening log 
> WARN regionserver.ReplicationSource: - Got: 
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1829)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1843)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:70)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.initReader(SequenceFileLogReader.java:177)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:66)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:312)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {code}
> The WAL in question was completely empty but there were other WALs in the 
> recovered queue which were newer and non-empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18137) Replication gets stuck for empty WALs

2017-05-31 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-18137:

Fix Version/s: 1.2.7
   1.1.11
   1.3.2
   1.4.0
   2.0.0

> Replication gets stuck for empty WALs
> -
>
> Key: HBASE-18137
> URL: https://issues.apache.org/jira/browse/HBASE-18137
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.3.1
>Reporter: Ashu Pachauri
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
>
> Replication assumes that only the last WAL of a recovered queue can be empty. 
> But, intermittent DFS issues may cause empty WALs being created (without the 
> PWAL magic), and a roll of WAL to happen without a regionserver crash. This 
> will cause recovered queues to have empty WALs in the middle. This cause 
> replication to get stuck:
> {code}
> TRACE regionserver.ReplicationSource: Opening log 
> WARN regionserver.ReplicationSource: - Got: 
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1829)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1843)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:70)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:168)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.initReader(SequenceFileLogReader.java:177)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:66)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:312)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:276)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:264)
>   at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:423)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:830)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.run(ReplicationSource.java:572)
> {code}
> The WAL in question was completely empty but there were other WALs in the 
> recovered queue which were newer and non-empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14614) Procedure v2: Core Assignment Manager

2017-05-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031964#comment-16031964
 ] 

Hadoop QA commented on HBASE-14614:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 108 new or modified 
test files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
31s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 18s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 
16s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
45s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 7s 
{color} | {color:red} hbase-protocol-shaded in master has 24 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
35s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 655 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 31s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 2m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 6s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s 
{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 47s 
{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s 
{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s 
{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 47s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 149m 22s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 45s 
{color} | {color:green} hbase-rsgroup in the 

[jira] [Commented] (HBASE-16392) Backup delete fault tolerance

2017-05-31 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031939#comment-16031939
 ] 

Vladimir Rodionov commented on HBASE-16392:
---

OK
 

> Backup delete fault tolerance
> -
>
> Key: HBASE-16392
> URL: https://issues.apache.org/jira/browse/HBASE-16392
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: HBASE-16392-v1.patch, HBASE-16392-v2.patch
>
>
> Backup delete modified file system and backup system table. We have to make 
> sure that operation is atomic, durable and isolated.
> Delete operation:
> # Start backup session (this guarantees) that system will be blocked for all 
> backup commands during delete operation
> # Save list of tables being deleted to system table
> # Before delete operation we take backup system table snapshot  
> # During delete operation we detect any failures and restore backup system 
> table from snapshot, then finish backup session
> # To guarantee consistency of the data, delete operation MUST be repeated
> # We guarantee that all file delete operations are idempotent, can be 
> repeated multiple times
> # Any backup operations will be blocked until consistency is restored
> # To restore consistency, repair command must be executed.
> # Repair command checks if there is failed delete op in a backup system 
> table, and repeats delete operation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17825) Backup: further optimizations

2017-05-31 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov updated HBASE-17825:
--
Priority: Critical  (was: Major)

> Backup: further optimizations
> -
>
> Key: HBASE-17825
> URL: https://issues.apache.org/jira/browse/HBASE-17825
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> Some phases of backup and restore can be optimized:
> # WALPlayer support for multiple tables
> # Run DistCp once per all tables during backup/restore
> The eventual goal:
> # 2 M/R jobs per backup/restore



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17826) Backup: submitting M/R job to a particular Yarn queue

2017-05-31 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov updated HBASE-17826:
--
Priority: Critical  (was: Major)

> Backup: submitting M/R job to a particular Yarn queue
> -
>
> Key: HBASE-17826
> URL: https://issues.apache.org/jira/browse/HBASE-17826
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
>
> We need this to be configurable. Currently, all M/R jobs are submitted to a 
> default queue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HBASE-15993) Regex support in table names

2017-05-31 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov resolved HBASE-15993.
---
Resolution: Won't Fix

> Regex support in table names
> 
>
> Key: HBASE-15993
> URL: https://issues.apache.org/jira/browse/HBASE-15993
> Project: HBase
>  Issue Type: Bug
>Reporter: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
>
> Add support for regular expression in table names in backup/restore/set 
> operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16392) Backup delete fault tolerance

2017-05-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031933#comment-16031933
 ] 

Ted Yu commented on HBASE-16392:


There're several potential points of failure mentioned during review.
Can you add more test(s) exercising these points of failure ?

Thanks

> Backup delete fault tolerance
> -
>
> Key: HBASE-16392
> URL: https://issues.apache.org/jira/browse/HBASE-16392
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: HBASE-16392-v1.patch, HBASE-16392-v2.patch
>
>
> Backup delete modified file system and backup system table. We have to make 
> sure that operation is atomic, durable and isolated.
> Delete operation:
> # Start backup session (this guarantees) that system will be blocked for all 
> backup commands during delete operation
> # Save list of tables being deleted to system table
> # Before delete operation we take backup system table snapshot  
> # During delete operation we detect any failures and restore backup system 
> table from snapshot, then finish backup session
> # To guarantee consistency of the data, delete operation MUST be repeated
> # We guarantee that all file delete operations are idempotent, can be 
> repeated multiple times
> # Any backup operations will be blocked until consistency is restored
> # To restore consistency, repair command must be executed.
> # Repair command checks if there is failed delete op in a backup system 
> table, and repeats delete operation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HBASE-16391) Multiple backup/restore sessions support

2017-05-31 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov resolved HBASE-16391.
---
Resolution: Later

> Multiple backup/restore sessions support
> 
>
> Key: HBASE-16391
> URL: https://issues.apache.org/jira/browse/HBASE-16391
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: HBASE-7912
>Reporter: Vladimir Rodionov
> Fix For: HBASE-7912
>
>
> Multiple simultaneous sessions support for backup/restore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17748) Include HBase Snapshots in Space Quotas

2017-05-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031931#comment-16031931
 ] 

Hadoop QA commented on HBASE-17748:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 13 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
34s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
41s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
46s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
28s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 37s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 36s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 36s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 1m 34s 
{color} | {color:red} The patch causes 20 errors with Hadoop v2.6.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 3m 1s 
{color} | {color:red} The patch causes 20 errors with Hadoop v2.6.2. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 4m 23s 
{color} | {color:red} The patch causes 20 errors with Hadoop v2.6.3. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 5m 42s 
{color} | {color:red} The patch causes 20 errors with Hadoop v2.6.4. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 7m 4s 
{color} | {color:red} The patch causes 20 errors with Hadoop v2.6.5. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 8m 32s 
{color} | {color:red} The patch causes 20 errors with Hadoop v2.7.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 9m 53s 
{color} | {color:red} The patch causes 20 errors with Hadoop v2.7.2. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 11m 12s 
{color} | {color:red} The patch causes 20 errors with Hadoop v2.7.3. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 12m 33s 
{color} | {color:red} The patch causes 20 errors with Hadoop v3.0.0-alpha2. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 29s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 30s 
{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 4s 
{color} | {color:green} hbase-client in 

[jira] [Resolved] (HBASE-17824) Add test for multiple RS per host support

2017-05-31 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov resolved HBASE-17824.
---
Resolution: Won't Fix

> Add test for multiple RS per host support
> -
>
> Key: HBASE-17824
> URL: https://issues.apache.org/jira/browse/HBASE-17824
> Project: HBase
>  Issue Type: Test
>Reporter: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031919#comment-16031919
 ] 

Hadoop QA commented on HBASE-17707:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
36s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 24s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
28s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 8s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
57m 37s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 216m 8s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
43s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 300m 48s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.master.balancer.TestStochasticLoadBalancer2 
|
| Timed out junit tests | 
org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer |
|   | org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870588/HBASE-17707-14.patch |
| JIRA Issue | HBASE-17707 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 1e9b0c75cd32 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / dda9ae0 |
| Default Java | 1.8.0_131 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7024/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/7024/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7024/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7024/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was 

[jira] [Comment Edited] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031917#comment-16031917
 ] 

Kahlil Oppenheimer edited comment on HBASE-17707 at 5/31/17 8:38 PM:
-

We do not use read replica in our clusters. That being said, I believe these 
changes should still function properly with read replicas enabled. The only 
issues we encountered formerly were that the table skew cost could actually 
exceed the region replica cost, causing multiple region replicas to be hosted 
on the same region. This, however, is not an issue with the table skew changes, 
but an issue with the fact that region replicas are enforced as a soft 
constraint via the cost function, rather than as a hard constraint in the 
balancer logic. I believe that by adjusting the region replica cost logic to 
scale better to large cluster sizes (as I did in this patch), I think we 
mitigate this issue.


was (Author: kahliloppenheimer):
We do not use read replica in our clusters. That being said, I believe these 
changes should still function properly with read replicas enabled. The only 
issues we encountered formerly were that the table skew cost could actually 
exceed the region replica cost, causing multiple region replicas to be hosted 
on the same region. This, however, is not an issue with the table skew changes, 
but an issue with the fact that region replicas are enforced as a soft 
constraint via the cost function, rather than a hard constraint. I believe that 
by adjusting the region replica cost logic to scale better to large cluster 
sizes (as I did in this patch), I think we mitigate this issue.

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> HBASE-17707-14.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031917#comment-16031917
 ] 

Kahlil Oppenheimer commented on HBASE-17707:


We do not use read replica in our clusters. That being said, I believe these 
changes should still function properly with read replicas enabled. The only 
issues we encountered formerly were that the table skew cost could actually 
exceed the region replica cost, causing multiple region replicas to be hosted 
on the same region. This, however, is not an issue with my changes, but an 
issue with the fact that region replicas are enforced as a soft constraint via 
the cost function, rather than a hard constraint. I believe that by adjusting 
the region replica cost logic to scale better to large cluster sizes (as I did 
in this patch), I think we mitigate this issue.

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> HBASE-17707-14.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031917#comment-16031917
 ] 

Kahlil Oppenheimer edited comment on HBASE-17707 at 5/31/17 8:38 PM:
-

We do not use read replica in our clusters. That being said, I believe these 
changes should still function properly with read replicas enabled. The only 
issues we encountered formerly were that the table skew cost could actually 
exceed the region replica cost, causing multiple region replicas to be hosted 
on the same region. This, however, is not an issue with the table skew changes, 
but an issue with the fact that region replicas are enforced as a soft 
constraint via the cost function, rather than a hard constraint. I believe that 
by adjusting the region replica cost logic to scale better to large cluster 
sizes (as I did in this patch), I think we mitigate this issue.


was (Author: kahliloppenheimer):
We do not use read replica in our clusters. That being said, I believe these 
changes should still function properly with read replicas enabled. The only 
issues we encountered formerly were that the table skew cost could actually 
exceed the region replica cost, causing multiple region replicas to be hosted 
on the same region. This, however, is not an issue with my changes, but an 
issue with the fact that region replicas are enforced as a soft constraint via 
the cost function, rather than a hard constraint. I believe that by adjusting 
the region replica cost logic to scale better to large cluster sizes (as I did 
in this patch), I think we mitigate this issue.

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> HBASE-17707-14.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16392) Backup delete fault tolerance

2017-05-31 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031897#comment-16031897
 ] 

Vladimir Rodionov commented on HBASE-16392:
---

This is because Repair command does not call Delete command, but directly 
BackupAdmin API deleteBackups.

You can search:
{code}
REPAIR status: no failed sessions found. Checking failed delete backup 
operation ...
Found failed delete operation for: backup_1496261671372
{code}

> Backup delete fault tolerance
> -
>
> Key: HBASE-16392
> URL: https://issues.apache.org/jira/browse/HBASE-16392
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: HBASE-16392-v1.patch, HBASE-16392-v2.patch
>
>
> Backup delete modified file system and backup system table. We have to make 
> sure that operation is atomic, durable and isolated.
> Delete operation:
> # Start backup session (this guarantees) that system will be blocked for all 
> backup commands during delete operation
> # Save list of tables being deleted to system table
> # Before delete operation we take backup system table snapshot  
> # During delete operation we detect any failures and restore backup system 
> table from snapshot, then finish backup session
> # To guarantee consistency of the data, delete operation MUST be repeated
> # We guarantee that all file delete operations are idempotent, can be 
> repeated multiple times
> # Any backup operations will be blocked until consistency is restored
> # To restore consistency, repair command must be executed.
> # Repair command checks if there is failed delete op in a backup system 
> table, and repeats delete operation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16261) MultiHFileOutputFormat Enhancement

2017-05-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031858#comment-16031858
 ] 

Hadoop QA commented on HBASE-16261:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 24s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
41s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
44s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
55s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
28m 23s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 114m 59s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 173m 45s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.13.1 Server=1.13.1 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870600/HBase-16261-V9.patch |
| JIRA Issue | HBASE-16261 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux af5b9782d3bf 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / dda9ae0 |
| Default Java | 1.8.0_131 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7026/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7026/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



>  MultiHFileOutputFormat Enhancement 
> 
>
> Key: HBASE-16261
> URL: https://issues.apache.org/jira/browse/HBASE-16261
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbase, mapreduce
>Affects Versions: 2.0.0
>Reporter: Yi Liang
>Assignee: Yi Liang
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16261-V1.patch, 

[jira] [Updated] (HBASE-17748) Include HBase Snapshots in Space Quotas

2017-05-31 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-17748:
---
Fix Version/s: 2.0.0
   Status: Patch Available  (was: Open)

> Include HBase Snapshots in Space Quotas
> ---
>
> Key: HBASE-17748
> URL: https://issues.apache.org/jira/browse/HBASE-17748
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 2.0.0
>
> Attachments: HBASE-17748.001.patch
>
>
> Umbrella issue for the inclusion of HBase Snapshots in the Space Quota 
> feature (HBASE-16961)
> https://docs.google.com/document/d/1f7utThEBYRXYHvp3e5fOhQBv2K1aeuzGHGEfNNE3Clc/edit#
>  / 
> https://home.apache.org/~elserj/hbase/FileSystemQuotasforApacheHBase-Snapshots.pdf



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17748) Include HBase Snapshots in Space Quotas

2017-05-31 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-17748:
---
Attachment: HBASE-17748.001.patch

.001 consolidating all of the smaller work into a single patch for review. Not 
included is documentation and some hbase shell work.

> Include HBase Snapshots in Space Quotas
> ---
>
> Key: HBASE-17748
> URL: https://issues.apache.org/jira/browse/HBASE-17748
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Josh Elser
>Assignee: Josh Elser
> Attachments: HBASE-17748.001.patch
>
>
> Umbrella issue for the inclusion of HBase Snapshots in the Space Quota 
> feature (HBASE-16961)
> https://docs.google.com/document/d/1f7utThEBYRXYHvp3e5fOhQBv2K1aeuzGHGEfNNE3Clc/edit#
>  / 
> https://home.apache.org/~elserj/hbase/FileSystemQuotasforApacheHBase-Snapshots.pdf



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031824#comment-16031824
 ] 

Ted Yu commented on HBASE-17707:


Do you use read replica in your cluster ?

Thanks

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> HBASE-17707-14.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031812#comment-16031812
 ] 

Kahlil Oppenheimer commented on HBASE-17707:


[~tedyu] Additionally, we have been running this version of the balancer at 
HubSpot on all of our production and QA clusters for a few months now and have 
seen better results with table skew and no issues otherwise. Please let me know 
if there are still issues you'd like me to address.

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> HBASE-17707-14.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17748) Include HBase Snapshots in Space Quotas

2017-05-31 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031780#comment-16031780
 ] 

Josh Elser commented on HBASE-17748:


The work listed in HBASE-17749, HBASE-17750, HBASE-17753, and HBASE-17833 are 
all fairly small. I think it would be more work reviewing the partial-picture 
in these than just doing it in one commit. I'm rebase'ing and going to attach 
at patch for the aggregate work here.

> Include HBase Snapshots in Space Quotas
> ---
>
> Key: HBASE-17748
> URL: https://issues.apache.org/jira/browse/HBASE-17748
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Josh Elser
>Assignee: Josh Elser
>
> Umbrella issue for the inclusion of HBase Snapshots in the Space Quota 
> feature (HBASE-16961)
> https://docs.google.com/document/d/1f7utThEBYRXYHvp3e5fOhQBv2K1aeuzGHGEfNNE3Clc/edit#
>  / 
> https://home.apache.org/~elserj/hbase/FileSystemQuotasforApacheHBase-Snapshots.pdf



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17753) Update QuotaObserverChore to include computed snapshot sizes

2017-05-31 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-17753:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

Rolling up changes into HBASE-17748

> Update QuotaObserverChore to include computed snapshot sizes
> 
>
> Key: HBASE-17753
> URL: https://issues.apache.org/jira/browse/HBASE-17753
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Josh Elser
>Assignee: Josh Elser
> Attachments: HBASE-17753.001.HBASE-17748.patch
>
>
> Need to update QuotaObserverChore to include the new snapshot size 
> computations that were implemented in HBASE-17749 so that the quota 
> utilizations are accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17833) Flakey TestSpaceQuotasWithSnapshots

2017-05-31 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-17833:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

Rolling up changes into HBASE-17748

> Flakey TestSpaceQuotasWithSnapshots
> ---
>
> Key: HBASE-17833
> URL: https://issues.apache.org/jira/browse/HBASE-17833
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Josh Elser
>Assignee: Josh Elser
> Attachments: HBASE-17833.001.patch, HBASE-17833.002.patch
>
>
> I've started seeing some intermittent unit test failures in this class.
> After digging in, it seems like the expected usage size is ~10KB different 
> than what we want. The tests wait to see the region size reports (physical 
> size of the table) to reach a certain range, but it seems like sometimes 
> there is a small change after the test already decided what the final "size" 
> was.
> Should be able to stabilize the test by waiting for the region size report to 
> be constant over a few iterations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17750) Update RS Chore that computes Region sizes to avoid double-counting on rematerialized tables

2017-05-31 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-17750:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

Rolling up changes into HBASE-17748

> Update RS Chore that computes Region sizes to avoid double-counting on 
> rematerialized tables
> 
>
> Key: HBASE-17750
> URL: https://issues.apache.org/jira/browse/HBASE-17750
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Josh Elser
>Assignee: Josh Elser
> Attachments: HBASE-17750.001.patch
>
>
> When a table is restored from a snapshot, it will reference files that are 
> also referenced by the Snapshot (and potentially the source table). We need 
> to make sure that these restored tables do not also "count" the size of those 
> files as it would make the actual FS utilization incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17749) Create Master Chore to compute the size of each Snapshot against a table with a quota

2017-05-31 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-17749:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

Rolling up changes into HBASE-17748

> Create Master Chore to compute the size of each Snapshot against a table with 
> a quota
> -
>
> Key: HBASE-17749
> URL: https://issues.apache.org/jira/browse/HBASE-17749
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Josh Elser
>Assignee: Josh Elser
> Attachments: HBASE-17749.001.HBASE-16961.patch
>
>
> See design doc in the umbrella (HBASE-17748) as to how the size of a snapshot 
> is defined.
> For each table that has a quota (either set directly or "inherited" from its 
> namespace), we need to compute the size of that snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18129) truncate_preserve fails when the truncate method doesn't exists on the master

2017-05-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031743#comment-16031743
 ] 

Hudson commented on HBASE-18129:


FAILURE: Integrated in Jenkins build HBase-HBASE-14614 #257 (See 
[https://builds.apache.org/job/HBase-HBASE-14614/257/])
HBASE-18129 truncate_preserve fails when the truncate method doesn't (tedyu: 
rev dda9ae02959a9c27bb805e83749adf4a2d3d38bd)
* (edit) hbase-shell/src/test/ruby/hbase/admin_test.rb
* (edit) hbase-shell/src/main/ruby/hbase/admin.rb


> truncate_preserve fails when the truncate method doesn't exists on the master
> -
>
> Key: HBASE-18129
> URL: https://issues.apache.org/jira/browse/HBASE-18129
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 2.0.0, 1.2.5
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-18129-branch-1.patch, 
> HBASE-18129-branch-1-v1.patch.patch, HBASE-18129-branch-1-v2.patch, 
> HBASE-18129-branch-1-v3.patch, HBASE-18129-master.patch, 
> HBASE-18129-master-v1.patch
>
>
> Recently, I runs a rolling upgrade from HBase 0.98.x to HBase 1.2.5. During 
> the master hasn't been upgraded yet, I truncate a table by the command 
> truncate_preserve of 1.2.5, but failed.
> {code}
> hbase(main):001:0> truncate_preserve 'cf_logs'
> Truncating 'cf_logs' table (it may take a while):
>  - Disabling table...
>  - Truncating table...
>  - Dropping table...
>  - Creating table with region boundaries...
> ERROR: no method 'createTable' for arguments 
> (org.apache.hadoop.hbase.HTableDescriptor,org.jruby.java.proxies.ArrayJavaProxy)
>  on Java::OrgApacheHadoopHbaseClient::HBaseAdmin
> {code}
> After checking code and commit history, I found it's HBASE-12833 which causes 
> this bug.so we should fix it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14614) Procedure v2: Core Assignment Manager

2017-05-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031744#comment-16031744
 ] 

Hudson commented on HBASE-14614:


FAILURE: Integrated in Jenkins build HBase-HBASE-14614 #257 (See 
[https://builds.apache.org/job/HBase-HBASE-14614/257/])
HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi) (stack: 
rev e76e8eb8e1a68cad50d36f2ea358b6dc5fad4843)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredStochasticBalancer.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/MockMasterServices.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/namespace/TestNamespaceAuditor.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/TestMobExportSnapshot.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestTableFavoredNodes.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionState.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestModifyTableProcedure.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/procedure/SimpleMasterProcedureManager.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeRequest.java
* (edit) 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/util/DelayedUtil.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
* (edit) 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureScheduler.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestModifyColumnFamilyProcedure.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/MockRegionServerServices.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestCreateTableProcedure.java
* (edit) 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/AbstractProcedureScheduler.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/TableStateManager.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureSchedulerPerformanceEvaluation.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestTableDDLProcedureBase.java
* (edit) 
hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/AssignmentManagerStatusTmpl.jamon
* (edit) 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestBlockEvictionFromClient.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestMasterProcedureWalLease.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/mob/MobFileCache.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterDumpServlet.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/client/VersionInfoUtil.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestAssignmentManager.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestFavoredStochasticLoadBalancer.java
* (edit) 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestTruncateTableProcedure.java
* (edit) 
hbase-protocol-shaded/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/generated/QuotaProtos.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterOperationsForRegionReplicas.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSecureAsyncWALReplay.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java
* (delete) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/DispatchMergingRegionsProcedure.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestDeleteTableProcedure.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/NoSuchProcedureException.java
* (edit) 

[jira] [Commented] (HBASE-18122) Scanner id should include ServerName of region server

2017-05-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031740#comment-16031740
 ] 

Hudson commented on HBASE-18122:


FAILURE: Integrated in Jenkins build HBase-HBASE-14614 #257 (See 
[https://builds.apache.org/job/HBase-HBASE-14614/257/])
HBASE-18122 Scanner id should include ServerName of region server (yangzhe1991: 
rev 9cf1a08c53b4d3416e10c2b725cbc3abda07590c)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScannerIdGenerator.java
HBASE-18122 addendum for fixing async client scanner (yangzhe1991: rev 
c945d2b2d9dc2b9223819842239997efa10f4c0b)
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRpcRetryingCallerFactory.java


> Scanner id should include ServerName of region server
> -
>
> Key: HBASE-18122
> URL: https://issues.apache.org/jira/browse/HBASE-18122
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.4.0, 1.3.1, 1.1.10, 1.2.6
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
> Attachments: addendum.patch, HBASE-18122.v01.patch, 
> HBASE-18122.v02.patch, HBASE-18122.v03.patch, HBASE-18122.v04.patch
>
>
> Now the scanner id is a long number from 1 to max in a region server. Each 
> new scanner will have a scanner id.
> If a client has a scanner whose id is x, when the RS restart and the scanner 
> id is also incremented to x or a little larger, there will be a scanner id 
> collision.
> So the scanner id should now be same during each time the RS restart. We can 
> add the start timestamp as the highest several bits in scanner id uint64.
> And because HBASE-18121 is not easy to fix and there are many clients with 
> old version. We can also encode server host:port into the scanner id.
> So we can use ServerName.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17988) get-active-master.rb and draining_servers.rb no longer work

2017-05-31 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-17988:

Fix Version/s: 1.2.7
   1.1.11
   1.3.2
   1.4.0

> get-active-master.rb and draining_servers.rb no longer work
> ---
>
> Key: HBASE-17988
> URL: https://issues.apache.org/jira/browse/HBASE-17988
> Project: HBase
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 2.0.0
>Reporter: Mike Drob
>Assignee: Chinmay Kulkarni
>Priority: Critical
> Fix For: 2.0.0, 1.4.0, 1.3.2, 1.1.11, 1.2.7
>
>
> The scripts {{bin/get-active-master.rb}} and {{bin/draining_servers.rb}} no 
> longer work on current master branch. Here is an example error message:
> {noformat}
> $ bin/hbase-jruby bin/get-active-master.rb 
> NoMethodError: undefined method `masterAddressZNode' for 
> #
>at bin/get-active-master.rb:35
> {noformat}
> My initial probing suggests that this is likely due to movement that happened 
> in HBASE-16690. Perhaps instead of reworking the ruby, there is similar Java 
> functionality already existing somewhere.
> Putting priority at critical since it's impossible to know whether users rely 
> on the scripts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17988) get-active-master.rb and draining_servers.rb no longer work

2017-05-31 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031623#comment-16031623
 ] 

Sean Busbey commented on HBASE-17988:
-

I suspect this broke in 0.98.

{code}
Busbey-MBA:hbase busbey$ git grep masterAddressZNode
bin/get-active-master.rb:  master_address = ZKUtil.getData(zk, 
zk.masterAddressZNode)
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java:
  private String masterAddressZNode;
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java:
masterAddressZNode = ZKUtil.joinZNode(baseZNode,
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java:
return this.masterAddressZNode;
Busbey-MBA:hbase busbey$ git grep masterAddressZNode origin/branch-1.2
origin/branch-1.2:bin/get-active-master.rb:  master_address = 
ZKUtil.getData(zk, zk.masterAddressZNode)
origin/branch-1.2:hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java:
  private String masterAddressZNode;
origin/branch-1.2:hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java:
masterAddressZNode = ZKUtil.joinZNode(baseZNode,
origin/branch-1.2:hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java:
return this.masterAddressZNode;
Busbey-MBA:hbase busbey$ git grep masterAddressZNode origin/branch-1.0
origin/branch-1.0:bin/get-active-master.rb:  master_address = 
ZKUtil.getData(zk, zk.masterAddressZNode)
origin/branch-1.0:hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java:
  private String masterAddressZNode;
origin/branch-1.0:hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java:
masterAddressZNode = ZKUtil.joinZNode(baseZNode,
origin/branch-1.0:hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java:
return this.masterAddressZNode;
Busbey-MBA:hbase busbey$ git grep masterAddressZNode origin/0.98
origin/0.98:bin/get-active-master.rb:  master_address = ZKUtil.getData(zk, 
zk.masterAddressZNode)
origin/0.98:hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java:
  private String masterAddressZNode;
origin/0.98:hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java:
masterAddressZNode = ZKUtil.joinZNode(baseZNode,
origin/0.98:hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java:
return this.masterAddressZNode;
Busbey-MBA:hbase busbey$ git grep masterAddressZNode origin/0.94
origin/0.94:bin/get-active-master.rb:  master_address = ZKUtil.getData(zk, 
zk.masterAddressZNode)
origin/0.94:src/main/java/org/apache/hadoop/hbase/MasterAddressTracker.java:
super(watcher, watcher.masterAddressZNode, abortable);
origin/0.94:src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java:
if(path.equals(watcher.masterAddressZNode) && !master.isStopped()) {
origin/0.94:src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java:
if(path.equals(watcher.masterAddressZNode) && !master.isStopped()) {
origin/0.94:src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java:
if(ZKUtil.watchAndCheckExists(watcher, watcher.masterAddressZNode)) {
origin/0.94:src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java:
  this.watcher.masterAddressZNode, this.sn.getVersionedBytes())) {
origin/0.94:src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java:
  ZKUtil.getDataAndWatch(this.watcher, this.watcher.masterAddressZNode);
origin/0.94:src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java:
ZKUtil.deleteNode(this.watcher, this.watcher.masterAddressZNode);
origin/0.94:src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java:
  if (ZKUtil.checkExists(watcher, watcher.masterAddressZNode) >= 0) {
origin/0.94:src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java:
ZKUtil.getDataAndWatch(watcher, watcher.masterAddressZNode);
origin/0.94:src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java:
ZKUtil.deleteNode(watcher, watcher.masterAddressZNode);
origin/0.94:src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java:
  byte[] bytes = ZKUtil.getDataAndWatch(this.watcher, 
this.watcher.masterAddressZNode);
origin/0.94:src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java:
  (node.equals(zkw.masterAddressZNode) == true) ||
origin/0.94:src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java:
  ServerName.parseVersionedServerName(getData(zkw, zkw.masterAddressZNode)));
origin/0.94:src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java:
  public String masterAddressZNode;
origin/0.94:src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java:
masterAddressZNode = 

[jira] [Resolved] (HBASE-15903) Delete Object

2017-05-31 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-15903.

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HBASE-14850

> Delete Object
> -
>
> Key: HBASE-15903
> URL: https://issues.apache.org/jira/browse/HBASE-15903
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sudeep Sunthankar
>Assignee: Ted Yu
> Fix For: HBASE-14850
>
> Attachments: 15903.v2.txt, 15903.v4.txt, 15903.v7.txt, 
> HBASE-15903.HBASE-14850.v1.patch
>
>
> Patch for creating Delete objects. These Delete objects are used by the Table 
> implementation to delete rowkey from a table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18005) read replica: handle the case that region server hosting both primary replica and meta region is down

2017-05-31 Thread huaxiang sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031615#comment-16031615
 ] 

huaxiang sun commented on HBASE-18005:
--

Thanks [~devaraj] for review. I will update the comments for the test case and 
upload a new patch.

> read replica: handle the case that region server hosting both primary replica 
> and meta region is down
> -
>
> Key: HBASE-18005
> URL: https://issues.apache.org/jira/browse/HBASE-18005
> Project: HBase
>  Issue Type: Bug
>Reporter: huaxiang sun
>Assignee: huaxiang sun
> Attachments: HBASE-18005-master-001.patch, 
> HBASE-18005-master-002.patch, HBASE-18005-master-003.patch, 
> HBASE-18005-master-004.patch, HBASE-18005-master-005.patch
>
>
> Identified one corner case in testing  that when the region server hosting 
> both primary replica and the meta region is down, the client tries to reload 
> the primary replica location from meta table, it is supposed to clean up only 
> the cached location for specific replicaId, but it clears caches for all 
> replicas. Please see
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L813
> Since it takes some time for regions to be reassigned (including meta 
> region), the following may throw exception
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L173
> This exception needs to be caught and  it needs to get cached location (in 
> this case, the primary replica's location is not available). If there are 
> cached locations for other replicas, it can still go ahead to get stale 
> values from secondary replicas.
> With meta replica, it still helps to not clean up the caches for all replicas 
> as the info from primary meta replica is up-to-date.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-18140) revisit test categorization of hbase shell tests

2017-05-31 Thread Sean Busbey (JIRA)
Sean Busbey created HBASE-18140:
---

 Summary: revisit test categorization of hbase shell tests
 Key: HBASE-18140
 URL: https://issues.apache.org/jira/browse/HBASE-18140
 Project: HBase
  Issue Type: Task
  Components: shell, test
Reporter: Sean Busbey
Priority: Minor
 Fix For: 2.0.0, 1.4.0


Right now the hbase shell tests are all categorized as large, so they often 
don't run. Many look like they can easily qualify as Medium or Small.

e.g.
{code}
Running org.apache.hadoop.hbase.client.TestShellNoCluster
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 29.673 sec - in 
org.apache.hadoop.hbase.client.TestShellNoCluster
Running org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.178 sec - in 
org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups
Running org.apache.hadoop.hbase.client.TestReplicationShell
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 34.396 sec - in 
org.apache.hadoop.hbase.client.TestReplicationShell
{code}





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-15903) Delete Object

2017-05-31 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031592#comment-16031592
 ] 

Enis Soztutar commented on HBASE-15903:
---

Thanks Ted for the updates. +1. You may need to rebase after HBASE-15602. You 
can go ahead with the commit if changes are trivial and tests are successful. 

> Delete Object
> -
>
> Key: HBASE-15903
> URL: https://issues.apache.org/jira/browse/HBASE-15903
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sudeep Sunthankar
>Assignee: Ted Yu
> Attachments: 15903.v2.txt, 15903.v4.txt, 15903.v7.txt, 
> HBASE-15903.HBASE-14850.v1.patch
>
>
> Patch for creating Delete objects. These Delete objects are used by the Table 
> implementation to delete rowkey from a table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18134) Re-think if the FileSystemUtilizationChore is still necessary

2017-05-31 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031591#comment-16031591
 ] 

Josh Elser commented on HBASE-18134:


Re-reading my notes and found an important detail I missed. The thought process 
is that, after a Region is opened by a RegionServer, all other code paths which 
will change that Region's size on HDFS would be covered by HBASE-18133.

So, instead of having a chore to periodically read+queue-for-send size reports 
for Regions, we just queue a size report after the Region is opened. I believe 
this completely eliminates the need for the FileSystemUtilizationChore. 

> Re-think if the FileSystemUtilizationChore is still necessary
> -
>
> Key: HBASE-18134
> URL: https://issues.apache.org/jira/browse/HBASE-18134
> Project: HBase
>  Issue Type: Task
>Reporter: Josh Elser
>Assignee: Josh Elser
>
> On the heels of HBASE-18133, we need to put some thought into whether or not 
> there are cases in which the RegionServer should still report sizes directly 
> from HDFS.
> The cases I have in mind are primarily in the face of RS failure/restart. 
> Ideally, we could get rid of this chore completely.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-15160) Put back HFile's HDFS op latency sampling code and add metrics for monitoring

2017-05-31 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031580#comment-16031580
 ] 

Enis Soztutar commented on HBASE-15160:
---

bq. Let me check YCSB to make sure no performance regression with it (should be 
since the current patch is quite similar to the one we're running online).
Thanks [~carp84]. Appreciate it. 

> Put back HFile's HDFS op latency sampling code and add metrics for monitoring
> -
>
> Key: HBASE-15160
> URL: https://issues.apache.org/jira/browse/HBASE-15160
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0, 1.1.2
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Critical
> Attachments: HBASE-15160.patch, HBASE-15160_v2.patch, 
> HBASE-15160_v3.patch, hbase-15160_v4.patch, hbase-15160_v5.patch, 
> hbase-15160_v6.patch
>
>
> In HBASE-11586 all HDFS op latency sampling code, including fsReadLatency, 
> fsPreadLatency and fsWriteLatency, have been removed. There was some 
> discussion about putting them back in a new JIRA but never happened. 
> According to our experience, these metrics are useful to judge whether issue 
> lies on HDFS when slow request occurs, so we propose to put them back in this 
> JIRA, and add the metrics for monitoring as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-18005) read replica: handle the case that region server hosting both primary replica and meta region is down

2017-05-31 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031577#comment-16031577
 ] 

Devaraj Das edited comment on HBASE-18005 at 5/31/17 5:43 PM:
--

LGTM [~huaxiang]. Nice work.


was (Author: devaraj):
LGTM [~h...@cloudera.com]. Nice work.

> read replica: handle the case that region server hosting both primary replica 
> and meta region is down
> -
>
> Key: HBASE-18005
> URL: https://issues.apache.org/jira/browse/HBASE-18005
> Project: HBase
>  Issue Type: Bug
>Reporter: huaxiang sun
>Assignee: huaxiang sun
> Attachments: HBASE-18005-master-001.patch, 
> HBASE-18005-master-002.patch, HBASE-18005-master-003.patch, 
> HBASE-18005-master-004.patch, HBASE-18005-master-005.patch
>
>
> Identified one corner case in testing  that when the region server hosting 
> both primary replica and the meta region is down, the client tries to reload 
> the primary replica location from meta table, it is supposed to clean up only 
> the cached location for specific replicaId, but it clears caches for all 
> replicas. Please see
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L813
> Since it takes some time for regions to be reassigned (including meta 
> region), the following may throw exception
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L173
> This exception needs to be caught and  it needs to get cached location (in 
> this case, the primary replica's location is not available). If there are 
> cached locations for other replicas, it can still go ahead to get stale 
> values from secondary replicas.
> With meta replica, it still helps to not clean up the caches for all replicas 
> as the info from primary meta replica is up-to-date.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18005) read replica: handle the case that region server hosting both primary replica and meta region is down

2017-05-31 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031577#comment-16031577
 ] 

Devaraj Das commented on HBASE-18005:
-

LGTM [~h...@cloudera.com]. Nice work.

> read replica: handle the case that region server hosting both primary replica 
> and meta region is down
> -
>
> Key: HBASE-18005
> URL: https://issues.apache.org/jira/browse/HBASE-18005
> Project: HBase
>  Issue Type: Bug
>Reporter: huaxiang sun
>Assignee: huaxiang sun
> Attachments: HBASE-18005-master-001.patch, 
> HBASE-18005-master-002.patch, HBASE-18005-master-003.patch, 
> HBASE-18005-master-004.patch, HBASE-18005-master-005.patch
>
>
> Identified one corner case in testing  that when the region server hosting 
> both primary replica and the meta region is down, the client tries to reload 
> the primary replica location from meta table, it is supposed to clean up only 
> the cached location for specific replicaId, but it clears caches for all 
> replicas. Please see
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L813
> Since it takes some time for regions to be reassigned (including meta 
> region), the following may throw exception
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L173
> This exception needs to be caught and  it needs to get cached location (in 
> this case, the primary replica's location is not available). If there are 
> cached locations for other replicas, it can still go ahead to get stale 
> values from secondary replicas.
> With meta replica, it still helps to not clean up the caches for all replicas 
> as the info from primary meta replica is up-to-date.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-15995) Separate replication WAL reading from shipping

2017-05-31 Thread Vincent Poon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031566#comment-16031566
 ] 

Vincent Poon commented on HBASE-15995:
--

[~zghaobac] Sure, I will try to backport to branch-1, but it will probably take 
some time.

> Separate replication WAL reading from shipping
> --
>
> Key: HBASE-15995
> URL: https://issues.apache.org/jira/browse/HBASE-15995
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Affects Versions: 2.0.0
>Reporter: Vincent Poon
>Assignee: Vincent Poon
> Fix For: 2.0.0
>
> Attachments: HBASE-15995.master.v1.patch, 
> HBASE-15995.master.v2.patch, HBASE-15995.master.v3.patch, 
> HBASE-15995.master.v4.patch, HBASE-15995.master.v6.patch, 
> HBASE-15995.master.v7.patch, replicationV1_100ms_delay.png, 
> replicationV2_100ms_delay.png
>
>
> Currently ReplicationSource reads edits from the WAL and ships them in the 
> same thread.
> By breaking out the reading from the shipping, we can introduce greater 
> parallelism and lay the foundation for further refactoring to a pipelined, 
> streaming model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18097) Save bandwidth on partial_flag_per_result in ScanResponse proto

2017-05-31 Thread Karan Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karan Mehta updated HBASE-18097:

Affects Version/s: (was: 1.3.2)
   1.4.0
   2.0.0

> Save bandwidth on partial_flag_per_result in ScanResponse proto
> ---
>
> Key: HBASE-18097
> URL: https://issues.apache.org/jira/browse/HBASE-18097
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Karan Mehta
>
> Currently the {{ScanResponse}} proto sends out 1 bit per {{Result}} that it 
> has embeds inside the {{CellScanner}} to indicate if it is partial or not. 
> {code}
> // In every RPC response there should be at most a single partial result. 
> Furthermore, if
> // there is a partial result, it is guaranteed to be in the last position 
> of the array.
> {code}
> According to client, only the last result can be partial, thus this repeated 
> bool can be converted to a bool, thus reducing overhead of serialization and 
> deserialization of the array. This will break wire compatibility therefore 
> this is something to look for in upcoming versions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18097) Save bandwidth on partial_flag_per_result in ScanResponse proto

2017-05-31 Thread Karan Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karan Mehta updated HBASE-18097:

Description: 
Currently the {{ScanResponse}} proto sends out 1 bit per {{Result}} that it has 
embeds inside the {{CellScanner}} to indicate if it is partial or not. 
{code}
// In every RPC response there should be at most a single partial result. 
Furthermore, if
// there is a partial result, it is guaranteed to be in the last position 
of the array.
{code}
According to client, only the last result can be partial, thus this repeated 
bool can be converted to a bool, thus reducing overhead of serialization and 
deserialization of the array. This will break wire compatibility therefore this 
is something to look for in upcoming versions.

  was:
Starting version 1.3, HBase automatically closes scanner on server side 
whenever the results are exhausted and corresponding bits are set in the 
{{ScanResponse}} proto returned to the client. We can use that info to 
eliminate the closeScanRequest RPC call, thereby saving 1 RPC per region per 
scan. This can be particularly useful for tables with more regions.

Also, currently the {{ScanResponse}} proto sends out 1 bit per {{Result}} that 
it has embeds inside the {{CellScanner}} to indicate if it is partial or not. 
{code}
// In every RPC response there should be at most a single partial result. 
Furthermore, if
// there is a partial result, it is guaranteed to be in the last position 
of the array.
{code}
According to client, only the last result can be partial, thus this repeated 
bool can be converted to a bool, thus reducing overhead of serialization and 
deserialization of the array. This will break wire compatibility therefore this 
is something to look for in upcoming versions.


> Save bandwidth on partial_flag_per_result in ScanResponse proto
> ---
>
> Key: HBASE-18097
> URL: https://issues.apache.org/jira/browse/HBASE-18097
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.2
>Reporter: Karan Mehta
>
> Currently the {{ScanResponse}} proto sends out 1 bit per {{Result}} that it 
> has embeds inside the {{CellScanner}} to indicate if it is partial or not. 
> {code}
> // In every RPC response there should be at most a single partial result. 
> Furthermore, if
> // there is a partial result, it is guaranteed to be in the last position 
> of the array.
> {code}
> According to client, only the last result can be partial, thus this repeated 
> bool can be converted to a bool, thus reducing overhead of serialization and 
> deserialization of the array. This will break wire compatibility therefore 
> this is something to look for in upcoming versions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18097) Save bandwidth on partial_flag_per_result in ScanResponse proto

2017-05-31 Thread Karan Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karan Mehta updated HBASE-18097:

Summary: Save bandwidth on partial_flag_per_result in ScanResponse proto  
(was: Client can save 1 RPC call for CloseScannerRequest)

> Save bandwidth on partial_flag_per_result in ScanResponse proto
> ---
>
> Key: HBASE-18097
> URL: https://issues.apache.org/jira/browse/HBASE-18097
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.2
>Reporter: Karan Mehta
>
> Starting version 1.3, HBase automatically closes scanner on server side 
> whenever the results are exhausted and corresponding bits are set in the 
> {{ScanResponse}} proto returned to the client. We can use that info to 
> eliminate the closeScanRequest RPC call, thereby saving 1 RPC per region per 
> scan. This can be particularly useful for tables with more regions.
> Also, currently the {{ScanResponse}} proto sends out 1 bit per {{Result}} 
> that it has embeds inside the {{CellScanner}} to indicate if it is partial or 
> not. 
> {code}
> // In every RPC response there should be at most a single partial result. 
> Furthermore, if
> // there is a partial result, it is guaranteed to be in the last position 
> of the array.
> {code}
> According to client, only the last result can be partial, thus this repeated 
> bool can be converted to a bool, thus reducing overhead of serialization and 
> deserialization of the array. This will break wire compatibility therefore 
> this is something to look for in upcoming versions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16196) Update jruby to a newer version.

2017-05-31 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031546#comment-16031546
 ] 

Sean Busbey commented on HBASE-16196:
-

Figured it out. the shell tests are all labeled as LargeTests and the surefire 
configs don't run LargeTests under default configs. so {{mvn -PrunAllTests 
package}} got things.

> Update jruby to a newer version.
> 
>
> Key: HBASE-16196
> URL: https://issues.apache.org/jira/browse/HBASE-16196
> Project: HBase
>  Issue Type: Bug
>  Components: dependencies, shell
>Reporter: Elliott Clark
>Assignee: Mike Drob
>Priority: Critical
> Fix For: 2.0.0, 1.5.0
>
> Attachments: 0001-Update-to-JRuby-9.1.2.0-and-JLine-2.12.patch, 
> hbase-16196.branch-1.patch, hbase-16196.v2.branch-1.patch, 
> hbase-16196.v3.branch-1.patch, hbase-16196.v4.branch-1.patch, 
> HBASE-16196.v5.patch, HBASE-16196.v6.patch, HBASE-16196.v7.patch, 
> HBASE-16196.v8.patch, HBASE-16196.v9.patch
>
>
> Ruby 1.8.7 is no longer maintained.
> The TTY library in the old jruby is bad. The newer one is less bad.
> Since this is only a dependency on the hbase-shell module and not on 
> hbase-client or hbase-server this should be a pretty simple thing that 
> doesn't have any backwards compat issues.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16196) Update jruby to a newer version.

2017-05-31 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031533#comment-16031533
 ] 

Sean Busbey commented on HBASE-16196:
-

I'll probably make the "ruby changed a bunch" to be phrased stronger.

sorry for the delay, I'm trying to confirm the hbase shell tests and having 
trouble getting them to run.

> Update jruby to a newer version.
> 
>
> Key: HBASE-16196
> URL: https://issues.apache.org/jira/browse/HBASE-16196
> Project: HBase
>  Issue Type: Bug
>  Components: dependencies, shell
>Reporter: Elliott Clark
>Assignee: Mike Drob
>Priority: Critical
> Fix For: 2.0.0, 1.5.0
>
> Attachments: 0001-Update-to-JRuby-9.1.2.0-and-JLine-2.12.patch, 
> hbase-16196.branch-1.patch, hbase-16196.v2.branch-1.patch, 
> hbase-16196.v3.branch-1.patch, hbase-16196.v4.branch-1.patch, 
> HBASE-16196.v5.patch, HBASE-16196.v6.patch, HBASE-16196.v7.patch, 
> HBASE-16196.v8.patch, HBASE-16196.v9.patch
>
>
> Ruby 1.8.7 is no longer maintained.
> The TTY library in the old jruby is bad. The newer one is less bad.
> Since this is only a dependency on the hbase-shell module and not on 
> hbase-client or hbase-server this should be a pretty simple thing that 
> doesn't have any backwards compat issues.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-16261) MultiHFileOutputFormat Enhancement

2017-05-31 Thread Yi Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liang updated HBASE-16261:
-
Status: Patch Available  (was: Open)

>  MultiHFileOutputFormat Enhancement 
> 
>
> Key: HBASE-16261
> URL: https://issues.apache.org/jira/browse/HBASE-16261
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbase, mapreduce
>Affects Versions: 2.0.0
>Reporter: Yi Liang
>Assignee: Yi Liang
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16261-V1.patch, HBASE-16261-V2.patch, 
> HBASE-16261-V3.patch, HBASE-16261-V4.patch, HBASE-16261-V5.patch, 
> HBase-16261-V6.patch, HBase-16261-V7.patch, HBase-16261-V8.patch, 
> HBase-16261-V9.patch
>
>
> Change MultiHFileOutputFormat to MultiTableHFileOutputFormat, Continuing work 
> to enhance the MultiTableHFileOutputFormat to make it more usable:
> MultiTableHFileOutputFormat follow HFileOutputFormat2
> (1) HFileOutputFormat2 can read one table's region split keys. and then 
> output multiple hfiles for one family, and each hfile map to one region. We 
> can add partitioner in MultiTableHFileOutputFormat to make it support this 
> feature.
> (2) HFileOutputFormat2 support Customized Compression algorithm for column 
> family and BloomFilter, also support customized DataBlockEncoding for the 
> output hfiles. We can also make MultiTableHFileOutputFormat to support these 
> features.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-16261) MultiHFileOutputFormat Enhancement

2017-05-31 Thread Yi Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liang updated HBASE-16261:
-
Attachment: HBase-16261-V9.patch

>  MultiHFileOutputFormat Enhancement 
> 
>
> Key: HBASE-16261
> URL: https://issues.apache.org/jira/browse/HBASE-16261
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbase, mapreduce
>Affects Versions: 2.0.0
>Reporter: Yi Liang
>Assignee: Yi Liang
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16261-V1.patch, HBASE-16261-V2.patch, 
> HBASE-16261-V3.patch, HBASE-16261-V4.patch, HBASE-16261-V5.patch, 
> HBase-16261-V6.patch, HBase-16261-V7.patch, HBase-16261-V8.patch, 
> HBase-16261-V9.patch
>
>
> Change MultiHFileOutputFormat to MultiTableHFileOutputFormat, Continuing work 
> to enhance the MultiTableHFileOutputFormat to make it more usable:
> MultiTableHFileOutputFormat follow HFileOutputFormat2
> (1) HFileOutputFormat2 can read one table's region split keys. and then 
> output multiple hfiles for one family, and each hfile map to one region. We 
> can add partitioner in MultiTableHFileOutputFormat to make it support this 
> feature.
> (2) HFileOutputFormat2 support Customized Compression algorithm for column 
> family and BloomFilter, also support customized DataBlockEncoding for the 
> output hfiles. We can also make MultiTableHFileOutputFormat to support these 
> features.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-16261) MultiHFileOutputFormat Enhancement

2017-05-31 Thread Yi Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liang updated HBASE-16261:
-
Status: Open  (was: Patch Available)

>  MultiHFileOutputFormat Enhancement 
> 
>
> Key: HBASE-16261
> URL: https://issues.apache.org/jira/browse/HBASE-16261
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbase, mapreduce
>Affects Versions: 2.0.0
>Reporter: Yi Liang
>Assignee: Yi Liang
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16261-V1.patch, HBASE-16261-V2.patch, 
> HBASE-16261-V3.patch, HBASE-16261-V4.patch, HBASE-16261-V5.patch, 
> HBase-16261-V6.patch, HBase-16261-V7.patch, HBase-16261-V8.patch
>
>
> Change MultiHFileOutputFormat to MultiTableHFileOutputFormat, Continuing work 
> to enhance the MultiTableHFileOutputFormat to make it more usable:
> MultiTableHFileOutputFormat follow HFileOutputFormat2
> (1) HFileOutputFormat2 can read one table's region split keys. and then 
> output multiple hfiles for one family, and each hfile map to one region. We 
> can add partitioner in MultiTableHFileOutputFormat to make it support this 
> feature.
> (2) HFileOutputFormat2 support Customized Compression algorithm for column 
> family and BloomFilter, also support customized DataBlockEncoding for the 
> output hfiles. We can also make MultiTableHFileOutputFormat to support these 
> features.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-16261) MultiHFileOutputFormat Enhancement

2017-05-31 Thread Yi Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liang updated HBASE-16261:
-
Attachment: (was: HBase-16261-V9.patch)

>  MultiHFileOutputFormat Enhancement 
> 
>
> Key: HBASE-16261
> URL: https://issues.apache.org/jira/browse/HBASE-16261
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbase, mapreduce
>Affects Versions: 2.0.0
>Reporter: Yi Liang
>Assignee: Yi Liang
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16261-V1.patch, HBASE-16261-V2.patch, 
> HBASE-16261-V3.patch, HBASE-16261-V4.patch, HBASE-16261-V5.patch, 
> HBase-16261-V6.patch, HBase-16261-V7.patch, HBase-16261-V8.patch
>
>
> Change MultiHFileOutputFormat to MultiTableHFileOutputFormat, Continuing work 
> to enhance the MultiTableHFileOutputFormat to make it more usable:
> MultiTableHFileOutputFormat follow HFileOutputFormat2
> (1) HFileOutputFormat2 can read one table's region split keys. and then 
> output multiple hfiles for one family, and each hfile map to one region. We 
> can add partitioner in MultiTableHFileOutputFormat to make it support this 
> feature.
> (2) HFileOutputFormat2 support Customized Compression algorithm for column 
> family and BloomFilter, also support customized DataBlockEncoding for the 
> output hfiles. We can also make MultiTableHFileOutputFormat to support these 
> features.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17959) Canary timeout should be configurable on a per-table basis

2017-05-31 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031503#comment-16031503
 ] 

Andrew Purtell commented on HBASE-17959:


[~ckulkarni] Interested in providing a patch for application to branch-1 as 
well?

> Canary timeout should be configurable on a per-table basis
> --
>
> Key: HBASE-17959
> URL: https://issues.apache.org/jira/browse/HBASE-17959
> Project: HBase
>  Issue Type: Improvement
>  Components: canary
>Reporter: Andrew Purtell
>Assignee: Chinmay Kulkarni
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17959.002.patch, HBASE-17959.003.patch, 
> HBASE-17959.patch
>
>
> The Canary read and write timeouts should be configurable on a per-table 
> basis, for cases where different tables have different latency SLAs. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17959) Canary timeout should be configurable on a per-table basis

2017-05-31 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031501#comment-16031501
 ] 

Andrew Purtell commented on HBASE-17959:


Looks good, committing

> Canary timeout should be configurable on a per-table basis
> --
>
> Key: HBASE-17959
> URL: https://issues.apache.org/jira/browse/HBASE-17959
> Project: HBase
>  Issue Type: Improvement
>  Components: canary
>Reporter: Andrew Purtell
>Assignee: Chinmay Kulkarni
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17959.002.patch, HBASE-17959.003.patch, 
> HBASE-17959.patch
>
>
> The Canary read and write timeouts should be configurable on a per-table 
> basis, for cases where different tables have different latency SLAs. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18138) HBase named read caches

2017-05-31 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031496#comment-16031496
 ] 

Vladimir Rodionov commented on HBASE-18138:
---

Good idea, especially to support multiple tenants.

> HBase named read caches
> ---
>
> Key: HBASE-18138
> URL: https://issues.apache.org/jira/browse/HBASE-18138
> Project: HBase
>  Issue Type: New Feature
>  Components: BlockCache, BucketCache
>Reporter: Biju Nair
>
> Instead of a single read(block) cache, if HBase can support creation of named 
> read caches and use by tables it will help common scenarios like
> - Assigning a chunk of the cache to tables with data which are critical to 
> performance so that they don’t get swapped out due to other less critical 
> table data being read
> - To be able to guarantee a percentage of the cache to tenants in a multi 
> tenant environment by assigning named caches to each tenant



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18023) Log multi-* requests for more than threshold number of rows

2017-05-31 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031483#comment-16031483
 ] 

Josh Elser commented on HBASE-18023:


bq. Are you actively working on this?

"no" is probably the most accurate answer :). It's on my back-burner of things 
to get to.

> Log multi-* requests for more than threshold number of rows
> ---
>
> Key: HBASE-18023
> URL: https://issues.apache.org/jira/browse/HBASE-18023
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Clay B.
>Assignee: Josh Elser
>Priority: Minor
>
> Today, if a user happens to do something like a large multi-put, they can get 
> through request throttling (e.g. it is one request) but still crash a region 
> server with a garbage storm. We have seen regionservers hit this issue and it 
> is silent and deadly. The RS will report nothing more than a mysterious 
> garbage collection and exit out.
> Ideally, we could report a large multi-* request before starting it, in case 
> it happens to be deadly. Knowing the client, user and how many rows are 
> affected would be a good start to tracking down painful users.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-14614) Procedure v2: Core Assignment Manager

2017-05-31 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14614:
--
Attachment: HBASE-14614.master.050.patch

> Procedure v2: Core Assignment Manager
> -
>
> Key: HBASE-14614
> URL: https://issues.apache.org/jira/browse/HBASE-14614
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Stephen Yuan Jiang
>Assignee: Matteo Bertozzi
> Fix For: 2.0.0
>
> Attachments: HBASE-14614.master.003.patch, 
> HBASE-14614.master.004.patch, HBASE-14614.master.005.patch, 
> HBASE-14614.master.006.patch, HBASE-14614.master.007.patch, 
> HBASE-14614.master.008.patch, HBASE-14614.master.009.patch, 
> HBASE-14614.master.010.patch, HBASE-14614.master.012.patch, 
> HBASE-14614.master.013.patch, HBASE-14614.master.014.patch, 
> HBASE-14614.master.015.patch, HBASE-14614.master.017.patch, 
> HBASE-14614.master.018.patch, HBASE-14614.master.019.patch, 
> HBASE-14614.master.020.patch, HBASE-14614.master.022.patch, 
> HBASE-14614.master.023.patch, HBASE-14614.master.024.patch, 
> HBASE-14614.master.025.patch, HBASE-14614.master.026.patch, 
> HBASE-14614.master.027.patch, HBASE-14614.master.028.patch, 
> HBASE-14614.master.029.patch, HBASE-14614.master.030.patch, 
> HBASE-14614.master.033.patch, HBASE-14614.master.038.patch, 
> HBASE-14614.master.039.patch, HBASE-14614.master.040.patch, 
> HBASE-14614.master.041.patch, HBASE-14614.master.042.patch, 
> HBASE-14614.master.043.patch, HBASE-14614.master.044.patch, 
> HBASE-14614.master.045.patch, HBASE-14614.master.045.patch, 
> HBASE-14614.master.046.patch, HBASE-14614.master.047.patch, 
> HBASE-14614.master.048.patch, HBASE-14614.master.049.patch, 
> HBASE-14614.master.050.patch
>
>
> New AssignmentManager implemented using proc-v2.
>  - AssignProcedure handle assignment operation
>  - UnassignProcedure handle unassign operation
>  - MoveRegionProcedure handle move/balance operation
> Concurrent Assign operations are batched together and sent to the balancer
> Concurrent Assign and Unassign operation ready to be sent to the RS are 
> batched together
> This patch is an intermediate state where we add the new AM as 
> AssignmentManager2() to the master, to be reached by tests. but the new AM 
> will not be integrated with the rest of the system. Only new am unit-tests 
> will exercise the new assigment manager. The integration with the master code 
> is part of HBASE-14616



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14614) Procedure v2: Core Assignment Manager

2017-05-31 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031461#comment-16031461
 ] 

stack commented on HBASE-14614:
---

Address review up in rb.

Address test failures.

> Procedure v2: Core Assignment Manager
> -
>
> Key: HBASE-14614
> URL: https://issues.apache.org/jira/browse/HBASE-14614
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Stephen Yuan Jiang
>Assignee: Matteo Bertozzi
> Fix For: 2.0.0
>
> Attachments: HBASE-14614.master.003.patch, 
> HBASE-14614.master.004.patch, HBASE-14614.master.005.patch, 
> HBASE-14614.master.006.patch, HBASE-14614.master.007.patch, 
> HBASE-14614.master.008.patch, HBASE-14614.master.009.patch, 
> HBASE-14614.master.010.patch, HBASE-14614.master.012.patch, 
> HBASE-14614.master.013.patch, HBASE-14614.master.014.patch, 
> HBASE-14614.master.015.patch, HBASE-14614.master.017.patch, 
> HBASE-14614.master.018.patch, HBASE-14614.master.019.patch, 
> HBASE-14614.master.020.patch, HBASE-14614.master.022.patch, 
> HBASE-14614.master.023.patch, HBASE-14614.master.024.patch, 
> HBASE-14614.master.025.patch, HBASE-14614.master.026.patch, 
> HBASE-14614.master.027.patch, HBASE-14614.master.028.patch, 
> HBASE-14614.master.029.patch, HBASE-14614.master.030.patch, 
> HBASE-14614.master.033.patch, HBASE-14614.master.038.patch, 
> HBASE-14614.master.039.patch, HBASE-14614.master.040.patch, 
> HBASE-14614.master.041.patch, HBASE-14614.master.042.patch, 
> HBASE-14614.master.043.patch, HBASE-14614.master.044.patch, 
> HBASE-14614.master.045.patch, HBASE-14614.master.045.patch, 
> HBASE-14614.master.046.patch, HBASE-14614.master.047.patch, 
> HBASE-14614.master.048.patch, HBASE-14614.master.049.patch
>
>
> New AssignmentManager implemented using proc-v2.
>  - AssignProcedure handle assignment operation
>  - UnassignProcedure handle unassign operation
>  - MoveRegionProcedure handle move/balance operation
> Concurrent Assign operations are batched together and sent to the balancer
> Concurrent Assign and Unassign operation ready to be sent to the RS are 
> batched together
> This patch is an intermediate state where we add the new AM as 
> AssignmentManager2() to the master, to be reached by tests. but the new AM 
> will not be integrated with the rest of the system. Only new am unit-tests 
> will exercise the new assigment manager. The integration with the master code 
> is part of HBASE-14616



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18023) Log multi-* requests for more than threshold number of rows

2017-05-31 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031443#comment-16031443
 ] 

Andrew Purtell commented on HBASE-18023:


[~elserj] Are you actively working on this?

> Log multi-* requests for more than threshold number of rows
> ---
>
> Key: HBASE-18023
> URL: https://issues.apache.org/jira/browse/HBASE-18023
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Clay B.
>Assignee: Josh Elser
>Priority: Minor
>
> Today, if a user happens to do something like a large multi-put, they can get 
> through request throttling (e.g. it is one request) but still crash a region 
> server with a garbage storm. We have seen regionservers hit this issue and it 
> is silent and deadly. The RS will report nothing more than a mysterious 
> garbage collection and exit out.
> Ideally, we could report a large multi-* request before starting it, in case 
> it happens to be deadly. Knowing the client, user and how many rows are 
> affected would be a good start to tracking down painful users.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18139) maven-remote-resources-plugin fails with IndexOutOfBoundsException in hbase-assembly

2017-05-31 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031396#comment-16031396
 ] 

Sean Busbey commented on HBASE-18139:
-

if you look at the referenced file, LICENSE.vm line 1678 shows:

{code}
## fail the template. If you're looking at the source LICENSE.vm
## file based on a stacktrace or exception message, you need to find
## the generated LICENSE file that has the actual dependency info printed.
#set($empty = [])
${empty[0]}
{code}

What does the file in {{hbase-assembly/target/}} show at the end? You ought to 
be able to find it with {{find hbase-assembly/target -name 'LICENSE*'}}

> maven-remote-resources-plugin fails with IndexOutOfBoundsException in 
> hbase-assembly
> 
>
> Key: HBASE-18139
> URL: https://issues.apache.org/jira/browse/HBASE-18139
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.3.2
>Reporter: Xiang Li
>Priority: Blocker
>
> The same as HBASE-14199.
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process 
> (aggregate-licenses) on project hbase-assembly: Error rendering velocity 
> resource.: Error invoking method 'get(java.lang.Integer)' in 
> java.util.ArrayList at META-INF/LICENSE.vm[line 1678, column 8]: 
> InvocationTargetException: Index: 0, Size: 0 -> [Help 1]
> {code}
> Fail to run mvn install against the latest branch-1 and branch-1.3, with no 
> additional change.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031382#comment-16031382
 ] 

Kahlil Oppenheimer commented on HBASE-17707:


[~tedyu] I added logic to reset values in HBaseConfig before each test is run. 
One thing I noticed is that some tests would set values in the HBase config 
that would carry over to other tests without being run.

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> HBASE-17707-14.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Status: Patch Available  (was: Open)

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> HBASE-17707-14.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >