[GitHub] [hbase] a516072575 opened a new pull request #367: region_mover.rb should choose same rsgroup servers as target servers

2019-07-08 Thread GitBox
a516072575 opened a new pull request #367: region_mover.rb should choose same 
rsgroup servers as target servers
URL: https://github.com/apache/hbase/pull/367
 
 
   There are many retries when I am using graceful_stop.sh to shutdown region 
server after using regroup, because the target server in a different rsgroup. 
This makes it slow to graceful shutdown a regionserver. So i think that 
region_mover.rb  should only choose same rsgroup servers as target servers.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HBASE-22662) Move RSGroupInfoManager to hbase-server

2019-07-08 Thread kevin su (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880952#comment-16880952
 ] 

kevin su commented on HBASE-22662:
--

I also move RSGroupProtobufUtil to hbase-server, because RSGroupInfoManager 
uses it also.

> Move RSGroupInfoManager to hbase-server
> ---
>
> Key: HBASE-22662
> URL: https://issues.apache.org/jira/browse/HBASE-22662
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: kevin su
>Priority: Major
> Attachments: HBASE-22662.v0.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hbase] a516072575 commented on issue #366: HBASE-22658 region_mover.rb should choose same rsgroup servers as target servers

2019-07-08 Thread GitBox
a516072575 commented on issue #366: HBASE-22658 region_mover.rb should choose 
same rsgroup servers as target servers
URL: https://github.com/apache/hbase/pull/366#issuecomment-509499859
 
 
   I chose the wrong branch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HBASE-22662) Move RSGroupInfoManager to hbase-server

2019-07-08 Thread kevin su (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kevin su reassigned HBASE-22662:


Assignee: kevin su

> Move RSGroupInfoManager to hbase-server
> ---
>
> Key: HBASE-22662
> URL: https://issues.apache.org/jira/browse/HBASE-22662
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: kevin su
>Priority: Major
> Attachments: HBASE-22662.v0.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22662) Move RSGroupInfoManager to hbase-server

2019-07-08 Thread kevin su (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kevin su updated HBASE-22662:
-
Attachment: HBASE-22662.v0.patch

> Move RSGroupInfoManager to hbase-server
> ---
>
> Key: HBASE-22662
> URL: https://issues.apache.org/jira/browse/HBASE-22662
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Priority: Major
> Attachments: HBASE-22662.v0.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hbase] a516072575 closed pull request #366: HBASE-22658 region_mover.rb should choose same rsgroup servers as target servers

2019-07-08 Thread GitBox
a516072575 closed pull request #366: HBASE-22658 region_mover.rb should choose 
same rsgroup servers as target servers
URL: https://github.com/apache/hbase/pull/366
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] a516072575 opened a new pull request #366: HBASE-22658 region_mover.rb should choose same rsgroup servers as target servers

2019-07-08 Thread GitBox
a516072575 opened a new pull request #366: HBASE-22658 region_mover.rb should 
choose same rsgroup servers as target servers
URL: https://github.com/apache/hbase/pull/366
 
 
   There are many retries when i am using graceful_stop.sh to shutdown region 
server after using regroup, because the target server in a different rsgroup. 
This makes it slow to graceful shutdown a regionserver. So I think that 
region_mover.rb  should only choose same rsgroup servers as target servers.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HBASE-22667) [Flush] NPE when region flushs

2019-07-08 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880931#comment-16880931
 ] 

Reid Chan commented on HBASE-22667:
---

It's werid, since null check for memStoreScanners is done in line 855.
{code}
 852   @Override
 853   public void updateReaders(List sfs, List 
memStoreScanners) throws IOException {
 854 if (CollectionUtils.isEmpty(sfs)
 855   && CollectionUtils.isEmpty(memStoreScanners)) {
 856   return;
 857 }
 858 flushLock.lock();
 859 try {
 860   if (this.closing) {
 861 // Lets close scanners created by caller, since close() won't 
notice this.
 862 // memStoreScanners is immutable, so lets create a new list.
 863 clearAndClose(new ArrayList<>(memStoreScanners));
 864 return;
 865   }
{code}

> [Flush] NPE when region flushs
> --
>
> Key: HBASE-22667
> URL: https://issues.apache.org/jira/browse/HBASE-22667
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Reid Chan
>Priority: Critical
>
> {code}
> 2019-07-09 08:02:14,262 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> hostname,16020,1562233574704: Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> namespace:table,963,1562296120996.b8e2f19748d374d192b93f106a0f73b3.
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2646)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2322)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2284)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2170)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2095)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:508)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:478)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at java.util.ArrayList.(ArrayList.java:177)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:863)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1172)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1145)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.access$900(HStore.java:122)
> at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2505)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2600)
> ... 9 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2019-07-08 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880922#comment-16880922
 ] 

Hudson commented on HBASE-20952:


Results for branch HBASE-20952
[build #97 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/97/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/97//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/97//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/97//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22667) [Flush] NPE when region flushs

2019-07-08 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880919#comment-16880919
 ] 

Reid Chan commented on HBASE-22667:
---

I haven't dig out the root cause yet, just file it first.

> [Flush] NPE when region flushs
> --
>
> Key: HBASE-22667
> URL: https://issues.apache.org/jira/browse/HBASE-22667
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Reid Chan
>Priority: Critical
>
> {code}
> 2019-07-09 08:02:14,262 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> hostname,16020,1562233574704: Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> namespace:table,963,1562296120996.b8e2f19748d374d192b93f106a0f73b3.
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2646)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2322)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2284)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2170)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2095)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:508)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:478)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at java.util.ArrayList.(ArrayList.java:177)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:863)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1172)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1145)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.access$900(HStore.java:122)
> at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2505)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2600)
> ... 9 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22667) [Flush] NPE when region flushs

2019-07-08 Thread Reid Chan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reid Chan updated HBASE-22667:
--
Description: 
{code}
2019-07-09 08:02:14,262 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
hostname,16020,1562233574704: Replay of WAL required. Forcing server shutdown
org.apache.hadoop.hbase.DroppedSnapshotException: region: 
namespace:table,963,1562296120996.b8e2f19748d374d192b93f106a0f73b3.
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2646)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2322)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2284)
at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2170)
at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2095)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:508)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:478)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at java.util.ArrayList.(ArrayList.java:177)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:863)
at 
org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1172)
at 
org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1145)
at 
org.apache.hadoop.hbase.regionserver.HStore.access$900(HStore.java:122)
at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2505)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2600)
... 9 more
{code}

  was:
{code}
2019-07-09 08:02:14,262 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
hadoop290.bx.momo.com,16020,1562233574704: Replay of WAL required. Forcing 
server shutdown
org.apache.hadoop.hbase.DroppedSnapshotException: region: 
online:al_user_session_mapping,963,1562296120996.b8e2f19748d374d192b93f106a0f73b3.
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2646)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2322)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2284)
at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2170)
at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2095)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:508)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:478)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at java.util.ArrayList.(ArrayList.java:177)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:863)
at 
org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1172)
at 
org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1145)
at 
org.apache.hadoop.hbase.regionserver.HStore.access$900(HStore.java:122)
at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2505)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2600)
... 9 more
{code}


> [Flush] NPE when region flushs
> --
>
> Key: HBASE-22667
> URL: https://issues.apache.org/jira/browse/HBASE-22667
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Reid Chan
>Priority: Critical
>
> {code}
> 2019-07-09 08:02:14,262 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> hostname,16020,1562233574704: Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> namespace:table,963,1562296120996.b8e2f19748d374d192b93f106a0f73b3.
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2646)
> at 
> 

[jira] [Updated] (HBASE-22667) [Flush] NPE when region flushs

2019-07-08 Thread Reid Chan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reid Chan updated HBASE-22667:
--
Affects Version/s: 1.4.6

> [Flush] NPE when region flushs
> --
>
> Key: HBASE-22667
> URL: https://issues.apache.org/jira/browse/HBASE-22667
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.6
>Reporter: Reid Chan
>Priority: Critical
>
> {code}
> 2019-07-09 08:02:14,262 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> hadoop290.bx.momo.com,16020,1562233574704: Replay of WAL required. Forcing 
> server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> online:al_user_session_mapping,963,1562296120996.b8e2f19748d374d192b93f106a0f73b3.
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2646)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2322)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2284)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2170)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2095)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:508)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:478)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at java.util.ArrayList.(ArrayList.java:177)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:863)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1172)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1145)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.access$900(HStore.java:122)
> at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2505)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2600)
> ... 9 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22667) [Flush] NPE when region flushs

2019-07-08 Thread Reid Chan (JIRA)
Reid Chan created HBASE-22667:
-

 Summary: [Flush] NPE when region flushs
 Key: HBASE-22667
 URL: https://issues.apache.org/jira/browse/HBASE-22667
 Project: HBase
  Issue Type: Bug
Reporter: Reid Chan


{code}
2019-07-09 08:02:14,262 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
hadoop290.bx.momo.com,16020,1562233574704: Replay of WAL required. Forcing 
server shutdown
org.apache.hadoop.hbase.DroppedSnapshotException: region: 
online:al_user_session_mapping,963,1562296120996.b8e2f19748d374d192b93f106a0f73b3.
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2646)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2322)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2284)
at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2170)
at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2095)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:508)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:478)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at java.util.ArrayList.(ArrayList.java:177)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:863)
at 
org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1172)
at 
org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1145)
at 
org.apache.hadoop.hbase.regionserver.HStore.access$900(HStore.java:122)
at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2505)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2600)
... 9 more
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hbase] openinx opened a new pull request #365: HBASE-22663 The HeapAllocationRatio in WebUI is not accurate because almost all of the heap allocation will happen in another separated allocat

2019-07-08 Thread GitBox
openinx opened a new pull request #365: HBASE-22663 The HeapAllocationRatio in 
WebUI is not accurate because almost all of the heap allocation will happen in 
another separated allocator named HEAP
URL: https://github.com/apache/hbase/pull/365
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HBASE-22665) RegionServer abort failed when AbstractFSWAL.shutdown hang

2019-07-08 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880897#comment-16880897
 ] 

Duo Zhang commented on HBASE-22665:
---

{quote}
Duo Zhang, would you think we should clean out AsyncFSWAL.unackedAppends when 
handling sync failed?
{quote}

We haven't done this? IIRC we will move the entries from unackedAppends to 
toWriteAppends in syncFailed...

> RegionServer abort failed when AbstractFSWAL.shutdown hang
> --
>
> Key: HBASE-22665
> URL: https://issues.apache.org/jira/browse/HBASE-22665
> Project: HBase
>  Issue Type: Bug
> Environment: HBase 2.1.2
> Hadoop 3.1.x
> centos 7.4
>Reporter: Yechao Chen
>Priority: Major
> Attachments: image-2019-07-08-16-07-37-664.png, 
> image-2019-07-08-16-08-26-777.png, image-2019-07-08-16-14-43-455.png, 
> jstack_20190625, jstack_20190704_1, jstack_20190704_2, rs.log.part1
>
>
> We use hbase 2.1.2,when the rs with heavy qps and rs abort with error like 
> "Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to 
> get sync result after 30 ms for txid=36380334, WAL system stuck?"
>  
> RegionServer aborted failed when AbstractFSWAL.shutdown hang
>  
> jstack info always show the regionserver hang with "AbstractFSWAL.shutdown"
> "regionserver/hbase-slave-216-99:16020" #25 daemon prio=5 os_prio=0 
> tid=0x7f204282c600 nid=0x34aa waiting on condition [0x7f0fe044d000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7f18a49b2bb8> (a 
> java.util.concurrent.locks.ReentrantLock$FairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
>  {color:#FF}at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285){color}
> {color:#FF} at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:815){color}
>  at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.shutdown(AbstractFSWALProvider.java:168)
>  at 
> org.apache.hadoop.hbase.wal.RegionGroupingProvider.shutdown(RegionGroupingProvider.java:221)
>  at org.apache.hadoop.hbase.wal.WALFactory.shutdown(WALFactory.java:239)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.shutdownWAL(HRegionServer.java:1445)
>  {color:#FF}at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1117){color}
> {color:#FF} at java.lang.Thread.run(Thread.java:745){color}
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-22665) RegionServer abort failed when AbstractFSWAL.shutdown hang

2019-07-08 Thread Yechao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880403#comment-16880403
 ] 

Yechao Chen edited comment on HBASE-22665 at 7/9/19 3:26 AM:
-

Thanks for reply [~Apache9]  [~wchevreuil]

 
{quote}So the RS process hangs forever and never completes shutdown?
{quote}

Yes,the RS process hangs forever and never finish shutdown before kill it 

bq. In this case, shouldn't we had handled that on AsyncFSWAL.syncFailed? I 
guess that would allow waitForSafePoint to finish, and, consequently, 
rollWriter. which would release rollWriterLock for shutdown.
bq. 
bq. Yechao Chen, would you have the full RS log covering the period when this 
was observed?

the log has uploaded, about the time "16:11:15.074" (the log date time was lost 
by error log  config  ,just ignore it) 
the log print error "HRegionServer - * ABORTING region server... "
before that ,there is a lot of wal "Slow sync" or "sync failed" ,and "Large 
batch operation detected " and gc pause is long (aoout 10 seconds many times),  
data hotspot in the region  of this rs.


was (Author: chenyechao):
Thanks for reply [~Apache9][~wchevreuil]

 
{quote}So the RS process hangs forever and never completes shutdown?
{quote}

Yes,the RS process hangs forever and never finish shutdown before kill it 

bq. In this case, shouldn't we had handled that on AsyncFSWAL.syncFailed? I 
guess that would allow waitForSafePoint to finish, and, consequently, 
rollWriter. which would release rollWriterLock for shutdown.
bq. 
bq. Yechao Chen, would you have the full RS log covering the period when this 
was observed?

the log has uploaded, about the time "16:11:15.074" (the log date time was lost 
by error log  config  ,just ignore it) 
the log print error "HRegionServer - * ABORTING region server... "
before that ,there is a lot of wal "Slow sync" or "sync failed" ,and "Large 
batch operation detected " and gc pause is long (aoout 10 seconds many times),  
data hotspot in the region  of this rs.

> RegionServer abort failed when AbstractFSWAL.shutdown hang
> --
>
> Key: HBASE-22665
> URL: https://issues.apache.org/jira/browse/HBASE-22665
> Project: HBase
>  Issue Type: Bug
> Environment: HBase 2.1.2
> Hadoop 3.1.x
> centos 7.4
>Reporter: Yechao Chen
>Priority: Major
> Attachments: image-2019-07-08-16-07-37-664.png, 
> image-2019-07-08-16-08-26-777.png, image-2019-07-08-16-14-43-455.png, 
> jstack_20190625, jstack_20190704_1, jstack_20190704_2, rs.log.part1
>
>
> We use hbase 2.1.2,when the rs with heavy qps and rs abort with error like 
> "Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to 
> get sync result after 30 ms for txid=36380334, WAL system stuck?"
>  
> RegionServer aborted failed when AbstractFSWAL.shutdown hang
>  
> jstack info always show the regionserver hang with "AbstractFSWAL.shutdown"
> "regionserver/hbase-slave-216-99:16020" #25 daemon prio=5 os_prio=0 
> tid=0x7f204282c600 nid=0x34aa waiting on condition [0x7f0fe044d000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7f18a49b2bb8> (a 
> java.util.concurrent.locks.ReentrantLock$FairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
>  {color:#FF}at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285){color}
> {color:#FF} at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:815){color}
>  at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.shutdown(AbstractFSWALProvider.java:168)
>  at 
> org.apache.hadoop.hbase.wal.RegionGroupingProvider.shutdown(RegionGroupingProvider.java:221)
>  at org.apache.hadoop.hbase.wal.WALFactory.shutdown(WALFactory.java:239)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.shutdownWAL(HRegionServer.java:1445)
>  {color:#FF}at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1117){color}
> {color:#FF} at java.lang.Thread.run(Thread.java:745){color}
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22514) Move rsgroup feature into core of HBase

2019-07-08 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880890#comment-16880890
 ] 

Hudson commented on HBASE-22514:


Results for branch HBASE-22514
[build #1 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-22514/1/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-22514/1//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-22514/1//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-22514/1//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Move rsgroup feature into core of HBase
> ---
>
> Key: HBASE-22514
> URL: https://issues.apache.org/jira/browse/HBASE-22514
> Project: HBase
>  Issue Type: Umbrella
>  Components: Admin, Client, rsgroup
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Major
> Attachments: HBASE-22514.master.001.patch, 
> image-2019-05-31-18-25-38-217.png
>
>
> The class RSGroupAdminClient is not public 
> we need to use java api  RSGroupAdminClient  to manager RSG 
> so  RSGroupAdminClient should be public
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22664) Move protobuf stuff in hbase-rsgroup to hbase-protocol-shaded

2019-07-08 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-22664:
--
Component/s: Protobufs

> Move protobuf stuff in hbase-rsgroup to hbase-protocol-shaded
> -
>
> Key: HBASE-22664
> URL: https://issues.apache.org/jira/browse/HBASE-22664
> Project: HBase
>  Issue Type: Sub-task
>  Components: Protobufs, Region Assignment, rsgroup
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: HBASE-22514
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-22664) Move protobuf stuff in hbase-rsgroup to hbase-protocol-shaded

2019-07-08 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-22664.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

Pushed to branch HBASE-22514.

Thanks [~zghaobac] for reviewing.

> Move protobuf stuff in hbase-rsgroup to hbase-protocol-shaded
> -
>
> Key: HBASE-22664
> URL: https://issues.apache.org/jira/browse/HBASE-22664
> Project: HBase
>  Issue Type: Sub-task
>  Components: Protobufs, Region Assignment, rsgroup
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: HBASE-22514
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22664) Move protobuf stuff in hbase-rsgroup to hbase-protocol-shaded

2019-07-08 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-22664:
--
Component/s: rsgroup
 Region Assignment

> Move protobuf stuff in hbase-rsgroup to hbase-protocol-shaded
> -
>
> Key: HBASE-22664
> URL: https://issues.apache.org/jira/browse/HBASE-22664
> Project: HBase
>  Issue Type: Sub-task
>  Components: Region Assignment, rsgroup
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: HBASE-22514
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22664) Move protobuf stuff in hbase-rsgroup to hbase-protocol-shaded

2019-07-08 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-22664:
--
Fix Version/s: HBASE-22514

> Move protobuf stuff in hbase-rsgroup to hbase-protocol-shaded
> -
>
> Key: HBASE-22664
> URL: https://issues.apache.org/jira/browse/HBASE-22664
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: HBASE-22514
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hbase] Apache9 merged pull request #362: HBASE-22664 Move protobuf stuff in hbase-rsgroup to hbase-protocol-sh…

2019-07-08 Thread GitBox
Apache9 merged pull request #362: HBASE-22664 Move protobuf stuff in 
hbase-rsgroup to hbase-protocol-sh…
URL: https://github.com/apache/hbase/pull/362
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HBASE-21751) WAL creation fails during region open may cause region assign forever fail

2019-07-08 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880844#comment-16880844
 ] 

Duo Zhang commented on HBASE-21751:
---

Ping [~allan163] [~luffy123].

> WAL creation fails during region open may cause region assign forever fail
> --
>
> Key: HBASE-21751
> URL: https://issues.apache.org/jira/browse/HBASE-21751
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.2, 2.0.4
>Reporter: Allan Yang
>Assignee: Bing Xiao
>Priority: Major
> Fix For: 2.3.0, 2.2.1, 2.1.6
>
> Attachments: HBASE-21751-branch-2.1-v1.patch, 
> HBASE-21751-branch-2.1-v2.patch, HBASE-21751-branch-2.1-v3.patch, 
> HBASE-21751.patch, HBASE-21751.v2.patch, HBASE-21751.v3.patch, 
> HBASE-21751v2.patch
>
>
> During the first region opens on the RS, WALFactory will create a WAL file, 
> but if the wal creation fails, in some cases, HDFS will leave a empty file in 
> the dir(e.g. disk full, file is created succesfully but block allocation 
> fails). We have a check in AbstractFSWAL that if WAL belong to the same 
> factory exists, then a error will be throw. Thus, the region can never be 
> open on this RS later.
> {code:java}
> 2019-01-17 02:15:53,320 ERROR [RS_OPEN_META-regionserver/server003:16020-0] 
> handler.OpenRegionHandler(301): Failed open of region=hbase:meta,,1.1588230740
> java.io.IOException: Target WAL already exists within directory 
> hdfs://cluster/hbase/WALs/server003.hbase.hostname.com,16020,1545269815888
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.(AbstractFSWAL.java:382)
> at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.(AsyncFSWAL.java:210)
> at 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:72)
> at 
> org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:47)
> at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:138)
> at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:57)
> at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:264)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:2085)
> at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284)
> at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hbase] jatsakthi commented on a change in pull request #354: HBASE-20368 Fix RIT stuck when a rsgroup has no online servers but AM…

2019-07-08 Thread GitBox
jatsakthi commented on a change in pull request #354: HBASE-20368 Fix RIT stuck 
when a rsgroup has no online servers but AM…
URL: https://github.com/apache/hbase/pull/354#discussion_r301347251
 
 

 ##
 File path: 
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsKillRS.java
 ##
 @@ -131,7 +143,84 @@ public boolean evaluate() throws Exception {
 });
 
 ServerName targetServer1 = getServerName(newServers.iterator().next());
-Assert.assertEquals(1, admin.getRegions(targetServer1).size());
-Assert.assertEquals(tableName, 
admin.getRegions(targetServer1).get(0).getTable());
+assertEquals(1, admin.getRegions(targetServer1).size());
+assertEquals(tableName, admin.getRegions(targetServer1).get(0).getTable());
+  }
+
+  @Test
+  public void testKillAllRSInGroup() throws Exception {
+// create a rsgroup and move one regionserver to it
+String groupName = "my_group";
+int groupRSCount = 2;
+addGroup(groupName, groupRSCount);
+
+// create a table, and move it to my_group
+Table t = TEST_UTIL.createMultiRegionTable(tableName, Bytes.toBytes("f"), 
5);
+TEST_UTIL.loadTable(t, Bytes.toBytes("f"));
+Set toAddTables = new HashSet<>();
+toAddTables.add(tableName);
+rsGroupAdmin.moveTables(toAddTables, groupName);
+
assertTrue(rsGroupAdmin.getRSGroupInfo(groupName).getTables().contains(tableName));
+TEST_UTIL.waitTableAvailable(tableName, 3);
+
+// check my_group servers and table regions
+Set servers = rsGroupAdmin.getRSGroupInfo(groupName).getServers();
+assertEquals(2, servers.size());
+LOG.debug("group servers {}", servers);
+for (RegionInfo tr :
+
master.getAssignmentManager().getRegionStates().getRegionsOfTable(tableName)) {
+  assertTrue(servers.contains(
+  
master.getAssignmentManager().getRegionStates().getRegionAssignments()
+  .get(tr).getAddress()));
+}
+
+// move all table regions on one group server to another
+// these codes are aimed to make 'lastHost' in my_group
+// and check if table regions are online
+List gsn = new ArrayList<>();
+for(Address addr : servers){
+  gsn.add(getServerName(addr));
+}
+assertEquals(2, gsn.size());
+for(Map.Entry entry :
+
master.getAssignmentManager().getRegionStates().getRegionAssignments().entrySet()){
+  if(entry.getKey().getTable().equals(tableName)){
+LOG.debug("move region {}", entry.getKey().getRegionNameAsString());
+TEST_UTIL.moveRegionAndWait(entry.getKey(), gsn.get(1 - 
gsn.indexOf(entry.getValue(;
+  }
+}
+TEST_UTIL.waitTableAvailable(tableName, 3);
+
+// case 1: stop all the regionservers in my_group, and restart a 
regionserver in my_group,
+// and then check if all table regions are online
+for(Address addr : rsGroupAdmin.getRSGroupInfo(groupName).getServers()) {
+  TEST_UTIL.getMiniHBaseCluster().stopRegionServer(getServerName(addr));
+}
+// better wait for a while for region reassign
+sleep(1);
+assertEquals(NUM_SLAVES_BASE - gsn.size(),
+TEST_UTIL.getMiniHBaseCluster().getLiveRegionServerThreads().size());
+TEST_UTIL.getMiniHBaseCluster().startRegionServer(gsn.get(0).getHostname(),
+gsn.get(0).getPort());
+assertEquals(NUM_SLAVES_BASE - gsn.size() + 1,
+TEST_UTIL.getMiniHBaseCluster().getLiveRegionServerThreads().size());
+TEST_UTIL.waitTableAvailable(tableName, 3);
+
+// case 2: stop all the regionservers in my_group, and move another
+// regionserver(in 'default' group) to my_group, and then check if all 
table regions are online
 
 Review comment:
   // regionserver(from the 'default' group)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] jatsakthi commented on a change in pull request #354: HBASE-20368 Fix RIT stuck when a rsgroup has no online servers but AM…

2019-07-08 Thread GitBox
jatsakthi commented on a change in pull request #354: HBASE-20368 Fix RIT stuck 
when a rsgroup has no online servers but AM…
URL: https://github.com/apache/hbase/pull/354#discussion_r301347171
 
 

 ##
 File path: 
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsKillRS.java
 ##
 @@ -131,7 +143,84 @@ public boolean evaluate() throws Exception {
 });
 
 ServerName targetServer1 = getServerName(newServers.iterator().next());
-Assert.assertEquals(1, admin.getRegions(targetServer1).size());
-Assert.assertEquals(tableName, 
admin.getRegions(targetServer1).get(0).getTable());
+assertEquals(1, admin.getRegions(targetServer1).size());
+assertEquals(tableName, admin.getRegions(targetServer1).get(0).getTable());
+  }
+
+  @Test
+  public void testKillAllRSInGroup() throws Exception {
+// create a rsgroup and move one regionserver to it
+String groupName = "my_group";
+int groupRSCount = 2;
+addGroup(groupName, groupRSCount);
+
+// create a table, and move it to my_group
+Table t = TEST_UTIL.createMultiRegionTable(tableName, Bytes.toBytes("f"), 
5);
+TEST_UTIL.loadTable(t, Bytes.toBytes("f"));
+Set toAddTables = new HashSet<>();
+toAddTables.add(tableName);
+rsGroupAdmin.moveTables(toAddTables, groupName);
+
assertTrue(rsGroupAdmin.getRSGroupInfo(groupName).getTables().contains(tableName));
+TEST_UTIL.waitTableAvailable(tableName, 3);
+
+// check my_group servers and table regions
+Set servers = rsGroupAdmin.getRSGroupInfo(groupName).getServers();
+assertEquals(2, servers.size());
+LOG.debug("group servers {}", servers);
+for (RegionInfo tr :
+
master.getAssignmentManager().getRegionStates().getRegionsOfTable(tableName)) {
+  assertTrue(servers.contains(
+  
master.getAssignmentManager().getRegionStates().getRegionAssignments()
+  .get(tr).getAddress()));
+}
+
+// move all table regions on one group server to another
+// these codes are aimed to make 'lastHost' in my_group
+// and check if table regions are online
+List gsn = new ArrayList<>();
+for(Address addr : servers){
+  gsn.add(getServerName(addr));
+}
+assertEquals(2, gsn.size());
+for(Map.Entry entry :
+
master.getAssignmentManager().getRegionStates().getRegionAssignments().entrySet()){
+  if(entry.getKey().getTable().equals(tableName)){
+LOG.debug("move region {}", entry.getKey().getRegionNameAsString());
 
 Review comment:
   LOG.debug("move region {} from {} to {}", 
entry.getKey().getRegionNameAsString(), fromServer, toServer);


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] jatsakthi commented on a change in pull request #354: HBASE-20368 Fix RIT stuck when a rsgroup has no online servers but AM…

2019-07-08 Thread GitBox
jatsakthi commented on a change in pull request #354: HBASE-20368 Fix RIT stuck 
when a rsgroup has no online servers but AM…
URL: https://github.com/apache/hbase/pull/354#discussion_r301347039
 
 

 ##
 File path: 
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsKillRS.java
 ##
 @@ -131,7 +143,84 @@ public boolean evaluate() throws Exception {
 });
 
 ServerName targetServer1 = getServerName(newServers.iterator().next());
-Assert.assertEquals(1, admin.getRegions(targetServer1).size());
-Assert.assertEquals(tableName, 
admin.getRegions(targetServer1).get(0).getTable());
+assertEquals(1, admin.getRegions(targetServer1).size());
+assertEquals(tableName, admin.getRegions(targetServer1).get(0).getTable());
+  }
+
+  @Test
+  public void testKillAllRSInGroup() throws Exception {
+// create a rsgroup and move one regionserver to it
+String groupName = "my_group";
+int groupRSCount = 2;
+addGroup(groupName, groupRSCount);
+
+// create a table, and move it to my_group
+Table t = TEST_UTIL.createMultiRegionTable(tableName, Bytes.toBytes("f"), 
5);
+TEST_UTIL.loadTable(t, Bytes.toBytes("f"));
+Set toAddTables = new HashSet<>();
+toAddTables.add(tableName);
+rsGroupAdmin.moveTables(toAddTables, groupName);
+
assertTrue(rsGroupAdmin.getRSGroupInfo(groupName).getTables().contains(tableName));
+TEST_UTIL.waitTableAvailable(tableName, 3);
+
+// check my_group servers and table regions
+Set servers = rsGroupAdmin.getRSGroupInfo(groupName).getServers();
+assertEquals(2, servers.size());
+LOG.debug("group servers {}", servers);
+for (RegionInfo tr :
+
master.getAssignmentManager().getRegionStates().getRegionsOfTable(tableName)) {
+  assertTrue(servers.contains(
+  
master.getAssignmentManager().getRegionStates().getRegionAssignments()
+  .get(tr).getAddress()));
+}
+
+// move all table regions on one group server to another
 
 Review comment:
   //Swap the region locations (i.e. rs1 regions to rs2 & vice versa)
   // these codes are ...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] jatsakthi commented on a change in pull request #354: HBASE-20368 Fix RIT stuck when a rsgroup has no online servers but AM…

2019-07-08 Thread GitBox
jatsakthi commented on a change in pull request #354: HBASE-20368 Fix RIT stuck 
when a rsgroup has no online servers but AM…
URL: https://github.com/apache/hbase/pull/354#discussion_r301346868
 
 

 ##
 File path: 
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsKillRS.java
 ##
 @@ -131,7 +143,84 @@ public boolean evaluate() throws Exception {
 });
 
 ServerName targetServer1 = getServerName(newServers.iterator().next());
-Assert.assertEquals(1, admin.getRegions(targetServer1).size());
-Assert.assertEquals(tableName, 
admin.getRegions(targetServer1).get(0).getTable());
+assertEquals(1, admin.getRegions(targetServer1).size());
+assertEquals(tableName, admin.getRegions(targetServer1).get(0).getTable());
+  }
+
+  @Test
+  public void testKillAllRSInGroup() throws Exception {
+// create a rsgroup and move one regionserver to it
 
 Review comment:
   *two regionservers ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (HBASE-22623) Add RegionObserver coprocessor hook for preWALAppend

2019-07-08 Thread Geoffrey Jacoby (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880747#comment-16880747
 ] 

Geoffrey Jacoby edited comment on HBASE-22623 at 7/8/19 10:08 PM:
--

As an alternative to adding yet-another-new-coprocessor hook, or changing the 
behavior of an old one, how about this? Complete HBASE-18127, which adds a 
notion of OperationContext to the ObserverContext for the mutation pipeline. 
Then add a new field to the OperationContext, WALAttributeMap, which would then 
allow for _any_ coprocessor hook (before WAL committal) to annotate the WALKey. 
Then the WAL writing code would just read from the OperationContext's 
WALAttributeMap and annotate the WALKey appropriately. 

No publicly overridable code would ever get anywhere near a protobuf, and the 
only interface that would change would be the HBASE-18127 OperationContext one 
that has never been released. (This would leave as future work the more general 
cases of HBASE-18127, such as passing information from a RegionObserver to a 
WALObserver, which prevents use of thread local storage.) 

[~apurtell] [~anoop.hbase][~abhishek.chouhan][~stack]


was (Author: gjacoby):
As an alternative to adding yet-another-new-coprocessor hook, or changing the 
behavior of an old one, how about this? Complete HBASE-18127, which adds a 
notion of OperationContext to the ObserverContext for the mutation pipeline. 
Then add a new field to the OperationContext, WALAttributeMap, which would then 
allow for _any_ coprocessor hook (before WAL committal) to annotate the WALKey. 
Then the WAL writing code would just read from the OperationContext and 
annotate the WALKey appropriately. 

No publicly overridable code would ever get anywhere near a protobuf, and the 
only interface that would change would be the HBASE-18127 OperationContext one 
that has never been released. (This would leave as future work the more general 
cases of HBASE-18127, such as passing information from a RegionObserver to a 
WALObserver, which prevents use of thread local storage.) 

[~apurtell] [~anoop.hbase][~abhishek.chouhan][~stack]

> Add RegionObserver coprocessor hook for preWALAppend
> 
>
> Key: HBASE-22623
> URL: https://issues.apache.org/jira/browse/HBASE-22623
> Project: HBase
>  Issue Type: New Feature
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>Priority: Major
>
> While many coprocessor hooks expose the WALEdit to implementing coprocs, 
> there aren't any that expose the WALKey before it's created and added to the 
> WALEntry. 
> It's sometimes useful for coprocessors to be able to edit the WALKey, for 
> example to add extended attributes using the fields to be added in 
> HBASE-22622. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22623) Add RegionObserver coprocessor hook for preWALAppend

2019-07-08 Thread Geoffrey Jacoby (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880747#comment-16880747
 ] 

Geoffrey Jacoby commented on HBASE-22623:
-

As an alternative to adding yet-another-new-coprocessor hook, or changing the 
behavior of an old one, how about this? Complete HBASE-18127, which adds a 
notion of OperationContext to the ObserverContext for the mutation pipeline. 
Then add a new field to the OperationContext, WALAttributeMap, which would then 
allow for _any_ coprocessor hook (before WAL committal) to annotate the WALKey. 
Then the WAL writing code would just read from the OperationContext and 
annotate the WALKey appropriately. 

No publicly overridable code would ever get anywhere near a protobuf, and the 
only interface that would change would be the HBASE-18127 OperationContext one 
that has never been released. (This would leave as future work the more general 
cases of HBASE-18127, such as passing information from a RegionObserver to a 
WALObserver, which prevents use of thread local storage.) 

[~apurtell] [~anoop.hbase][~abhishek.chouhan][~stack]

> Add RegionObserver coprocessor hook for preWALAppend
> 
>
> Key: HBASE-22623
> URL: https://issues.apache.org/jira/browse/HBASE-22623
> Project: HBase
>  Issue Type: New Feature
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>Priority: Major
>
> While many coprocessor hooks expose the WALEdit to implementing coprocs, 
> there aren't any that expose the WALKey before it's created and added to the 
> WALEntry. 
> It's sometimes useful for coprocessors to be able to edit the WALKey, for 
> example to add extended attributes using the fields to be added in 
> HBASE-22622. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hbase] gjacoby126 commented on issue #352: HBASE-22622 - WALKey Extended Attributes

2019-07-08 Thread GitBox
gjacoby126 commented on issue #352: HBASE-22622 - WALKey Extended Attributes
URL: https://github.com/apache/hbase/pull/352#issuecomment-509390346
 
 
   @apurtell @saintstack @Apache9 - adding extended attributes to WALKey as 
discussed in HBASE-22622. I wasn't able to use a protobuf map class because the 
"public" protobuf version doesn't support it so it wouldn't be backwards 
compatible, so I went back to @apurtell 's original suggestion of a repeated 
key/value attribute. Keys are strings rather than bytes because Java uses 
reference equality for hashmaps with byte[] keys, and it seemed better than 
littering the code with ByteBuffers or ImmutableBytesWritable everywhere.  
   
   I've fixed the checkstyle issues, and the test failures appear to be 
flapping replication tests (each run gives different results even when only 
trivial formatting changes have been made)
   
   Related coprocessor changes will be in HBASE-22623, unless the community 
feels it would be better to consolidate the two patches. 
   
   Thanks for taking a look!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] Apache-HBase commented on issue #352: HBASE-22622 - WALKey Extended Attributes

2019-07-08 Thread GitBox
Apache-HBase commented on issue #352: HBASE-22622 - WALKey Extended Attributes
URL: https://github.com/apache/hbase/pull/352#issuecomment-509382880
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 25 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | hbaseanti | 0 |  Patch does not have any anti-patterns. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ master Compile Tests _ |
   | 0 | mvndep | 14 | Maven dependency ordering for branch |
   | +1 | mvninstall | 248 | master passed |
   | +1 | compile | 117 | master passed |
   | +1 | checkstyle | 88 | master passed |
   | +1 | shadedjars | 275 | branch has no errors when building our shaded 
downstream artifacts. |
   | +1 | findbugs | 434 | master passed |
   | +1 | javadoc | 52 | master passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 16 | Maven dependency ordering for patch |
   | +1 | mvninstall | 242 | the patch passed |
   | +1 | compile | 108 | the patch passed |
   | +1 | cc | 108 | the patch passed |
   | +1 | javac | 108 | the patch passed |
   | +1 | checkstyle | 87 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedjars | 265 | patch has no errors when building our shaded 
downstream artifacts. |
   | +1 | hadoopcheck | 735 | Patch does not cause any errors with Hadoop 2.8.5 
2.9.2 or 3.1.2. |
   | +1 | hbaseprotoc | 107 | the patch passed |
   | +1 | findbugs | 464 | the patch passed |
   | +1 | javadoc | 58 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 36 | hbase-protocol-shaded in the patch passed. |
   | +1 | unit | 25 | hbase-protocol in the patch passed. |
   | -1 | unit | 8844 | hbase-server in the patch failed. |
   | +1 | asflicense | 90 | The patch does not generate ASF License warnings. |
   | | | 12668 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hbase.replication.TestReplicationSyncUpTool |
   |   | hadoop.hbase.replication.TestReplicationSmallTests |
   |   | 
hadoop.hbase.replication.multiwal.TestReplicationEndpointWithMultipleWAL |
   |   | 
hadoop.hbase.replication.multiwal.TestReplicationEndpointWithMultipleAsyncWAL |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-352/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/352 |
   | Optional Tests |  dupname  asflicense  cc  unit  hbaseprotoc  javac  
javadoc  findbugs  shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
   | uname | Linux fed590497168 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | /testptch/patchprocess/precommit/personality/provided.sh |
   | git revision | master / 605f8a15bb |
   | maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
   | Default Java | 1.8.0_181 |
   | findbugs | v3.1.11 |
   | unit | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-352/3/artifact/out/patch-unit-hbase-server.txt
 |
   |  Test Results | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-352/3/testReport/
 |
   | Max. process+thread count | 5004 (vs. ulimit of 1) |
   | modules | C: hbase-protocol-shaded hbase-protocol hbase-server U: . |
   | Console output | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-352/3/console |
   | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HBASE-22665) RegionServer abort failed when AbstractFSWAL.shutdown hang

2019-07-08 Thread Wellington Chevreuil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880645#comment-16880645
 ] 

Wellington Chevreuil commented on HBASE-22665:
--

Thanks [~chenyechao]!

My theory here is that the sync failure from "16:06:06" shown in the error 
below let to _AsyncFSWAL.unackedAppends_ size be left indefinitely > 0, as 
[_AsyncFSWAL.syncFailed_|https://github.com/apache/hbase/blob/branch-2.1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AsyncFSWAL.java#L291]
 never clears it out. Then, when _LogRoller_ triggers the log roll, reaching 
[_AsyncFSWAL.waitForSafePoint_|https://github.com/apache/hbase/blob/branch-2.1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AsyncFSWAL.java#L627],
 which triggers new [consumer 
thread|https://github.com/apache/hbase/blob/branch-2.1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AsyncFSWAL.java#L487]
 that will awake back the awaiting condition only if 
_AsyncFSWAL.unackedAppends_ is empty.

[~Apache9], would you think we should clean out _AsyncFSWAL.unackedAppends_ 
when handling sync failed? Or maybe add extra check in this 
_[AsyncFSWAL.consume|https://github.com/apache/hbase/blob/branch-2.1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AsyncFSWAL.java#L487]_
 method?

> RegionServer abort failed when AbstractFSWAL.shutdown hang
> --
>
> Key: HBASE-22665
> URL: https://issues.apache.org/jira/browse/HBASE-22665
> Project: HBase
>  Issue Type: Bug
> Environment: HBase 2.1.2
> Hadoop 3.1.x
> centos 7.4
>Reporter: Yechao Chen
>Priority: Major
> Attachments: image-2019-07-08-16-07-37-664.png, 
> image-2019-07-08-16-08-26-777.png, image-2019-07-08-16-14-43-455.png, 
> jstack_20190625, jstack_20190704_1, jstack_20190704_2, rs.log.part1
>
>
> We use hbase 2.1.2,when the rs with heavy qps and rs abort with error like 
> "Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to 
> get sync result after 30 ms for txid=36380334, WAL system stuck?"
>  
> RegionServer aborted failed when AbstractFSWAL.shutdown hang
>  
> jstack info always show the regionserver hang with "AbstractFSWAL.shutdown"
> "regionserver/hbase-slave-216-99:16020" #25 daemon prio=5 os_prio=0 
> tid=0x7f204282c600 nid=0x34aa waiting on condition [0x7f0fe044d000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7f18a49b2bb8> (a 
> java.util.concurrent.locks.ReentrantLock$FairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
>  {color:#FF}at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285){color}
> {color:#FF} at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:815){color}
>  at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.shutdown(AbstractFSWALProvider.java:168)
>  at 
> org.apache.hadoop.hbase.wal.RegionGroupingProvider.shutdown(RegionGroupingProvider.java:221)
>  at org.apache.hadoop.hbase.wal.WALFactory.shutdown(WALFactory.java:239)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.shutdownWAL(HRegionServer.java:1445)
>  {color:#FF}at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1117){color}
> {color:#FF} at java.lang.Thread.run(Thread.java:745){color}
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hbase] Apache-HBase commented on issue #322: HBASE-22586 Javadoc Warnings related to @param tag

2019-07-08 Thread GitBox
Apache-HBase commented on issue #322: HBASE-22586 Javadoc Warnings related to 
@param tag
URL: https://github.com/apache/hbase/pull/322#issuecomment-509343430
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 179 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | hbaseanti | 0 |  Patch does not have any anti-patterns. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 7 new or modified test 
files. |
   ||| _ master Compile Tests _ |
   | 0 | mvndep | 24 | Maven dependency ordering for branch |
   | +1 | mvninstall | 260 | master passed |
   | +1 | compile | 114 | master passed |
   | +1 | checkstyle | 117 | master passed |
   | +1 | shadedjars | 271 | branch has no errors when building our shaded 
downstream artifacts. |
   | +1 | findbugs | 320 | master passed |
   | +1 | javadoc | 82 | master passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 14 | Maven dependency ordering for patch |
   | +1 | mvninstall | 241 | the patch passed |
   | +1 | compile | 113 | the patch passed |
   | +1 | javac | 113 | the patch passed |
   | +1 | checkstyle | 21 | hbase-common: The patch generated 0 new + 63 
unchanged - 8 fixed = 63 total (was 71) |
   | +1 | checkstyle | 10 | The patch passed checkstyle in hbase-hadoop2-compat 
|
   | +1 | checkstyle | 63 | hbase-server: The patch generated 0 new + 84 
unchanged - 7 fixed = 84 total (was 91) |
   | +1 | checkstyle | 16 | The patch passed checkstyle in hbase-mapreduce |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedjars | 261 | patch has no errors when building our shaded 
downstream artifacts. |
   | +1 | hadoopcheck | 760 | Patch does not cause any errors with Hadoop 2.8.5 
2.9.2 or 3.1.2. |
   | +1 | findbugs | 344 | the patch passed |
   | +1 | javadoc | 79 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 172 | hbase-common in the patch passed. |
   | +1 | unit | 36 | hbase-hadoop2-compat in the patch passed. |
   | -1 | unit | 14981 | hbase-server in the patch failed. |
   | +1 | unit | 1360 | hbase-mapreduce in the patch passed. |
   | +1 | asflicense | 108 | The patch does not generate ASF License warnings. |
   | | | 20304 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-322/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/322 |
   | Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
   | uname | Linux b1781cd27745 4.4.0-137-generic #163-Ubuntu SMP Mon Sep 24 
13:14:43 UTC 2018 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | /testptch/patchprocess/precommit/personality/provided.sh |
   | git revision | master / 605f8a15bb |
   | maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
   | Default Java | 1.8.0_181 |
   | findbugs | v3.1.11 |
   | unit | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-322/4/artifact/out/patch-unit-hbase-server.txt
 |
   |  Test Results | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-322/4/testReport/
 |
   | Max. process+thread count | 5278 (vs. ulimit of 1) |
   | modules | C: hbase-common hbase-hadoop2-compat hbase-server 
hbase-mapreduce U: . |
   | Console output | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-322/4/console |
   | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] virajjasani commented on issue #345: HBASE-22638 : Zookeeper Utility enhancements

2019-07-08 Thread GitBox
virajjasani commented on issue #345: HBASE-22638 : Zookeeper Utility 
enhancements
URL: https://github.com/apache/hbase/pull/345#issuecomment-509337863
 
 
   @HorizonNet Could you please take a final look?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] virajjasani commented on a change in pull request #345: HBASE-22638 : Zookeeper Utility enhancements

2019-07-08 Thread GitBox
virajjasani commented on a change in pull request #345: HBASE-22638 : Zookeeper 
Utility enhancements
URL: https://github.com/apache/hbase/pull/345#discussion_r300344249
 
 

 ##
 File path: 
hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
 ##
 @@ -1860,32 +1873,35 @@ private static void getReplicationZnodesDump(ZKWatcher 
zkw, StringBuilder sb)
 // do a ls -r on this znode
 sb.append("\n").append(replicationZnode).append(": ");
 List children = ZKUtil.listChildrenNoWatch(zkw, replicationZnode);
-Collections.sort(children);
-for (String child : children) {
-  String znode = ZNodePaths.joinZNode(replicationZnode, child);
-  if (znode.equals(zkw.getZNodePaths().peersZNode)) {
-appendPeersZnodes(zkw, znode, sb);
-  } else if (znode.equals(zkw.getZNodePaths().queuesZNode)) {
-appendRSZnodes(zkw, znode, sb);
-  } else if (znode.equals(zkw.getZNodePaths().hfileRefsZNode)) {
-appendHFileRefsZnodes(zkw, znode, sb);
+if (children != null) {
+  Collections.sort(children);
+  for (String child : children) {
+String zNode = ZNodePaths.joinZNode(replicationZnode, child);
+if (zNode.equals(zkw.getZNodePaths().peersZNode)) {
+  appendPeersZnodes(zkw, zNode, sb);
+} else if (zNode.equals(zkw.getZNodePaths().queuesZNode)) {
+  appendRSZnodes(zkw, zNode, sb);
+} else if (zNode.equals(zkw.getZNodePaths().hfileRefsZNode)) {
+  appendHFileRefsZnodes(zkw, zNode, sb);
+}
   }
 }
   }
 
   private static void appendHFileRefsZnodes(ZKWatcher zkw, String 
hfileRefsZnode,
 StringBuilder sb) throws 
KeeperException {
 sb.append("\n").append(hfileRefsZnode).append(": ");
-for (String peerIdZnode : ZKUtil.listChildrenNoWatch(zkw, hfileRefsZnode)) 
{
-  String znodeToProcess = ZNodePaths.joinZNode(hfileRefsZnode, 
peerIdZnode);
-  sb.append("\n").append(znodeToProcess).append(": ");
-  List peerHFileRefsZnodes = ZKUtil.listChildrenNoWatch(zkw, 
znodeToProcess);
-  int size = peerHFileRefsZnodes.size();
-  for (int i = 0; i < size; i++) {
-sb.append(peerHFileRefsZnodes.get(i));
-if (i != size - 1) {
-  sb.append(", ");
-}
+final List hFileRefChildrenNoWatchList =
+ZKUtil.listChildrenNoWatch(zkw, hfileRefsZnode);
+if (hFileRefChildrenNoWatchList == null) {
 
 Review comment:
   Thanks and totally agree @HorizonNet . Updated the PR. Please continue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] virajjasani removed a comment on issue #345: HBASE-22638 : Zookeeper Utility enhancements

2019-07-08 Thread GitBox
virajjasani removed a comment on issue #345: HBASE-22638 : Zookeeper Utility 
enhancements
URL: https://github.com/apache/hbase/pull/345#issuecomment-508059880
 
 
   Please review @jatsakthi @wchevreuil 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] virajjasani removed a comment on issue #345: HBASE-22638 : Zookeeper Utility enhancements

2019-07-08 Thread GitBox
virajjasani removed a comment on issue #345: HBASE-22638 : Zookeeper Utility 
enhancements
URL: https://github.com/apache/hbase/pull/345#issuecomment-508073090
 
 
   @HorizonNet 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] virajjasani commented on issue #348: HBASE-22643 : Delete region without archiving only if regiondir is pr…

2019-07-08 Thread GitBox
virajjasani commented on issue #348: HBASE-22643 : Delete region without 
archiving only if regiondir is pr…
URL: https://github.com/apache/hbase/pull/348#issuecomment-509336776
 
 
   @busbey Could you please review this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails

2019-07-08 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880569#comment-16880569
 ] 

Vladimir Rodionov commented on HBASE-22075:
---

To prevent non-atomic failures we will need acid txs? No? As for HBASE-16812, I 
do not think it is the only patch which affected MOB in a bad way - there 
should be others. I am saying that, because my own test failed with HBASE-16812 
reverted (on HDP-2.6.5). 

> Potential data loss when MOB compaction fails
> -
>
> Key: HBASE-22075
> URL: https://issues.apache.org/jira/browse/HBASE-22075
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, 
> 2.1.3
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: compaction, mob
> Fix For: 2.0.6, 2.2.1, 2.1.6
>
> Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, 
> HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, 
> HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java
>
>
> When MOB compaction fails during last step (bulk load of a newly created 
> reference file) there is a high chance of a data loss due to partially loaded 
> reference file, cells of which refer to (now) non-existent MOB file. The 
> newly created MOB file is deleted automatically in case of a MOB compaction 
> failure, but some cells with the references to this file might be loaded to 
> HBase. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22623) Add RegionObserver coprocessor hook for preWALAppend

2019-07-08 Thread Geoffrey Jacoby (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geoffrey Jacoby updated HBASE-22623:

Summary: Add RegionObserver coprocessor hook for preWALAppend  (was: Add 
coprocessor hooks for preWALAppend and postWALAppend)

> Add RegionObserver coprocessor hook for preWALAppend
> 
>
> Key: HBASE-22623
> URL: https://issues.apache.org/jira/browse/HBASE-22623
> Project: HBase
>  Issue Type: New Feature
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>Priority: Major
>
> While many coprocessor hooks expose the WALEdit to implementing coprocs, 
> there aren't any that expose the WALKey before it's created and added to the 
> WALEntry. 
> It's sometimes useful for coprocessors to be able to edit the WALKey, for 
> example to add extended attributes using the fields to be added in 
> HBASE-22622. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22666) Add missing @Test annotation to TestQuotaThrottle

2019-07-08 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880533#comment-16880533
 ] 

Hudson commented on HBASE-22666:


Results for branch branch-1
[build #939 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/939/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/939//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/939//JDK7_Nightly_Build_Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/939//JDK8_Nightly_Build_Report_(Hadoop2)/]




(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Add missing @Test annotation to TestQuotaThrottle
> -
>
> Key: HBASE-22666
> URL: https://issues.apache.org/jira/browse/HBASE-22666
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 1.5.0
>
>
> TestQuotaThrottle#testTableWriteCapacityUnitThrottle does not have @Test 
> annotation; compile step fails on nightly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails

2019-07-08 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880489#comment-16880489
 ] 

Sean Busbey commented on HBASE-22075:
-

My current opinion is that there are a couple of different issues to solve here.

1) I found that all of the places we see this particular dataloss test show a 
problem include HBASE-16812. Before that change there's a lock preventing 
overlaps between compaction and mob compaction. CDH5's backport of the MOB 
feature does not include this change.

Since that change is too far back to easily revert in master or branches-2 I'm 
going to test this theory by backporting it on top of CDH5 and see if the IT 
then shows the dataloss. will report back.

2) Independent of the problem with races between compaction and mob compaction, 
I think the use of bulk load to commit the updated ref files is subject to 
non-atomic failure. We should either confirm that it isn't or rework how we 
commit the updated mob references. My intuition is that we should be able to do 
this region-by-region using the building blocks that bulk loading is based on 
without needing to completely overhaul mob accounting or mob compaction (e.g. 
we shouldn't need something like the distributed procedure based mob compaction 
from HBASE-15381)

> Potential data loss when MOB compaction fails
> -
>
> Key: HBASE-22075
> URL: https://issues.apache.org/jira/browse/HBASE-22075
> Project: HBase
>  Issue Type: Bug
>  Components: mob
>Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, 
> 2.1.3
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Critical
>  Labels: compaction, mob
> Fix For: 2.0.6, 2.2.1, 2.1.6
>
> Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, 
> HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, 
> HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java
>
>
> When MOB compaction fails during last step (bulk load of a newly created 
> reference file) there is a high chance of a data loss due to partially loaded 
> reference file, cells of which refer to (now) non-existent MOB file. The 
> newly created MOB file is deleted automatically in case of a MOB compaction 
> failure, but some cells with the references to this file might be loaded to 
> HBase. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-21606) Document use of the meta table load metrics added in HBASE-19722

2019-07-08 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey reassigned HBASE-21606:
---

Assignee: Szalay-Beko Mate  (was: Sean Busbey)

that looks like a great write up! feel free to take this jira over.

> Document use of the meta table load metrics added in HBASE-19722
> 
>
> Key: HBASE-21606
> URL: https://issues.apache.org/jira/browse/HBASE-21606
> Project: HBase
>  Issue Type: Task
>  Components: documentation, meta, metrics, Operability
>Affects Versions: 3.0.0, 1.5.0, 1.4.6, 2.2.0, 2.0.2, 2.1.3
>Reporter: Sean Busbey
>Assignee: Szalay-Beko Mate
>Priority: Critical
>
> HBASE-19722 added a great new tool for figuring out where cluster load is 
> coming from. Needs a section in the ref guide
> * When should I use this?
> * Why shouldn't I use it all the time?
> * What does using it look like?
> * How do I use it?
> I think all the needed info for making something to answer these questions is 
> in the discussion on HBASE-19722



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22567) HBCK2 addMissingRegionsToMeta

2019-07-08 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880435#comment-16880435
 ] 

stack commented on HBASE-22567:
---

[~wchevreuil] Sounds good sir.

> HBCK2 addMissingRegionsToMeta
> -
>
> Key: HBASE-22567
> URL: https://issues.apache.org/jira/browse/HBASE-22567
> Project: HBase
>  Issue Type: New Feature
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>
> Following latest discussion on HBASE-21745, this proposes an hbck2 command 
> that allows for inserting back regions missing in META that still have 
> *regioninfo* available in HDFS. Although this is still an interactive and 
> simpler version than the old _OfflineMetaRepair_, it still relies on hdfs 
> state as the source of truth, and performs META updates mostly independently 
> from Master (apart from requiring Meta table been online).
> For a more detailed explanation on this command behaviour, pasting _command 
> usage_ text:
> {noformat}
> To be used for scenarios where some regions may be missing in META,
> but there's still a valid 'regioninfo' metadata file on HDFS.
> This is a lighter version of 'OfflineMetaRepair' tool commonly used for
> similar issues on 1.x release line.
> This command needs META to be online. For each table name passed as
> parameter, it performs a diff between regions available in META,
> against existing regions dirs on HDFS. Then, for region dirs with
> no matches in META, it reads regioninfo metadata file and
> re-creates given region in META. Regions are re-created in 'CLOSED'
> state at META table only, but not in Masters' cache, and are not
> assigned either. A rolling Masters restart, followed by a
> hbck2 'assigns' command with all re-inserted regions is required.
> This hbck2 'assigns' command is printed for user convenience.
> WARNING: To avoid potential region overlapping problems due to ongoing
> splits, this command disables given tables while re-inserting regions.
> An example adding missing regions for tables 'table_1' and 'table_2':
> $ HBCK2 addMissingRegionsInMeta table_1 table_2
> Returns hbck2 'assigns' command with all re-inserted regions.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22417) DeleteTableProcedure.deleteFromMeta method should remove table from Master's table descriptors cache

2019-07-08 Thread Wellington Chevreuil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880416#comment-16880416
 ] 

Wellington Chevreuil commented on HBASE-22417:
--

This latest test failure, I do believe is unrelated to this last patch changes. 
When executed it against my local branch before rebasing with latest master 
state, it passed. After rebasing, am getting errors on a different test:
{noformat}
[ERROR] 
testShutdownFixupWhenDaughterHasSplit(org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster)
  Time elapsed: 15.904 s  <<< FAILURE!
java.lang.AssertionError: Waiting for reference to be compacted
at 
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit(TestSplitTransactionOnCluster.java:398)
{noformat}

If I revert commit *ac4e5288*, test passes back. Seems a flaky test? Noticed 
that this test is also failing on master build for [HBASE-22582 
|https://github.com/apache/hbase/pull/341#issuecomment-508374837].


> DeleteTableProcedure.deleteFromMeta method should remove table from Master's 
> table descriptors cache
> 
>
> Key: HBASE-22417
> URL: https://issues.apache.org/jira/browse/HBASE-22417
> Project: HBase
>  Issue Type: Bug
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: HBASE-22417.master.001.patch, 
> HBASE-22417.master.002.patch, HBASE-22417.master.003.patch, 
> HBASE-22417.master.004.patch
>
>
> DeleteTableProcedure defines a static deleteFromMeta method that's currently 
> used both by DeleteTableProcedure itself and TruncateTableProcedure. 
> Sometimes, depending on the table size (and under slower, under performing 
> FileSystems), truncation can take longer to complete 
> *TRUNCATE_TABLE_CLEAR_FS_LAYOUT* stage, but the given table has already been 
> deleted from meta on previous *TRUNCATE_TABLE_REMOVE_FROM_META* stage. In 
> this case, features relying on Master's table descriptor's cache might 
> wrongly try to reference this truncating table. Master Web UI, for example, 
> would try to check this table state and end up showing a 500 error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22665) RegionServer abort failed when AbstractFSWAL.shutdown hang

2019-07-08 Thread Yechao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880403#comment-16880403
 ] 

Yechao Chen commented on HBASE-22665:
-

Thanks for reply [~Apache9][~wchevreuil]

 
{quote}So the RS process hangs forever and never completes shutdown?
{quote}

Yes,the RS process hangs forever and never finish shutdown before kill it 

bq. In this case, shouldn't we had handled that on AsyncFSWAL.syncFailed? I 
guess that would allow waitForSafePoint to finish, and, consequently, 
rollWriter. which would release rollWriterLock for shutdown.
bq. 
bq. Yechao Chen, would you have the full RS log covering the period when this 
was observed?

the log has uploaded, about the time "16:11:15.074" (the log date time was lost 
by error log  config  ,just ignore it) 
the log print error "HRegionServer - * ABORTING region server... "
before that ,there is a lot of wal "Slow sync" or "sync failed" ,and "Large 
batch operation detected " and gc pause is long (aoout 10 seconds many times),  
data hotspot in the region  of this rs.

> RegionServer abort failed when AbstractFSWAL.shutdown hang
> --
>
> Key: HBASE-22665
> URL: https://issues.apache.org/jira/browse/HBASE-22665
> Project: HBase
>  Issue Type: Bug
> Environment: HBase 2.1.2
> Hadoop 3.1.x
> centos 7.4
>Reporter: Yechao Chen
>Priority: Major
> Attachments: image-2019-07-08-16-07-37-664.png, 
> image-2019-07-08-16-08-26-777.png, image-2019-07-08-16-14-43-455.png, 
> jstack_20190625, jstack_20190704_1, jstack_20190704_2, rs.log.part1
>
>
> We use hbase 2.1.2,when the rs with heavy qps and rs abort with error like 
> "Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to 
> get sync result after 30 ms for txid=36380334, WAL system stuck?"
>  
> RegionServer aborted failed when AbstractFSWAL.shutdown hang
>  
> jstack info always show the regionserver hang with "AbstractFSWAL.shutdown"
> "regionserver/hbase-slave-216-99:16020" #25 daemon prio=5 os_prio=0 
> tid=0x7f204282c600 nid=0x34aa waiting on condition [0x7f0fe044d000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7f18a49b2bb8> (a 
> java.util.concurrent.locks.ReentrantLock$FairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
>  {color:#FF}at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285){color}
> {color:#FF} at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:815){color}
>  at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.shutdown(AbstractFSWALProvider.java:168)
>  at 
> org.apache.hadoop.hbase.wal.RegionGroupingProvider.shutdown(RegionGroupingProvider.java:221)
>  at org.apache.hadoop.hbase.wal.WALFactory.shutdown(WALFactory.java:239)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.shutdownWAL(HRegionServer.java:1445)
>  {color:#FF}at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1117){color}
> {color:#FF} at java.lang.Thread.run(Thread.java:745){color}
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22665) RegionServer abort failed when AbstractFSWAL.shutdown hang

2019-07-08 Thread Yechao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yechao Chen updated HBASE-22665:

Attachment: rs.log.part1

> RegionServer abort failed when AbstractFSWAL.shutdown hang
> --
>
> Key: HBASE-22665
> URL: https://issues.apache.org/jira/browse/HBASE-22665
> Project: HBase
>  Issue Type: Bug
> Environment: HBase 2.1.2
> Hadoop 3.1.x
> centos 7.4
>Reporter: Yechao Chen
>Priority: Major
> Attachments: image-2019-07-08-16-07-37-664.png, 
> image-2019-07-08-16-08-26-777.png, image-2019-07-08-16-14-43-455.png, 
> jstack_20190625, jstack_20190704_1, jstack_20190704_2, rs.log.part1
>
>
> We use hbase 2.1.2,when the rs with heavy qps and rs abort with error like 
> "Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to 
> get sync result after 30 ms for txid=36380334, WAL system stuck?"
>  
> RegionServer aborted failed when AbstractFSWAL.shutdown hang
>  
> jstack info always show the regionserver hang with "AbstractFSWAL.shutdown"
> "regionserver/hbase-slave-216-99:16020" #25 daemon prio=5 os_prio=0 
> tid=0x7f204282c600 nid=0x34aa waiting on condition [0x7f0fe044d000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7f18a49b2bb8> (a 
> java.util.concurrent.locks.ReentrantLock$FairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
>  {color:#FF}at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285){color}
> {color:#FF} at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:815){color}
>  at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.shutdown(AbstractFSWALProvider.java:168)
>  at 
> org.apache.hadoop.hbase.wal.RegionGroupingProvider.shutdown(RegionGroupingProvider.java:221)
>  at org.apache.hadoop.hbase.wal.WALFactory.shutdown(WALFactory.java:239)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.shutdownWAL(HRegionServer.java:1445)
>  {color:#FF}at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1117){color}
> {color:#FF} at java.lang.Thread.run(Thread.java:745){color}
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22618) Provide a way to have Heterogeneous deployment

2019-07-08 Thread Pierre Zemb (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880361#comment-16880361
 ] 

Pierre Zemb commented on HBASE-22618:
-

Cool, I will make the implementation based on master then. Thanks for the tips!

> Provide a way to have Heterogeneous deployment
> --
>
> Key: HBASE-22618
> URL: https://issues.apache.org/jira/browse/HBASE-22618
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.1.6, 1.4.11
>Reporter: Pierre Zemb
>Priority: Major
>
> Hi,
> We wouls like to open the discussion about bringing the possibility to have 
> regions deployed on {color:#22}Heterogeneous deployment{color}, i.e Hbase 
> cluster running different kind of hardware.
> h2. Why?
>  * Cloud deployments means that we may not be able to have the same hardware 
> throughout the years
>  * Some tables may need special requirements such as SSD whereas others 
> should be using hard-drives
>  * {color:#22} {color}*in our usecase*{color:#22}(single table, 
> dedicated HBase and Hadoop tuned for our usecase, good key 
> distribution){color}*, the number of regions per RS was the real limit for 
> us*{color:#22}.{color}
> h2. Our usecase
> We found out that *in our usecase*(single table, dedicated HBase and Hadoop 
> tuned for our usecase, good key distribution)*, the number of regions per RS 
> was the real limit for us*.
> Over the years, due to historical reasons and also the need to benchmark new 
> machines, we ended-up with differents groups of hardware: some servers can 
> handle only 180 regions, whereas the biggest can handle more than 900. 
> Because of such a difference, we had to disable the LoadBalancing to avoid 
> the {{roundRobinAssigmnent}}. We developed some internal tooling which are 
> responsible for load balancing regions across RegionServers. That was 1.5 
> year ago.
> h2. Our Proof-of-concept
> We did work on a Proof-of-concept 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java],
>  and some early tests 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java],
>  
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerBalance.java],
>  and 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerRules.java].
>  We wrote the balancer for our use-case, which means that:
>  * there is one table
>  * there is no region-replica
>  * good key dispersion
>  * there is no regions on master
> A rule file is loaded before balancing. It contains lines of rules. A rule is 
> composed of a regexp for hostname, and a limit. For example, we could have:
>  
> {quote}rs[0-9] 200
> rs1[0-9] 50
> {quote}
>  
> RegionServers with hostname matching the first rules will have a limit of 
> 200, and the others 50. If there's no match, a default is set.
> Thanks to the rule, we have two informations: the max number of regions for 
> this cluster, and the rules for each servers. {{HeterogeneousBalancer}} will 
> try to balance regions according to their capacity.
> Let's take an example. Let's say that we have 20 RS:
>  * 10 RS, named through {{rs0}} to {{rs9}} loaded with 60 regions each, and 
> each can handle 200 regions.
>  * 10 RS, named through {{rs10}} to {{rs19}} loaded with 60 regions each, and 
> each can support 50 regions.
> Based on the following rules:
>  
> {quote}rs[0-9] 200
> rs1[0-9] 50
> {quote}
>  
> The second group is overloaded, whereas the first group has plenty of space.
> We know that we can handle at maximum *2500 regions* (200*10 + 50*10) and we 
> have currently *1200 regions* (60*20). {{HeterogeneousBalancer}} will 
> understand that the cluster is *full at 48.0%* (1200/2500). Based on this 
> information, we will then *try to put all the RegionServers to ~48% of load 
> according to the rules.* In this case, it will move regions from the second 
> group to the first.
> The balancer will:
>  * compute how many regions needs to be moved. In our example, by moving 36 
> regions on rs10, we could go from 120.0% to 46.0%
>  * select regions with lowest data-locality
>  * try to find an appropriate RS for the region. We will take the lowest 
> available RS.
> h2. Other implementations and ideas
> Clay Baenziger proposed this idea on the dev ML:
> {quote}{color:#22}Could it work to have the stochastic load balancer use 
> [pluggable cost functions instead of this static list of cost 
> 

[jira] [Updated] (HBASE-22666) Add missing @Test annotation to TestQuotaThrottle

2019-07-08 Thread Peter Somogyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi updated HBASE-22666:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~Apache9] for reviewing!

> Add missing @Test annotation to TestQuotaThrottle
> -
>
> Key: HBASE-22666
> URL: https://issues.apache.org/jira/browse/HBASE-22666
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 1.5.0
>
>
> TestQuotaThrottle#testTableWriteCapacityUnitThrottle does not have @Test 
> annotation; compile step fails on nightly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hbase] petersomogyi merged pull request #364: HBASE-22666 Add missing @Test annotation to TestQuotaThrottle

2019-07-08 Thread GitBox
petersomogyi merged pull request #364: HBASE-22666 Add missing @Test annotation 
to TestQuotaThrottle
URL: https://github.com/apache/hbase/pull/364
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] petersomogyi commented on issue #364: HBASE-22666 Add missing @Test annotation to TestQuotaThrottle

2019-07-08 Thread GitBox
petersomogyi commented on issue #364: HBASE-22666 Add missing @Test annotation 
to TestQuotaThrottle
URL: https://github.com/apache/hbase/pull/364#issuecomment-509225382
 
 
   Failure in TestRegionMergeTransactionOnCluster is unrelated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HBASE-22661) list_regions command in hbase shell is broken

2019-07-08 Thread Toshihiro Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880345#comment-16880345
 ] 

Toshihiro Suzuki commented on HBASE-22661:
--

I started hbase as a local mode. Is it related?

> list_regions command in hbase shell is broken
> -
>
> Key: HBASE-22661
> URL: https://issues.apache.org/jira/browse/HBASE-22661
> Project: HBase
>  Issue Type: Bug
>Reporter: Toshihiro Suzuki
>Priority: Major
>
> I faced the following error in the master branch:
> {code}
> hbase(main):001:0> create "test", "cf"
> 2019-07-07 23:24:15,254 WARN  [main] util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Created table test
> Took 6.5678 seconds
> => Hbase::Table - test
> hbase(main):002:0> list_regions "test"
> ERROR: undefined method `getClusterStatus' for 
> #
> Did you mean?  get_cluster_metrics
> For usage try 'help "list_regions"'
> Took 0.1997 seconds
> {code}
> I didn't check if the other branches have the same issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22618) Provide a way to have Heterogeneous deployment

2019-07-08 Thread Wellington Chevreuil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880342#comment-16880342
 ] 

Wellington Chevreuil commented on HBASE-22618:
--

Oh, sorry for the confusion, thought this was based out of master.
{quote}Master branch corresponds to Hbase 2.X release right?
{quote}
No, master branch is for the next potential major release. 
{quote} I must admit that my target cluster is in 1.4
{quote}
Right, and how difficult would be to apply this on master first? We could then 
backport it back to branch-2 and branch-1, once master implementation is 
finished.

> Provide a way to have Heterogeneous deployment
> --
>
> Key: HBASE-22618
> URL: https://issues.apache.org/jira/browse/HBASE-22618
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.1.6, 1.4.11
>Reporter: Pierre Zemb
>Priority: Major
>
> Hi,
> We wouls like to open the discussion about bringing the possibility to have 
> regions deployed on {color:#22}Heterogeneous deployment{color}, i.e Hbase 
> cluster running different kind of hardware.
> h2. Why?
>  * Cloud deployments means that we may not be able to have the same hardware 
> throughout the years
>  * Some tables may need special requirements such as SSD whereas others 
> should be using hard-drives
>  * {color:#22} {color}*in our usecase*{color:#22}(single table, 
> dedicated HBase and Hadoop tuned for our usecase, good key 
> distribution){color}*, the number of regions per RS was the real limit for 
> us*{color:#22}.{color}
> h2. Our usecase
> We found out that *in our usecase*(single table, dedicated HBase and Hadoop 
> tuned for our usecase, good key distribution)*, the number of regions per RS 
> was the real limit for us*.
> Over the years, due to historical reasons and also the need to benchmark new 
> machines, we ended-up with differents groups of hardware: some servers can 
> handle only 180 regions, whereas the biggest can handle more than 900. 
> Because of such a difference, we had to disable the LoadBalancing to avoid 
> the {{roundRobinAssigmnent}}. We developed some internal tooling which are 
> responsible for load balancing regions across RegionServers. That was 1.5 
> year ago.
> h2. Our Proof-of-concept
> We did work on a Proof-of-concept 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java],
>  and some early tests 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java],
>  
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerBalance.java],
>  and 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerRules.java].
>  We wrote the balancer for our use-case, which means that:
>  * there is one table
>  * there is no region-replica
>  * good key dispersion
>  * there is no regions on master
> A rule file is loaded before balancing. It contains lines of rules. A rule is 
> composed of a regexp for hostname, and a limit. For example, we could have:
>  
> {quote}rs[0-9] 200
> rs1[0-9] 50
> {quote}
>  
> RegionServers with hostname matching the first rules will have a limit of 
> 200, and the others 50. If there's no match, a default is set.
> Thanks to the rule, we have two informations: the max number of regions for 
> this cluster, and the rules for each servers. {{HeterogeneousBalancer}} will 
> try to balance regions according to their capacity.
> Let's take an example. Let's say that we have 20 RS:
>  * 10 RS, named through {{rs0}} to {{rs9}} loaded with 60 regions each, and 
> each can handle 200 regions.
>  * 10 RS, named through {{rs10}} to {{rs19}} loaded with 60 regions each, and 
> each can support 50 regions.
> Based on the following rules:
>  
> {quote}rs[0-9] 200
> rs1[0-9] 50
> {quote}
>  
> The second group is overloaded, whereas the first group has plenty of space.
> We know that we can handle at maximum *2500 regions* (200*10 + 50*10) and we 
> have currently *1200 regions* (60*20). {{HeterogeneousBalancer}} will 
> understand that the cluster is *full at 48.0%* (1200/2500). Based on this 
> information, we will then *try to put all the RegionServers to ~48% of load 
> according to the rules.* In this case, it will move regions from the second 
> group to the first.
> The balancer will:
>  * compute how many regions needs to be moved. In our example, by moving 36 
> regions on rs10, we could go from 120.0% to 46.0%
>  * select regions with lowest data-locality
>  * try to find an 

[jira] [Commented] (HBASE-22665) RegionServer abort failed when AbstractFSWAL.shutdown hang

2019-07-08 Thread Wellington Chevreuil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880331#comment-16880331
 ] 

Wellington Chevreuil commented on HBASE-22665:
--

{quote}RegionServer aborted failed when AbstractFSWAL.shutdown hang
{quote}
So the RS process hangs forever and never completes shutdown, [~chenyechao]?

{quote}I think the WAL has already been broken before shutting down... As this 
message{quote}
In this case, shouldn't we had handled that on 
[AsyncFSWAL.syncFailed|https://github.com/apache/hbase/blob/branch-2.1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AsyncFSWAL.java#L304]?
 I guess that would allow _waitForSafePoint_ to finish, and, consequently,  
_rollWriter_. which would release _rollWriterLock_ for _shutdown_. 

[~chenyechao], would you have the full RS log covering the period when this was 
observed?

> RegionServer abort failed when AbstractFSWAL.shutdown hang
> --
>
> Key: HBASE-22665
> URL: https://issues.apache.org/jira/browse/HBASE-22665
> Project: HBase
>  Issue Type: Bug
> Environment: HBase 2.1.2
> Hadoop 3.1.x
> centos 7.4
>Reporter: Yechao Chen
>Priority: Major
> Attachments: image-2019-07-08-16-07-37-664.png, 
> image-2019-07-08-16-08-26-777.png, image-2019-07-08-16-14-43-455.png, 
> jstack_20190625, jstack_20190704_1, jstack_20190704_2
>
>
> We use hbase 2.1.2,when the rs with heavy qps and rs abort with error like 
> "Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to 
> get sync result after 30 ms for txid=36380334, WAL system stuck?"
>  
> RegionServer aborted failed when AbstractFSWAL.shutdown hang
>  
> jstack info always show the regionserver hang with "AbstractFSWAL.shutdown"
> "regionserver/hbase-slave-216-99:16020" #25 daemon prio=5 os_prio=0 
> tid=0x7f204282c600 nid=0x34aa waiting on condition [0x7f0fe044d000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7f18a49b2bb8> (a 
> java.util.concurrent.locks.ReentrantLock$FairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
>  {color:#FF}at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285){color}
> {color:#FF} at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:815){color}
>  at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.shutdown(AbstractFSWALProvider.java:168)
>  at 
> org.apache.hadoop.hbase.wal.RegionGroupingProvider.shutdown(RegionGroupingProvider.java:221)
>  at org.apache.hadoop.hbase.wal.WALFactory.shutdown(WALFactory.java:239)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.shutdownWAL(HRegionServer.java:1445)
>  {color:#FF}at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1117){color}
> {color:#FF} at java.lang.Thread.run(Thread.java:745){color}
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose

2019-07-08 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880332#comment-16880332
 ] 

Hudson commented on HBASE-21879:


Results for branch HBASE-21879
[build #171 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21879/171/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21879/171//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21879/171//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21879/171//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Read HFile's block to ByteBuffer directly instead of to byte for reducing 
> young gc purpose
> --
>
> Key: HBASE-21879
> URL: https://issues.apache.org/jira/browse/HBASE-21879
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HBASE-21879.v1.patch, HBASE-21879.v1.patch, 
> QPS-latencies-before-HBASE-21879.png, gc-data-before-HBASE-21879.png
>
>
> In HFileBlock#readBlockDataInternal,  we have the following: 
> {code}
> @VisibleForTesting
> protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
> long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, 
> boolean updateMetrics)
>  throws IOException {
>  // .
>   // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with 
> BBPool (offheap).
>   byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
>   int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
>   onDiskSizeWithHeader - preReadHeaderSize, true, offset + 
> preReadHeaderSize, pread);
>   if (headerBuf != null) {
> // ...
>   }
>   // ...
>  }
> {code}
> In the read path,  we still read the block from hfile to on-heap byte[], then 
> copy the on-heap byte[] to offheap bucket cache asynchronously,  and in my  
> 100% get performance test, I also observed some frequent young gc,  The 
> largest memory footprint in the young gen should be the on-heap block byte[].
> In fact, we can read HFile's block to ByteBuffer directly instead of to 
> byte[] for reducing young gc purpose. we did not implement this before, 
> because no ByteBuffer reading interface in the older HDFS client, but 2.7+ 
> has supported this now,  so we can fix this now. I think. 
> Will provide an patch and some perf-comparison for this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hbase] Apache9 commented on issue #362: HBASE-22664 Move protobuf sutff in hbase-rsgroup to hbase-protocol-sh…

2019-07-08 Thread GitBox
Apache9 commented on issue #362: HBASE-22664 Move protobuf sutff in 
hbase-rsgroup to hbase-protocol-sh…
URL: https://github.com/apache/hbase/pull/362#issuecomment-509218901
 
 
   Any other concerns? @infraio 
   Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HBASE-22618) Provide a way to have Heterogeneous deployment

2019-07-08 Thread Pierre Zemb (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880297#comment-16880297
 ] 

Pierre Zemb commented on HBASE-22618:
-

Thank you [~wchevreuil] for your comments!

 

Master branch corresponds to Hbase 2.X release right? Is it possible to 
backport the patch to 1.4? I must admit that my target cluster is in 1.4 :)

> Provide a way to have Heterogeneous deployment
> --
>
> Key: HBASE-22618
> URL: https://issues.apache.org/jira/browse/HBASE-22618
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.1.6, 1.4.11
>Reporter: Pierre Zemb
>Priority: Major
>
> Hi,
> We wouls like to open the discussion about bringing the possibility to have 
> regions deployed on {color:#22}Heterogeneous deployment{color}, i.e Hbase 
> cluster running different kind of hardware.
> h2. Why?
>  * Cloud deployments means that we may not be able to have the same hardware 
> throughout the years
>  * Some tables may need special requirements such as SSD whereas others 
> should be using hard-drives
>  * {color:#22} {color}*in our usecase*{color:#22}(single table, 
> dedicated HBase and Hadoop tuned for our usecase, good key 
> distribution){color}*, the number of regions per RS was the real limit for 
> us*{color:#22}.{color}
> h2. Our usecase
> We found out that *in our usecase*(single table, dedicated HBase and Hadoop 
> tuned for our usecase, good key distribution)*, the number of regions per RS 
> was the real limit for us*.
> Over the years, due to historical reasons and also the need to benchmark new 
> machines, we ended-up with differents groups of hardware: some servers can 
> handle only 180 regions, whereas the biggest can handle more than 900. 
> Because of such a difference, we had to disable the LoadBalancing to avoid 
> the {{roundRobinAssigmnent}}. We developed some internal tooling which are 
> responsible for load balancing regions across RegionServers. That was 1.5 
> year ago.
> h2. Our Proof-of-concept
> We did work on a Proof-of-concept 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java],
>  and some early tests 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java],
>  
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerBalance.java],
>  and 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerRules.java].
>  We wrote the balancer for our use-case, which means that:
>  * there is one table
>  * there is no region-replica
>  * good key dispersion
>  * there is no regions on master
> A rule file is loaded before balancing. It contains lines of rules. A rule is 
> composed of a regexp for hostname, and a limit. For example, we could have:
>  
> {quote}rs[0-9] 200
> rs1[0-9] 50
> {quote}
>  
> RegionServers with hostname matching the first rules will have a limit of 
> 200, and the others 50. If there's no match, a default is set.
> Thanks to the rule, we have two informations: the max number of regions for 
> this cluster, and the rules for each servers. {{HeterogeneousBalancer}} will 
> try to balance regions according to their capacity.
> Let's take an example. Let's say that we have 20 RS:
>  * 10 RS, named through {{rs0}} to {{rs9}} loaded with 60 regions each, and 
> each can handle 200 regions.
>  * 10 RS, named through {{rs10}} to {{rs19}} loaded with 60 regions each, and 
> each can support 50 regions.
> Based on the following rules:
>  
> {quote}rs[0-9] 200
> rs1[0-9] 50
> {quote}
>  
> The second group is overloaded, whereas the first group has plenty of space.
> We know that we can handle at maximum *2500 regions* (200*10 + 50*10) and we 
> have currently *1200 regions* (60*20). {{HeterogeneousBalancer}} will 
> understand that the cluster is *full at 48.0%* (1200/2500). Based on this 
> information, we will then *try to put all the RegionServers to ~48% of load 
> according to the rules.* In this case, it will move regions from the second 
> group to the first.
> The balancer will:
>  * compute how many regions needs to be moved. In our example, by moving 36 
> regions on rs10, we could go from 120.0% to 46.0%
>  * select regions with lowest data-locality
>  * try to find an appropriate RS for the region. We will take the lowest 
> available RS.
> h2. Other implementations and ideas
> Clay Baenziger proposed this idea on the dev ML:
> {quote}{color:#22}Could it work to have the stochastic load balancer use 
> 

[GitHub] [hbase] Apache-HBase commented on issue #302: HBASE-22571 Javadoc Warnings related to @return tag

2019-07-08 Thread GitBox
Apache-HBase commented on issue #302: HBASE-22571 Javadoc Warnings related to 
@return tag
URL: https://github.com/apache/hbase/pull/302#issuecomment-509216428
 
 
   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 93 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | hbaseanti | 0 |  Patch does not have any anti-patterns. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 14 new or modified test 
files. |
   ||| _ master Compile Tests _ |
   | 0 | mvndep | 18 | Maven dependency ordering for branch |
   | +1 | mvninstall | 229 | master passed |
   | +1 | compile | 111 | master passed |
   | +1 | checkstyle | 126 | master passed |
   | +1 | shadedjars | 253 | branch has no errors when building our shaded 
downstream artifacts. |
   | +1 | findbugs | 322 | master passed |
   | +1 | javadoc | 77 | master passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 12 | Maven dependency ordering for patch |
   | +1 | mvninstall | 222 | the patch passed |
   | +1 | compile | 111 | the patch passed |
   | +1 | javac | 111 | the patch passed |
   | +1 | checkstyle | 25 | The patch passed checkstyle in hbase-client |
   | +1 | checkstyle | 11 | The patch passed checkstyle in hbase-zookeeper |
   | +1 | checkstyle | 70 | The patch passed checkstyle in hbase-server |
   | +1 | checkstyle | 15 | hbase-mapreduce: The patch generated 0 new + 37 
unchanged - 8 fixed = 37 total (was 45) |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedjars | 252 | patch has no errors when building our shaded 
downstream artifacts. |
   | +1 | hadoopcheck | 690 | Patch does not cause any errors with Hadoop 2.8.5 
2.9.2 or 3.1.2. |
   | +1 | findbugs | 345 | the patch passed |
   | +1 | javadoc | 78 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 102 | hbase-client in the patch passed. |
   | +1 | unit | 45 | hbase-zookeeper in the patch passed. |
   | +1 | unit | 7958 | hbase-server in the patch passed. |
   | +1 | unit | 830 | hbase-mapreduce in the patch passed. |
   | +1 | asflicense | 92 | The patch does not generate ASF License warnings. |
   | | | 12432 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-302/7/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/302 |
   | Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
   | uname | Linux 167ec46ecc11 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | /testptch/patchprocess/precommit/personality/provided.sh |
   | git revision | master / 605f8a15bb |
   | maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
   | Default Java | 1.8.0_181 |
   | findbugs | v3.1.11 |
   |  Test Results | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-302/7/testReport/
 |
   | Max. process+thread count | 5062 (vs. ulimit of 1) |
   | modules | C: hbase-client hbase-zookeeper hbase-server hbase-mapreduce U: 
. |
   | Console output | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-302/7/console |
   | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] Apache-HBase commented on issue #364: HBASE-22666 Add missing @Test annotation to TestQuotaThrottle

2019-07-08 Thread GitBox
Apache-HBase commented on issue #364: HBASE-22666 Add missing @Test annotation 
to TestQuotaThrottle
URL: https://github.com/apache/hbase/pull/364#issuecomment-509215239
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 1126 | Docker mode activated. |
   ||| _ Prechecks _ |
   | 0 | findbugs | 0 | Findbugs executables are not available. |
   | +1 | hbaseanti | 0 |  Patch does not have any anti-patterns. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ branch-1 Compile Tests _ |
   | +1 | mvninstall | 123 | branch-1 passed |
   | +1 | compile | 39 | branch-1 passed with JDK v1.8.0_212 |
   | +1 | compile | 41 | branch-1 passed with JDK v1.7.0_222 |
   | +1 | checkstyle | 75 | branch-1 passed |
   | +1 | shadedjars | 158 | branch has no errors when building our shaded 
downstream artifacts. |
   | +1 | javadoc | 29 | branch-1 passed with JDK v1.8.0_212 |
   | +1 | javadoc | 38 | branch-1 passed with JDK v1.7.0_222 |
   ||| _ Patch Compile Tests _ |
   | +1 | mvninstall | 100 | the patch passed |
   | +1 | compile | 36 | the patch passed with JDK v1.8.0_212 |
   | +1 | javac | 36 | the patch passed |
   | +1 | compile | 41 | the patch passed with JDK v1.7.0_222 |
   | +1 | javac | 41 | the patch passed |
   | +1 | checkstyle | 77 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedjars | 165 | patch has no errors when building our shaded 
downstream artifacts. |
   | +1 | hadoopcheck | 211 | Patch does not cause any errors with Hadoop 2.8.5 
2.9.2. |
   | +1 | javadoc | 28 | the patch passed with JDK v1.8.0_212 |
   | +1 | javadoc | 39 | the patch passed with JDK v1.7.0_222 |
   ||| _ Other Tests _ |
   | -1 | unit | 8844 | hbase-server in the patch failed. |
   | +1 | asflicense | 23 | The patch does not generate ASF License warnings. |
   | | | 11374 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hbase.regionserver.TestRegionMergeTransactionOnCluster |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-364/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/364 |
   | Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
   | uname | Linux d62fbded00a1 4.4.0-137-generic #163-Ubuntu SMP Mon Sep 24 
13:14:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | /testptch/patchprocess/precommit/personality/provided.sh |
   | git revision | branch-1 / ebbb0e2 |
   | maven | version: Apache Maven 3.0.5 |
   | Default Java | 1.7.0_222 |
   | Multi-JDK versions |  /usr/lib/jvm/java-8-openjdk-amd64:1.8.0_212 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_222 |
   | unit | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-364/1/artifact/out/patch-unit-hbase-server.txt
 |
   |  Test Results | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-364/1/testReport/
 |
   | Max. process+thread count | 4244 (vs. ulimit of 1) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-364/1/console |
   | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] syedmurtazahassan commented on issue #322: HBASE-22586 Javadoc Warnings related to @param tag

2019-07-08 Thread GitBox
syedmurtazahassan commented on issue #322: HBASE-22586 Javadoc Warnings related 
to @param tag
URL: https://github.com/apache/hbase/pull/322#issuecomment-509214999
 
 
   @jatsakthi @HorizonNet Adressed the comments. Kindly have a look when you 
have time. 
   Thanks. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HBASE-6519) FSRegionScanner should be in its own file

2019-07-08 Thread kevin su (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880213#comment-16880213
 ] 

kevin su commented on HBASE-6519:
-

*[~rvadali]*  {color:#33}link not working{color}

> FSRegionScanner should be in its own file
> -
>
> Key: HBASE-6519
> URL: https://issues.apache.org/jira/browse/HBASE-6519
> Project: HBase
>  Issue Type: Improvement
>  Components: util
> Environment: mac osx, jdk 1.6
>Reporter: Ramkumar Vadali
>Priority: Minor
>
> I found this problem in the 0.89-fb branch.
> I was not able to start the master because of a ClassNotFoundException for 
> FSRegionScanner.
> FSRegionScanner is a top-level class in FSUtils.java. Moving it to a separate 
> file solved the problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hbase] Apache-HBase commented on issue #362: HBASE-22664 Move protobuf sutff in hbase-rsgroup to hbase-protocol-sh…

2019-07-08 Thread GitBox
Apache-HBase commented on issue #362: HBASE-22664 Move protobuf sutff in 
hbase-rsgroup to hbase-protocol-sh…
URL: https://github.com/apache/hbase/pull/362#issuecomment-509182266
 
 
   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 151 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | hbaseanti | 0 |  Patch does not have any anti-patterns. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ master Compile Tests _ |
   | 0 | mvndep | 25 | Maven dependency ordering for branch |
   | +1 | mvninstall | 252 | master passed |
   | +1 | compile | 106 | master passed |
   | +1 | checkstyle | 69 | master passed |
   | +1 | shadedjars | 266 | branch has no errors when building our shaded 
downstream artifacts. |
   | +1 | findbugs | 321 | master passed |
   | +1 | javadoc | 61 | master passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 13 | Maven dependency ordering for patch |
   | +1 | mvninstall | 240 | the patch passed |
   | +1 | compile | 107 | the patch passed |
   | +1 | cc | 107 | the patch passed |
   | +1 | javac | 107 | the patch passed |
   | +1 | checkstyle | 65 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | xml | 2 | The patch has no ill-formed XML file. |
   | +1 | shadedjars | 269 | patch has no errors when building our shaded 
downstream artifacts. |
   | +1 | hadoopcheck | 746 | Patch does not cause any errors with Hadoop 2.8.5 
2.9.2 or 3.1.2. |
   | +1 | hbaseprotoc | 108 | the patch passed |
   | +1 | findbugs | 399 | the patch passed |
   | +1 | javadoc | 57 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 36 | hbase-protocol-shaded in the patch passed. |
   | +1 | unit | 25 | hbase-protocol in the patch passed. |
   | +1 | unit | 111 | hbase-client in the patch passed. |
   | +1 | unit | 463 | hbase-rsgroup in the patch passed. |
   | +1 | asflicense | 46 | The patch does not generate ASF License warnings. |
   | | | 4298 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-362/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/362 |
   | Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  cc  hbaseprotoc  xml  |
   | uname | Linux 6270fc02cad9 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | /testptch/patchprocess/precommit/personality/provided.sh |
   | git revision | master / 605f8a15bb |
   | maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
   | Default Java | 1.8.0_181 |
   | findbugs | v3.1.11 |
   |  Test Results | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-362/3/testReport/
 |
   | Max. process+thread count | 4410 (vs. ulimit of 1) |
   | modules | C: hbase-protocol-shaded hbase-protocol hbase-client 
hbase-rsgroup U: . |
   | Console output | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-362/3/console |
   | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] Apache-HBase commented on issue #363: Add unit tests for org.apache.hadoop.hbase.util.Strings

2019-07-08 Thread GitBox
Apache-HBase commented on issue #363: Add unit tests for 
org.apache.hadoop.hbase.util.Strings
URL: https://github.com/apache/hbase/pull/363#issuecomment-509170491
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 138 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | hbaseanti | 0 |  Patch does not have any anti-patterns. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ master Compile Tests _ |
   | +1 | mvninstall | 255 | master passed |
   | +1 | compile | 21 | master passed |
   | +1 | checkstyle | 24 | master passed |
   | +1 | shadedjars | 278 | branch has no errors when building our shaded 
downstream artifacts. |
   | +1 | findbugs | 42 | master passed |
   | +1 | javadoc | 20 | master passed |
   ||| _ Patch Compile Tests _ |
   | +1 | mvninstall | 258 | the patch passed |
   | +1 | compile | 23 | the patch passed |
   | +1 | javac | 23 | the patch passed |
   | -1 | checkstyle | 23 | hbase-common: The patch generated 18 new + 0 
unchanged - 0 fixed = 18 total (was 0) |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedjars | 282 | patch has no errors when building our shaded 
downstream artifacts. |
   | +1 | hadoopcheck | 736 | Patch does not cause any errors with Hadoop 2.8.5 
2.9.2 or 3.1.2. |
   | +1 | findbugs | 47 | the patch passed |
   | +1 | javadoc | 21 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 181 | hbase-common in the patch passed. |
   | +1 | asflicense | 12 | The patch does not generate ASF License warnings. |
   | | | 2675 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-363/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/363 |
   | Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
   | uname | Linux f78e3a500950 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | /testptch/patchprocess/precommit/personality/provided.sh |
   | git revision | master / 605f8a15bb |
   | maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
   | Default Java | 1.8.0_181 |
   | findbugs | v3.1.11 |
   | checkstyle | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-363/1/artifact/out/diff-checkstyle-hbase-common.txt
 |
   |  Test Results | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-363/1/testReport/
 |
   | Max. process+thread count | 346 (vs. ulimit of 1) |
   | modules | C: hbase-common U: hbase-common |
   | Console output | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-363/1/console |
   | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HBASE-22618) Provide a way to have Heterogeneous deployment

2019-07-08 Thread Wellington Chevreuil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880174#comment-16880174
 ] 

Wellington Chevreuil commented on HBASE-22618:
--

Thanks for sharing this, [~PierreZ]. Had left some comments on the your github 
fork. Once you are done with your next changes, [can you submit a PR to master 
branch|https://help.github.com/en/articles/creating-a-pull-request-from-a-fork] 
and link it here? 

> Provide a way to have Heterogeneous deployment
> --
>
> Key: HBASE-22618
> URL: https://issues.apache.org/jira/browse/HBASE-22618
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.1.6, 1.4.11
>Reporter: Pierre Zemb
>Priority: Major
>
> Hi,
> We wouls like to open the discussion about bringing the possibility to have 
> regions deployed on {color:#22}Heterogeneous deployment{color}, i.e Hbase 
> cluster running different kind of hardware.
> h2. Why?
>  * Cloud deployments means that we may not be able to have the same hardware 
> throughout the years
>  * Some tables may need special requirements such as SSD whereas others 
> should be using hard-drives
>  * {color:#22} {color}*in our usecase*{color:#22}(single table, 
> dedicated HBase and Hadoop tuned for our usecase, good key 
> distribution){color}*, the number of regions per RS was the real limit for 
> us*{color:#22}.{color}
> h2. Our usecase
> We found out that *in our usecase*(single table, dedicated HBase and Hadoop 
> tuned for our usecase, good key distribution)*, the number of regions per RS 
> was the real limit for us*.
> Over the years, due to historical reasons and also the need to benchmark new 
> machines, we ended-up with differents groups of hardware: some servers can 
> handle only 180 regions, whereas the biggest can handle more than 900. 
> Because of such a difference, we had to disable the LoadBalancing to avoid 
> the {{roundRobinAssigmnent}}. We developed some internal tooling which are 
> responsible for load balancing regions across RegionServers. That was 1.5 
> year ago.
> h2. Our Proof-of-concept
> We did work on a Proof-of-concept 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java],
>  and some early tests 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java],
>  
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerBalance.java],
>  and 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerRules.java].
>  We wrote the balancer for our use-case, which means that:
>  * there is one table
>  * there is no region-replica
>  * good key dispersion
>  * there is no regions on master
> A rule file is loaded before balancing. It contains lines of rules. A rule is 
> composed of a regexp for hostname, and a limit. For example, we could have:
>  
> {quote}rs[0-9] 200
> rs1[0-9] 50
> {quote}
>  
> RegionServers with hostname matching the first rules will have a limit of 
> 200, and the others 50. If there's no match, a default is set.
> Thanks to the rule, we have two informations: the max number of regions for 
> this cluster, and the rules for each servers. {{HeterogeneousBalancer}} will 
> try to balance regions according to their capacity.
> Let's take an example. Let's say that we have 20 RS:
>  * 10 RS, named through {{rs0}} to {{rs9}} loaded with 60 regions each, and 
> each can handle 200 regions.
>  * 10 RS, named through {{rs10}} to {{rs19}} loaded with 60 regions each, and 
> each can support 50 regions.
> Based on the following rules:
>  
> {quote}rs[0-9] 200
> rs1[0-9] 50
> {quote}
>  
> The second group is overloaded, whereas the first group has plenty of space.
> We know that we can handle at maximum *2500 regions* (200*10 + 50*10) and we 
> have currently *1200 regions* (60*20). {{HeterogeneousBalancer}} will 
> understand that the cluster is *full at 48.0%* (1200/2500). Based on this 
> information, we will then *try to put all the RegionServers to ~48% of load 
> according to the rules.* In this case, it will move regions from the second 
> group to the first.
> The balancer will:
>  * compute how many regions needs to be moved. In our example, by moving 36 
> regions on rs10, we could go from 120.0% to 46.0%
>  * select regions with lowest data-locality
>  * try to find an appropriate RS for the region. We will take the lowest 
> available RS.
> h2. Other implementations and ideas
> Clay Baenziger proposed this idea on the dev ML:
> 

[GitHub] [hbase] Apache-HBase commented on issue #323: HBASE-22414 Interruption of moving regions in RSGroup will cause regi…

2019-07-08 Thread GitBox
Apache-HBase commented on issue #323: HBASE-22414 Interruption of moving 
regions in RSGroup will cause regi…
URL: https://github.com/apache/hbase/pull/323#issuecomment-509165250
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 163 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | hbaseanti | 0 |  Patch does not have any anti-patterns. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 2 new or modified test 
files. |
   ||| _ master Compile Tests _ |
   | +1 | mvninstall | 308 | master passed |
   | +1 | compile | 29 | master passed |
   | +1 | checkstyle | 14 | master passed |
   | +1 | shadedjars | 334 | branch has no errors when building our shaded 
downstream artifacts. |
   | +1 | findbugs | 47 | master passed |
   | +1 | javadoc | 23 | master passed |
   ||| _ Patch Compile Tests _ |
   | +1 | mvninstall | 306 | the patch passed |
   | +1 | compile | 30 | the patch passed |
   | +1 | javac | 30 | the patch passed |
   | -1 | checkstyle | 15 | hbase-rsgroup: The patch generated 28 new + 2 
unchanged - 0 fixed = 30 total (was 2) |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedjars | 353 | patch has no errors when building our shaded 
downstream artifacts. |
   | +1 | hadoopcheck | 1013 | Patch does not cause any errors with Hadoop 
2.8.5 2.9.2 or 3.1.2. |
   | +1 | findbugs | 62 | the patch passed |
   | +1 | javadoc | 23 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 531 | hbase-rsgroup in the patch passed. |
   | +1 | asflicense | 12 | The patch does not generate ASF License warnings. |
   | | | 3681 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-323/10/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/323 |
   | Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
   | uname | Linux 82d5a8ad84e9 4.4.0-143-generic #169-Ubuntu SMP Thu Feb 7 
07:56:38 UTC 2019 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | /testptch/patchprocess/precommit/personality/provided.sh |
   | git revision | master / 605f8a15bb |
   | maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
   | Default Java | 1.8.0_181 |
   | findbugs | v3.1.11 |
   | checkstyle | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-323/10/artifact/out/diff-checkstyle-hbase-rsgroup.txt
 |
   |  Test Results | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-323/10/testReport/
 |
   | Max. process+thread count | 4423 (vs. ulimit of 1) |
   | modules | C: hbase-rsgroup U: hbase-rsgroup |
   | Console output | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-323/10/console |
   | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HBASE-22665) RegionServer abort failed when AbstractFSWAL.shutdown hang

2019-07-08 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880170#comment-16880170
 ] 

Duo Zhang commented on HBASE-22665:
---

I think the WAL has already been broken before shutting down... As this message

{noformat}
Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get 
sync result after 30 ms for txid=36380334, WAL system stuck?
{noformat}

> RegionServer abort failed when AbstractFSWAL.shutdown hang
> --
>
> Key: HBASE-22665
> URL: https://issues.apache.org/jira/browse/HBASE-22665
> Project: HBase
>  Issue Type: Bug
> Environment: HBase 2.1.2
> Hadoop 3.1.x
> centos 7.4
>Reporter: Yechao Chen
>Priority: Major
> Attachments: image-2019-07-08-16-07-37-664.png, 
> image-2019-07-08-16-08-26-777.png, image-2019-07-08-16-14-43-455.png, 
> jstack_20190625, jstack_20190704_1, jstack_20190704_2
>
>
> We use hbase 2.1.2,when the rs with heavy qps and rs abort with error like 
> "Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to 
> get sync result after 30 ms for txid=36380334, WAL system stuck?"
>  
> RegionServer aborted failed when AbstractFSWAL.shutdown hang
>  
> jstack info always show the regionserver hang with "AbstractFSWAL.shutdown"
> "regionserver/hbase-slave-216-99:16020" #25 daemon prio=5 os_prio=0 
> tid=0x7f204282c600 nid=0x34aa waiting on condition [0x7f0fe044d000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7f18a49b2bb8> (a 
> java.util.concurrent.locks.ReentrantLock$FairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
>  {color:#FF}at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285){color}
> {color:#FF} at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:815){color}
>  at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.shutdown(AbstractFSWALProvider.java:168)
>  at 
> org.apache.hadoop.hbase.wal.RegionGroupingProvider.shutdown(RegionGroupingProvider.java:221)
>  at org.apache.hadoop.hbase.wal.WALFactory.shutdown(WALFactory.java:239)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.shutdownWAL(HRegionServer.java:1445)
>  {color:#FF}at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1117){color}
> {color:#FF} at java.lang.Thread.run(Thread.java:745){color}
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hbase] Apache-HBase commented on issue #354: HBASE-20368 Fix RIT stuck when a rsgroup has no online servers but AM…

2019-07-08 Thread GitBox
Apache-HBase commented on issue #354: HBASE-20368 Fix RIT stuck when a rsgroup 
has no online servers but AM…
URL: https://github.com/apache/hbase/pull/354#issuecomment-509164869
 
 
   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 23 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | hbaseanti | 0 |  Patch does not have any anti-patterns. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 2 new or modified test 
files. |
   ||| _ master Compile Tests _ |
   | 0 | mvndep | 13 | Maven dependency ordering for branch |
   | +1 | mvninstall | 233 | master passed |
   | +1 | compile | 72 | master passed |
   | +1 | checkstyle | 81 | master passed |
   | +1 | shadedjars | 264 | branch has no errors when building our shaded 
downstream artifacts. |
   | +1 | findbugs | 245 | master passed |
   | +1 | javadoc | 51 | master passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 15 | Maven dependency ordering for patch |
   | +1 | mvninstall | 232 | the patch passed |
   | +1 | compile | 73 | the patch passed |
   | +1 | javac | 73 | the patch passed |
   | +1 | checkstyle | 78 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedjars | 264 | patch has no errors when building our shaded 
downstream artifacts. |
   | +1 | hadoopcheck | 821 | Patch does not cause any errors with Hadoop 2.8.5 
2.9.2 or 3.1.2. |
   | +1 | findbugs | 263 | the patch passed |
   | +1 | javadoc | 52 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 8133 | hbase-server in the patch passed. |
   | +1 | unit | 222 | hbase-rsgroup in the patch passed. |
   | +1 | asflicense | 54 | The patch does not generate ASF License warnings. |
   | | | 11510 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-354/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/354 |
   | Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
   | uname | Linux 1267e668b363 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | /testptch/patchprocess/precommit/personality/provided.sh |
   | git revision | master / 605f8a15bb |
   | maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
   | Default Java | 1.8.0_181 |
   | findbugs | v3.1.11 |
   |  Test Results | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-354/4/testReport/
 |
   | Max. process+thread count | 4675 (vs. ulimit of 1) |
   | modules | C: hbase-server hbase-rsgroup U: . |
   | Console output | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-354/4/console |
   | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HBASE-22666) Add missing @Test annotation to TestQuotaThrottle

2019-07-08 Thread Peter Somogyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi updated HBASE-22666:
--
Status: Patch Available  (was: Open)

> Add missing @Test annotation to TestQuotaThrottle
> -
>
> Key: HBASE-22666
> URL: https://issues.apache.org/jira/browse/HBASE-22666
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 1.5.0
>
>
> TestQuotaThrottle#testTableWriteCapacityUnitThrottle does not have @Test 
> annotation; compile step fails on nightly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hbase] petersomogyi opened a new pull request #364: HBASE-22666 Add missing @Test annotation to TestQuotaThrottle

2019-07-08 Thread GitBox
petersomogyi opened a new pull request #364: HBASE-22666 Add missing @Test 
annotation to TestQuotaThrottle
URL: https://github.com/apache/hbase/pull/364
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HBASE-22666) Add missing @Test annotation to TestQuotaThrottle

2019-07-08 Thread Peter Somogyi (JIRA)
Peter Somogyi created HBASE-22666:
-

 Summary: Add missing @Test annotation to TestQuotaThrottle
 Key: HBASE-22666
 URL: https://issues.apache.org/jira/browse/HBASE-22666
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 1.5.0
Reporter: Peter Somogyi
Assignee: Peter Somogyi
 Fix For: 1.5.0


TestQuotaThrottle#testTableWriteCapacityUnitThrottle does not have @Test 
annotation; compile step fails on nightly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hbase] Apache9 commented on a change in pull request #362: HBASE-22664 Move protobuf sutff in hbase-rsgroup to hbase-protocol-sh…

2019-07-08 Thread GitBox
Apache9 commented on a change in pull request #362: HBASE-22664 Move protobuf 
sutff in hbase-rsgroup to hbase-protocol-sh…
URL: https://github.com/apache/hbase/pull/362#discussion_r301005287
 
 

 ##
 File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
 ##
 @@ -1768,23 +1768,36 @@ public static ServerName toServerName(final byte [] 
data) throws Deserialization
 return ServerName.valueOf(hostname, port, -1L);
   }
 
+  public static HBaseProtos.TimeRange toTimeRange(TimeRange timeRange) {
+if (timeRange == null) {
+  timeRange = TimeRange.allTime();
+}
+return 
HBaseProtos.TimeRange.newBuilder().setFrom(timeRange.getMin()).setTo(timeRange.getMax())
+.build();
+  }
+
   public static RSGroupInfo toGroupInfo(RSGroupProtos.RSGroupInfo proto) {
 RSGroupInfo RSGroupInfo = new RSGroupInfo(proto.getName());
-for(HBaseProtos.ServerName el: proto.getServersList()) {
+for (HBaseProtos.ServerName el : proto.getServersList()) {
   RSGroupInfo.addServer(Address.fromParts(el.getHostName(), el.getPort()));
 }
-for(HBaseProtos.TableName pTableName: proto.getTablesList()) {
+for (HBaseProtos.TableName pTableName : proto.getTablesList()) {
   RSGroupInfo.addTable(ProtobufUtil.toTableName(pTableName));
 }
 return RSGroupInfo;
   }
 
-  public static HBaseProtos.TimeRange toTimeRange(TimeRange timeRange) {
-if (timeRange == null) {
-  timeRange = TimeRange.allTime();
+  public static RSGroupProtos.RSGroupInfo toProtoGroupInfo(RSGroupInfo pojo) {
 
 Review comment:
   At least we still need it for now as it is referenced in lots of places. Not 
sure if we can remove it at the end. FWIW, I think we still need to support the 
old way to change rsgroup, which uses coprocessor endpoint?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] Braavos96 opened a new pull request #363: Add unit tests for org.apache.hadoop.hbase.util.Strings

2019-07-08 Thread GitBox
Braavos96 opened a new pull request #363: Add unit tests for 
org.apache.hadoop.hbase.util.Strings
URL: https://github.com/apache/hbase/pull/363
 
 
   I've analysed your codebase and noticed that 
`org.apache.hadoop.hbase.util.Strings` is not fully tested.
   I've written some tests for the methods in this class with the help of 
[Diffblue Cover](https://www.diffblue.com/opensource).
   
   Hopefully, these tests will help you detect any regressions caused by future 
code changes. If you would find it useful to have additional tests written for 
this repository, I would be more than happy to look at other classes that you 
consider important in a subsequent PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] Apache-HBase commented on issue #362: HBASE-22664 Move protobuf sutff in hbase-rsgroup to hbase-protocol-sh…

2019-07-08 Thread GitBox
Apache-HBase commented on issue #362: HBASE-22664 Move protobuf sutff in 
hbase-rsgroup to hbase-protocol-sh…
URL: https://github.com/apache/hbase/pull/362#issuecomment-509154536
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 52 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | hbaseanti | 0 |  Patch does not have any anti-patterns. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ master Compile Tests _ |
   | 0 | mvndep | 30 | Maven dependency ordering for branch |
   | +1 | mvninstall | 275 | master passed |
   | +1 | compile | 103 | master passed |
   | +1 | checkstyle | 61 | master passed |
   | +1 | shadedjars | 270 | branch has no errors when building our shaded 
downstream artifacts. |
   | +1 | findbugs | 327 | master passed |
   | +1 | javadoc | 68 | master passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 16 | Maven dependency ordering for patch |
   | +1 | mvninstall | 237 | the patch passed |
   | +1 | compile | 104 | the patch passed |
   | +1 | cc | 104 | the patch passed |
   | +1 | javac | 104 | the patch passed |
   | +1 | checkstyle | 67 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | xml | 1 | The patch has no ill-formed XML file. |
   | +1 | shadedjars | 274 | patch has no errors when building our shaded 
downstream artifacts. |
   | +1 | hadoopcheck | 791 | Patch does not cause any errors with Hadoop 2.8.5 
2.9.2 or 3.1.2. |
   | +1 | hbaseprotoc | 122 | the patch passed |
   | +1 | findbugs | 353 | the patch passed |
   | +1 | javadoc | 60 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 37 | hbase-protocol-shaded in the patch passed. |
   | +1 | unit | 25 | hbase-protocol in the patch passed. |
   | +1 | unit | 108 | hbase-client in the patch passed. |
   | -1 | unit | 574 | hbase-rsgroup in the patch failed. |
   | +1 | asflicense | 43 | The patch does not generate ASF License warnings. |
   | | | 4386 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hbase.rsgroup.TestRSGroupsBalance |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-362/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/362 |
   | Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  cc  hbaseprotoc  xml  |
   | uname | Linux 5bf55cae4981 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | /testptch/patchprocess/precommit/personality/provided.sh |
   | git revision | master / 605f8a15bb |
   | maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
   | Default Java | 1.8.0_181 |
   | findbugs | v3.1.11 |
   | unit | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-362/2/artifact/out/patch-unit-hbase-rsgroup.txt
 |
   |  Test Results | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-362/2/testReport/
 |
   | Max. process+thread count | 4514 (vs. ulimit of 1) |
   | modules | C: hbase-protocol-shaded hbase-protocol hbase-client 
hbase-rsgroup U: . |
   | Console output | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-362/2/console |
   | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] syedmurtazahassan commented on a change in pull request #302: HBASE-22571 Javadoc Warnings related to @return tag

2019-07-08 Thread GitBox
syedmurtazahassan commented on a change in pull request #302: HBASE-22571 
Javadoc Warnings related to @return tag
URL: https://github.com/apache/hbase/pull/302#discussion_r301003774
 
 

 ##
 File path: 
hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedAction.java
 ##
 @@ -320,7 +320,7 @@ public boolean verifyResultAgainstDataGenerator(Result 
result, boolean verifyVal
* @param verifyCfAndColumnIntegrity verify that cf/column set in the result 
is complete. Note
*   that to use this multiPut should be 
used, or verification
*   has to happen after writes, otherwise 
there can be races.
-   * @return
+   * @return the verified result from get or scan
 
 Review comment:
   @HorizonNet Thanks for your comment. Adressed it. Kindly have a look. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] Apache-HBase commented on issue #362: HBASE-22664 Move protobuf sutff in hbase-rsgroup to hbase-protocol-sh…

2019-07-08 Thread GitBox
Apache-HBase commented on issue #362: HBASE-22664 Move protobuf sutff in 
hbase-rsgroup to hbase-protocol-sh…
URL: https://github.com/apache/hbase/pull/362#issuecomment-509149555
 
 
   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 23 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | hbaseanti | 0 |  Patch does not have any anti-patterns. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ master Compile Tests _ |
   | 0 | mvndep | 24 | Maven dependency ordering for branch |
   | +1 | mvninstall | 245 | master passed |
   | +1 | compile | 105 | master passed |
   | +1 | checkstyle | 60 | master passed |
   | +1 | shadedjars | 268 | branch has no errors when building our shaded 
downstream artifacts. |
   | +1 | findbugs | 321 | master passed |
   | +1 | javadoc | 63 | master passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 16 | Maven dependency ordering for patch |
   | +1 | mvninstall | 241 | the patch passed |
   | +1 | compile | 104 | the patch passed |
   | +1 | cc | 104 | the patch passed |
   | +1 | javac | 104 | the patch passed |
   | +1 | checkstyle | 60 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | xml | 2 | The patch has no ill-formed XML file. |
   | +1 | shadedjars | 269 | patch has no errors when building our shaded 
downstream artifacts. |
   | +1 | hadoopcheck | 761 | Patch does not cause any errors with Hadoop 2.8.5 
2.9.2 or 3.1.2. |
   | +1 | hbaseprotoc | 121 | the patch passed |
   | +1 | findbugs | 362 | the patch passed |
   | +1 | javadoc | 63 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 37 | hbase-protocol-shaded in the patch passed. |
   | +1 | unit | 24 | hbase-protocol in the patch passed. |
   | +1 | unit | 109 | hbase-client in the patch passed. |
   | +1 | unit | 221 | hbase-rsgroup in the patch passed. |
   | +1 | asflicense | 46 | The patch does not generate ASF License warnings. |
   | | | 3918 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-362/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/362 |
   | Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  cc  hbaseprotoc  xml  |
   | uname | Linux 3dcf9ec97852 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | /testptch/patchprocess/precommit/personality/provided.sh |
   | git revision | master / 605f8a15bb |
   | maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
   | Default Java | 1.8.0_181 |
   | findbugs | v3.1.11 |
   |  Test Results | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-362/1/testReport/
 |
   | Max. process+thread count | 4602 (vs. ulimit of 1) |
   | modules | C: hbase-protocol-shaded hbase-protocol hbase-client 
hbase-rsgroup U: . |
   | Console output | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-362/1/console |
   | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] infraio commented on a change in pull request #362: HBASE-22664 Move protobuf sutff in hbase-rsgroup to hbase-protocol-sh…

2019-07-08 Thread GitBox
infraio commented on a change in pull request #362: HBASE-22664 Move protobuf 
sutff in hbase-rsgroup to hbase-protocol-sh…
URL: https://github.com/apache/hbase/pull/362#discussion_r300995309
 
 

 ##
 File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
 ##
 @@ -1768,23 +1768,36 @@ public static ServerName toServerName(final byte [] 
data) throws Deserialization
 return ServerName.valueOf(hostname, port, -1L);
   }
 
+  public static HBaseProtos.TimeRange toTimeRange(TimeRange timeRange) {
+if (timeRange == null) {
+  timeRange = TimeRange.allTime();
+}
+return 
HBaseProtos.TimeRange.newBuilder().setFrom(timeRange.getMin()).setTo(timeRange.getMax())
+.build();
+  }
+
   public static RSGroupInfo toGroupInfo(RSGroupProtos.RSGroupInfo proto) {
 RSGroupInfo RSGroupInfo = new RSGroupInfo(proto.getName());
-for(HBaseProtos.ServerName el: proto.getServersList()) {
+for (HBaseProtos.ServerName el : proto.getServersList()) {
   RSGroupInfo.addServer(Address.fromParts(el.getHostName(), el.getPort()));
 }
-for(HBaseProtos.TableName pTableName: proto.getTablesList()) {
+for (HBaseProtos.TableName pTableName : proto.getTablesList()) {
   RSGroupInfo.addTable(ProtobufUtil.toTableName(pTableName));
 }
 return RSGroupInfo;
   }
 
-  public static HBaseProtos.TimeRange toTimeRange(TimeRange timeRange) {
-if (timeRange == null) {
-  timeRange = TimeRange.allTime();
+  public static RSGroupProtos.RSGroupInfo toProtoGroupInfo(RSGroupInfo pojo) {
 
 Review comment:
   This method should be removed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HBASE-22514) Move rsgroup feature into core of HBase

2019-07-08 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880138#comment-16880138
 ] 

Duo Zhang commented on HBASE-22514:
---

Then why not just use zookeeper directly...

> Move rsgroup feature into core of HBase
> ---
>
> Key: HBASE-22514
> URL: https://issues.apache.org/jira/browse/HBASE-22514
> Project: HBase
>  Issue Type: Umbrella
>  Components: Admin, Client, rsgroup
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Major
> Attachments: HBASE-22514.master.001.patch, 
> image-2019-05-31-18-25-38-217.png
>
>
> The class RSGroupAdminClient is not public 
> we need to use java api  RSGroupAdminClient  to manager RSG 
> so  RSGroupAdminClient should be public
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19893) restore_snapshot is broken in master branch when region splits

2019-07-08 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880133#comment-16880133
 ] 

Hudson commented on HBASE-19893:


Results for branch branch-2.1
[build #1342 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1342/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1342//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1342//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1342//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> restore_snapshot is broken in master branch when region splits
> --
>
> Key: HBASE-19893
> URL: https://issues.apache.org/jira/browse/HBASE-19893
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.3.0
>
> Attachments: 19893.master.004.patch, 19893.master.004.patch, 
> 19893.master.004.patch, HBASE-19893.master.001.patch, 
> HBASE-19893.master.002.patch, HBASE-19893.master.003.patch, 
> HBASE-19893.master.003.patch, HBASE-19893.master.004.patch, 
> HBASE-19893.master.005.patch, HBASE-19893.master.005.patch, 
> HBASE-19893.master.005.patch, HBASE-19893.master.006.patch, 
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientWithRegionReplicas-output.txt
>
>
> When I was investigating HBASE-19850, I found restore_snapshot didn't work in 
> master branch.
>  
> Steps to reproduce are as follows:
> 1. Create a table
> {code:java}
> create "test", "cf"
> {code}
> 2. Load data (2000 rows) to the table
> {code:java}
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> {code}
> 3. Split the table
> {code:java}
> split "test"
> {code}
> 4. Take a snapshot
> {code:java}
> snapshot "test", "snap"
> {code}
> 5. Load more data (2000 rows) to the table and split the table agin
> {code:java}
> (2000...4000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> split "test"
> {code}
> 6. Restore the table from the snapshot 
> {code:java}
> disable "test"
> restore_snapshot "snap"
> enable "test"
> {code}
> 7. Scan the table
> {code:java}
> scan "test"
> {code}
> However, this scan returns only 244 rows (it should return 2000 rows) like 
> the following:
> {code:java}
> hbase(main):038:0> scan "test"
> ROW COLUMN+CELL
>  row78 column=cf:col, timestamp=1517298307049, value=val
> 
>   row999 column=cf:col, timestamp=1517298307608, value=val
> 244 row(s)
> Took 0.1500 seconds
> {code}
>  
> Also, the restored table should have 2 online regions but it has 3 online 
> regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22514) Move rsgroup feature into core of HBase

2019-07-08 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880129#comment-16880129
 ] 

Guanghao Zhang commented on HBASE-22514:


{quote}bq.And I think there will be a big challenge is that, how do we deal 
with master startup, as region assign depends on the LoadBalancer, but now 
LoadBalancer will depend on hbase:rsgroup table.
{quote}
If I am not wrong, rsgroup info will storage on hbase:rsgroup table and 
zookeeper both. If hbase:rsgroup is not online, the RSGroupBalancer should load 
the rsgroup information from zookeeper. 

> Move rsgroup feature into core of HBase
> ---
>
> Key: HBASE-22514
> URL: https://issues.apache.org/jira/browse/HBASE-22514
> Project: HBase
>  Issue Type: Umbrella
>  Components: Admin, Client, rsgroup
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Major
> Attachments: HBASE-22514.master.001.patch, 
> image-2019-05-31-18-25-38-217.png
>
>
> The class RSGroupAdminClient is not public 
> we need to use java api  RSGroupAdminClient  to manager RSG 
> so  RSGroupAdminClient should be public
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19893) restore_snapshot is broken in master branch when region splits

2019-07-08 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880126#comment-16880126
 ] 

Hudson commented on HBASE-19893:


Results for branch branch-2.0
[build #1735 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1735/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1735//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1735//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1735//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> restore_snapshot is broken in master branch when region splits
> --
>
> Key: HBASE-19893
> URL: https://issues.apache.org/jira/browse/HBASE-19893
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.3.0
>
> Attachments: 19893.master.004.patch, 19893.master.004.patch, 
> 19893.master.004.patch, HBASE-19893.master.001.patch, 
> HBASE-19893.master.002.patch, HBASE-19893.master.003.patch, 
> HBASE-19893.master.003.patch, HBASE-19893.master.004.patch, 
> HBASE-19893.master.005.patch, HBASE-19893.master.005.patch, 
> HBASE-19893.master.005.patch, HBASE-19893.master.006.patch, 
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientWithRegionReplicas-output.txt
>
>
> When I was investigating HBASE-19850, I found restore_snapshot didn't work in 
> master branch.
>  
> Steps to reproduce are as follows:
> 1. Create a table
> {code:java}
> create "test", "cf"
> {code}
> 2. Load data (2000 rows) to the table
> {code:java}
> (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> {code}
> 3. Split the table
> {code:java}
> split "test"
> {code}
> 4. Take a snapshot
> {code:java}
> snapshot "test", "snap"
> {code}
> 5. Load more data (2000 rows) to the table and split the table agin
> {code:java}
> (2000...4000).each{|i| put "test", "row#{i}", "cf:col", "val"}
> split "test"
> {code}
> 6. Restore the table from the snapshot 
> {code:java}
> disable "test"
> restore_snapshot "snap"
> enable "test"
> {code}
> 7. Scan the table
> {code:java}
> scan "test"
> {code}
> However, this scan returns only 244 rows (it should return 2000 rows) like 
> the following:
> {code:java}
> hbase(main):038:0> scan "test"
> ROW COLUMN+CELL
>  row78 column=cf:col, timestamp=1517298307049, value=val
> 
>   row999 column=cf:col, timestamp=1517298307608, value=val
> 244 row(s)
> Took 0.1500 seconds
> {code}
>  
> Also, the restored table should have 2 online regions but it has 3 online 
> regions.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-22567) HBCK2 addMissingRegionsToMeta

2019-07-08 Thread Wellington Chevreuil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880125#comment-16880125
 ] 

Wellington Chevreuil edited comment on HBASE-22567 at 7/8/19 8:40 AM:
--

{quote}Maybe we doc the Daisuke Kobayashi finding over on the hbck page? 
Suggest restart of master as way to rebuild meta if issue? Perhaps then we'd 
add the 'reader' part of your patch?{quote}
Sounds all good for me! Given this jira was specific for the new command, and 
the given PR is already a bit large, maybe worth doing this extra doc work on a 
separate jira?


was (Author: wchevreuil):
{quote}Maybe we doc the Daisuke Kobayashi finding over on the hbck page? 
Suggest restart of master as way to rebuild meta if issue? Perhaps then we'd 
add the 'reader' part of your patch?{quote}
Sounds all good for me! Given this jira was specific for the new command, and 
the given PR is already a bit large, maybe worth doing it on a separate jira?

> HBCK2 addMissingRegionsToMeta
> -
>
> Key: HBASE-22567
> URL: https://issues.apache.org/jira/browse/HBASE-22567
> Project: HBase
>  Issue Type: New Feature
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>
> Following latest discussion on HBASE-21745, this proposes an hbck2 command 
> that allows for inserting back regions missing in META that still have 
> *regioninfo* available in HDFS. Although this is still an interactive and 
> simpler version than the old _OfflineMetaRepair_, it still relies on hdfs 
> state as the source of truth, and performs META updates mostly independently 
> from Master (apart from requiring Meta table been online).
> For a more detailed explanation on this command behaviour, pasting _command 
> usage_ text:
> {noformat}
> To be used for scenarios where some regions may be missing in META,
> but there's still a valid 'regioninfo' metadata file on HDFS.
> This is a lighter version of 'OfflineMetaRepair' tool commonly used for
> similar issues on 1.x release line.
> This command needs META to be online. For each table name passed as
> parameter, it performs a diff between regions available in META,
> against existing regions dirs on HDFS. Then, for region dirs with
> no matches in META, it reads regioninfo metadata file and
> re-creates given region in META. Regions are re-created in 'CLOSED'
> state at META table only, but not in Masters' cache, and are not
> assigned either. A rolling Masters restart, followed by a
> hbck2 'assigns' command with all re-inserted regions is required.
> This hbck2 'assigns' command is printed for user convenience.
> WARNING: To avoid potential region overlapping problems due to ongoing
> splits, this command disables given tables while re-inserting regions.
> An example adding missing regions for tables 'table_1' and 'table_2':
> $ HBCK2 addMissingRegionsInMeta table_1 table_2
> Returns hbck2 'assigns' command with all re-inserted regions.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22567) HBCK2 addMissingRegionsToMeta

2019-07-08 Thread Wellington Chevreuil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880125#comment-16880125
 ] 

Wellington Chevreuil commented on HBASE-22567:
--

{quote}Maybe we doc the Daisuke Kobayashi finding over on the hbck page? 
Suggest restart of master as way to rebuild meta if issue? Perhaps then we'd 
add the 'reader' part of your patch?{quote}
Sounds all good for me! Given this jira was specific for the new command, and 
the given PR is already a bit large, maybe worth doing it on a separate jira?

> HBCK2 addMissingRegionsToMeta
> -
>
> Key: HBASE-22567
> URL: https://issues.apache.org/jira/browse/HBASE-22567
> Project: HBase
>  Issue Type: New Feature
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>
> Following latest discussion on HBASE-21745, this proposes an hbck2 command 
> that allows for inserting back regions missing in META that still have 
> *regioninfo* available in HDFS. Although this is still an interactive and 
> simpler version than the old _OfflineMetaRepair_, it still relies on hdfs 
> state as the source of truth, and performs META updates mostly independently 
> from Master (apart from requiring Meta table been online).
> For a more detailed explanation on this command behaviour, pasting _command 
> usage_ text:
> {noformat}
> To be used for scenarios where some regions may be missing in META,
> but there's still a valid 'regioninfo' metadata file on HDFS.
> This is a lighter version of 'OfflineMetaRepair' tool commonly used for
> similar issues on 1.x release line.
> This command needs META to be online. For each table name passed as
> parameter, it performs a diff between regions available in META,
> against existing regions dirs on HDFS. Then, for region dirs with
> no matches in META, it reads regioninfo metadata file and
> re-creates given region in META. Regions are re-created in 'CLOSED'
> state at META table only, but not in Masters' cache, and are not
> assigned either. A rolling Masters restart, followed by a
> hbck2 'assigns' command with all re-inserted regions is required.
> This hbck2 'assigns' command is printed for user convenience.
> WARNING: To avoid potential region overlapping problems due to ongoing
> splits, this command disables given tables while re-inserting regions.
> An example adding missing regions for tables 'table_1' and 'table_2':
> $ HBCK2 addMissingRegionsInMeta table_1 table_2
> Returns hbck2 'assigns' command with all re-inserted regions.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-22665) RegionServer abort failed when AbstractFSWAL.shutdown hang

2019-07-08 Thread Yechao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880112#comment-16880112
 ] 

Yechao Chen edited comment on HBASE-22665 at 7/8/19 8:33 AM:
-

check the the code and jstack ,

 

the wal log roller stuck ,so the .AbstractFSWAL.shutdown wait the 
AbstractFSWAL.rollWriter

with the lock rollwriterLock.lock 

 

it seems like that:

1、AbstractFSWAL.rollWriter  called rollWriterLock.lock();

2、AsyncFSWAL.doReplaceWriter called waitForSafePoint();

3、waitForSafePoint() can't finished 

4、AbstractFSWAL.shutdown called rollWriterLock.lock();(waiting)

5、The rs process can't be aborted

 

 

 

 

 

with {color:#ff}at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.waitForSafePoint(AsyncFSWAL.java:628){color}

"regionserver/hbase-slave-216-99:16020.logRoller" #297 daemon prio=5 os_prio=0 
tid=0x7f202a4952c0 nid=0x34c2 waiting on condition [0x7f0fdd19f000]
 java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x7f18d60b93a8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1976)
 {color:#ff}at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.waitForSafePoint(AsyncFSWAL.java:628){color}
 at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doReplaceWriter(AsyncFSWAL.java:656)
 at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doReplaceWriter(AsyncFSWAL.java:124)
 at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.replaceWriter(AbstractFSWAL.java:699)
 at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:759)
 at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:184)
 at java.lang.Thread.run(Thread.java:745)

 

"regionserver/hbase-slave-216-99:16020" #25 daemon prio=5 os_prio=0 
tid=0x7f204282c600 nid=0x34aa waiting on condition [0x7f0fe044d000]
 java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x7f18a49b2bb8> (a 
java.util.concurrent.locks.ReentrantLock$FairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
 at 
java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
 {color:#FF}at 
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285){color}
{color:#FF} at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:815){color}
 at 
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.shutdown(AbstractFSWALProvider.java:168)
 at 
org.apache.hadoop.hbase.wal.RegionGroupingProvider.shutdown(RegionGroupingProvider.java:221)
 at org.apache.hadoop.hbase.wal.WALFactory.shutdown(WALFactory.java:239)
 at 
org.apache.hadoop.hbase.regionserver.HRegionServer.shutdownWAL(HRegionServer.java:1445)
 at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1117)
 at java.lang.Thread.run(Thread.java:745)

 

 

!image-2019-07-08-16-07-37-664.png!

 

!image-2019-07-08-16-08-26-777.png!

!image-2019-07-08-16-14-43-455.png!

 


was (Author: chenyechao):
check the the code and jstack ,

 

the wal log roller stuck ,so the .AbstractFSWAL.shutdown wait the 
AbstractFSWAL.rollWriter

with the lock rollwriterLock.lock 

 

it seems like that:

1、AbstractFSWAL.rollWriter  called rollWriterLock.lock();

2、AsyncFSWAL.doReplaceWriter called waitForSafePoint();

3、waitForSafePoint() can't finished 

4、AbstractFSWAL.shutdown called rollWriterLock.lock();(waiting)

5、The rs process can't be aborted

 

 

 

 

 

with {color:#FF}at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.waitForSafePoint(AsyncFSWAL.java:628){color}

"regionserver/hbase-slave-216-99:16020.logRoller" #297 daemon prio=5 os_prio=0 
tid=0x7f202a4952c0 nid=0x34c2 waiting on condition [0x7f0fdd19f000]
 java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x7f18d60b93a8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1976)
 {color:#FF}at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.waitForSafePoint(AsyncFSWAL.java:628){color}
 at 

[jira] [Comment Edited] (HBASE-22618) Provide a way to have Heterogeneous deployment

2019-07-08 Thread Pierre Zemb (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880118#comment-16880118
 ] 

Pierre Zemb edited comment on HBASE-22618 at 7/8/19 8:26 AM:
-

Hi! Thanks [~apurtell] for your comment. I worked a bit on it last week:

 
 * I added the possibility to load a CostFunction 
[https://github.com/PierreZ/hbase/commit/9ddf356ee12f0b39ee0d33211834a718f0dd6194]
 It can only load a single function for now
 * I reimplemented my balancer as a cost function. Moving from a full balancer 
to a single cost function was a huge benefit for us, as we just need to 
implement 
[https://github.com/PierreZ/hbase/commit/ebe2a1501dda4deb150308a3b380de3bef5961ee#diff-53043f78e2be40cfbf3ff4344bb30bd0R69]

I will now backport my tests, and also add the possibility to load multiple 
cost functions

 


was (Author: pierrez):
Hi! Thanks [~apurtell] for your comment. I worked a bit on it last week:

 
 * I added the possibility to load a CostFunction 
[here|[https://github.com/PierreZ/hbase/commit/9ddf356ee12f0b39ee0d33211834a718f0dd6194]]
 It can only load a single function for now
 * I reimplemented my balancer as a cost function. Moving from a full balancer 
to a single cost function was a huge benefit for us, as we just need to 
implement 
[cost|[https://github.com/PierreZ/hbase/commit/ebe2a1501dda4deb150308a3b380de3bef5961ee#diff-53043f78e2be40cfbf3ff4344bb30bd0R69]]

I will now backport my tests, and also add the possibility to load multiple 
cost functions

 

> Provide a way to have Heterogeneous deployment
> --
>
> Key: HBASE-22618
> URL: https://issues.apache.org/jira/browse/HBASE-22618
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.1.6, 1.4.11
>Reporter: Pierre Zemb
>Priority: Major
>
> Hi,
> We wouls like to open the discussion about bringing the possibility to have 
> regions deployed on {color:#22}Heterogeneous deployment{color}, i.e Hbase 
> cluster running different kind of hardware.
> h2. Why?
>  * Cloud deployments means that we may not be able to have the same hardware 
> throughout the years
>  * Some tables may need special requirements such as SSD whereas others 
> should be using hard-drives
>  * {color:#22} {color}*in our usecase*{color:#22}(single table, 
> dedicated HBase and Hadoop tuned for our usecase, good key 
> distribution){color}*, the number of regions per RS was the real limit for 
> us*{color:#22}.{color}
> h2. Our usecase
> We found out that *in our usecase*(single table, dedicated HBase and Hadoop 
> tuned for our usecase, good key distribution)*, the number of regions per RS 
> was the real limit for us*.
> Over the years, due to historical reasons and also the need to benchmark new 
> machines, we ended-up with differents groups of hardware: some servers can 
> handle only 180 regions, whereas the biggest can handle more than 900. 
> Because of such a difference, we had to disable the LoadBalancing to avoid 
> the {{roundRobinAssigmnent}}. We developed some internal tooling which are 
> responsible for load balancing regions across RegionServers. That was 1.5 
> year ago.
> h2. Our Proof-of-concept
> We did work on a Proof-of-concept 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java],
>  and some early tests 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java],
>  
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerBalance.java],
>  and 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerRules.java].
>  We wrote the balancer for our use-case, which means that:
>  * there is one table
>  * there is no region-replica
>  * good key dispersion
>  * there is no regions on master
> A rule file is loaded before balancing. It contains lines of rules. A rule is 
> composed of a regexp for hostname, and a limit. For example, we could have:
>  
> {quote}rs[0-9] 200
> rs1[0-9] 50
> {quote}
>  
> RegionServers with hostname matching the first rules will have a limit of 
> 200, and the others 50. If there's no match, a default is set.
> Thanks to the rule, we have two informations: the max number of regions for 
> this cluster, and the rules for each servers. {{HeterogeneousBalancer}} will 
> try to balance regions according to their capacity.
> Let's take an example. Let's say that we have 20 RS:
>  * 10 RS, named through {{rs0}} to {{rs9}} loaded with 60 regions each, and 
> each can 

[jira] [Commented] (HBASE-22618) Provide a way to have Heterogeneous deployment

2019-07-08 Thread Pierre Zemb (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880118#comment-16880118
 ] 

Pierre Zemb commented on HBASE-22618:
-

Hi! Thanks [~apurtell] for your comment. I worked a bit on it last week:

 
 * I added the possibility to load a CostFunction 
[here|[https://github.com/PierreZ/hbase/commit/9ddf356ee12f0b39ee0d33211834a718f0dd6194].]
 It can only load a single function for now
 * I reimplemented my balancer as a cost function. Moving from a full balancer 
to a single cost function was a huge benefit for us, as we just need to 
implement [cost 
title|[https://github.com/PierreZ/hbase/commit/ebe2a1501dda4deb150308a3b380de3bef5961ee#diff-53043f78e2be40cfbf3ff4344bb30bd0R69]]

I will now backport my tests, and also add the possibility to load multiple 
cost functions

 

> Provide a way to have Heterogeneous deployment
> --
>
> Key: HBASE-22618
> URL: https://issues.apache.org/jira/browse/HBASE-22618
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.1.6, 1.4.11
>Reporter: Pierre Zemb
>Priority: Major
>
> Hi,
> We wouls like to open the discussion about bringing the possibility to have 
> regions deployed on {color:#22}Heterogeneous deployment{color}, i.e Hbase 
> cluster running different kind of hardware.
> h2. Why?
>  * Cloud deployments means that we may not be able to have the same hardware 
> throughout the years
>  * Some tables may need special requirements such as SSD whereas others 
> should be using hard-drives
>  * {color:#22} {color}*in our usecase*{color:#22}(single table, 
> dedicated HBase and Hadoop tuned for our usecase, good key 
> distribution){color}*, the number of regions per RS was the real limit for 
> us*{color:#22}.{color}
> h2. Our usecase
> We found out that *in our usecase*(single table, dedicated HBase and Hadoop 
> tuned for our usecase, good key distribution)*, the number of regions per RS 
> was the real limit for us*.
> Over the years, due to historical reasons and also the need to benchmark new 
> machines, we ended-up with differents groups of hardware: some servers can 
> handle only 180 regions, whereas the biggest can handle more than 900. 
> Because of such a difference, we had to disable the LoadBalancing to avoid 
> the {{roundRobinAssigmnent}}. We developed some internal tooling which are 
> responsible for load balancing regions across RegionServers. That was 1.5 
> year ago.
> h2. Our Proof-of-concept
> We did work on a Proof-of-concept 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java],
>  and some early tests 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java],
>  
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerBalance.java],
>  and 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerRules.java].
>  We wrote the balancer for our use-case, which means that:
>  * there is one table
>  * there is no region-replica
>  * good key dispersion
>  * there is no regions on master
> A rule file is loaded before balancing. It contains lines of rules. A rule is 
> composed of a regexp for hostname, and a limit. For example, we could have:
>  
> {quote}rs[0-9] 200
> rs1[0-9] 50
> {quote}
>  
> RegionServers with hostname matching the first rules will have a limit of 
> 200, and the others 50. If there's no match, a default is set.
> Thanks to the rule, we have two informations: the max number of regions for 
> this cluster, and the rules for each servers. {{HeterogeneousBalancer}} will 
> try to balance regions according to their capacity.
> Let's take an example. Let's say that we have 20 RS:
>  * 10 RS, named through {{rs0}} to {{rs9}} loaded with 60 regions each, and 
> each can handle 200 regions.
>  * 10 RS, named through {{rs10}} to {{rs19}} loaded with 60 regions each, and 
> each can support 50 regions.
> Based on the following rules:
>  
> {quote}rs[0-9] 200
> rs1[0-9] 50
> {quote}
>  
> The second group is overloaded, whereas the first group has plenty of space.
> We know that we can handle at maximum *2500 regions* (200*10 + 50*10) and we 
> have currently *1200 regions* (60*20). {{HeterogeneousBalancer}} will 
> understand that the cluster is *full at 48.0%* (1200/2500). Based on this 
> information, we will then *try to put all the RegionServers to ~48% of load 
> according to the rules.* In this case, it will move regions from the second 
> group to the first.

[jira] [Comment Edited] (HBASE-22618) Provide a way to have Heterogeneous deployment

2019-07-08 Thread Pierre Zemb (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880118#comment-16880118
 ] 

Pierre Zemb edited comment on HBASE-22618 at 7/8/19 8:25 AM:
-

Hi! Thanks [~apurtell] for your comment. I worked a bit on it last week:

 
 * I added the possibility to load a CostFunction 
[here|[https://github.com/PierreZ/hbase/commit/9ddf356ee12f0b39ee0d33211834a718f0dd6194]]
 It can only load a single function for now
 * I reimplemented my balancer as a cost function. Moving from a full balancer 
to a single cost function was a huge benefit for us, as we just need to 
implement 
[cost|[https://github.com/PierreZ/hbase/commit/ebe2a1501dda4deb150308a3b380de3bef5961ee#diff-53043f78e2be40cfbf3ff4344bb30bd0R69]]

I will now backport my tests, and also add the possibility to load multiple 
cost functions

 


was (Author: pierrez):
Hi! Thanks [~apurtell] for your comment. I worked a bit on it last week:

 
 * I added the possibility to load a CostFunction 
[here|[https://github.com/PierreZ/hbase/commit/9ddf356ee12f0b39ee0d33211834a718f0dd6194].]
 It can only load a single function for now
 * I reimplemented my balancer as a cost function. Moving from a full balancer 
to a single cost function was a huge benefit for us, as we just need to 
implement [cost 
title|[https://github.com/PierreZ/hbase/commit/ebe2a1501dda4deb150308a3b380de3bef5961ee#diff-53043f78e2be40cfbf3ff4344bb30bd0R69]]

I will now backport my tests, and also add the possibility to load multiple 
cost functions

 

> Provide a way to have Heterogeneous deployment
> --
>
> Key: HBASE-22618
> URL: https://issues.apache.org/jira/browse/HBASE-22618
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.1.6, 1.4.11
>Reporter: Pierre Zemb
>Priority: Major
>
> Hi,
> We wouls like to open the discussion about bringing the possibility to have 
> regions deployed on {color:#22}Heterogeneous deployment{color}, i.e Hbase 
> cluster running different kind of hardware.
> h2. Why?
>  * Cloud deployments means that we may not be able to have the same hardware 
> throughout the years
>  * Some tables may need special requirements such as SSD whereas others 
> should be using hard-drives
>  * {color:#22} {color}*in our usecase*{color:#22}(single table, 
> dedicated HBase and Hadoop tuned for our usecase, good key 
> distribution){color}*, the number of regions per RS was the real limit for 
> us*{color:#22}.{color}
> h2. Our usecase
> We found out that *in our usecase*(single table, dedicated HBase and Hadoop 
> tuned for our usecase, good key distribution)*, the number of regions per RS 
> was the real limit for us*.
> Over the years, due to historical reasons and also the need to benchmark new 
> machines, we ended-up with differents groups of hardware: some servers can 
> handle only 180 regions, whereas the biggest can handle more than 900. 
> Because of such a difference, we had to disable the LoadBalancing to avoid 
> the {{roundRobinAssigmnent}}. We developed some internal tooling which are 
> responsible for load balancing regions across RegionServers. That was 1.5 
> year ago.
> h2. Our Proof-of-concept
> We did work on a Proof-of-concept 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java],
>  and some early tests 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java],
>  
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerBalance.java],
>  and 
> [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerRules.java].
>  We wrote the balancer for our use-case, which means that:
>  * there is one table
>  * there is no region-replica
>  * good key dispersion
>  * there is no regions on master
> A rule file is loaded before balancing. It contains lines of rules. A rule is 
> composed of a regexp for hostname, and a limit. For example, we could have:
>  
> {quote}rs[0-9] 200
> rs1[0-9] 50
> {quote}
>  
> RegionServers with hostname matching the first rules will have a limit of 
> 200, and the others 50. If there's no match, a default is set.
> Thanks to the rule, we have two informations: the max number of regions for 
> this cluster, and the rules for each servers. {{HeterogeneousBalancer}} will 
> try to balance regions according to their capacity.
> Let's take an example. Let's say that we have 20 RS:
>  * 10 RS, named through {{rs0}} to {{rs9}} loaded with 60 regions 

[jira] [Updated] (HBASE-22664) Move protobuf stuff in hbase-rsgroup to hbase-protocol-shaded

2019-07-08 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-22664:
--
Summary: Move protobuf stuff in hbase-rsgroup to hbase-protocol-shaded  
(was: Move protobuf sutff in hbase-rsgroup to hbase-protocol-shaded)

> Move protobuf stuff in hbase-rsgroup to hbase-protocol-shaded
> -
>
> Key: HBASE-22664
> URL: https://issues.apache.org/jira/browse/HBASE-22664
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hbase] Apache-HBase commented on issue #323: HBASE-22414 Interruption of moving regions in RSGroup will cause regi…

2019-07-08 Thread GitBox
Apache-HBase commented on issue #323: HBASE-22414 Interruption of moving 
regions in RSGroup will cause regi…
URL: https://github.com/apache/hbase/pull/323#issuecomment-509127869
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 170 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | hbaseanti | 0 |  Patch does not have any anti-patterns. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 2 new or modified test 
files. |
   ||| _ master Compile Tests _ |
   | +1 | mvninstall | 324 | master passed |
   | +1 | compile | 31 | master passed |
   | +1 | checkstyle | 14 | master passed |
   | +1 | shadedjars | 334 | branch has no errors when building our shaded 
downstream artifacts. |
   | +1 | findbugs | 50 | master passed |
   | +1 | javadoc | 24 | master passed |
   ||| _ Patch Compile Tests _ |
   | +1 | mvninstall | 305 | the patch passed |
   | +1 | compile | 30 | the patch passed |
   | +1 | javac | 30 | the patch passed |
   | -1 | checkstyle | 15 | hbase-rsgroup: The patch generated 9 new + 2 
unchanged - 0 fixed = 11 total (was 2) |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedjars | 332 | patch has no errors when building our shaded 
downstream artifacts. |
   | +1 | hadoopcheck | 956 | Patch does not cause any errors with Hadoop 2.8.5 
2.9.2 or 3.1.2. |
   | +1 | findbugs | 59 | the patch passed |
   | +1 | javadoc | 24 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 485 | hbase-rsgroup in the patch passed. |
   | +1 | asflicense | 11 | The patch does not generate ASF License warnings. |
   | | | 3565 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-323/9/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hbase/pull/323 |
   | Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
   | uname | Linux faf329789726 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | /testptch/patchprocess/precommit/personality/provided.sh |
   | git revision | master / 605f8a15bb |
   | maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
   | Default Java | 1.8.0_181 |
   | findbugs | v3.1.11 |
   | checkstyle | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-323/9/artifact/out/diff-checkstyle-hbase-rsgroup.txt
 |
   |  Test Results | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-323/9/testReport/
 |
   | Max. process+thread count | 4618 (vs. ulimit of 1) |
   | modules | C: hbase-rsgroup U: hbase-rsgroup |
   | Console output | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-323/9/console |
   | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HBASE-22664) Move protobuf sutff in hbase-rsgroup to hbase-protocol-shaded

2019-07-08 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880115#comment-16880115
 ] 

Duo Zhang commented on HBASE-22664:
---

[~zghaobac] PTAL.

Plan to do this on a feature branch, HBASE-22514.

> Move protobuf sutff in hbase-rsgroup to hbase-protocol-shaded
> -
>
> Key: HBASE-22664
> URL: https://issues.apache.org/jira/browse/HBASE-22664
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hbase] Apache9 opened a new pull request #362: HBASE-22664 Move protobuf sutff in hbase-rsgroup to hbase-protocol-sh…

2019-07-08 Thread GitBox
Apache9 opened a new pull request #362: HBASE-22664 Move protobuf sutff in 
hbase-rsgroup to hbase-protocol-sh…
URL: https://github.com/apache/hbase/pull/362
 
 
   …aded


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HBASE-22664) Move protobuf sutff in hbase-rsgroup to hbase-protocol-shaded

2019-07-08 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang reassigned HBASE-22664:
-

Assignee: Duo Zhang

> Move protobuf sutff in hbase-rsgroup to hbase-protocol-shaded
> -
>
> Key: HBASE-22664
> URL: https://issues.apache.org/jira/browse/HBASE-22664
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22665) RegionServer abort failed when AbstractFSWAL.shutdown hang

2019-07-08 Thread Yechao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880112#comment-16880112
 ] 

Yechao Chen commented on HBASE-22665:
-

check the the code and jstack ,

 

the wal log roller stuck ,so the .AbstractFSWAL.shutdown wait the 
AbstractFSWAL.rollWriter

with the lock rollwriterLock.lock 

 

it seems like that:

1、AbstractFSWAL.rollWriter  called rollWriterLock.lock();

2、AsyncFSWAL.doReplaceWriter called waitForSafePoint();

3、waitForSafePoint() can't finished 

4、AbstractFSWAL.shutdown called rollWriterLock.lock();(waiting)

5、The rs process can't be aborted

 

 

 

 

 

with {color:#FF}at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.waitForSafePoint(AsyncFSWAL.java:628){color}

"regionserver/hbase-slave-216-99:16020.logRoller" #297 daemon prio=5 os_prio=0 
tid=0x7f202a4952c0 nid=0x34c2 waiting on condition [0x7f0fdd19f000]
 java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x7f18d60b93a8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1976)
 {color:#FF}at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.waitForSafePoint(AsyncFSWAL.java:628){color}
 at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doReplaceWriter(AsyncFSWAL.java:656)
 at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doReplaceWriter(AsyncFSWAL.java:124)
 at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.replaceWriter(AbstractFSWAL.java:699)
 at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:759)
 at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:184)
 at java.lang.Thread.run(Thread.java:745)

 

"regionserver/hbase-slave-216-99:16020" #25 daemon prio=5 os_prio=0 
tid=0x7f204282c600 nid=0x34aa waiting on condition [0x7f0fe044d000]
 java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x7f18a49b2bb8> (a 
java.util.concurrent.locks.ReentrantLock$FairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
 at 
java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
 at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
 at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:815)
 at 
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.shutdown(AbstractFSWALProvider.java:168)
 at 
org.apache.hadoop.hbase.wal.RegionGroupingProvider.shutdown(RegionGroupingProvider.java:221)
 at org.apache.hadoop.hbase.wal.WALFactory.shutdown(WALFactory.java:239)
 at 
org.apache.hadoop.hbase.regionserver.HRegionServer.shutdownWAL(HRegionServer.java:1445)
 at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1117)
 at java.lang.Thread.run(Thread.java:745)

 

 

!image-2019-07-08-16-07-37-664.png!

 

!image-2019-07-08-16-08-26-777.png!

!image-2019-07-08-16-14-43-455.png!

 

> RegionServer abort failed when AbstractFSWAL.shutdown hang
> --
>
> Key: HBASE-22665
> URL: https://issues.apache.org/jira/browse/HBASE-22665
> Project: HBase
>  Issue Type: Bug
> Environment: HBase 2.1.2
> Hadoop 3.1.x
> centos 7.4
>Reporter: Yechao Chen
>Priority: Major
> Attachments: image-2019-07-08-16-07-37-664.png, 
> image-2019-07-08-16-08-26-777.png, image-2019-07-08-16-14-43-455.png, 
> jstack_20190625, jstack_20190704_1, jstack_20190704_2
>
>
> We use hbase 2.1.2,when the rs with heavy qps and rs abort with error like 
> "Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to 
> get sync result after 30 ms for txid=36380334, WAL system stuck?"
>  
> RegionServer aborted failed when AbstractFSWAL.shutdown hang
>  
> jstack info always show the regionserver hang with "AbstractFSWAL.shutdown"
> "regionserver/hbase-slave-216-99:16020" #25 daemon prio=5 os_prio=0 
> tid=0x7f204282c600 nid=0x34aa waiting on condition [0x7f0fe044d000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7f18a49b2bb8> (a 
> java.util.concurrent.locks.ReentrantLock$FairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> 

[jira] [Updated] (HBASE-22665) RegionServer abort failed when AbstractFSWAL.shutdown hang

2019-07-08 Thread Yechao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yechao Chen updated HBASE-22665:

Attachment: image-2019-07-08-16-14-43-455.png

> RegionServer abort failed when AbstractFSWAL.shutdown hang
> --
>
> Key: HBASE-22665
> URL: https://issues.apache.org/jira/browse/HBASE-22665
> Project: HBase
>  Issue Type: Bug
> Environment: HBase 2.1.2
> Hadoop 3.1.x
> centos 7.4
>Reporter: Yechao Chen
>Priority: Major
> Attachments: image-2019-07-08-16-07-37-664.png, 
> image-2019-07-08-16-08-26-777.png, image-2019-07-08-16-14-43-455.png, 
> jstack_20190625, jstack_20190704_1, jstack_20190704_2
>
>
> We use hbase 2.1.2,when the rs with heavy qps and rs abort with error like 
> "Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to 
> get sync result after 30 ms for txid=36380334, WAL system stuck?"
>  
> RegionServer aborted failed when AbstractFSWAL.shutdown hang
>  
> jstack info always show the regionserver hang with "AbstractFSWAL.shutdown"
> "regionserver/hbase-slave-216-99:16020" #25 daemon prio=5 os_prio=0 
> tid=0x7f204282c600 nid=0x34aa waiting on condition [0x7f0fe044d000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7f18a49b2bb8> (a 
> java.util.concurrent.locks.ReentrantLock$FairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
>  {color:#FF}at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285){color}
> {color:#FF} at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:815){color}
>  at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.shutdown(AbstractFSWALProvider.java:168)
>  at 
> org.apache.hadoop.hbase.wal.RegionGroupingProvider.shutdown(RegionGroupingProvider.java:221)
>  at org.apache.hadoop.hbase.wal.WALFactory.shutdown(WALFactory.java:239)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.shutdownWAL(HRegionServer.java:1445)
>  {color:#FF}at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1117){color}
> {color:#FF} at java.lang.Thread.run(Thread.java:745){color}
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22665) RegionServer abort failed when AbstractFSWAL.shutdown hang

2019-07-08 Thread Yechao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yechao Chen updated HBASE-22665:

Attachment: image-2019-07-08-16-08-26-777.png

> RegionServer abort failed when AbstractFSWAL.shutdown hang
> --
>
> Key: HBASE-22665
> URL: https://issues.apache.org/jira/browse/HBASE-22665
> Project: HBase
>  Issue Type: Bug
> Environment: HBase 2.1.2
> Hadoop 3.1.x
> centos 7.4
>Reporter: Yechao Chen
>Priority: Major
> Attachments: image-2019-07-08-16-07-37-664.png, 
> image-2019-07-08-16-08-26-777.png, jstack_20190625, jstack_20190704_1, 
> jstack_20190704_2
>
>
> We use hbase 2.1.2,when the rs with heavy qps and rs abort with error like 
> "Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to 
> get sync result after 30 ms for txid=36380334, WAL system stuck?"
>  
> RegionServer aborted failed when AbstractFSWAL.shutdown hang
>  
> jstack info always show the regionserver hang with "AbstractFSWAL.shutdown"
> "regionserver/hbase-slave-216-99:16020" #25 daemon prio=5 os_prio=0 
> tid=0x7f204282c600 nid=0x34aa waiting on condition [0x7f0fe044d000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7f18a49b2bb8> (a 
> java.util.concurrent.locks.ReentrantLock$FairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
>  {color:#FF}at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285){color}
> {color:#FF} at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:815){color}
>  at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.shutdown(AbstractFSWALProvider.java:168)
>  at 
> org.apache.hadoop.hbase.wal.RegionGroupingProvider.shutdown(RegionGroupingProvider.java:221)
>  at org.apache.hadoop.hbase.wal.WALFactory.shutdown(WALFactory.java:239)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.shutdownWAL(HRegionServer.java:1445)
>  {color:#FF}at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1117){color}
> {color:#FF} at java.lang.Thread.run(Thread.java:745){color}
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22665) RegionServer abort failed when AbstractFSWAL.shutdown hang

2019-07-08 Thread Yechao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yechao Chen updated HBASE-22665:

Attachment: image-2019-07-08-16-07-37-664.png

> RegionServer abort failed when AbstractFSWAL.shutdown hang
> --
>
> Key: HBASE-22665
> URL: https://issues.apache.org/jira/browse/HBASE-22665
> Project: HBase
>  Issue Type: Bug
> Environment: HBase 2.1.2
> Hadoop 3.1.x
> centos 7.4
>Reporter: Yechao Chen
>Priority: Major
> Attachments: image-2019-07-08-16-07-37-664.png, jstack_20190625, 
> jstack_20190704_1, jstack_20190704_2
>
>
> We use hbase 2.1.2,when the rs with heavy qps and rs abort with error like 
> "Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to 
> get sync result after 30 ms for txid=36380334, WAL system stuck?"
>  
> RegionServer aborted failed when AbstractFSWAL.shutdown hang
>  
> jstack info always show the regionserver hang with "AbstractFSWAL.shutdown"
> "regionserver/hbase-slave-216-99:16020" #25 daemon prio=5 os_prio=0 
> tid=0x7f204282c600 nid=0x34aa waiting on condition [0x7f0fe044d000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7f18a49b2bb8> (a 
> java.util.concurrent.locks.ReentrantLock$FairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
>  {color:#FF}at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285){color}
> {color:#FF} at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:815){color}
>  at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.shutdown(AbstractFSWALProvider.java:168)
>  at 
> org.apache.hadoop.hbase.wal.RegionGroupingProvider.shutdown(RegionGroupingProvider.java:221)
>  at org.apache.hadoop.hbase.wal.WALFactory.shutdown(WALFactory.java:239)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.shutdownWAL(HRegionServer.java:1445)
>  {color:#FF}at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1117){color}
> {color:#FF} at java.lang.Thread.run(Thread.java:745){color}
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22665) RegionServer abort failed when AbstractFSWAL.shutdown hang

2019-07-08 Thread Yechao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yechao Chen updated HBASE-22665:

Attachment: jstack_20190704_1
jstack_20190625
jstack_20190704_2

> RegionServer abort failed when AbstractFSWAL.shutdown hang
> --
>
> Key: HBASE-22665
> URL: https://issues.apache.org/jira/browse/HBASE-22665
> Project: HBase
>  Issue Type: Bug
> Environment: HBase 2.1.2
> Hadoop 3.1.x
> centos 7.4
>Reporter: Yechao Chen
>Priority: Major
> Attachments: jstack_20190625, jstack_20190704_1, jstack_20190704_2
>
>
> We use hbase 2.1.2,when the rs with heavy qps and rs abort with error like 
> "Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to 
> get sync result after 30 ms for txid=36380334, WAL system stuck?"
>  
> RegionServer aborted failed when AbstractFSWAL.shutdown hang
>  
> jstack info always show the regionserver hang with "AbstractFSWAL.shutdown"
> "regionserver/hbase-slave-216-99:16020" #25 daemon prio=5 os_prio=0 
> tid=0x7f204282c600 nid=0x34aa waiting on condition [0x7f0fe044d000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7f18a49b2bb8> (a 
> java.util.concurrent.locks.ReentrantLock$FairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
>  {color:#FF}at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285){color}
> {color:#FF} at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:815){color}
>  at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.shutdown(AbstractFSWALProvider.java:168)
>  at 
> org.apache.hadoop.hbase.wal.RegionGroupingProvider.shutdown(RegionGroupingProvider.java:221)
>  at org.apache.hadoop.hbase.wal.WALFactory.shutdown(WALFactory.java:239)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.shutdownWAL(HRegionServer.java:1445)
>  {color:#FF}at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1117){color}
> {color:#FF} at java.lang.Thread.run(Thread.java:745){color}
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22665) RegionServer abort failed when AbstractFSWAL.shutdown hang

2019-07-08 Thread Yechao Chen (JIRA)
Yechao Chen created HBASE-22665:
---

 Summary: RegionServer abort failed when AbstractFSWAL.shutdown hang
 Key: HBASE-22665
 URL: https://issues.apache.org/jira/browse/HBASE-22665
 Project: HBase
  Issue Type: Bug
 Environment: HBase 2.1.2

Hadoop 3.1.x

centos 7.4
Reporter: Yechao Chen


We use hbase 2.1.2,when the rs with heavy qps and rs abort with error like 
"Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to 
get sync result after 30 ms for txid=36380334, WAL system stuck?"

 

RegionServer aborted failed when AbstractFSWAL.shutdown hang

 

jstack info always show the regionserver hang with "AbstractFSWAL.shutdown"

"regionserver/hbase-slave-216-99:16020" #25 daemon prio=5 os_prio=0 
tid=0x7f204282c600 nid=0x34aa waiting on condition [0x7f0fe044d000]
 java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x7f18a49b2bb8> (a 
java.util.concurrent.locks.ReentrantLock$FairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
 at 
java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
 {color:#FF}at 
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285){color}
{color:#FF} at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:815){color}
 at 
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.shutdown(AbstractFSWALProvider.java:168)
 at 
org.apache.hadoop.hbase.wal.RegionGroupingProvider.shutdown(RegionGroupingProvider.java:221)
 at org.apache.hadoop.hbase.wal.WALFactory.shutdown(WALFactory.java:239)
 at 
org.apache.hadoop.hbase.regionserver.HRegionServer.shutdownWAL(HRegionServer.java:1445)
 {color:#FF}at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1117){color}
{color:#FF} at java.lang.Thread.run(Thread.java:745){color}

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22664) Move protobuf sutff in hbase-rsgroup to hbase-protocol-shaded

2019-07-08 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-22664:
-

 Summary: Move protobuf sutff in hbase-rsgroup to 
hbase-protocol-shaded
 Key: HBASE-22664
 URL: https://issues.apache.org/jira/browse/HBASE-22664
 Project: HBase
  Issue Type: Sub-task
Reporter: Duo Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)