[jira] [Commented] (HBASE-21553) schedLock not released in MasterProcedureScheduler

2018-12-10 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716035#comment-16716035
 ] 

Sean Busbey commented on HBASE-21553:
-

it looks like the addition of a shared lock check for the namespace came in 
HBASE-15105, which means branch-1.2 doesn't have the missed lock release.

Clean up to use try/finally for unlocks is probably still a good idea, but 
probably better done as a different JIRA so that folks don't think there's the 
same risk of deadlock getting fixed.

> schedLock not released in MasterProcedureScheduler
> --
>
> Key: HBASE-21553
> URL: https://issues.apache.org/jira/browse/HBASE-21553
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.10
>
> Attachments: HBASE-21553-branch-1.001.patch, 
> HBASE-21553-branch-1.002.patch
>
>
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749
> As shown above, we didn't unlock schedLock which can cause deadlock.
> Besides this, there are other places in this class handles schedLock.unlock 
> in a risky manner. I'd like to move them to finally block to improve the 
> robustness of handling locks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21553) schedLock not released in MasterProcedureScheduler

2018-12-10 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21553:

Component/s: proc-v2

> schedLock not released in MasterProcedureScheduler
> --
>
> Key: HBASE-21553
> URL: https://issues.apache.org/jira/browse/HBASE-21553
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.10
>
> Attachments: HBASE-21553-branch-1.001.patch, 
> HBASE-21553-branch-1.002.patch
>
>
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749
> As shown above, we didn't unlock schedLock which can cause deadlock.
> Besides this, there are other places in this class handles schedLock.unlock 
> in a risky manner. I'd like to move them to finally block to improve the 
> robustness of handling locks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21553) schedLock not released in MasterProcedureScheduler

2018-12-10 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21553:

Issue Type: Bug  (was: Improvement)

> schedLock not released in MasterProcedureScheduler
> --
>
> Key: HBASE-21553
> URL: https://issues.apache.org/jira/browse/HBASE-21553
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.10
>
> Attachments: HBASE-21553-branch-1.001.patch, 
> HBASE-21553-branch-1.002.patch
>
>
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749
> As shown above, we didn't unlock schedLock which can cause deadlock.
> Besides this, there are other places in this class handles schedLock.unlock 
> in a risky manner. I'd like to move them to finally block to improve the 
> robustness of handling locks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21553) schedLock not released in MasterProcedureScheduler

2018-12-10 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21553:

Priority: Critical  (was: Major)

> schedLock not released in MasterProcedureScheduler
> --
>
> Key: HBASE-21553
> URL: https://issues.apache.org/jira/browse/HBASE-21553
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.10
>
> Attachments: HBASE-21553-branch-1.001.patch, 
> HBASE-21553-branch-1.002.patch
>
>
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749
> As shown above, we didn't unlock schedLock which can cause deadlock.
> Besides this, there are other places in this class handles schedLock.unlock 
> in a risky manner. I'd like to move them to finally block to improve the 
> robustness of handling locks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21283) Add new shell command 'rit' for listing regions in transition

2018-12-10 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey resolved HBASE-21283.
-
  Resolution: Fixed
Release Note: 


The HBase `shell` now includes a command to list regions currently in 
transition.

```
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.5.0-SNAPSHOT, r9bb6d2fa8b760f16cd046657240ebd4ad91cb6de, Mon Oct  8 
21:05:50 UTC 2018

hbase(main):001:0> help 'rit'
List all regions in transition.
Examples:
  hbase> rit

hbase(main):002:0> create ...
0 row(s) in 2.5150 seconds
=> Hbase::Table - IntegrationTestBigLinkedList

hbase(main):003:0> rit
0 row(s) in 0.0340 seconds

hbase(main):004:0> unassign '56f0c38c81ae453d19906ce156a2d6a1'
0 row(s) in 0.0540 seconds

hbase(main):005:0> rit 
IntegrationTestBigLinkedList,L\xCC\xCC\xCC\xCC\xCC\xCC\xCB,1539117183224.56f0c38c81ae453d19906ce156a2d6a1.
 state=PENDING_CLOSE, ts=Tue Oct 09 20:33:34 UTC 2018 (0s ago), server=null 


  
1 row(s) in 0.0170 seconds
```

> Add new shell command 'rit' for listing regions in transition
> -
>
> Key: HBASE-21283
> URL: https://issues.apache.org/jira/browse/HBASE-21283
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability, shell
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-21283-branch-1.patch, HBASE-21283-branch-1.patch, 
> HBASE-21283-branch-1.patch, HBASE-21283.patch, HBASE-21283.patch, 
> HBASE-21283.patch
>
>
> The 'status' shell command shows regions in transition but sometimes an 
> operator may want to retrieve a simple list of regions in transition. Here's 
> a patch that adds a new 'rit' command to the TOOLS group that does just that. 
> No test, because it seems hard to mock RITs from the ruby test code, but I 
> have run TestShell and it passes, so the command is verified to meet minimum 
> requirements, like help text, and manually verified with branch-1 (shell in 
> branch-2 and up doesn't return until TransitRegionProcedure has completed so 
> by that time no RIT):
> {noformat}
> HBase Shell
> Use "help" to get list of supported commands.
> Use "exit" to quit this interactive shell.
> Version 1.5.0-SNAPSHOT, r9bb6d2fa8b760f16cd046657240ebd4ad91cb6de, Mon Oct  8 
> 21:05:50 UTC 2018
> hbase(main):001:0> help 'rit'
> List all regions in transition.
> Examples:
>   hbase> rit
> hbase(main):002:0> create ...
> 0 row(s) in 2.5150 seconds
> => Hbase::Table - IntegrationTestBigLinkedList
> hbase(main):003:0> rit
> 0 row(s) in 0.0340 seconds
> hbase(main):004:0> unassign '56f0c38c81ae453d19906ce156a2d6a1'
> 0 row(s) in 0.0540 seconds
> hbase(main):005:0> rit 
> IntegrationTestBigLinkedList,L\xCC\xCC\xCC\xCC\xCC\xCC\xCB,1539117183224.56f0c38c81ae453d19906ce156a2d6a1.
>  state=PENDING_CLOSE, ts=Tue Oct 09 20:33:34 UTC 2018 (0s ago), server=null   
>   
>   
> 
> 1 row(s) in 0.0170 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-21283) Add new shell command 'rit' for listing regions in transition

2018-12-10 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey reopened HBASE-21283:
-

> Add new shell command 'rit' for listing regions in transition
> -
>
> Key: HBASE-21283
> URL: https://issues.apache.org/jira/browse/HBASE-21283
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability, shell
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-21283-branch-1.patch, HBASE-21283-branch-1.patch, 
> HBASE-21283-branch-1.patch, HBASE-21283.patch, HBASE-21283.patch, 
> HBASE-21283.patch
>
>
> The 'status' shell command shows regions in transition but sometimes an 
> operator may want to retrieve a simple list of regions in transition. Here's 
> a patch that adds a new 'rit' command to the TOOLS group that does just that. 
> No test, because it seems hard to mock RITs from the ruby test code, but I 
> have run TestShell and it passes, so the command is verified to meet minimum 
> requirements, like help text, and manually verified with branch-1 (shell in 
> branch-2 and up doesn't return until TransitRegionProcedure has completed so 
> by that time no RIT):
> {noformat}
> HBase Shell
> Use "help" to get list of supported commands.
> Use "exit" to quit this interactive shell.
> Version 1.5.0-SNAPSHOT, r9bb6d2fa8b760f16cd046657240ebd4ad91cb6de, Mon Oct  8 
> 21:05:50 UTC 2018
> hbase(main):001:0> help 'rit'
> List all regions in transition.
> Examples:
>   hbase> rit
> hbase(main):002:0> create ...
> 0 row(s) in 2.5150 seconds
> => Hbase::Table - IntegrationTestBigLinkedList
> hbase(main):003:0> rit
> 0 row(s) in 0.0340 seconds
> hbase(main):004:0> unassign '56f0c38c81ae453d19906ce156a2d6a1'
> 0 row(s) in 0.0540 seconds
> hbase(main):005:0> rit 
> IntegrationTestBigLinkedList,L\xCC\xCC\xCC\xCC\xCC\xCC\xCB,1539117183224.56f0c38c81ae453d19906ce156a2d6a1.
>  state=PENDING_CLOSE, ts=Tue Oct 09 20:33:34 UTC 2018 (0s ago), server=null   
>   
>   
> 
> 1 row(s) in 0.0170 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21410) A helper page that help find all problematic regions and procedures

2018-12-10 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715798#comment-16715798
 ] 

Sean Busbey commented on HBASE-21410:
-

please reopen and then resolve again so you can add a release note calling this 
out.

> A helper page that help find all problematic regions and procedures
> ---
>
> Key: HBASE-21410
> URL: https://issues.apache.org/jira/browse/HBASE-21410
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.2.0, 2.1.1
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2
>
> Attachments: HBASE-21410.branch-2.1.001.patch, 
> HBASE-21410.branch-2.1.002.patch, HBASE-21410.master.001.patch, 
> HBASE-21410.master.002.patch, HBASE-21410.master.003.patch, 
> HBASE-21410.master.004.patch, Screenshot from 2018-10-30 19-06-21.png, 
> Screenshot from 2018-10-30 19-06-42.png, Screenshot from 2018-10-31 
> 10-11-38.png, Screenshot from 2018-10-31 10-11-56.png, Screenshot from 
> 2018-11-01 17-56-02.png, Screenshot from 2018-11-01 17-56-15.png
>
>
> *This page is mainly focus on finding the regions stuck in some state that 
> cannot be assigned. My proposal of the page is as follows:*
> !Screenshot from 2018-10-30 19-06-21.png!
> *From this page we can see all regions in RIT queue and their related 
> procedures. If we can determine that these regions' state are abnormal, we 
> can click the link 'Procedures as TXT' to get a full list of procedure IDs to 
> bypass them. Then click 'Regions as TXT' to get a full list of encoded region 
> names to assign.*
> !Screenshot from 2018-10-30 19-06-42.png!
> *Some region names are covered by the navigator bar, I'll fix it later.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21410) A helper page that help find all problematic regions and procedures

2018-12-10 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21410:

Fix Version/s: (was: 2.1.0)
   2.1.2

> A helper page that help find all problematic regions and procedures
> ---
>
> Key: HBASE-21410
> URL: https://issues.apache.org/jira/browse/HBASE-21410
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.2.0, 2.1.1
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2
>
> Attachments: HBASE-21410.branch-2.1.001.patch, 
> HBASE-21410.branch-2.1.002.patch, HBASE-21410.master.001.patch, 
> HBASE-21410.master.002.patch, HBASE-21410.master.003.patch, 
> HBASE-21410.master.004.patch, Screenshot from 2018-10-30 19-06-21.png, 
> Screenshot from 2018-10-30 19-06-42.png, Screenshot from 2018-10-31 
> 10-11-38.png, Screenshot from 2018-10-31 10-11-56.png, Screenshot from 
> 2018-11-01 17-56-02.png, Screenshot from 2018-11-01 17-56-15.png
>
>
> *This page is mainly focus on finding the regions stuck in some state that 
> cannot be assigned. My proposal of the page is as follows:*
> !Screenshot from 2018-10-30 19-06-21.png!
> *From this page we can see all regions in RIT queue and their related 
> procedures. If we can determine that these regions' state are abnormal, we 
> can click the link 'Procedures as TXT' to get a full list of procedure IDs to 
> bypass them. Then click 'Regions as TXT' to get a full list of encoded region 
> names to assign.*
> !Screenshot from 2018-10-30 19-06-42.png!
> *Some region names are covered by the navigator bar, I'll fix it later.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-06 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey resolved HBASE-21551.
-
  Resolution: Fixed
Release Note: 

### Summary
HBase clusters will experience Region Server failures due to out of memory 
errors due to a leak given any of the following:

* User initiates Scan operations set to use the STREAM reading type
* User initiates Scan operations set to use the default reading type that read 
more than 4 * the block size of column families involved in the scan (e.g. by 
default 4*64KiB)
* Compactions run

### Root cause

When there are long running scans the Region Server process attempts to 
optimize access by using a different API geared towards sequential access. Due 
to an error in HBASE-20704 for HBase 2.0+ the Region Server fails to release 
related resources when those scans finish. That same optimization path is 
always used for the HBase internal file compaction process.

### Workaround

Impact for this error can be minimized by setting the config value 
“hbase.storescanner.pread.max.bytes” to MAX_INT to avoid the optimization for 
default user scans. Clients should also be checked to ensure they do not pass 
the STREAM read type to the Scan API. This will have a severe impact on 
performance for long scans.

Compactions always use this sequential optimized reading mechanism so 
downstream users will need to periodically restart Region Server roles after 
compactions have happened.

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, 
> HBASE-21551.v3.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-06 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey reopened HBASE-21551:
-

reopening so I can add a release note

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, 
> HBASE-21551.v3.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21283) Add new shell command 'rit' for listing regions in transition

2018-12-06 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21283:

Component/s: Operability

> Add new shell command 'rit' for listing regions in transition
> -
>
> Key: HBASE-21283
> URL: https://issues.apache.org/jira/browse/HBASE-21283
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability, shell
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-21283-branch-1.patch, HBASE-21283-branch-1.patch, 
> HBASE-21283-branch-1.patch, HBASE-21283.patch, HBASE-21283.patch, 
> HBASE-21283.patch
>
>
> The 'status' shell command shows regions in transition but sometimes an 
> operator may want to retrieve a simple list of regions in transition. Here's 
> a patch that adds a new 'rit' command to the TOOLS group that does just that. 
> No test, because it seems hard to mock RITs from the ruby test code, but I 
> have run TestShell and it passes, so the command is verified to meet minimum 
> requirements, like help text, and manually verified with branch-1 (shell in 
> branch-2 and up doesn't return until TransitRegionProcedure has completed so 
> by that time no RIT):
> {noformat}
> HBase Shell
> Use "help" to get list of supported commands.
> Use "exit" to quit this interactive shell.
> Version 1.5.0-SNAPSHOT, r9bb6d2fa8b760f16cd046657240ebd4ad91cb6de, Mon Oct  8 
> 21:05:50 UTC 2018
> hbase(main):001:0> help 'rit'
> List all regions in transition.
> Examples:
>   hbase> rit
> hbase(main):002:0> create ...
> 0 row(s) in 2.5150 seconds
> => Hbase::Table - IntegrationTestBigLinkedList
> hbase(main):003:0> rit
> 0 row(s) in 0.0340 seconds
> hbase(main):004:0> unassign '56f0c38c81ae453d19906ce156a2d6a1'
> 0 row(s) in 0.0540 seconds
> hbase(main):005:0> rit 
> IntegrationTestBigLinkedList,L\xCC\xCC\xCC\xCC\xCC\xCC\xCB,1539117183224.56f0c38c81ae453d19906ce156a2d6a1.
>  state=PENDING_CLOSE, ts=Tue Oct 09 20:33:34 UTC 2018 (0s ago), server=null   
>   
>   
> 
> 1 row(s) in 0.0170 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21548) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

2018-12-04 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21548:

Issue Type: Improvement  (was: Bug)

> WAL writer for recovered.edits file in WalSplitting should not require hflush 
> from filesystem
> -
>
> Key: HBASE-21548
> URL: https://issues.apache.org/jira/browse/HBASE-21548
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 2.2.0
>
>
> We ran into a problem running HBase on top of Azure filesystems as described 
> in HBASE-21544
> The quick solution was to backport HBASE-20734 to branch-2.0 to solve this 
> issue. However, it is incorrect for HBase to have the recovered.edits writer 
> asserting more stringent requirements than it actually needs (does not need 
> hflush).
> This is to track fixing up the writers such that we are not requiring more 
> than we actually need.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21476) Support for nanosecond timestamps

2018-12-04 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708965#comment-16708965
 ] 

Sean Busbey commented on HBASE-21476:
-

{quote}
bq. What happens if a client that doesn't support nanoseconds attempts to write 
to a table that is configured for nanoseconds?

There's no error of any sort unless 
"base.hregion.keyvalue.timestamp.slop.millisecs" is specified. The value will 
be stored at the millisecond timestamp. It's up to the client to be careful 
here.
{quote}

Is there any way we could make this a hard enforcement? Maybe a new optional 
way to flag that a client write has been done using nanoseconds? That way we 
could detect an older client server side and send back an error instead of 
trashing the data. Alternatively is there a point where we have both enough of 
the RPC to know the client version and the request deserialized enough to know 
the table where we could proactively reject operations from old clients?

If not, we'll need a big operator warning called out for the feature. Either 
way we'll need something in the upgrade notes for whatever version this lands 
in.

> Support for nanosecond timestamps
> -
>
> Key: HBASE-21476
> URL: https://issues.apache.org/jira/browse/HBASE-21476
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.1.1
>Reporter: Andrey Elenskiy
>Assignee: Andrey Elenskiy
>Priority: Major
>  Labels: features, patch
> Attachments: Apache HBase - Nanosecond Timestamps v1.pdf, 
> nanosecond_timestamps_v1.patch, nanosecond_timestamps_v2.patch
>
>
> Introducing a new table attribute "NANOSECOND_TIMESTAMPS" to tell HBase to 
> handle timestamps with nanosecond precision. This is useful for applications 
> that timestamp updates at the source with nanoseconds and still want features 
> like column family TTL and "hbase.hstore.time.to.purge.deletes" to work.
> The attribute should be specified either on new tables or on existing tables 
> which have timestamps only with nanosecond precision. There's no migration 
> from milliseconds to nanoseconds for already existing tables. We could add 
> this migration as part of compaction if you think that would be useful, but 
> that would obviously make the change more complex.
> I've added a new EnvironmentEdge method "currentTimeNano()" that uses 
> [java.time.Instant|https://docs.oracle.com/javase/8/docs/api/java/time/Instant.html]
>  to get time in nanoseconds which means it will only work with Java 8. The 
> idea is to gradually replace all places where "EnvironmentEdge.currentTime()" 
> is used to have HBase working purely with nanoseconds (which is a 
> prerequisite for HBASE-14070). Also, I've refactored ScanInfo and 
> PartitionedMobCompactor to expect TableDescriptor as an argument which makes 
> code a little cleaner and easier to extend.
> Couple more points:
> - column family TTL (specified in seconds) and 
> "hbase.hstore.time.to.purge.deletes" (specified in milliseconds) options 
> don't need to be changed, those are adjusted automatically.
> - Per cell TTL needs to be scaled by clients accordingly after 
> "NANOSECOND_TIMESTAMPS" table attribute is specified.
> Looking for everyone's feedback to know if that's a worthwhile direction. 
> Will add more comprehensive tests in a later patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21547) Precommit uses master flaky list for other branches

2018-12-04 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708781#comment-16708781
 ] 

Sean Busbey commented on HBASE-21547:
-

this would be a great improvement.

> Precommit uses master flaky list for other branches
> ---
>
> Key: HBASE-21547
> URL: https://issues.apache.org/jira/browse/HBASE-21547
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Peter Somogyi
>Priority: Major
>
> Precommit job downloads the flaky exclude list for master branch when the 
> uploaded patch file is made for different branches.
> As an example check 
> [https://builds.apache.org/job/PreCommit-HBASE-Build/15192] which was against 
> branch-1 but the unit test downloaded master's flaky list.
> {noformat}
> 15:26:05 [Tue Dec  4 14:26:04 UTC 2018 INFO]: Personality: patch unit
> 15:26:05 [Tue Dec  4 14:26:04 UTC 2018 INFO]: 
> EXCLUDE_TESTS_URL=https://builds.apache.org/job/HBase-Find-Flaky-Tests/job/master/lastSuccessfulBuild/artifact/excludes/
> 15:26:05 [Tue Dec  4 14:26:04 UTC 2018 INFO]: INCLUDE_TESTS_URL=
> 15:26:05 --2018-12-04 14:26:04--  
> https://builds.apache.org/job/HBase-Find-Flaky-Tests/job/master/lastSuccessfulBuild/artifact/excludes/
> 15:26:05 Resolving builds.apache.org (builds.apache.org)... 195.201.213.130, 
> 2a01:4f8:c0:2cc9::2
> 15:26:05 Connecting to builds.apache.org 
> (builds.apache.org)|195.201.213.130|:443... connected.
> 15:26:06 HTTP request sent, awaiting response... 200 
> 15:26:06 Length: 866 [application/octet-stream]
> 15:26:06 Saving to: 'excludes'
> 15:26:06 
> 15:26:06  0K   100% 
> 43.0M=0s
> 15:26:06 
> 15:26:06 2018-12-04 14:26:06 (43.0 MB/s) - 'excludes' saved [866/866]
> 15:26:06 
> 15:26:09 cd /testptch/hbase/hbase-thrift
> 15:26:09 mvn --batch-mode 
> -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/yetus-m2/hbase-branch-1-patch-1
>  -DHBasePatchProcess -Dhttps.protocols=TLSv1.2 -PrunAllTests 
> -Dtest.exclude.pattern=**/master.cleaner.TestSnapshotFromMaster.java,**/client.TestRestoreSnapshotFromClientAfterSplittingRegions.java,**/regionserver.TestRegionMergeTransactionOnCluster.java,**/client.TestCloneSnapshotFromClientAfterSplittingRegion.java,**/master.assignment.TestAssignmentManager.java,**/master.assignment.TestAMAssignWithRandExec.java,**/client.TestMobCloneSnapshotFromClientAfterSplittingRegion.java,**/regionserver.TestCompactingToCellFlatMapMemStore.java,**/replication.TestReplicationSmallTestsSync.java,**/TestMultiVersions.java,**/client.TestMobRestoreSnapshotFromClientAfterSplittingRegions.java,**/client.TestRestoreSnapshotFromClientWithRegionReplicas.java,**/regionserver.TestRegionServerAbortTimeout.java,**/replication.TestMasterReplication.java,**/backup.TestIncrementalBackupWithBulkLoad.java,**/master.replication.TestRegisterPeerWorkerWhenRestarting.java
>  clean test -fae > /testptch/patchprocess/patch-unit-hbase-thrift.txt 2>&1
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21545) NEW_VERSION_BEHAVIOR breaks Get/Scan with specified columns

2018-12-04 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708814#comment-16708814
 ] 

Sean Busbey commented on HBASE-21545:
-

Hi [~timoha]! thanks for writing this up. Could you provide the reproduction as 
a test?

> NEW_VERSION_BEHAVIOR breaks Get/Scan with specified columns
> ---
>
> Key: HBASE-21545
> URL: https://issues.apache.org/jira/browse/HBASE-21545
> Project: HBase
>  Issue Type: Bug
>  Components: API
>Affects Versions: 2.1.1
> Environment: HBase 2.1.1
> Hadoop 2.8.4
> Java 8
>Reporter: Andrey Elenskiy
>Priority: Major
> Attachments: App.java
>
>
> Setting NEW_VERSION_BEHAVIOR => 'true' on a column family causes only one 
> column to be returned when columns are specified in Scan or Get query. The 
> result is always one first column by sorted order. I've attached a code 
> snipped to reproduce the issue that can be converted into a test.
> I've also validated with hbase shell and gohbase client, so it's gotta be 
> server side issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21547) Precommit uses master flaky list for other branches

2018-12-04 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708802#comment-16708802
 ] 

Sean Busbey edited comment on HBASE-21547 at 12/4/18 2:51 PM:
--

It was intentional at first, because we only had a flaky list from master. Then 
it was "let's do this later" because I was out of time for working on updating 
the flaky infra.

 

If we have a branch at that point, updating precommit to use the corresponding 
flaky list would be great. I'm not sure if it makes HBASE-19265 more or less 
pressing. Since I'm usually concerned about branches that aren't master, 
probably more pressing for me. :)


was (Author: busbey):
It was intentional at first, because we only had a flaky list from master. Then 
it was "let's do this later" because I was out of time for working on updating 
the flaky infra.

 

If we have a branch at that point, updating it to use the corresponding flaky 
list would be great. I'm not sure if it makes HBASE-19265 more or less 
pressing. Since I'm usually concerned about branches that aren't master, 
probably more pressing for me. :)

> Precommit uses master flaky list for other branches
> ---
>
> Key: HBASE-21547
> URL: https://issues.apache.org/jira/browse/HBASE-21547
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Peter Somogyi
>Priority: Major
>
> Precommit job downloads the flaky exclude list for master branch when the 
> uploaded patch file is made for different branches.
> As an example check 
> [https://builds.apache.org/job/PreCommit-HBASE-Build/15192] which was against 
> branch-1 but the unit test downloaded master's flaky list.
> {noformat}
> 15:26:05 [Tue Dec  4 14:26:04 UTC 2018 INFO]: Personality: patch unit
> 15:26:05 [Tue Dec  4 14:26:04 UTC 2018 INFO]: 
> EXCLUDE_TESTS_URL=https://builds.apache.org/job/HBase-Find-Flaky-Tests/job/master/lastSuccessfulBuild/artifact/excludes/
> 15:26:05 [Tue Dec  4 14:26:04 UTC 2018 INFO]: INCLUDE_TESTS_URL=
> 15:26:05 --2018-12-04 14:26:04--  
> https://builds.apache.org/job/HBase-Find-Flaky-Tests/job/master/lastSuccessfulBuild/artifact/excludes/
> 15:26:05 Resolving builds.apache.org (builds.apache.org)... 195.201.213.130, 
> 2a01:4f8:c0:2cc9::2
> 15:26:05 Connecting to builds.apache.org 
> (builds.apache.org)|195.201.213.130|:443... connected.
> 15:26:06 HTTP request sent, awaiting response... 200 
> 15:26:06 Length: 866 [application/octet-stream]
> 15:26:06 Saving to: 'excludes'
> 15:26:06 
> 15:26:06  0K   100% 
> 43.0M=0s
> 15:26:06 
> 15:26:06 2018-12-04 14:26:06 (43.0 MB/s) - 'excludes' saved [866/866]
> 15:26:06 
> 15:26:09 cd /testptch/hbase/hbase-thrift
> 15:26:09 mvn --batch-mode 
> -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/yetus-m2/hbase-branch-1-patch-1
>  -DHBasePatchProcess -Dhttps.protocols=TLSv1.2 -PrunAllTests 
> -Dtest.exclude.pattern=**/master.cleaner.TestSnapshotFromMaster.java,**/client.TestRestoreSnapshotFromClientAfterSplittingRegions.java,**/regionserver.TestRegionMergeTransactionOnCluster.java,**/client.TestCloneSnapshotFromClientAfterSplittingRegion.java,**/master.assignment.TestAssignmentManager.java,**/master.assignment.TestAMAssignWithRandExec.java,**/client.TestMobCloneSnapshotFromClientAfterSplittingRegion.java,**/regionserver.TestCompactingToCellFlatMapMemStore.java,**/replication.TestReplicationSmallTestsSync.java,**/TestMultiVersions.java,**/client.TestMobRestoreSnapshotFromClientAfterSplittingRegions.java,**/client.TestRestoreSnapshotFromClientWithRegionReplicas.java,**/regionserver.TestRegionServerAbortTimeout.java,**/replication.TestMasterReplication.java,**/backup.TestIncrementalBackupWithBulkLoad.java,**/master.replication.TestRegisterPeerWorkerWhenRestarting.java
>  clean test -fae > /testptch/patchprocess/patch-unit-hbase-thrift.txt 2>&1
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21547) Precommit uses master flaky list for other branches

2018-12-04 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708802#comment-16708802
 ] 

Sean Busbey commented on HBASE-21547:
-

It was intentional at first, because we only had a flaky list from master. Then 
it was "let's do this later" because I was out of time for working on updating 
the flaky infra.

 

If we have a branch at that point, updating it to use the corresponding flaky 
list would be great. I'm not sure if it makes HBASE-19265 more or less 
pressing. Since I'm usually concerned about branches that aren't master, 
probably more pressing for me. :)

> Precommit uses master flaky list for other branches
> ---
>
> Key: HBASE-21547
> URL: https://issues.apache.org/jira/browse/HBASE-21547
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Peter Somogyi
>Priority: Major
>
> Precommit job downloads the flaky exclude list for master branch when the 
> uploaded patch file is made for different branches.
> As an example check 
> [https://builds.apache.org/job/PreCommit-HBASE-Build/15192] which was against 
> branch-1 but the unit test downloaded master's flaky list.
> {noformat}
> 15:26:05 [Tue Dec  4 14:26:04 UTC 2018 INFO]: Personality: patch unit
> 15:26:05 [Tue Dec  4 14:26:04 UTC 2018 INFO]: 
> EXCLUDE_TESTS_URL=https://builds.apache.org/job/HBase-Find-Flaky-Tests/job/master/lastSuccessfulBuild/artifact/excludes/
> 15:26:05 [Tue Dec  4 14:26:04 UTC 2018 INFO]: INCLUDE_TESTS_URL=
> 15:26:05 --2018-12-04 14:26:04--  
> https://builds.apache.org/job/HBase-Find-Flaky-Tests/job/master/lastSuccessfulBuild/artifact/excludes/
> 15:26:05 Resolving builds.apache.org (builds.apache.org)... 195.201.213.130, 
> 2a01:4f8:c0:2cc9::2
> 15:26:05 Connecting to builds.apache.org 
> (builds.apache.org)|195.201.213.130|:443... connected.
> 15:26:06 HTTP request sent, awaiting response... 200 
> 15:26:06 Length: 866 [application/octet-stream]
> 15:26:06 Saving to: 'excludes'
> 15:26:06 
> 15:26:06  0K   100% 
> 43.0M=0s
> 15:26:06 
> 15:26:06 2018-12-04 14:26:06 (43.0 MB/s) - 'excludes' saved [866/866]
> 15:26:06 
> 15:26:09 cd /testptch/hbase/hbase-thrift
> 15:26:09 mvn --batch-mode 
> -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/yetus-m2/hbase-branch-1-patch-1
>  -DHBasePatchProcess -Dhttps.protocols=TLSv1.2 -PrunAllTests 
> -Dtest.exclude.pattern=**/master.cleaner.TestSnapshotFromMaster.java,**/client.TestRestoreSnapshotFromClientAfterSplittingRegions.java,**/regionserver.TestRegionMergeTransactionOnCluster.java,**/client.TestCloneSnapshotFromClientAfterSplittingRegion.java,**/master.assignment.TestAssignmentManager.java,**/master.assignment.TestAMAssignWithRandExec.java,**/client.TestMobCloneSnapshotFromClientAfterSplittingRegion.java,**/regionserver.TestCompactingToCellFlatMapMemStore.java,**/replication.TestReplicationSmallTestsSync.java,**/TestMultiVersions.java,**/client.TestMobRestoreSnapshotFromClientAfterSplittingRegions.java,**/client.TestRestoreSnapshotFromClientWithRegionReplicas.java,**/regionserver.TestRegionServerAbortTimeout.java,**/replication.TestMasterReplication.java,**/backup.TestIncrementalBackupWithBulkLoad.java,**/master.replication.TestRegisterPeerWorkerWhenRestarting.java
>  clean test -fae > /testptch/patchprocess/patch-unit-hbase-thrift.txt 2>&1
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17914) Create a new reader instead of cloning a new StoreFile when compaction

2018-12-04 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708796#comment-16708796
 ] 

Sean Busbey commented on HBASE-17914:
-

please move this conversation to either dev@hbase or to a new JIRA. threads on 
resolved issues get very few eyes.

> Create a new reader instead of cloning a new StoreFile when compaction
> --
>
> Key: HBASE-17914
> URL: https://issues.apache.org/jira/browse/HBASE-17914
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, regionserver
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-17914-v1.patch, HBASE-17914-v1.patch, 
> HBASE-17914-v2.patch, HBASE-17914-v3.patch, HBASE-17914.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21546) ConnectException in TestThriftHttpServer

2018-12-04 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708792#comment-16708792
 ] 

Sean Busbey commented on HBASE-21546:
-

+1

> ConnectException in TestThriftHttpServer
> 
>
> Key: HBASE-21546
> URL: https://issues.apache.org/jira/browse/HBASE-21546
> Project: HBase
>  Issue Type: Bug
>  Components: test, Thrift
>Affects Versions: 1.5.0, 1.4.9
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Attachments: HBASE-21546.branch-1.01.patch
>
>
> TestThriftHttpServer is the first on the flaky list for branch-1 and 
> branch-1.4 with approximately 60% failure rate.
> Thrift server is not yet accepting request at the time the test starts. 
> java.net.ConnectException: Connection refused (Connection refused) at 
> org.apache.hadoop.hbase.thrift.TestThriftHttpServer.checkHttpMethods(TestThriftHttpServer.java:275)
>  at 
> org.apache.hadoop.hbase.thrift.TestThriftHttpServer.testThriftServerHttpOptionsForbiddenWhenOptionsDisabled(TestThriftHttpServer.java:176)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21481) [acl] Superuser's permissions should not be granted or revoked by any non-su global admin

2018-12-04 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708790#comment-16708790
 ] 

Sean Busbey commented on HBASE-21481:
-

shoot forgot to post here. I started reviewing this. will try to finish up this 
week.

> [acl] Superuser's permissions should not be granted or revoked by any non-su 
> global admin
> -
>
> Key: HBASE-21481
> URL: https://issues.apache.org/jira/browse/HBASE-21481
> Project: HBase
>  Issue Type: Improvement
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Major
>  Labels: ACL, security-issue
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21481.master.001.patch, 
> HBASE-21481.master.002.patch, HBASE-21481.master.003.patch, 
> HBASE-21481.master.004.patch, HBASE-21481.master.005.patch, 
> HBASE-21481.master.006.patch, HBASE-21481.master.007.patch, 
> HBASE-21481.master.008.patch, HBASE-21481.master.009.patch
>
>
> Superusers are {{hbase.superuser}} listed in configuration and plus the one 
> who start master process, these two may be overlap.
> A superuser must be a global admin, but a global admin may not be a 
> superuser, possibly granted afterwards.
> For now, an non-su global admin with a Global.ADMIN permission can grant or 
> revoke any superuser's permission, accidentally or deliberately.
> The purpose of this issue is to ban this action.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21544) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

2018-12-03 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707696#comment-16707696
 ] 

Sean Busbey commented on HBASE-21544:
-

bq. FSDataOutputStream (assuming that's what you meant by FileSystem.close()) 
doesn't say anything in terms of Javadoc, but the implementation is such that 
close() makes the same guarantees as hflush().

Does it only do that if the underlying FileSystem supports hflush?

{quote}
bq. I thought recovered edits now go to the same FileSystem as the WAL? 
wouldn't that imply that hflush should be present?

Ah, this didn't land on 2.0.x. Yes, that would have precluded the need for such 
a change.

Semantics are that it would be good to make sure that we aren't over-requiring 
from our filesystem, but you are correct in that this is less of a concern in 
newer versions since the durability required of the FS by WALs is more than 
that for recovered.edits 
{quote}

Sure. I just worry about too many configuration knobs. Could we just backport 
the fix for HBASE-20734 to branch-2.0 and call it a day?

> WAL writer for recovered.edits file in WalSplitting should not require hflush 
> from filesystem
> -
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for 

[jira] [Commented] (HBASE-21544) WAL writer for recovered.edits file in WalSplitting should not require hflush from filesystem

2018-12-03 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707657#comment-16707657
 ] 

Sean Busbey commented on HBASE-21544:
-

what does the contract for FileSystem.close say about data persistence?

I thought recovered edits now go to the same FileSystem as the WAL? wouldn't 
that imply that hflush should be present?

> WAL writer for recovered.edits file in WalSplitting should not require hflush 
> from filesystem
> -
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for writing out the recovered.edits 
> files. If {{close()}} on the recovered.edits file would fail, we're trash any 
> intermediate data in the filesystem and rerun the whole process.
> It's my understanding that this check is over-bearing and we should not make 
> the check when the ProtobufLogWriter is being used for the recovered.edits 
> file.
> [~zyork], [~busbey] fyi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21493) Release 1.2.9

2018-12-02 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey resolved HBASE-21493.
-
Resolution: Fixed

* pushed signed tag to rel/1.2.9
* [sent release announcement to 
user@hbase|https://lists.apache.org/thread.html/09ec808a0737d5adf53fb018ec75224a56d018c986fc0994fcc614fb@%3Cuser.hbase.apache.org%3E]
 and dev@hbase

> Release 1.2.9
> -
>
> Key: HBASE-21493
> URL: https://issues.apache.org/jira/browse/HBASE-21493
> Project: HBase
>  Issue Type: Task
>  Components: community
>Affects Versions: 1.2.9
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 1.2.9
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21476) Support for nanosecond timestamps

2018-11-30 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705697#comment-16705697
 ] 

Sean Busbey commented on HBASE-21476:
-

I see the WIP patches are starting to address MOB handling, but I don't see it 
mentioned in the scope document at all. Should call out in the scope document 
the impact of having tables configured for MOB both with and without nanosecond 
support enabled.

> Support for nanosecond timestamps
> -
>
> Key: HBASE-21476
> URL: https://issues.apache.org/jira/browse/HBASE-21476
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.1.1
>Reporter: Andrey Elenskiy
>Assignee: Andrey Elenskiy
>Priority: Major
>  Labels: features, patch
> Attachments: Apache HBase - Nanosecond Timestamps v1.pdf, 
> nanosecond_timestamps_v1.patch, nanosecond_timestamps_v2.patch
>
>
> Introducing a new table attribute "NANOSECOND_TIMESTAMPS" to tell HBase to 
> handle timestamps with nanosecond precision. This is useful for applications 
> that timestamp updates at the source with nanoseconds and still want features 
> like column family TTL and "hbase.hstore.time.to.purge.deletes" to work.
> The attribute should be specified either on new tables or on existing tables 
> which have timestamps only with nanosecond precision. There's no migration 
> from milliseconds to nanoseconds for already existing tables. We could add 
> this migration as part of compaction if you think that would be useful, but 
> that would obviously make the change more complex.
> I've added a new EnvironmentEdge method "currentTimeNano()" that uses 
> [java.time.Instant|https://docs.oracle.com/javase/8/docs/api/java/time/Instant.html]
>  to get time in nanoseconds which means it will only work with Java 8. The 
> idea is to gradually replace all places where "EnvironmentEdge.currentTime()" 
> is used to have HBase working purely with nanoseconds (which is a 
> prerequisite for HBASE-14070). Also, I've refactored ScanInfo and 
> PartitionedMobCompactor to expect TableDescriptor as an argument which makes 
> code a little cleaner and easier to extend.
> Couple more points:
> - column family TTL (specified in seconds) and 
> "hbase.hstore.time.to.purge.deletes" (specified in milliseconds) options 
> don't need to be changed, those are adjusted automatically.
> - Per cell TTL needs to be scaled by clients accordingly after 
> "NANOSECOND_TIMESTAMPS" table attribute is specified.
> Looking for everyone's feedback to know if that's a worthwhile direction. 
> Will add more comprehensive tests in a later patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21476) Support for nanosecond timestamps

2018-11-30 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705695#comment-16705695
 ] 

Sean Busbey commented on HBASE-21476:
-

Do we need to account for this table attribute when bulk loading?

What about snapshots? do they retain information on wether their contents use 
nanoseconds? Do tables cloned from a snapshot have to have the same nanosecond 
config as the snapshot?

> Support for nanosecond timestamps
> -
>
> Key: HBASE-21476
> URL: https://issues.apache.org/jira/browse/HBASE-21476
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.1.1
>Reporter: Andrey Elenskiy
>Assignee: Andrey Elenskiy
>Priority: Major
>  Labels: features, patch
> Attachments: Apache HBase - Nanosecond Timestamps v1.pdf, 
> nanosecond_timestamps_v1.patch, nanosecond_timestamps_v2.patch
>
>
> Introducing a new table attribute "NANOSECOND_TIMESTAMPS" to tell HBase to 
> handle timestamps with nanosecond precision. This is useful for applications 
> that timestamp updates at the source with nanoseconds and still want features 
> like column family TTL and "hbase.hstore.time.to.purge.deletes" to work.
> The attribute should be specified either on new tables or on existing tables 
> which have timestamps only with nanosecond precision. There's no migration 
> from milliseconds to nanoseconds for already existing tables. We could add 
> this migration as part of compaction if you think that would be useful, but 
> that would obviously make the change more complex.
> I've added a new EnvironmentEdge method "currentTimeNano()" that uses 
> [java.time.Instant|https://docs.oracle.com/javase/8/docs/api/java/time/Instant.html]
>  to get time in nanoseconds which means it will only work with Java 8. The 
> idea is to gradually replace all places where "EnvironmentEdge.currentTime()" 
> is used to have HBase working purely with nanoseconds (which is a 
> prerequisite for HBASE-14070). Also, I've refactored ScanInfo and 
> PartitionedMobCompactor to expect TableDescriptor as an argument which makes 
> code a little cleaner and easier to extend.
> Couple more points:
> - column family TTL (specified in seconds) and 
> "hbase.hstore.time.to.purge.deletes" (specified in milliseconds) options 
> don't need to be changed, those are adjusted automatically.
> - Per cell TTL needs to be scaled by clients accordingly after 
> "NANOSECOND_TIMESTAMPS" table attribute is specified.
> Looking for everyone's feedback to know if that's a worthwhile direction. 
> Will add more comprehensive tests in a later patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21476) Support for nanosecond timestamps

2018-11-30 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705694#comment-16705694
 ] 

Sean Busbey commented on HBASE-21476:
-

{quote}
HBase clients that don’t provide timestamps to their requests, don’t need to 
change anything.
The clients looking to write into a table that supports nanoseconds will have 
to provide
nanoseconds via “setTimestamp()” using Java 8’s Instant API. Same goes for per 
cell TTL and
Get/Scan requests with “setTimeRange()”
{quote}

What happens if a client that doesn't support nanoseconds attempts to write to 
a table that is configured for nanoseconds?

> Support for nanosecond timestamps
> -
>
> Key: HBASE-21476
> URL: https://issues.apache.org/jira/browse/HBASE-21476
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.1.1
>Reporter: Andrey Elenskiy
>Assignee: Andrey Elenskiy
>Priority: Major
>  Labels: features, patch
> Attachments: Apache HBase - Nanosecond Timestamps v1.pdf, 
> nanosecond_timestamps_v1.patch, nanosecond_timestamps_v2.patch
>
>
> Introducing a new table attribute "NANOSECOND_TIMESTAMPS" to tell HBase to 
> handle timestamps with nanosecond precision. This is useful for applications 
> that timestamp updates at the source with nanoseconds and still want features 
> like column family TTL and "hbase.hstore.time.to.purge.deletes" to work.
> The attribute should be specified either on new tables or on existing tables 
> which have timestamps only with nanosecond precision. There's no migration 
> from milliseconds to nanoseconds for already existing tables. We could add 
> this migration as part of compaction if you think that would be useful, but 
> that would obviously make the change more complex.
> I've added a new EnvironmentEdge method "currentTimeNano()" that uses 
> [java.time.Instant|https://docs.oracle.com/javase/8/docs/api/java/time/Instant.html]
>  to get time in nanoseconds which means it will only work with Java 8. The 
> idea is to gradually replace all places where "EnvironmentEdge.currentTime()" 
> is used to have HBase working purely with nanoseconds (which is a 
> prerequisite for HBASE-14070). Also, I've refactored ScanInfo and 
> PartitionedMobCompactor to expect TableDescriptor as an argument which makes 
> code a little cleaner and easier to extend.
> Couple more points:
> - column family TTL (specified in seconds) and 
> "hbase.hstore.time.to.purge.deletes" (specified in milliseconds) options 
> don't need to be changed, those are adjusted automatically.
> - Per cell TTL needs to be scaled by clients accordingly after 
> "NANOSECOND_TIMESTAMPS" table attribute is specified.
> Looking for everyone's feedback to know if that's a worthwhile direction. 
> Will add more comprehensive tests in a later patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21476) Support for nanosecond timestamps

2018-11-30 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21476:

Status: Patch Available  (was: Open)

moving to patch available so QABot can give it a spin. Please use {{git 
format-patch}} to create future patches.

> Support for nanosecond timestamps
> -
>
> Key: HBASE-21476
> URL: https://issues.apache.org/jira/browse/HBASE-21476
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.1.1
>Reporter: Andrey Elenskiy
>Assignee: Andrey Elenskiy
>Priority: Major
>  Labels: features, patch
> Attachments: Apache HBase - Nanosecond Timestamps v1.pdf, 
> nanosecond_timestamps_v1.patch, nanosecond_timestamps_v2.patch
>
>
> Introducing a new table attribute "NANOSECOND_TIMESTAMPS" to tell HBase to 
> handle timestamps with nanosecond precision. This is useful for applications 
> that timestamp updates at the source with nanoseconds and still want features 
> like column family TTL and "hbase.hstore.time.to.purge.deletes" to work.
> The attribute should be specified either on new tables or on existing tables 
> which have timestamps only with nanosecond precision. There's no migration 
> from milliseconds to nanoseconds for already existing tables. We could add 
> this migration as part of compaction if you think that would be useful, but 
> that would obviously make the change more complex.
> I've added a new EnvironmentEdge method "currentTimeNano()" that uses 
> [java.time.Instant|https://docs.oracle.com/javase/8/docs/api/java/time/Instant.html]
>  to get time in nanoseconds which means it will only work with Java 8. The 
> idea is to gradually replace all places where "EnvironmentEdge.currentTime()" 
> is used to have HBase working purely with nanoseconds (which is a 
> prerequisite for HBASE-14070). Also, I've refactored ScanInfo and 
> PartitionedMobCompactor to expect TableDescriptor as an argument which makes 
> code a little cleaner and easier to extend.
> Couple more points:
> - column family TTL (specified in seconds) and 
> "hbase.hstore.time.to.purge.deletes" (specified in milliseconds) options 
> don't need to be changed, those are adjusted automatically.
> - Per cell TTL needs to be scaled by clients accordingly after 
> "NANOSECOND_TIMESTAMPS" table attribute is specified.
> Looking for everyone's feedback to know if that's a worthwhile direction. 
> Will add more comprehensive tests in a later patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21453) Convert ReadOnlyZKClient to DEBUG instead of INFO

2018-11-30 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21453:

Component/s: Zookeeper
 logging

> Convert ReadOnlyZKClient to DEBUG instead of INFO
> -
>
> Key: HBASE-21453
> URL: https://issues.apache.org/jira/browse/HBASE-21453
> Project: HBase
>  Issue Type: Bug
>  Components: logging, Zookeeper
>Reporter: stack
>Assignee: Sakthi
>Priority: Major
> Attachments: hbase-21453.master.001.patch
>
>
> Running commands in spark-shell, this is what it looks like on each 
> invocation:
> {code}
> scala> val count = rdd.count()
> 2018-11-07 21:01:46,026 INFO  [Executor task launch worker for task 1] 
> zookeeper.ReadOnlyZKClient: Connect 0x18f3d868 to localhost:2181 with session 
> timeout=9ms, retries 30, retry interval 1000ms, keepAlive=6ms
> 2018-11-07 21:01:46,027 INFO  [ReadOnlyZKClient-localhost:2181@0x18f3d868] 
> zookeeper.ZooKeeper: Initiating client connection, 
> connectString=localhost:2181 sessionTimeout=9 
> watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$20/1362339879@743dab9f
> 2018-11-07 21:01:46,030 INFO  
> [ReadOnlyZKClient-localhost:2181@0x18f3d868-SendThread(localhost:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL 
> (unknown error)
> 2018-11-07 21:01:46,031 INFO  
> [ReadOnlyZKClient-localhost:2181@0x18f3d868-SendThread(localhost:2181)] 
> zookeeper.ClientCnxn: Socket connection established to 
> localhost/127.0.0.1:2181, initiating session
> 2018-11-07 21:01:46,033 INFO  
> [ReadOnlyZKClient-localhost:2181@0x18f3d868-SendThread(localhost:2181)] 
> zookeeper.ClientCnxn: Session establishment complete on server 
> localhost/127.0.0.1:2181, sessionid = 0x166f1b283080005, negotiated timeout = 
> 4
> 2018-11-07 21:01:46,035 INFO  [Executor task launch worker for task 1] 
> mapreduce.TableInputFormatBase: Input split length: 0 bytes.
> [Stage 1:>  (0 + 1) / 
> 1]2018-11-07 21:01:48,074 INFO  [Executor task launch worker for task 1] 
> zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x18f3d868 to 
> localhost:2181
> 2018-11-07 21:01:48,075 INFO  [ReadOnlyZKClient-localhost:2181@0x18f3d868] 
> zookeeper.ZooKeeper: Session: 0x166f1b283080005 closed
> 2018-11-07 21:01:48,076 INFO  [ReadOnlyZKClient 
> -localhost:2181@0x18f3d868-EventThread] zookeeper.ClientCnxn: EventThread 
> shut down for session: 0x166f1b283080005
> count: Long = 10
> {code}
> Let me shut down the ReadOnlyZKClient log level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21518) TestMasterFailoverWithProcedures is flaky

2018-11-29 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704343#comment-16704343
 ] 

Sean Busbey commented on HBASE-21518:
-

Yes, any UT that has multiple masters and knocks one over.

> TestMasterFailoverWithProcedures is flaky
> -
>
> Key: HBASE-21518
> URL: https://issues.apache.org/jira/browse/HBASE-21518
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.0.3, 2.1.2
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Attachments: HBASE-21518-v1.patch, output.txt
>
>
> TestMasterFailoverWithProcedures test is failing frequently, times out. I 
> faced this failure on 2.0.3RC0 vote and it also appears on multiple flaky 
> dashboards.
> branch-2: 
> [https://builds.apache.org/view/H-L/view/HBase/job/HBase-Flaky-Tests/job/branch-2/2007/]
> branch-2.1: 
> [https://builds.apache.org/view/H-L/view/HBase/job/HBase-Flaky-Tests/job/branch-2.1/2002/]
> branch-2.0: 
> [https://builds.apache.org/view/H-L/view/HBase/job/HBase-Flaky-Tests/job/branch-2.0/1988/]
>   
> {noformat}
> [INFO] Running 
> org.apache.hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures
> [ERROR] Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 
> 780.648 s <<< FAILURE! - in 
> org.apache.hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures
> [ERROR] 
> org.apache.hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures  
> Time elapsed: 749.024 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 780 
> seconds
>   at 
> org.apache.hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures.tearDown(TestMasterFailoverWithProcedures.java:86)
> [ERROR] 
> org.apache.hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures  
> Time elapsed: 749.051 s  <<< ERROR!
> java.lang.Exception: Appears to be stuck in thread RS-EventLoopGroup-3-2
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21518) TestMasterFailoverWithProcedures is flaky

2018-11-29 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703571#comment-16703571
 ] 

Sean Busbey commented on HBASE-21518:
-

+1

> TestMasterFailoverWithProcedures is flaky
> -
>
> Key: HBASE-21518
> URL: https://issues.apache.org/jira/browse/HBASE-21518
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.0.3, 2.1.2
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Attachments: HBASE-21518-v1.patch, output.txt
>
>
> TestMasterFailoverWithProcedures test is failing frequently, times out. I 
> faced this failure on 2.0.3RC0 vote and it also appears on multiple flaky 
> dashboards.
> branch-2: 
> [https://builds.apache.org/view/H-L/view/HBase/job/HBase-Flaky-Tests/job/branch-2/2007/]
> branch-2.1: 
> [https://builds.apache.org/view/H-L/view/HBase/job/HBase-Flaky-Tests/job/branch-2.1/2002/]
> branch-2.0: 
> [https://builds.apache.org/view/H-L/view/HBase/job/HBase-Flaky-Tests/job/branch-2.0/1988/]
>   
> {noformat}
> [INFO] Running 
> org.apache.hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures
> [ERROR] Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 
> 780.648 s <<< FAILURE! - in 
> org.apache.hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures
> [ERROR] 
> org.apache.hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures  
> Time elapsed: 749.024 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 780 
> seconds
>   at 
> org.apache.hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures.tearDown(TestMasterFailoverWithProcedures.java:86)
> [ERROR] 
> org.apache.hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures  
> Time elapsed: 749.051 s  <<< ERROR!
> java.lang.Exception: Appears to be stuck in thread RS-EventLoopGroup-3-2
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21513) [rm] make_rc.sh doesn't work on mac os x

2018-11-28 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702170#comment-16702170
 ] 

Sean Busbey commented on HBASE-21513:
-

okay I have a reproduced error. progress! Interestingly mine wasn't on a test 
jar, which lines up well with it being a deeper issue than my "maybe it isn't 
packaging test jars" guess from before.

{code}
11:20:50,524 [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-assembly-plugin:2.6:single (default-cli) on 
project hbase-assembly: Failed to create assembly: Error adding file 
'org.apache.hbase:hbase-common:jar:2.0.3' to archive: 
/Users/busbey/tmp_projects/hbase/hbase-common/target/classes isn't a file. -> 
[Help 1]
{code}

I'm on Maven 3.5.2 at the moment. bbl with an update.

> [rm] make_rc.sh doesn't work on mac os x
> 
>
> Key: HBASE-21513
> URL: https://issues.apache.org/jira/browse/HBASE-21513
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: make_rc.sh.txt
>
>
> Trying to build an RC on a mac, it fails always for me with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.6:single (default-cli) on 
> project hbase-assembly: Failed to create assembly: Error adding file 
> 'org.apache.hbase:hbase-common:jar:tests:2.0.3' to archive: 
> /Users/stack/checkouts/hbase.git/hbase-common/target/test-classes isn't a 
> file. -> [Help 1]
> [ERROR]
> {code}
> This is the second build that tries to assemble the tgz inside in make_rc.sh.
> If I leave out 'site' target, it works. I tried an earlier version that 
> current head of branch-2.0 and it had same issue.
> [~busbey] had a nice suggestion changing the -DskipTests to 
> -Dtest=NO_SUCH_TEST ... but that didn't work for me.
> I went and got a linux vm and it just worked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21513) [rm] make_rc.sh doesn't work on mac os x

2018-11-28 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702134#comment-16702134
 ] 

Sean Busbey commented on HBASE-21513:
-

yay! past that part. awesome. this setting worth a note in our RM guidance? or 
a comment in make_rc.sh?

> [rm] make_rc.sh doesn't work on mac os x
> 
>
> Key: HBASE-21513
> URL: https://issues.apache.org/jira/browse/HBASE-21513
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: make_rc.sh.txt
>
>
> Trying to build an RC on a mac, it fails always for me with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.6:single (default-cli) on 
> project hbase-assembly: Failed to create assembly: Error adding file 
> 'org.apache.hbase:hbase-common:jar:tests:2.0.3' to archive: 
> /Users/stack/checkouts/hbase.git/hbase-common/target/test-classes isn't a 
> file. -> [Help 1]
> [ERROR]
> {code}
> This is the second build that tries to assemble the tgz inside in make_rc.sh.
> If I leave out 'site' target, it works. I tried an earlier version that 
> current head of branch-2.0 and it had same issue.
> [~busbey] had a nice suggestion changing the -DskipTests to 
> -Dtest=NO_SUCH_TEST ... but that didn't work for me.
> I went and got a linux vm and it just worked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21513) [rm] make_rc.sh doesn't work on mac os x

2018-11-28 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702114#comment-16702114
 ] 

Sean Busbey commented on HBASE-21513:
-

yeah this is on OSX. let me try the env thing.

> [rm] make_rc.sh doesn't work on mac os x
> 
>
> Key: HBASE-21513
> URL: https://issues.apache.org/jira/browse/HBASE-21513
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: make_rc.sh.txt
>
>
> Trying to build an RC on a mac, it fails always for me with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.6:single (default-cli) on 
> project hbase-assembly: Failed to create assembly: Error adding file 
> 'org.apache.hbase:hbase-common:jar:tests:2.0.3' to archive: 
> /Users/stack/checkouts/hbase.git/hbase-common/target/test-classes isn't a 
> file. -> [Help 1]
> [ERROR]
> {code}
> This is the second build that tries to assemble the tgz inside in make_rc.sh.
> If I leave out 'site' target, it works. I tried an earlier version that 
> current head of branch-2.0 and it had same issue.
> [~busbey] had a nice suggestion changing the -DskipTests to 
> -Dtest=NO_SUCH_TEST ... but that didn't work for me.
> I went and got a linux vm and it just worked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21513) [rm] make_rc.sh doesn't work on mac os x

2018-11-28 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701942#comment-16701942
 ] 

Sean Busbey commented on HBASE-21513:
-

I'm currently trying to reproduce stack's failure on OSX. I don't think I have 
an executable named gpg2.

{code}

Busbey-MBA:hbase busbey$ which gpg2
Busbey-MBA:hbase busbey$ which gpg
/usr/local/bin/gpg
Busbey-MBA:hbase busbey$ gpg --version
gpg (GnuPG) 2.2.4
libgcrypt 1.8.2
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Home: /Users/busbey/.gnupg
Supported algorithms:
Pubkey: RSA, ELG, DSA, ECDH, ECDSA, EDDSA
Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH,
CAMELLIA128, CAMELLIA192, CAMELLIA256
Hash: SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224
Compression: Uncompressed, ZIP, ZLIB, BZIP2
{code}

I mentioned Ubuntu because all of this stuff works for me there and I haven't 
previously even tried to make it work on OSX.

> [rm] make_rc.sh doesn't work on mac os x
> 
>
> Key: HBASE-21513
> URL: https://issues.apache.org/jira/browse/HBASE-21513
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: make_rc.sh.txt
>
>
> Trying to build an RC on a mac, it fails always for me with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.6:single (default-cli) on 
> project hbase-assembly: Failed to create assembly: Error adding file 
> 'org.apache.hbase:hbase-common:jar:tests:2.0.3' to archive: 
> /Users/stack/checkouts/hbase.git/hbase-common/target/test-classes isn't a 
> file. -> [Help 1]
> [ERROR]
> {code}
> This is the second build that tries to assemble the tgz inside in make_rc.sh.
> If I leave out 'site' target, it works. I tried an earlier version that 
> current head of branch-2.0 and it had same issue.
> [~busbey] had a nice suggestion changing the -DskipTests to 
> -Dtest=NO_SUCH_TEST ... but that didn't work for me.
> I went and got a linux vm and it just worked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21513) [rm] make_rc.sh doesn't work on mac os x

2018-11-28 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701880#comment-16701880
 ] 

Sean Busbey commented on HBASE-21513:
-

FYI, so far I keep hitting some error with the gpg plugin. (I've only ever made 
RCs on Ubuntu)

{code}
00:05:14,807 [INFO] --- maven-gpg-plugin:1.6:sign (sign-release-artifacts) @ 
hbase ---
Downloading from apache.snapshots: 
http://repository.apache.org/snapshots/org/codehaus/plexus/plexus-utils/3.0.20/plexus-utils-3.0.20.jar
Downloading from central: 
https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.20/plexus-utils-3.0.20.jar
Downloaded from central: 
https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.0.20/plexus-utils-3.0.20.jar
 (243 kB at 1.7 MB/s)
gpg: signing failed: Inappropriate ioctl for device
gpg: signing failed: Inappropriate ioctl for device
{code}

I'll figure it out eventually but if someone already knows what's up lmk.

> [rm] make_rc.sh doesn't work on mac os x
> 
>
> Key: HBASE-21513
> URL: https://issues.apache.org/jira/browse/HBASE-21513
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: make_rc.sh.txt
>
>
> Trying to build an RC on a mac, it fails always for me with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.6:single (default-cli) on 
> project hbase-assembly: Failed to create assembly: Error adding file 
> 'org.apache.hbase:hbase-common:jar:tests:2.0.3' to archive: 
> /Users/stack/checkouts/hbase.git/hbase-common/target/test-classes isn't a 
> file. -> [Help 1]
> [ERROR]
> {code}
> This is the second build that tries to assemble the tgz inside in make_rc.sh.
> If I leave out 'site' target, it works. I tried an earlier version that 
> current head of branch-2.0 and it had same issue.
> [~busbey] had a nice suggestion changing the -DskipTests to 
> -Dtest=NO_SUCH_TEST ... but that didn't work for me.
> I went and got a linux vm and it just worked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21154) Remove hbase:namespace table; fold it into hbase:meta

2018-11-27 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700718#comment-16700718
 ] 

Sean Busbey commented on HBASE-21154:
-

I'd like to see a draft of the docs for Upgrade Consideration before we make a 
call on if this change works for a 2.y release.

> Remove hbase:namespace table; fold it into hbase:meta
> -
>
> Key: HBASE-21154
> URL: https://issues.apache.org/jira/browse/HBASE-21154
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21154-v1.patch, HBASE-21154-v2.patch, 
> HBASE-21154-v4.patch, HBASE-21154-v5.patch, HBASE-21154-v6.patch, 
> HBASE-21154-v7.patch, HBASE-21154.patch
>
>
> Namespace table is a small system table. Usually it has two rows. It must be 
> assigned before user tables but after hbase:meta goes out. Its presence 
> complicates our startup and is a constant source of grief when for whatever 
> reason, it is not up and available. In fact, master startup is predicated on 
> hbase:namespace being assigned and will not make progress unless it is up.
> Lets just add a new 'ns' column family to hbase:meta for namespace.
> Here is a default ns table content:
> {code}
> hbase(main):023:0* scan 'hbase:namespace'
> ROW   
>COLUMN+CELL
>  default  
>column=info:d, timestamp=1526694059106, 
> value=\x0A\x07default
>  hbase
>column=info:d, timestamp=1526694059461, 
> value=\x0A\x05hbase
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21493) Release 1.2.9

2018-11-27 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700555#comment-16700555
 ] 

Sean Busbey commented on HBASE-21493:
-

still need to tag the release in git and send out a release announcement once 
the mirrors propagate.

> Release 1.2.9
> -
>
> Key: HBASE-21493
> URL: https://issues.apache.org/jira/browse/HBASE-21493
> Project: HBase
>  Issue Type: Task
>  Components: community
>Affects Versions: 1.2.9
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 1.2.9
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21493) Release 1.2.9

2018-11-27 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700554#comment-16700554
 ] 

Sean Busbey commented on HBASE-21493:
-

* staged changes to downloads webpage
* queued website build

> Release 1.2.9
> -
>
> Key: HBASE-21493
> URL: https://issues.apache.org/jira/browse/HBASE-21493
> Project: HBase
>  Issue Type: Task
>  Components: community
>Affects Versions: 1.2.9
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 1.2.9
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21493) Release 1.2.9

2018-11-27 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700546#comment-16700546
 ] 

Sean Busbey commented on HBASE-21493:
-

* set release date on reporter.a.o
* marked release date in branch-1.2 CHANGES
* updated branch-1.2 to set version to 1.2.10-SNAPSHOT


> Release 1.2.9
> -
>
> Key: HBASE-21493
> URL: https://issues.apache.org/jira/browse/HBASE-21493
> Project: HBase
>  Issue Type: Task
>  Components: community
>Affects Versions: 1.2.9
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 1.2.9
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21493) Release 1.2.9

2018-11-27 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700535#comment-16700535
 ] 

Sean Busbey commented on HBASE-21493:
-

vote passed:

https://s.apache.org/hbase-1.2.9-vote-results

* moved from dist/dev to dist/release.
* removed 1.2.8 from dist/release
* promoted the nexus repo
* closed the jira version 1.2.9
* set start date for jira version 1.2.10

> Release 1.2.9
> -
>
> Key: HBASE-21493
> URL: https://issues.apache.org/jira/browse/HBASE-21493
> Project: HBase
>  Issue Type: Task
>  Components: community
>Affects Versions: 1.2.9
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 1.2.9
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21513) [rm] make_rc.sh doesn't work on mac os x

2018-11-26 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699746#comment-16699746
 ] 

Sean Busbey commented on HBASE-21513:
-

what OS and version was the linux VM?

> [rm] make_rc.sh doesn't work on mac os x
> 
>
> Key: HBASE-21513
> URL: https://issues.apache.org/jira/browse/HBASE-21513
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: make_rc.sh.txt
>
>
> Trying to build an RC on a mac, it fails always for me with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.6:single (default-cli) on 
> project hbase-assembly: Failed to create assembly: Error adding file 
> 'org.apache.hbase:hbase-common:jar:tests:2.0.3' to archive: 
> /Users/stack/checkouts/hbase.git/hbase-common/target/test-classes isn't a 
> file. -> [Help 1]
> [ERROR]
> {code}
> This is the second build that tries to assemble the tgz inside in make_rc.sh.
> If I leave out 'site' target, it works. I tried an earlier version that 
> current head of branch-2.0 and it had same issue.
> [~busbey] had a nice suggestion changing the -DskipTests to 
> -Dtest=NO_SUCH_TEST ... but that didn't work for me.
> I went and got a linux vm and it just worked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21513) [rm] make_rc.sh doesn't work on mac os x

2018-11-26 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699688#comment-16699688
 ] 

Sean Busbey commented on HBASE-21513:
-

I'd really like the chance to figure out the underlying issue rather than 
adding an entire additional build invocation just because the current symptom 
goes away.

> [rm] make_rc.sh doesn't work on mac os x
> 
>
> Key: HBASE-21513
> URL: https://issues.apache.org/jira/browse/HBASE-21513
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: make_rc.sh.txt
>
>
> Trying to build an RC on a mac, it fails always for me with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.6:single (default-cli) on 
> project hbase-assembly: Failed to create assembly: Error adding file 
> 'org.apache.hbase:hbase-common:jar:tests:2.0.3' to archive: 
> /Users/stack/checkouts/hbase.git/hbase-common/target/test-classes isn't a 
> file. -> [Help 1]
> [ERROR]
> {code}
> This is the second build that tries to assemble the tgz inside in make_rc.sh.
> If I leave out 'site' target, it works. I tried an earlier version that 
> current head of branch-2.0 and it had same issue.
> [~busbey] had a nice suggestion changing the -DskipTests to 
> -Dtest=NO_SUCH_TEST ... but that didn't work for me.
> I went and got a linux vm and it just worked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21513) [rm] make_rc.sh doesn't work on mac os x

2018-11-26 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699455#comment-16699455
 ] 

Sean Busbey commented on HBASE-21513:
-

Could you either spin up the linux VM again or describe what you did so that I 
can recreate it? this sounds very much like a maven version dependent thing.

[~Apache9] what version of ubuntu and where did you get maven from? (or just 
what version of maven)

> [rm] make_rc.sh doesn't work on mac os x
> 
>
> Key: HBASE-21513
> URL: https://issues.apache.org/jira/browse/HBASE-21513
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Reporter: stack
>Priority: Major
>
> Trying to build an RC on a mac, it fails always for me with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.6:single (default-cli) on 
> project hbase-assembly: Failed to create assembly: Error adding file 
> 'org.apache.hbase:hbase-common:jar:tests:2.0.3' to archive: 
> /Users/stack/checkouts/hbase.git/hbase-common/target/test-classes isn't a 
> file. -> [Help 1]
> [ERROR]
> {code}
> This is the second build that tries to assemble the tgz inside in make_rc.sh.
> If I leave out 'site' target, it works. I tried an earlier version that 
> current head of branch-2.0 and it had same issue.
> [~busbey] had a nice suggestion changing the -DskipTests to 
> -Dtest=NO_SUCH_TEST ... but that didn't work for me.
> I went and got a linux vm and it just worked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21154) Remove hbase:namespace table; fold it into hbase:meta

2018-11-26 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699010#comment-16699010
 ] 

Sean Busbey commented on HBASE-21154:
-

bq.  HBASE-21508 has been committed, so let me commit the patch here first. 
Since it is only on master, I think it is OK as there is no recent releases so 
we can open new issues to address the new problems.

Am I as a reviewer expected to open these new issues? Without looking at 
anything other than the file list I can say that this change doesn't have 
sufficient documentation. The namespace table going away is A Big Deal 
operationally. It should be called out in an upgrade section on moving to 3.0.

> Remove hbase:namespace table; fold it into hbase:meta
> -
>
> Key: HBASE-21154
> URL: https://issues.apache.org/jira/browse/HBASE-21154
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21154-v1.patch, HBASE-21154-v2.patch, 
> HBASE-21154-v4.patch, HBASE-21154-v5.patch, HBASE-21154-v6.patch, 
> HBASE-21154-v7.patch, HBASE-21154.patch
>
>
> Namespace table is a small system table. Usually it has two rows. It must be 
> assigned before user tables but after hbase:meta goes out. Its presence 
> complicates our startup and is a constant source of grief when for whatever 
> reason, it is not up and available. In fact, master startup is predicated on 
> hbase:namespace being assigned and will not make progress unless it is up.
> Lets just add a new 'ns' column family to hbase:meta for namespace.
> Here is a default ns table content:
> {code}
> hbase(main):023:0* scan 'hbase:namespace'
> ROW   
>COLUMN+CELL
>  default  
>column=info:d, timestamp=1526694059106, 
> value=\x0A\x07default
>  hbase
>column=info:d, timestamp=1526694059461, 
> value=\x0A\x05hbase
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21154) Remove hbase:namespace table; fold it into hbase:meta

2018-11-26 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699005#comment-16699005
 ] 

Sean Busbey commented on HBASE-21154:
-

bq. Sean Busbey Do you have any cycles to review the patch? I plan to integrate 
this along with HBASE-21508, so I can start another round of ITBLL.

It was a holiday in my locale and I've been offline. Hence my request for time.

> Remove hbase:namespace table; fold it into hbase:meta
> -
>
> Key: HBASE-21154
> URL: https://issues.apache.org/jira/browse/HBASE-21154
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21154-v1.patch, HBASE-21154-v2.patch, 
> HBASE-21154-v4.patch, HBASE-21154-v5.patch, HBASE-21154-v6.patch, 
> HBASE-21154-v7.patch, HBASE-21154.patch
>
>
> Namespace table is a small system table. Usually it has two rows. It must be 
> assigned before user tables but after hbase:meta goes out. Its presence 
> complicates our startup and is a constant source of grief when for whatever 
> reason, it is not up and available. In fact, master startup is predicated on 
> hbase:namespace being assigned and will not make progress unless it is up.
> Lets just add a new 'ns' column family to hbase:meta for namespace.
> Here is a default ns table content:
> {code}
> hbase(main):023:0* scan 'hbase:namespace'
> ROW   
>COLUMN+CELL
>  default  
>column=info:d, timestamp=1526694059106, 
> value=\x0A\x07default
>  hbase
>column=info:d, timestamp=1526694059461, 
> value=\x0A\x05hbase
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21513) [rm] make_rc.sh doesn't work on mac os x

2018-11-25 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698379#comment-16698379
 ] 

Sean Busbey commented on HBASE-21513:
-

What are the maven version(s) in the two different environments?

> [rm] make_rc.sh doesn't work on mac os x
> 
>
> Key: HBASE-21513
> URL: https://issues.apache.org/jira/browse/HBASE-21513
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Reporter: stack
>Priority: Major
>
> Trying to build an RC on a mac, it fails always for me with:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.6:single (default-cli) on 
> project hbase-assembly: Failed to create assembly: Error adding file 
> 'org.apache.hbase:hbase-common:jar:tests:2.0.3' to archive: 
> /Users/stack/checkouts/hbase.git/hbase-common/target/test-classes isn't a 
> file. -> [Help 1]
> [ERROR]
> {code}
> This is the second build that tries to assemble the tgz inside in make_rc.sh.
> If I leave out 'site' target, it works. I tried an earlier version that 
> current head of branch-2.0 and it had same issue.
> [~busbey] had a nice suggestion changing the -DskipTests to 
> -Dtest=NO_SUCH_TEST ... but that didn't work for me.
> I went and got a linux vm and it just worked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21154) Remove hbase:namespace table; fold it into hbase:meta

2018-11-21 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695434#comment-16695434
 ] 

Sean Busbey commented on HBASE-21154:
-

I would like to review, have started but need time. (in case anyone is close to 
committing)

> Remove hbase:namespace table; fold it into hbase:meta
> -
>
> Key: HBASE-21154
> URL: https://issues.apache.org/jira/browse/HBASE-21154
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Reporter: stack
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21154-v1.patch, HBASE-21154-v2.patch, 
> HBASE-21154.patch
>
>
> Namespace table is a small system table. Usually it has two rows. It must be 
> assigned before user tables but after hbase:meta goes out. Its presence 
> complicates our startup and is a constant source of grief when for whatever 
> reason, it is not up and available. In fact, master startup is predicated on 
> hbase:namespace being assigned and will not make progress unless it is up.
> Lets just add a new 'ns' column family to hbase:meta for namespace.
> Here is a default ns table content:
> {code}
> hbase(main):023:0* scan 'hbase:namespace'
> ROW   
>COLUMN+CELL
>  default  
>column=info:d, timestamp=1526694059106, 
> value=\x0A\x07default
>  hbase
>column=info:d, timestamp=1526694059461, 
> value=\x0A\x05hbase
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21479) TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent fails with IndexOutOfBoundsException

2018-11-21 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695317#comment-16695317
 ] 

Sean Busbey commented on HBASE-21479:
-

We should log what the filesystem we're writing to is at the stat of each test.

I think these tests are relying on the local filesystem instead of the mini DFS 
cluster and then falling prey to the fact that the local filesystem 
implementation from Hadoop doesn't actually flush/sync to disk when you ask it 
to.

> TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent 
> fails with IndexOutOfBoundsException
> --
>
> Key: HBASE-21479
> URL: https://issues.apache.org/jira/browse/HBASE-21479
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
> Attachments: testHRegionReplayEvents-output.txt
>
>
> The test fails in both master branch and branch-2 :
> {code}
> testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents)
>   Time elapsed: 3.74 sec  <<< ERROR!
> java.lang.IndexOutOfBoundsException: Index: 2, Size: 1
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents.testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(TestHRegionReplayEvents.java:1042)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21493) Release 1.2.9

2018-11-17 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690804#comment-16690804
 ] 

Sean Busbey commented on HBASE-21493:
-

RC0 VOTE posted:

https://lists.apache.org/thread.html/832ac5299d6da39f208b93b4b29d113d32a8a3a6e770973692372cc6@%3Cdev.hbase.apache.org%3E

> Release 1.2.9
> -
>
> Key: HBASE-21493
> URL: https://issues.apache.org/jira/browse/HBASE-21493
> Project: HBase
>  Issue Type: Task
>  Components: community
>Affects Versions: 1.2.9
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 1.2.9
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21445) CopyTable by bulkload will write hfile into yarn's HDFS

2018-11-17 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21445:

Priority: Major  (was: Critical)

> CopyTable by bulkload will write hfile into yarn's HDFS 
> 
>
> Key: HBASE-21445
> URL: https://issues.apache.org/jira/browse/HBASE-21445
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.9
>
> Attachments: HBASE-21445.v1.patch
>
>
> When using CopyTable with bulkload, I found that all hfile's are written in 
> our Yarn's HDFS cluster.   and failed to load hfiles into HBase cluster, 
> because we use different HDFS between yarn cluster and hbase cluster. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21445) CopyTable by bulkload will write hfile into yarn's HDFS

2018-11-17 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21445:

Priority: Critical  (was: Major)

> CopyTable by bulkload will write hfile into yarn's HDFS 
> 
>
> Key: HBASE-21445
> URL: https://issues.apache.org/jira/browse/HBASE-21445
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.9
>
> Attachments: HBASE-21445.v1.patch
>
>
> When using CopyTable with bulkload, I found that all hfile's are written in 
> our Yarn's HDFS cluster.   and failed to load hfiles into HBase cluster, 
> because we use different HDFS between yarn cluster and hbase cluster. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21493) Release 1.2.9

2018-11-17 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690439#comment-16690439
 ] 

Sean Busbey commented on HBASE-21493:
-

[nightly build 
#553|https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-1.2/553/]
 covers all the code changes and got a clean bill of health. related [untrusted 
test report 
#98|https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-1.2/98/]
 shows that the exclude list was pretty small and all of those tests have 
gotten nothing but passes in the just-run-untrusted-tests job over the window 
it looks at.

things look good to me. barring objection I'll dig out my signing machine and 
generate an RC off of fd0d55b1e5ef54eb9bf60cce1f0a8e4c1da073ef tomorrow.

> Release 1.2.9
> -
>
> Key: HBASE-21493
> URL: https://issues.apache.org/jira/browse/HBASE-21493
> Project: HBase
>  Issue Type: Task
>  Components: community
>Affects Versions: 1.2.9
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 1.2.9
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21493) Release 1.2.9

2018-11-16 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690405#comment-16690405
 ] 

Sean Busbey commented on HBASE-21493:
-

* updated ref guide on branch-1.2
* in jira but not in git: none
* in git but not in jira: HBASE-21302 for marking 1.2.8 as done
* updated CHANGES
* update version #


> Release 1.2.9
> -
>
> Key: HBASE-21493
> URL: https://issues.apache.org/jira/browse/HBASE-21493
> Project: HBase
>  Issue Type: Task
>  Components: community
>Affects Versions: 1.2.9
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 1.2.9
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21493) Release 1.2.9

2018-11-16 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690390#comment-16690390
 ] 

Sean Busbey commented on HBASE-21493:
-

* booted out 1.2.9 things that show no progress
* looked at recent fixes in branches 1 that weren't in branch-1.2. backported 
HBASE-21357

> Release 1.2.9
> -
>
> Key: HBASE-21493
> URL: https://issues.apache.org/jira/browse/HBASE-21493
> Project: HBase
>  Issue Type: Task
>  Components: community
>Affects Versions: 1.2.9
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 1.2.9
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HBASE-21493) Release 1.2.9

2018-11-16 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-21493 started by Sean Busbey.
---
> Release 1.2.9
> -
>
> Key: HBASE-21493
> URL: https://issues.apache.org/jira/browse/HBASE-21493
> Project: HBase
>  Issue Type: Task
>  Components: community
>Affects Versions: 1.2.9
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 1.2.9
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21493) Release 1.2.9

2018-11-16 Thread Sean Busbey (JIRA)
Sean Busbey created HBASE-21493:
---

 Summary: Release 1.2.9
 Key: HBASE-21493
 URL: https://issues.apache.org/jira/browse/HBASE-21493
 Project: HBase
  Issue Type: Task
  Components: community
Affects Versions: 1.2.9
Reporter: Sean Busbey
Assignee: Sean Busbey
 Fix For: 1.2.9






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21275) Thrift Server (branch 1 fix) -> Disable TRACE HTTP method for thrift http server (branch 1 only)

2018-11-16 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21275:

Fix Version/s: 1.5.0

> Thrift Server (branch 1 fix) -> Disable TRACE HTTP method for thrift http 
> server (branch 1 only)
> 
>
> Key: HBASE-21275
> URL: https://issues.apache.org/jira/browse/HBASE-21275
> Project: HBase
>  Issue Type: Bug
>  Components: Thrift
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Fix For: 1.5.0, 1.4.9
>
> Attachments: HBASE-21275-branch-1.001.patch, 
> HBASE-21275-branch-1.2.001.patch, HBASE-21275-branch-1.2.002.patch, 
> HBASE-21275-branch-1.2.003.patch, HBASE-21275-branch-1.2.003.patch, 
> HBASE-21275-branch-1.4.001.patch
>
>
> There's been a reasonable number of users running thrift http server on hbase 
> 1.x suffering with security audit tests pointing thrift server allows TRACE 
> requests.
> After doing some search, I can see HBASE-20406 added restrictions for 
> TRACE/OPTIONS method when Thrift is running over http, but it relies on many 
> other commits applied to thrift http server. This patch was later reverted 
> from master. Then again later, HBASE-20004 had made TRACE/OPTIONS 
> configurable via "*hbase.thrift.http.allow.options.method*" property, with 
> both methods being disabled by default. This also seems to rely on many 
> changes applied to thrift http server, and a branch 1 compatible patch does 
> not seem feasible.
> A solution for branch 1 is pretty simple though, am proposing a patch that 
> simply uses *WebAppContext*, instead of *Context*, as the context for the 
> *HttpServer* instance. *WebAppContext* will already restrict TRACE methods by 
> default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21357) RS should abort if OOM in Reader thread

2018-11-16 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21357:

Fix Version/s: 1.2.9

> RS should abort if OOM in Reader thread
> ---
>
> Key: HBASE-21357
> URL: https://issues.apache.org/jira/browse/HBASE-21357
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.8
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.9, 1.2.9
>
> Attachments: HBASE-21357.branch-1.001.patch, 
> HBASE-21357.branch-1.001.patch
>
>
> It is a bit strange, we will abort the RS if OOM in Listener thread, 
> Responder thread and in CallRunner thread, only not in Reader thread... 
> We should abort RS if OOM happens in Reader thread, too. If not, the reader 
> thread exists because of OOM, and the selector closes. Later connection 
> select to this reader will be ignored
> {code}
> try {
>   if (key.isValid()) {
> if (key.isAcceptable())
>   doAccept(key);
>   }
> } catch (IOException ignored) {
>   if (LOG.isTraceEnabled()) LOG.trace("ignored", ignored);
> }
> {code}
> Leaving the client (or Master and other RS)'s call wait until SocketTimeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"

2018-11-16 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21373:

Issue Type: Improvement  (was: Bug)

> Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for 
> cluster size, it gives little indication"
> -
>
> Key: HBASE-21373
> URL: https://issues.apache.org/jira/browse/HBASE-21373
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability
>Reporter: stack
>Assignee: Xu Cang
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.9
>
> Attachments: HBASE-21373.branch-1.001.patch, 
> HBASE-21373.branch-1.002.patch
>
>
> Issue to backport to branch-1. Hope you don't mind my assigning it to you Xu 
> Cang.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21357) RS should abort if OOM in Reader thread

2018-11-16 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21357:

Priority: Critical  (was: Major)

> RS should abort if OOM in Reader thread
> ---
>
> Key: HBASE-21357
> URL: https://issues.apache.org/jira/browse/HBASE-21357
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.8
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.9
>
> Attachments: HBASE-21357.branch-1.001.patch, 
> HBASE-21357.branch-1.001.patch
>
>
> It is a bit strange, we will abort the RS if OOM in Listener thread, 
> Responder thread and in CallRunner thread, only not in Reader thread... 
> We should abort RS if OOM happens in Reader thread, too. If not, the reader 
> thread exists because of OOM, and the selector closes. Later connection 
> select to this reader will be ignored
> {code}
> try {
>   if (key.isValid()) {
> if (key.isAcceptable())
>   doAccept(key);
>   }
> } catch (IOException ignored) {
>   if (LOG.isTraceEnabled()) LOG.trace("ignored", ignored);
> }
> {code}
> Leaving the client (or Master and other RS)'s call wait until SocketTimeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21357) RS should abort if OOM in Reader thread

2018-11-16 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21357:

Component/s: regionserver

> RS should abort if OOM in Reader thread
> ---
>
> Key: HBASE-21357
> URL: https://issues.apache.org/jira/browse/HBASE-21357
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.8
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.9
>
> Attachments: HBASE-21357.branch-1.001.patch, 
> HBASE-21357.branch-1.001.patch
>
>
> It is a bit strange, we will abort the RS if OOM in Listener thread, 
> Responder thread and in CallRunner thread, only not in Reader thread... 
> We should abort RS if OOM happens in Reader thread, too. If not, the reader 
> thread exists because of OOM, and the selector closes. Later connection 
> select to this reader will be ignored
> {code}
> try {
>   if (key.isValid()) {
> if (key.isAcceptable())
>   doAccept(key);
>   }
> } catch (IOException ignored) {
>   if (LOG.isTraceEnabled()) LOG.trace("ignored", ignored);
> }
> {code}
> Leaving the client (or Master and other RS)'s call wait until SocketTimeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20694) Consolidate warning on SecureBulkLoad directory permissions

2018-11-16 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-20694:

Fix Version/s: (was: 1.2.9)
   1.2.10

Still want to do this yourself [~elserj]? Maybe move to unassigned and make it 
a beginner?

> Consolidate warning on SecureBulkLoad directory permissions
> ---
>
> Key: HBASE-20694
> URL: https://issues.apache.org/jira/browse/HBASE-20694
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.9, 1.2.10
>
>
> Follow-on from HBASE-20605:
> HBase 1.x has a check which ignores a directory permission check if you're 
> using a specific filesystem which we think doesnt' do security properly.
> HBase 2.x dropped this check.
> Since the security of bulk-loaded data is dependent upon this directory 
> permission (and thus the capabilities of the FileSystem), it would be better 
> to have a consistent warning across branches.
> [~busbey] suggested that we make a WARN message which points admins to our 
> Book (and write such a section if we don't have something sufficient already) 
> and our supported filesystems, coupled with an option to disable that warning 
> message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21013) Backport "read part" of HBASE-18754 to all active 1.x branches

2018-11-16 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21013:

Fix Version/s: (was: 1.2.9)
   1.2.10

Any progress here? I'm inclined ot go with the forward port of the 0.98 version.

> Backport "read part" of HBASE-18754 to all active 1.x branches
> --
>
> Key: HBASE-21013
> URL: https://issues.apache.org/jira/browse/HBASE-21013
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Chia-Ping Tsai
>Assignee: Mingdao Yang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.9, 1.2.10
>
>
> The hfiles impacted by HBASE-18754 will have bytes of proto.TimeRangeTracker. 
> It makes all 1.x branches failed to read the hfile since all 1.x branches 
> can't deserialize the proto.TimeRangeTracker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-17229) Backport of purge ThreadLocals

2018-11-16 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-17229:

Fix Version/s: (was: 1.2.9)
   1.2.10

What do you folks think, still worth trying to get this in? Stable pointer 
already has moved to 1.4. Just tell folks hitting the issue that it's time to 
move to 1.4+?

> Backport of purge ThreadLocals
> --
>
> Key: HBASE-17229
> URL: https://issues.apache.org/jira/browse/HBASE-17229
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Priority: Critical
> Fix For: 1.2.10
>
>
> Backport HBASE-17072 and HBASE-16146. The former needs to be backported to 
> 1.3 ([~mantonov]) and 1.2 ([~busbey]). The latter is already in 1.3.  Needs 
> to be backported to 1.2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21492) CellCodec Written To WAL Before It's Verified

2018-11-16 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21492:

Priority: Critical  (was: Major)

> CellCodec Written To WAL Before It's Verified
> -
>
> Key: HBASE-21492
> URL: https://issues.apache.org/jira/browse/HBASE-21492
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 1.2.7, 2.0.2
>Reporter: BELUGA BEHR
>Priority: Critical
>
> The cell codec class name is written into the WAL file, but the cell codec 
> class is not actually verified to exist.  Therefore, users can inadvertently 
> configure an invalid class name and it will be recorded into the WAL file.  
> At that point, the WAL file becomes unreadable and blocks processing of all 
> other WAL files.
> {code:java|title=AbstractProtobufLogWriter.java}
>   private WALHeader buildWALHeader0(Configuration conf, WALHeader.Builder 
> builder) {
> if (!builder.hasWriterClsName()) {
>   builder.setWriterClsName(getWriterClassName());
> }
> if (!builder.hasCellCodecClsName()) {
>   builder.setCellCodecClsName(WALCellCodec.getWALCellCodecClass(conf));
> }
> return builder.build();
>   }
> {code}
> https://github.com/apache/hbase/blob/025ddce868eb06b4072b5152c5ffae5a01e7ae30/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractProtobufLogWriter.java#L78-L86



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21492) CellCodec Written To WAL Before It's Verified

2018-11-16 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21492:

Issue Type: Bug  (was: Improvement)

> CellCodec Written To WAL Before It's Verified
> -
>
> Key: HBASE-21492
> URL: https://issues.apache.org/jira/browse/HBASE-21492
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 1.2.7, 2.0.2
>Reporter: BELUGA BEHR
>Priority: Major
>
> The cell codec class name is written into the WAL file, but the cell codec 
> class is not actually verified to exist.  Therefore, users can inadvertently 
> configure an invalid class name and it will be recorded into the WAL file.  
> At that point, the WAL file becomes unreadable and blocks processing of all 
> other WAL files.
> {code:java|title=AbstractProtobufLogWriter.java}
>   private WALHeader buildWALHeader0(Configuration conf, WALHeader.Builder 
> builder) {
> if (!builder.hasWriterClsName()) {
>   builder.setWriterClsName(getWriterClassName());
> }
> if (!builder.hasCellCodecClsName()) {
>   builder.setCellCodecClsName(WALCellCodec.getWALCellCodecClass(conf));
> }
> return builder.build();
>   }
> {code}
> https://github.com/apache/hbase/blob/025ddce868eb06b4072b5152c5ffae5a01e7ae30/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AbstractProtobufLogWriter.java#L78-L86



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-11-16 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689436#comment-16689436
 ] 

Sean Busbey commented on HBASE-20952:
-

I pushed a commit to move this branch to weekly tests.

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21255) [acl] Refactor TablePermission into three classes (Global, Namespace, Table)

2018-11-16 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689421#comment-16689421
 ] 

Sean Busbey commented on HBASE-21255:
-

heads up on HBASE-21489. please take a look.

> [acl] Refactor TablePermission into three classes (Global, Namespace, Table)
> 
>
> Key: HBASE-21255
> URL: https://issues.apache.org/jira/browse/HBASE-21255
> Project: HBase
>  Issue Type: Improvement
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Major
>  Labels: ACLs, security-issue
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21225.master.001.patch, 
> HBASE-21225.master.002.patch, HBASE-21225.master.007.patch, 
> HBASE-21225.master.008.patch, HBASE-21225.master.009.patch, 
> HBASE-21225.master.009.patch, HBASE-21255.master.003.patch, 
> HBASE-21255.master.004.patch, HBASE-21255.master.005.patch, 
> HBASE-21255.master.006.patch, HBASE-21255.master.006.patch
>
>
> A TODO in {{TablePermission.java}}
> {code:java}
>   //TODO refactor this class
>   //we need to refacting this into three classes (Global, Table, Namespace)
> {code}
> Change Notes:
>  * Divide origin TablePermission into three classes GlobalPermission, 
> NamespacePermission, TablePermission
>  * New UserPermission consists of a user name(string, not byte[], for 
> convenience) and a permission in one of [Global, Namespace, Table]Permission.
>  * Rename TableAuthManager to AuthManager(it is IA.P), and rename some 
> methods for readability.
>  * Make PermissionCache thread safe, and the ListMultiMap is changed to Set.
>  * User cache and group cache in AuthManager is combined together.
>  * Wire proto is kept, BC should be under guarantee.
>  * Fix HBASE-21390.
>  * Resolve a small {{TODO}} global entry should be handled differently in 
> AccessControlLists
>  * Add a new api in {{Permission#getAccessScope()}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21484) [HBCK2] hbck2 should default to a released hbase version

2018-11-16 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21484:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> [HBCK2] hbck2 should default to a released hbase version
> 
>
> Key: HBASE-21484
> URL: https://issues.apache.org/jira/browse/HBASE-21484
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2
>Affects Versions: hbck2-1.0.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: hbck2-1.0.0
>
>
> can't build from clean checkout because 2.1.1-SNAPSHOT isn't a released 
> version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21483) [HBCK2] version string checking should look for exactly the version we know doesn't work

2018-11-16 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21483:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> [HBCK2] version string checking should look for exactly the version we know 
> doesn't work
> 
>
> Key: HBASE-21483
> URL: https://issues.apache.org/jira/browse/HBASE-21483
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2
>Affects Versions: hbck2-1.0.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: hbck2-1.0.0
>
>
> Right now the version check looks for anything that starts with "2.1.0" and 
> declares HBCK2 a lost cause. I presume this was to deal with 
> "2.1.0-SNAPSHOT", but that no longer needs to be a concern.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21484) [HBCK2] hbck2 should default to a released hbase version

2018-11-15 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21484:

Status: Patch Available  (was: Open)

PR started: https://github.com/apache/hbase-operator-tools/pull/1

> [HBCK2] hbck2 should default to a released hbase version
> 
>
> Key: HBASE-21484
> URL: https://issues.apache.org/jira/browse/HBASE-21484
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2
>Affects Versions: hbck2-1.0.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: hbck2-1.0.0
>
>
> can't build from clean checkout because 2.1.1-SNAPSHOT isn't a released 
> version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21483) [HBCK2] version string checking should look for exactly the version we know doesn't work

2018-11-15 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21483:

Status: Patch Available  (was: Open)

PR: https://github.com/apache/hbase-operator-tools/pull/1

> [HBCK2] version string checking should look for exactly the version we know 
> doesn't work
> 
>
> Key: HBASE-21483
> URL: https://issues.apache.org/jira/browse/HBASE-21483
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2
>Affects Versions: hbck2-1.0.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: hbck2-1.0.0
>
>
> Right now the version check looks for anything that starts with "2.1.0" and 
> declares HBCK2 a lost cause. I presume this was to deal with 
> "2.1.0-SNAPSHOT", but that no longer needs to be a concern.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21484) [HBCK2] hbck2 should default to a released hbase version

2018-11-15 Thread Sean Busbey (JIRA)
Sean Busbey created HBASE-21484:
---

 Summary: [HBCK2] hbck2 should default to a released hbase version
 Key: HBASE-21484
 URL: https://issues.apache.org/jira/browse/HBASE-21484
 Project: HBase
  Issue Type: Bug
  Components: hbck2
Affects Versions: hbck2-1.0.0
Reporter: Sean Busbey
Assignee: Sean Busbey
 Fix For: hbck2-1.0.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21484) [HBCK2] hbck2 should default to a released hbase version

2018-11-15 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21484:

Description: can't build from clean checkout because 2.1.1-SNAPSHOT isn't a 
released version

> [HBCK2] hbck2 should default to a released hbase version
> 
>
> Key: HBASE-21484
> URL: https://issues.apache.org/jira/browse/HBASE-21484
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2
>Affects Versions: hbck2-1.0.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: hbck2-1.0.0
>
>
> can't build from clean checkout because 2.1.1-SNAPSHOT isn't a released 
> version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21483) [HBCK2] version string checking should look for exactly the version we know doesn't work

2018-11-15 Thread Sean Busbey (JIRA)
Sean Busbey created HBASE-21483:
---

 Summary: [HBCK2] version string checking should look for exactly 
the version we know doesn't work
 Key: HBASE-21483
 URL: https://issues.apache.org/jira/browse/HBASE-21483
 Project: HBase
  Issue Type: Bug
  Components: hbck2
Affects Versions: hbck2-1.0.0
Reporter: Sean Busbey
Assignee: Sean Busbey
 Fix For: hbck2-1.0.0


Right now the version check looks for anything that starts with "2.1.0" and 
declares HBCK2 a lost cause. I presume this was to deal with "2.1.0-SNAPSHOT", 
but that no longer needs to be a concern.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21476) Support for nanosecond timestamps

2018-11-13 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686071#comment-16686071
 ] 

Sean Busbey commented on HBASE-21476:
-

Please write a scope document. Consider emailing dev@hbase to gather feedback 
once you have one.

Important bits to cover:
* What use cases this enables
* How will this impact upgrades
* How/where we'll document this
* How we'll test this feature

Some example scope documents:
* [HBase Spark 
Integration|https://issues.apache.org/jira/secure/attachment/12878023/Apache%20HBase%20-%20Apache%20Spark%20Integration%20Scope%20-%20update%201.pdf]
 (HBASE-18405)
* [Read Replica 
Clusters|https://issues.apache.org/jira/secure/attachment/12888376/HBase%20Read-Replica%20Clusters%20Scope%20doc_v2.pdf]
 (HBASE-18477)

> Support for nanosecond timestamps
> -
>
> Key: HBASE-21476
> URL: https://issues.apache.org/jira/browse/HBASE-21476
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.1.1
>Reporter: Andrey Elenskiy
>Assignee: Andrey Elenskiy
>Priority: Major
>  Labels: features, patch
> Attachments: nanosecond_timestamps_v1.patch
>
>
> Introducing a new table attribute "NANOSECOND_TIMESTAMPS" to tell HBase to 
> handle timestamps with nanosecond precision. This is useful for applications 
> that timestamp updates at the source with nanoseconds and still want features 
> like column family TTL and "hbase.hstore.time.to.purge.deletes" to work.
> The attribute should be specified either on new tables or on existing tables 
> which have timestamps only with nanosecond precision. There's no migration 
> from milliseconds to nanoseconds for already existing tables. We could add 
> this migration as part of compaction if you think that would be useful, but 
> that would obviously make the change more complex.
> I've added a new EnvironmentEdge method "currentTimeNano()" that uses 
> [java.time.Instant|https://docs.oracle.com/javase/8/docs/api/java/time/Instant.html]
>  to get time in nanoseconds which means it will only work with Java 8. The 
> idea is to gradually replace all places where "EnvironmentEdge.currentTime()" 
> is used to have HBase working purely with nanoseconds (which is a 
> prerequisite for HBASE-14070). Also, I've refactored ScanInfo and 
> PartitionedMobCompactor to expect TableDescriptor as an argument which makes 
> code a little cleaner and easier to extend.
> Couple more points:
> - column family TTL (specified in seconds) and 
> "hbase.hstore.time.to.purge.deletes" (specified in milliseconds) options 
> don't need to be changed, those are adjusted automatically.
> - Per cell TTL needs to be scaled by clients accordingly after 
> "NANOSECOND_TIMESTAMPS" table attribute is specified.
> Looking for everyone's feedback to know if that's a worthwhile direction. 
> Will add more comprehensive tests in a later patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20586) SyncTable tool: Add support for cross-realm remote clusters

2018-11-13 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685464#comment-16685464
 ] 

Sean Busbey commented on HBASE-20586:
-

A doc jira blocked by this one sounds like a good idea.

> SyncTable tool: Add support for cross-realm remote clusters
> ---
>
> Key: HBASE-20586
> URL: https://issues.apache.org/jira/browse/HBASE-20586
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce, Operability, Replication
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 1.5.0, 2.2.0
>
> Attachments: HBASE-20586.master.001.patch
>
>
> One possible scenario for HashTable/SyncTable is for synchronize different 
> clusters, for instance, when replication has been enabled but data existed 
> already, or due replication issues that may had caused long lags in the 
> replication.
> For secured clusters under different kerberos realms (with cross-realm 
> properly set), though, current SyncTable version would fail to authenticate 
> with the remote cluster when trying to read HashTable outputs (when 
> *sourcehashdir* is remote) and also when trying to read table data on the 
> remote cluster (when *sourcezkcluster* is remote).
> The hdfs error would look like this:
> {noformat}
> INFO mapreduce.Job: Task Id : attempt_1524358175778_105392_m_00_0, Status 
> : FAILED
> Error: java.io.IOException: Failed on local exception: java.io.IOException: 
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[TOKEN, KERBEROS]; Host Details : local host is: "local-host/1.1.1.1"; 
> destination host is: "remote-nn":8020;
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1506)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>         at com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:256)
> ...
>         at 
> org.apache.hadoop.hbase.mapreduce.HashTable$TableHash.readPropertiesFile(HashTable.java:144)
>         at 
> org.apache.hadoop.hbase.mapreduce.HashTable$TableHash.read(HashTable.java:105)
>         at 
> org.apache.hadoop.hbase.mapreduce.SyncTable$SyncMapper.setup(SyncTable.java:188)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> ...
> Caused by: java.io.IOException: 
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[TOKEN, KERBEROS]{noformat}
> The above can be sorted if the SyncTable job acquires a DT for the remote NN. 
> Once hdfs related authentication is done, it's also necessary to authenticate 
> against remote HBase, as the below error would arise:
> {noformat}
> INFO mapreduce.Job: Task Id : attempt_1524358175778_172414_m_00_0, Status 
> : FAILED
> Error: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get 
> the location
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:326)
> ...
> at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:867)
> at 
> org.apache.hadoop.hbase.mapreduce.SyncTable$SyncMapper.syncRange(SyncTable.java:331)
> ...
> Caused by: java.io.IOException: Could not set up IO Streams to 
> remote-rs-host/1.1.1.2:60020
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:786)
> ...
> Caused by: java.lang.RuntimeException: SASL authentication failed. The most 
> likely cause is missing or invalid credentials. Consider 'kinit'.
> ...
> Caused by: GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos tgt)
> ...{noformat}
> The above would need additional authentication logic against the remote hbase 
> cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21452) Illegal character in hbase counters group name

2018-11-13 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685454#comment-16685454
 ] 

Sean Busbey commented on HBASE-21452:
-

Okay so release note marking it as an incompat that calls out folks looking at 
Hadoop counters (either in MapReduce jobs or Spark) will see a change in the 
names? If that sounds correct, then +1 from me for master and branch-2.

> Illegal character in hbase counters group name
> --
>
> Key: HBASE-21452
> URL: https://issues.apache.org/jira/browse/HBASE-21452
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21452.branch-2.001.patch
>
>
> Messing w/ spark counting RDD rows, spark dumps out following complaint:
> {code}
> 2018-11-07 20:03:29,132 ERROR [Executor task launch worker for task 0] 
> repl.ExecutorClassLoader: Failed to check existence of class HBase 
> Counters_en_US on REPL class server at spark://192.168.1.139:61037/classes
> java.net.URISyntaxException: Illegal character in path at index 41: 
> spark://192.168.1.139:61037/classes/HBase Counters_en_US.class
>   at java.net.URI$Parser.fail(URI.java:2848)
>   at java.net.URI$Parser.checkChars(URI.java:3021)
>   at java.net.URI$Parser.parseHierarchical(URI.java:3105)
>   at java.net.URI$Parser.parse(URI.java:3053)
>   at java.net.URI.(URI.java:588)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:328)
>   at 
> org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ExecutorClassLoader$$getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:95)
>   at 
> org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:62)
>   at 
> org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:62)
>   at 
> org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:167)
>   at 
> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:85)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.util.ResourceBundle$Control.newBundle(ResourceBundle.java:2649)
>   at java.util.ResourceBundle.loadBundle(ResourceBundle.java:1510)
>   at java.util.ResourceBundle.findBundle(ResourceBundle.java:1474)
>   at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1370)
>   at java.util.ResourceBundle.getBundle(ResourceBundle.java:1091)
>   at 
> org.apache.hadoop.mapreduce.util.ResourceBundles.getBundle(ResourceBundles.java:37)
>   at 
> org.apache.hadoop.mapreduce.util.ResourceBundles.getValue(ResourceBundles.java:56)
>   at 
> org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterGroupName(ResourceBundles.java:77)
>   at 
> org.apache.hadoop.mapreduce.counters.CounterGroupFactory.newGroup(CounterGroupFactory.java:94)
>   at 
> org.apache.hadoop.mapreduce.counters.AbstractCounters.getGroup(AbstractCounters.java:226)
>   at 
> org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:153)
>   at 
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl$DummyReporter.getCounter(TaskAttemptContextImpl.java:110)
>   at 
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.getCounter(TaskAttemptContextImpl.java:76)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.updateCounters(TableRecordReaderImpl.java:298)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.updateCounters(TableRecordReaderImpl.java:286)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:257)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:133)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.nextKeyValue(TableInputFormatBase.java:220)
>   at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:214)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1837)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1168)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1168)
>   at 
> 

[jira] [Commented] (HBASE-21470) [hbase-connectors] Build shaded versions of the connectors libs

2018-11-13 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685440#comment-16685440
 ] 

Sean Busbey commented on HBASE-21470:
-

the spark module should already work with the shaded hbase module(s). IIRC it 
requires the mapreduce specific one. I would expect that Kafka will end up 
being the same. If we can avoid relying on the shaded plugin again within 
hbase-connectors we should do so.

> [hbase-connectors] Build shaded versions of the connectors libs
> ---
>
> Key: HBASE-21470
> URL: https://issues.apache.org/jira/browse/HBASE-21470
> Project: HBase
>  Issue Type: Task
>  Components: build, hbase-connectors
>Affects Versions: connector-1.0.0
>Reporter: Adrian Muraru
>Priority: Major
>
> For downstream users it would be helpful to generate shaded versions of the 
> connectors libs, e.g hbase-shaded-spark and hbase-shaded-kafka.
> These would ease integrating this libs in Spark/Hadoop projects where 
> transitive dependencies of the connectors libs conflict with the runtime ones



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-11-13 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685419#comment-16685419
 ] 

Sean Busbey commented on HBASE-20604:
-

My understanding is that a good portion of our issues in this code are caused 
by Hadoop not really defining what to expect when a client has a concurrent 
read open on a file that's still open for write. This is usually where the 
problems in our WAL reading code comes up; our replication system is relying on 
assumptions that aren't really documented anywhere.

AFAIK there's no UT because we have never been able to isolate the problem(s) 
that poke up in production around this, and trying to mock out the various 
levels of abstraction went poorly when last I tried.

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.9
>
> Attachments: HBASE-20604.002.patch, HBASE-20604.003.patch, 
> HBASE-20604.004.patch, HBASE-20604.005.patch, HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-11-13 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685426#comment-16685426
 ] 

Sean Busbey commented on HBASE-20604:
-

Also also I would like to start the RC process for 1.2.9 this week, so it'd be 
very helpful if this critical issue didn't reopen. ;)

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.9
>
> Attachments: HBASE-20604.002.patch, HBASE-20604.003.patch, 
> HBASE-20604.004.patch, HBASE-20604.005.patch, HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-11-13 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685425#comment-16685425
 ] 

Sean Busbey commented on HBASE-20604:
-

Also note that while we can try to isolate problems into things that Hadoop is 
doing wrong (either in CryptoInputStream or the other implementation classes), 
that project defines "what HDFS did in ~Hadoop 1" as canonical. This can 
include things that some folks might consider incorrect implementation details, 
if the project believes downstream has come to rely on it.

So to some degree we'll be stuck with defensive workarounds in our client usage 
regardless.

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.9
>
> Attachments: HBASE-20604.002.patch, HBASE-20604.003.patch, 
> HBASE-20604.004.patch, HBASE-20604.005.patch, HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-11-13 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685385#comment-16685385
 ] 

Sean Busbey commented on HBASE-20952:
-

How about if I move the job to only run weekly? Since my last comment this job 
has run 5 times, which means for about 20 hours of testing that has gotten us 
no new information.

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21428) Performance issue due to userRegionLock in the ConnectionManager.

2018-11-13 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685285#comment-16685285
 ] 

Sean Busbey commented on HBASE-21428:
-

This should be impacting all branches, correct?

> Performance issue due to userRegionLock in the ConnectionManager.
> -
>
> Key: HBASE-21428
> URL: https://issues.apache.org/jira/browse/HBASE-21428
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.7
>Reporter: koo
>Priority: Major
>
> My service is that execute a lot of puts using HTableMultiplexer.
> After the version change, most of the requests are rejected.
> It works fine in 1.2.6.1, but there is a problem in 1.2.7.
> This issue is related with the HBASE-19260.
> Most of my threads are using a lot of time as below.
>  
> |"Worker-972" #2479 daemon prio=5 os_prio=0 tid=0x7f8cea86b000 nid=0x4c8c 
> waiting on condition [0x7f8b78104000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0005dd703b78> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>  at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1274)
>  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1186)
>  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1170)
>  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1127)
>  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:962)
>  at 
> org.apache.hadoop.hbase.client.HTableMultiplexer.put(HTableMultiplexer.java:206)
>  at 
> org.apache.hadoop.hbase.client.HTableMultiplexer.put(HTableMultiplexer.java:150)|
>  
> When I looked at the issue(HBASE-19260), I recognized the dangerous of to 
> allow accessessing multiple threads.
> However, Already create many threads with the limitations
> I think it is very inefficient to allow only one thread access.
>  
> | this.metaLookupPool = getThreadPool(
>  conf.getInt("hbase.hconnection.meta.lookup.threads.max", 128),
>  conf.getInt("hbase.hconnection.meta.lookup.threads.core", 10),
>  "-metaLookup-shared-", new LinkedBlockingQueue());|
>  
> I want to suggest changing it that allow to have multiple locks.(but not the 
> entire thread)
> The following is pseudocode.
>  
> |int lockSize = conf.getInt("hbase.hconnection.meta.lookup.threads.max", 128) 
> / 2;
> BlockingQueue userRegionLockQueue = new 
> LinkedBlockingQueue();
>  for (int i=0; i   userRegionLockQueue.put(new ReentrantLock());
>  }|
>  
> thanks.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21255) [acl] Refactor TablePermission into three classes (Global, Namespace, Table)

2018-11-08 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680442#comment-16680442
 ] 

Sean Busbey commented on HBASE-21255:
-

> I will commit it late this day if no further comments.

I don't see a +1 yet. I know you're anxious to move forward but please wait for 
a review. perhaps ping dev@hbase for volunteers

> [acl] Refactor TablePermission into three classes (Global, Namespace, Table)
> 
>
> Key: HBASE-21255
> URL: https://issues.apache.org/jira/browse/HBASE-21255
> Project: HBase
>  Issue Type: Improvement
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21225.master.001.patch, 
> HBASE-21225.master.002.patch, HBASE-21225.master.007.patch, 
> HBASE-21255.master.003.patch, HBASE-21255.master.004.patch, 
> HBASE-21255.master.005.patch, HBASE-21255.master.006.patch
>
>
> A TODO in {{TablePermission.java}}
> {code:java}
>   //TODO refactor this class
>   //we need to refacting this into three classes (Global, Table, Namespace)
> {code}
> Change Notes:
>  * Divide origin TablePermission into three classes GlobalPermission, 
> NamespacePermission, TablePermission
>  * New UserPermission consists of a user name and a permission in one of 
> [Global, Namespace, Table]Permission.
>  * Rename TableAuthManager to AuthManager(it is IA.P), and rename some 
> methods for readability.
>  * Make PermissionCache thread safe, and the ListMultiMap is changed to Set.
>  * User cache and group cache in AuthManager is combined together.
>  * Wire proto is kept, BC should be under guarantee.
>  * Fix HBASE-21390.
>  * Resolve a small {{TODO}} global entry should be handled differently in 
> AccessControlLists



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21411) Need to document the snapshot metric data that is shown in HBase Master Web UI

2018-11-08 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21411:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

thanks for the offer [~stack]; I've got things set.

thanks again Roland!

> Need to document the snapshot metric data that is shown in HBase Master Web UI
> --
>
> Key: HBASE-21411
> URL: https://issues.apache.org/jira/browse/HBASE-21411
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Roland Teague
>Assignee: Roland Teague
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 0001-Patch-for-HBASE-21411.patch, 
> HBASE-21411.master.001.patch
>
>
> We need to add documentation into the Reference Guide for the work that was 
> done in HBASE-15415.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21454) Kill zk spew

2018-11-08 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680298#comment-16680298
 ] 

Sean Busbey commented on HBASE-21454:
-

I am enthusiastic about not seeing a bunch of ZK stuff, but server start up 
needs some source of a classpath dump. Can we add one at the same time so that 
all the services do it directly so that it won't show up in clients?

> Kill zk spew
> 
>
> Key: HBASE-21454
> URL: https://issues.apache.org/jira/browse/HBASE-21454
> Project: HBase
>  Issue Type: Bug
>  Components: logging, Zookeeper
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21454.master.001.patch
>
>
> Kill the zk spew. This is radical dropping startup listing of CLASSPATH and 
> all properties. Can dial back-in what we need after this patch goes in.
> I get spew each time I run a little command in spark-shell. Annoying. Always 
> been annoying in all logs.
> More might be needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21452) Illegal character in hbase counters group name

2018-11-08 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680356#comment-16680356
 ] 

Sean Busbey commented on HBASE-21452:
-

this will impact any existing MR users, right?

> Illegal character in hbase counters group name
> --
>
> Key: HBASE-21452
> URL: https://issues.apache.org/jira/browse/HBASE-21452
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21452.branch-2.001.patch
>
>
> Messing w/ spark counting RDD rows, spark dumps out following complaint:
> {code}
> 2018-11-07 20:03:29,132 ERROR [Executor task launch worker for task 0] 
> repl.ExecutorClassLoader: Failed to check existence of class HBase 
> Counters_en_US on REPL class server at spark://192.168.1.139:61037/classes
> java.net.URISyntaxException: Illegal character in path at index 41: 
> spark://192.168.1.139:61037/classes/HBase Counters_en_US.class
>   at java.net.URI$Parser.fail(URI.java:2848)
>   at java.net.URI$Parser.checkChars(URI.java:3021)
>   at java.net.URI$Parser.parseHierarchical(URI.java:3105)
>   at java.net.URI$Parser.parse(URI.java:3053)
>   at java.net.URI.(URI.java:588)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:328)
>   at 
> org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ExecutorClassLoader$$getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:95)
>   at 
> org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:62)
>   at 
> org.apache.spark.repl.ExecutorClassLoader$$anonfun$1.apply(ExecutorClassLoader.scala:62)
>   at 
> org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:167)
>   at 
> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:85)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.util.ResourceBundle$Control.newBundle(ResourceBundle.java:2649)
>   at java.util.ResourceBundle.loadBundle(ResourceBundle.java:1510)
>   at java.util.ResourceBundle.findBundle(ResourceBundle.java:1474)
>   at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1370)
>   at java.util.ResourceBundle.getBundle(ResourceBundle.java:1091)
>   at 
> org.apache.hadoop.mapreduce.util.ResourceBundles.getBundle(ResourceBundles.java:37)
>   at 
> org.apache.hadoop.mapreduce.util.ResourceBundles.getValue(ResourceBundles.java:56)
>   at 
> org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterGroupName(ResourceBundles.java:77)
>   at 
> org.apache.hadoop.mapreduce.counters.CounterGroupFactory.newGroup(CounterGroupFactory.java:94)
>   at 
> org.apache.hadoop.mapreduce.counters.AbstractCounters.getGroup(AbstractCounters.java:226)
>   at 
> org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:153)
>   at 
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl$DummyReporter.getCounter(TaskAttemptContextImpl.java:110)
>   at 
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.getCounter(TaskAttemptContextImpl.java:76)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.updateCounters(TableRecordReaderImpl.java:298)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.updateCounters(TableRecordReaderImpl.java:286)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:257)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:133)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.nextKeyValue(TableInputFormatBase.java:220)
>   at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:214)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1837)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1168)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1168)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
>   at 

[jira] [Commented] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split

2018-11-08 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680322#comment-16680322
 ] 

Sean Busbey commented on HBASE-21355:
-

awesome, thanks for the follow up.

> HStore's storeSize is calculated repeatedly which causing the confusing 
> region split 
> -
>
> Key: HBASE-21355
> URL: https://issues.apache.org/jira/browse/HBASE-21355
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.1.1, 2.0.3, 1.4.9
>
> Attachments: HBASE-21355.addendum.patch, HBASE-21355.addendum.patch, 
> HBASE-21355.branch-1.patch, HBASE-21355.v1.patch
>
>
> When testing the branch-2's write performance in our internal cluster,  we 
> found that the region will be inexplicably split.  
> We use the default ConstantSizeRegionSplitPolicy and 
> hbase.hregion.max.filesize=40G,but  the region will be split even if its 
> bytes size is less than 40G(only ~6G). 
> Checked the code, I found that the following path  will  accumulate the 
> store's storeSize to a very big value, because the path has no reset..
> {code}
> RsRpcServices#getRegionInfo
>   -> HRegion#isMergeable
>-> HRegion#hasReferences
> -> HStore#hasReferences
> -> HStore#openStoreFiles
> {code}
> BTW, we seems forget to maintain the read replica's storeSize when refresh 
> the store files.
> Some comment here,  I move the  storeSize calculation out of loadStoreFiles() 
> method, because the secondary read replica's refreshStoreFiles() will also 
> use loadStoreFiles() to refresh its store files and update the storeSize in 
> the completeCompaction(..) in the final (just like compaction.) , so no need 
> calculate the storeSize twice.. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21458) Error: Could not find or load main class org.apache.hadoop.hbase.util.GetJavaProperty

2018-11-08 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680301#comment-16680301
 ] 

Sean Busbey commented on HBASE-21458:
-

can you post up a log of the message you're seeing? run in debug mode maybe? do 
these extra jars end up in the classpath we hand back to the caller?

> Error: Could not find or load main class 
> org.apache.hadoop.hbase.util.GetJavaProperty
> -
>
> Key: HBASE-21458
> URL: https://issues.apache.org/jira/browse/HBASE-21458
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: HBASE-21458.branch-2.1.001.patch
>
>
> I get this when I run bin/hbase classpath whether a built checkout or an 
> undone tarball.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split

2018-11-07 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678681#comment-16678681
 ] 

Sean Busbey commented on HBASE-21355:
-

did this not impact branch-1.2 or was it just overlooked?

> HStore's storeSize is calculated repeatedly which causing the confusing 
> region split 
> -
>
> Key: HBASE-21355
> URL: https://issues.apache.org/jira/browse/HBASE-21355
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.1.1, 2.0.3, 1.4.9
>
> Attachments: HBASE-21355.addendum.patch, HBASE-21355.addendum.patch, 
> HBASE-21355.branch-1.patch, HBASE-21355.v1.patch
>
>
> When testing the branch-2's write performance in our internal cluster,  we 
> found that the region will be inexplicably split.  
> We use the default ConstantSizeRegionSplitPolicy and 
> hbase.hregion.max.filesize=40G,but  the region will be split even if its 
> bytes size is less than 40G(only ~6G). 
> Checked the code, I found that the following path  will  accumulate the 
> store's storeSize to a very big value, because the path has no reset..
> {code}
> RsRpcServices#getRegionInfo
>   -> HRegion#isMergeable
>-> HRegion#hasReferences
> -> HStore#hasReferences
> -> HStore#openStoreFiles
> {code}
> BTW, we seems forget to maintain the read replica's storeSize when refresh 
> the store files.
> Some comment here,  I move the  storeSize calculation out of loadStoreFiles() 
> method, because the secondary read replica's refreshStoreFiles() will also 
> use loadStoreFiles() to refresh its store files and update the storeSize in 
> the completeCompaction(..) in the final (just like compaction.) , so no need 
> calculate the storeSize twice.. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21411) Need to document the snapshot metric data that is shown in HBase Master Web UI

2018-11-07 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21411:

Status: Patch Available  (was: Open)

> Need to document the snapshot metric data that is shown in HBase Master Web UI
> --
>
> Key: HBASE-21411
> URL: https://issues.apache.org/jira/browse/HBASE-21411
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Roland Teague
>Assignee: Roland Teague
>Priority: Major
> Attachments: HBASE-21411.master.001.patch
>
>
> We need to add documentation into the Reference Guide for the work that was 
> done in HBASE-15415.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21411) Need to document the snapshot metric data that is shown in HBase Master Web UI

2018-11-07 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678596#comment-16678596
 ] 

Sean Busbey commented on HBASE-21411:
-

please use {{git format-patch}} to create your patch so that it will include 
authorship information as you'd like to have it appear.

> Need to document the snapshot metric data that is shown in HBase Master Web UI
> --
>
> Key: HBASE-21411
> URL: https://issues.apache.org/jira/browse/HBASE-21411
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Roland Teague
>Assignee: Roland Teague
>Priority: Major
> Attachments: HBASE-21411.master.001.patch
>
>
> We need to add documentation into the Reference Guide for the work that was 
> done in HBASE-15415.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-21411) Need to document the snapshot metric data that is shown in HBase Master Web UI

2018-11-07 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey reassigned HBASE-21411:
---

Assignee: Roland Teague

Thanks for the patch Roland! I've added you to the contributor role in JIRA so 
you ought to be able to assign issues to yourself now (as well as mark them 
"patch available" for qabot checking and review)

> Need to document the snapshot metric data that is shown in HBase Master Web UI
> --
>
> Key: HBASE-21411
> URL: https://issues.apache.org/jira/browse/HBASE-21411
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Roland Teague
>Assignee: Roland Teague
>Priority: Major
> Attachments: HBASE-21411.master.001.patch
>
>
> We need to add documentation into the Reference Guide for the work that was 
> done in HBASE-15415.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-11-07 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678495#comment-16678495
 ] 

Sean Busbey commented on HBASE-20952:
-

Can we wait to make the branch until there are commits for it? Or wait to run 
the tests until then?

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-11-07 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678398#comment-16678398
 ] 

Sean Busbey commented on HBASE-20604:
-

+1 on v5 pending qabot

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
> Attachments: HBASE-20604.002.patch, HBASE-20604.003.patch, 
> HBASE-20604.004.patch, HBASE-20604.005.patch, HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21443) [hbase-connectors] Purge hbase-* modules from core now they've been moved to hbase-connectors

2018-11-07 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678310#comment-16678310
 ] 

Sean Busbey commented on HBASE-21443:
-

so I'm +1 on either the patch w/o the scalatools plugin or the one that 
includes it (though in the case of the latter I'll probably file a jira to 
remove it afterwards)

> [hbase-connectors] Purge hbase-* modules from core now they've been moved to 
> hbase-connectors
> -
>
> Key: HBASE-21443
> URL: https://issues.apache.org/jira/browse/HBASE-21443
> Project: HBase
>  Issue Type: Sub-task
>  Components: hbase-connectors, spark
>Affects Versions: 3.0.0, 2.2.0
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21443.master.001.patch, 
> HBASE-21443.master.002.patch, HBASE-21443.master.002.patch
>
>
> The parent copied the spark modules over to hbase-connectors. Here we purge 
> them from hbase core repo.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-15557) Add guidance on HashTable/SyncTable to the RefGuide

2018-11-07 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-15557:

Summary: Add guidance on HashTable/SyncTable to the RefGuide  (was: 
document SyncTable in ref guide)

> Add guidance on HashTable/SyncTable to the RefGuide
> ---
>
> Key: HBASE-15557
> URL: https://issues.apache.org/jira/browse/HBASE-15557
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 1.2.0
>Reporter: Sean Busbey
>Assignee: Wellington Chevreuil
>Priority: Critical
> Attachments: HBASE-15557.master.001.patch, 
> HBASE-15557.master.002.patch
>
>
> The docs for SyncTable are insufficient. Brief description from [~davelatham] 
> HBASE-13639 comment:
> {quote}
> Sorry for the lack of better documentation, Abhishek Soni. Thanks for 
> bringing it up. I'll try to provide a better explanation. You may have 
> already seen it, but if not, the design doc linked in the description above 
> may also give you some better clues as to how it should be used.
> Briefly, the feature is intended to start with a pair of tables in remote 
> clusters that are already substantially similar and make them identical by 
> comparing hashes of the data and copying only the diffs instead of having to 
> copy the entire table. So it is targeted at a very specific use case (with 
> some work it could generalize to cover things like CopyTable and 
> VerifyRepliaction but it's not there yet). To use it, you choose one table to 
> be the "source", and the other table is the "target". After the process is 
> complete the target table should end up being identical to the source table.
> In the source table's cluster, run 
> org.apache.hadoop.hbase.mapreduce.HashTable and pass it the name of the 
> source table and an output directory in HDFS. HashTable will scan the source 
> table, break the data up into row key ranges (default of 8kB per range) and 
> produce a hash of the data for each range.
> Make the hashes available to the target cluster - I'd recommend using DistCp 
> to copy it across.
> In the target table's cluster, run 
> org.apache.hadoop.hbase.mapreduce.SyncTable and pass it the directory where 
> you put the hashes, and the names of the source and destination tables. You 
> will likely also need to specify the source table's ZK quorum via the 
> --sourcezkcluster option. SyncTable will then read the hash information, and 
> compute the hashes of the same row ranges for the target table. For any row 
> range where the hash fails to match, it will open a remote scanner to the 
> source table, read the data for that range, and do Puts and Deletes to the 
> target table to update it to match the source.
> I hope that clarifies it a bit. Let me know if you need a hand. If anyone 
> wants to work on getting some documentation into the book, I can try to write 
> some more but would love a hand on turning it into an actual book patch.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20586) SyncTable tool: Add support for cross-realm remote clusters

2018-11-07 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678295#comment-16678295
 ] 

Sean Busbey commented on HBASE-20586:
-

I agree that we don't have the needed infra to have a test for this right now. 
I would like whoever commits it to try running the change as well, especially 
given that it's been ~6 months since it was submitted. I'll try to make time 
next week.

> SyncTable tool: Add support for cross-realm remote clusters
> ---
>
> Key: HBASE-20586
> URL: https://issues.apache.org/jira/browse/HBASE-20586
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce, Operability, Replication
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 1.5.0, 2.2.0
>
> Attachments: HBASE-20586.master.001.patch
>
>
> One possible scenario for HashTable/SyncTable is for synchronize different 
> clusters, for instance, when replication has been enabled but data existed 
> already, or due replication issues that may had caused long lags in the 
> replication.
> For secured clusters under different kerberos realms (with cross-realm 
> properly set), though, current SyncTable version would fail to authenticate 
> with the remote cluster when trying to read HashTable outputs (when 
> *sourcehashdir* is remote) and also when trying to read table data on the 
> remote cluster (when *sourcezkcluster* is remote).
> The hdfs error would look like this:
> {noformat}
> INFO mapreduce.Job: Task Id : attempt_1524358175778_105392_m_00_0, Status 
> : FAILED
> Error: java.io.IOException: Failed on local exception: java.io.IOException: 
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[TOKEN, KERBEROS]; Host Details : local host is: "local-host/1.1.1.1"; 
> destination host is: "remote-nn":8020;
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1506)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>         at com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:256)
> ...
>         at 
> org.apache.hadoop.hbase.mapreduce.HashTable$TableHash.readPropertiesFile(HashTable.java:144)
>         at 
> org.apache.hadoop.hbase.mapreduce.HashTable$TableHash.read(HashTable.java:105)
>         at 
> org.apache.hadoop.hbase.mapreduce.SyncTable$SyncMapper.setup(SyncTable.java:188)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> ...
> Caused by: java.io.IOException: 
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[TOKEN, KERBEROS]{noformat}
> The above can be sorted if the SyncTable job acquires a DT for the remote NN. 
> Once hdfs related authentication is done, it's also necessary to authenticate 
> against remote HBase, as the below error would arise:
> {noformat}
> INFO mapreduce.Job: Task Id : attempt_1524358175778_172414_m_00_0, Status 
> : FAILED
> Error: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get 
> the location
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:326)
> ...
> at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:867)
> at 
> org.apache.hadoop.hbase.mapreduce.SyncTable$SyncMapper.syncRange(SyncTable.java:331)
> ...
> Caused by: java.io.IOException: Could not set up IO Streams to 
> remote-rs-host/1.1.1.2:60020
> at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:786)
> ...
> Caused by: java.lang.RuntimeException: SASL authentication failed. The most 
> likely cause is missing or invalid credentials. Consider 'kinit'.
> ...
> Caused by: GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos tgt)
> ...{noformat}
> The above would need additional authentication logic against the remote hbase 
> cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-15557) Add guidance on HashTable/SyncTable to the RefGuide

2018-11-07 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-15557:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

merged. Thanks again [~wchevreuil] this is a great doc addition!

maybe for follow-on, this bit sounds like an error condition we should detect?

{code}
+.Set sourcezkcluster to the actual source cluster ZK quorum
+[NOTE]
+
+Although not required, if sourcezkcluster is not set, SyncTable will connect 
to local HBase cluster for both source and target,
+which does not give any meaningful result.
{code}


> Add guidance on HashTable/SyncTable to the RefGuide
> ---
>
> Key: HBASE-15557
> URL: https://issues.apache.org/jira/browse/HBASE-15557
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 1.2.0
>Reporter: Sean Busbey
>Assignee: Wellington Chevreuil
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HBASE-15557.master.001.patch, 
> HBASE-15557.master.002.patch
>
>
> The docs for SyncTable are insufficient. Brief description from [~davelatham] 
> HBASE-13639 comment:
> {quote}
> Sorry for the lack of better documentation, Abhishek Soni. Thanks for 
> bringing it up. I'll try to provide a better explanation. You may have 
> already seen it, but if not, the design doc linked in the description above 
> may also give you some better clues as to how it should be used.
> Briefly, the feature is intended to start with a pair of tables in remote 
> clusters that are already substantially similar and make them identical by 
> comparing hashes of the data and copying only the diffs instead of having to 
> copy the entire table. So it is targeted at a very specific use case (with 
> some work it could generalize to cover things like CopyTable and 
> VerifyRepliaction but it's not there yet). To use it, you choose one table to 
> be the "source", and the other table is the "target". After the process is 
> complete the target table should end up being identical to the source table.
> In the source table's cluster, run 
> org.apache.hadoop.hbase.mapreduce.HashTable and pass it the name of the 
> source table and an output directory in HDFS. HashTable will scan the source 
> table, break the data up into row key ranges (default of 8kB per range) and 
> produce a hash of the data for each range.
> Make the hashes available to the target cluster - I'd recommend using DistCp 
> to copy it across.
> In the target table's cluster, run 
> org.apache.hadoop.hbase.mapreduce.SyncTable and pass it the directory where 
> you put the hashes, and the names of the source and destination tables. You 
> will likely also need to specify the source table's ZK quorum via the 
> --sourcezkcluster option. SyncTable will then read the hash information, and 
> compute the hashes of the same row ranges for the target table. For any row 
> range where the hash fails to match, it will open a remote scanner to the 
> source table, read the data for that range, and do Puts and Deletes to the 
> target table to update it to match the source.
> I hope that clarifies it a bit. Let me know if you need a hand. If anyone 
> wants to work on getting some documentation into the book, I can try to write 
> some more but would love a hand on turning it into an actual book patch.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   7   8   9   10   >