[jira] [Resolved] (HBASE-24618) Backport HBASE-21204 (NPE when scan raw DELETE_FAMILY_VERSION and codec is not set) to branch-1

2020-06-24 Thread Abhishek Singh Chouhan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan resolved HBASE-24618.

Resolution: Fixed

> Backport HBASE-21204 (NPE when scan raw DELETE_FAMILY_VERSION and codec is 
> not set) to branch-1
> ---
>
> Key: HBASE-24618
> URL: https://issues.apache.org/jira/browse/HBASE-24618
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24618) Backport HBASE-21204 (NPE when scan raw DELETE_FAMILY_VERSION and codec is not set) to branch-1

2020-06-24 Thread Abhishek Singh Chouhan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-24618:
---
 Hadoop Flags: Reviewed
Affects Version/s: 1.6.0
   Issue Type: Bug  (was: Improvement)

> Backport HBASE-21204 (NPE when scan raw DELETE_FAMILY_VERSION and codec is 
> not set) to branch-1
> ---
>
> Key: HBASE-24618
> URL: https://issues.apache.org/jira/browse/HBASE-24618
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24618) Backport HBASE-21204 to branch-1

2020-06-22 Thread Abhishek Singh Chouhan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-24618:
---
Fix Version/s: 1.7.0

> Backport HBASE-21204 to branch-1
> 
>
> Key: HBASE-24618
> URL: https://issues.apache.org/jira/browse/HBASE-24618
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24618) Backport HBASE-21204 to branch-1

2020-06-22 Thread Abhishek Singh Chouhan (Jira)
Abhishek Singh Chouhan created HBASE-24618:
--

 Summary: Backport HBASE-21204 to branch-1
 Key: HBASE-24618
 URL: https://issues.apache.org/jira/browse/HBASE-24618
 Project: HBase
  Issue Type: Improvement
Reporter: Abhishek Singh Chouhan
Assignee: Abhishek Singh Chouhan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24018) Access check for getTableDescriptors is too restrictive

2020-03-18 Thread Abhishek Singh Chouhan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17062122#comment-17062122
 ] 

Abhishek Singh Chouhan commented on HBASE-24018:


[~apurtell] [~larsh] Thoughts?

> Access check for getTableDescriptors is too restrictive
> ---
>
> Key: HBASE-24018
> URL: https://issues.apache.org/jira/browse/HBASE-24018
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Priority: Major
>
> Currently getTableDescriptor requires a user to have Admin or Create 
> permissions. A client might need to get table descriptors to act accordingly 
> eg. based on an attribute set or a CP loaded. It should not be necessary for 
> the client to have create or admin privileges just to read the descriptor, 
> execute and/or read permission should be sufficient? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24018) Access check for getTableDescriptors is too restrictive

2020-03-18 Thread Abhishek Singh Chouhan (Jira)
Abhishek Singh Chouhan created HBASE-24018:
--

 Summary: Access check for getTableDescriptors is too restrictive
 Key: HBASE-24018
 URL: https://issues.apache.org/jira/browse/HBASE-24018
 Project: HBase
  Issue Type: Improvement
Reporter: Abhishek Singh Chouhan


Currently getTableDescriptor requires a user to have Admin or Create 
permissions. A client might need to get table descriptors to act accordingly 
eg. based on an attribute set or a CP loaded. It should not be necessary for 
the client to have create or admin privileges just to read the descriptor, 
execute and/or read permission should be sufficient? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23825) Increment proto conversion is broken

2020-02-11 Thread Abhishek Singh Chouhan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan resolved HBASE-23825.

Hadoop Flags: Reviewed
  Resolution: Fixed

> Increment proto conversion is broken
> 
>
> Key: HBASE-23825
> URL: https://issues.apache.org/jira/browse/HBASE-23825
> Project: HBase
>  Issue Type: Bug
>  Components: Increment
>Affects Versions: 1.4.0, 1.2.6, 1.3.2, 1.4.1, 1.5.0, 1.1.11, 1.3.3, 1.4.2, 
> 1.4.3, 1.4.4, 1.4.5, 1.3.2.1, 1.4.6, 1.4.8, 1.4.7, 1.4.9, 1.4.10, 1.3.4, 
> 1.3.5, 1.3.6, 1.4.11, 1.4.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.6.0, 1.3.7, 1.4.13
>
>
> While converting the request back to Increment using ProtobufUtil.toIncrement 
> we incorrectly use the optimization to avoid copying the byte 
> array(HBaseZeroCopyByteString#zeroCopyGetBytes) on a BoundedByteString. The 
> optimization was only meant for LiteralByteString where it is safe to use the 
> backing byte array, however it ends up being used to BoundedByteString which 
> is a subclass of LiteralByteString. This essentially breaks increments since 
> we end up creating wrong cells on the server side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23825) Increment proto conversion is broken

2020-02-11 Thread Abhishek Singh Chouhan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034960#comment-17034960
 ] 

Abhishek Singh Chouhan commented on HBASE-23825:


I've pushed to branch-1, 1.4, 1.3. Thanks for the reviews [~apurtell] 
[~anoop.hbase] [~sakthi]

> Increment proto conversion is broken
> 
>
> Key: HBASE-23825
> URL: https://issues.apache.org/jira/browse/HBASE-23825
> Project: HBase
>  Issue Type: Bug
>  Components: Increment
>Affects Versions: 1.4.0, 1.2.6, 1.3.2, 1.4.1, 1.5.0, 1.1.11, 1.3.3, 1.4.2, 
> 1.4.3, 1.4.4, 1.4.5, 1.3.2.1, 1.4.6, 1.4.8, 1.4.7, 1.4.9, 1.4.10, 1.3.4, 
> 1.3.5, 1.3.6, 1.4.11, 1.4.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Blocker
> Fix For: 1.6.0, 1.3.7, 1.4.13
>
>
> While converting the request back to Increment using ProtobufUtil.toIncrement 
> we incorrectly use the optimization to avoid copying the byte 
> array(HBaseZeroCopyByteString#zeroCopyGetBytes) on a BoundedByteString. The 
> optimization was only meant for LiteralByteString where it is safe to use the 
> backing byte array, however it ends up being used to BoundedByteString which 
> is a subclass of LiteralByteString. This essentially breaks increments since 
> we end up creating wrong cells on the server side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-23825) Increment proto conversion is broken

2020-02-10 Thread Abhishek Singh Chouhan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033988#comment-17033988
 ] 

Abhishek Singh Chouhan edited comment on HBASE-23825 at 2/10/20 10:16 PM:
--

This is not a problem in master and 2.x since we reverted HBASE-18026 from 
those branches. FYI [~andrew.purt...@gmail.com]


was (Author: abhishek.chouhan):
This is not a problem in master and 2.x since we reverted HBASE-18026. FYI 
[~andrew.purt...@gmail.com]

> Increment proto conversion is broken
> 
>
> Key: HBASE-23825
> URL: https://issues.apache.org/jira/browse/HBASE-23825
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.5.0, 1.3.6, 1.4.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Major
>
> While converting the request back to Increment using ProtobufUtil.toIncrement 
> we incorrectly use the optimization to avoid copying the byte 
> array(HBaseZeroCopyByteString#zeroCopyGetBytes) on a BoundedByteString. The 
> optimization was only meant for LiteralByteString where it is safe to use the 
> backing byte array, however it ends up being used to BoundedByteString which 
> is a subclass of LiteralByteString. This essentially breaks increments since 
> we end up creating wrong cells on the server side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23825) Increment proto conversion is broken

2020-02-10 Thread Abhishek Singh Chouhan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033988#comment-17033988
 ] 

Abhishek Singh Chouhan commented on HBASE-23825:


This is not a problem in master and 2.x since we reverted HBASE-18026. FYI 
[~andrew.purt...@gmail.com]

> Increment proto conversion is broken
> 
>
> Key: HBASE-23825
> URL: https://issues.apache.org/jira/browse/HBASE-23825
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.5.0, 1.3.6, 1.4.12
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Major
>
> While converting the request back to Increment using ProtobufUtil.toIncrement 
> we incorrectly use the optimization to avoid copying the byte 
> array(HBaseZeroCopyByteString#zeroCopyGetBytes) on a BoundedByteString. The 
> optimization was only meant for LiteralByteString where it is safe to use the 
> backing byte array, however it ends up being used to BoundedByteString which 
> is a subclass of LiteralByteString. This essentially breaks increments since 
> we end up creating wrong cells on the server side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23825) Increment proto conversion is broken

2020-02-10 Thread Abhishek Singh Chouhan (Jira)
Abhishek Singh Chouhan created HBASE-23825:
--

 Summary: Increment proto conversion is broken
 Key: HBASE-23825
 URL: https://issues.apache.org/jira/browse/HBASE-23825
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.4.12, 1.3.6, 1.5.0
Reporter: Abhishek Singh Chouhan
Assignee: Abhishek Singh Chouhan


While converting the request back to Increment using ProtobufUtil.toIncrement 
we incorrectly use the optimization to avoid copying the byte 
array(HBaseZeroCopyByteString#zeroCopyGetBytes) on a BoundedByteString. The 
optimization was only meant for LiteralByteString where it is safe to use the 
backing byte array, however it ends up being used to BoundedByteString which is 
a subclass of LiteralByteString. This essentially breaks increments since we 
end up creating wrong cells on the server side. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18127) Enable state to be passed between the region observer coprocessor hook calls

2019-07-02 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877362#comment-16877362
 ] 

Abhishek Singh Chouhan commented on HBASE-18127:


[~gjacoby] The place i left this at was that although it is possible to come up 
with a Executor implementation that takes care of transferring RpcCall between 
the caller thread and the one that will take up further execution, going that 
route would mean changing all existing such cases and replacing the executor 
with our own. In case there are places that simply end up creating another 
thread and calling further hooks from there, that would need another kind of 
handling or replacing those with custom executor based mechanism. We also need 
to think about the fact that further development on the code base would require 
that only these custom Executors that transfer necessary threadlocals between 
threads should be the only ones used, atleast when interacting with cp hooks, 
i'm not sure how feasible and error-free this would be(missing this would 
result in the state vanishing between some hooks :)).

> Enable state to be passed between the region observer coprocessor hook calls
> 
>
> Key: HBASE-18127
> URL: https://issues.apache.org/jira/browse/HBASE-18127
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Attachments: HBASE-18127.master.001.patch, 
> HBASE-18127.master.002.patch, HBASE-18127.master.002.patch, 
> HBASE-18127.master.003.patch, HBASE-18127.master.004.patch, 
> HBASE-18127.master.005.patch, HBASE-18127.master.005.patch, 
> HBASE-18127.master.006.patch
>
>
> Allow regionobserver to optionally skip postPut/postDelete when 
> postBatchMutate was called.
> Right now a RegionObserver can only statically implement one or the other. In 
> scenarios where we need to work sometimes on the single postPut and 
> postDelete hooks and sometimes on the batchMutate hooks, there is currently 
> no place to convey this information to the single hooks. I.e. the work has 
> been done in the batch, skip the single hooks.
> There are various solutions:
> 1. Allow some state to be passed _per operation_.
> 2. Remove the single hooks and always only call batch hooks (with a default 
> wrapper for the single hooks).
> 3. more?
> [~apurtell], what we had discussed a few days back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22617) Recovered WAL directories not getting cleaned up

2019-06-24 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871580#comment-16871580
 ] 

Abhishek Singh Chouhan commented on HBASE-22617:


Was out for the weekend. Thanks for taking this up [~Apache9] [~apurtell]. 

> Recovered WAL directories not getting cleaned up
> 
>
> Key: HBASE-22617
> URL: https://issues.apache.org/jira/browse/HBASE-22617
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 1.5.0
>Reporter: Abhishek Singh Chouhan
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 3.0.0, 1.5.0, 2.3.0, 2.0.6, 2.2.1, 2.1.6, 1.4.11
>
>
> While colocating the recovered edits directory with hbase.wal.dir, 
> BASE_NAMESPACE_DIR got missed. This results in recovered edits being put in a 
> separate directory rather than the default region directory even if the 
> hbase.wal.dir is not overridden. Eg. if data is stored in 
> /hbase/data/namespace/table1, recovered edits are put in  
> /hbase/namespace/table1. This also messes up the regular cleaner chores which 
> never operate on this new directory and these directories will never be 
> deleted, even for split parents or dropped tables. We should change the 
> default back to have the base namespace directory in path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22617) Recovered WAL directories not getting cleaned up

2019-06-21 Thread Abhishek Singh Chouhan (JIRA)
Abhishek Singh Chouhan created HBASE-22617:
--

 Summary: Recovered WAL directories not getting cleaned up
 Key: HBASE-22617
 URL: https://issues.apache.org/jira/browse/HBASE-22617
 Project: HBase
  Issue Type: Task
Affects Versions: 1.5.0
Reporter: Abhishek Singh Chouhan
Assignee: Abhishek Singh Chouhan


While colocating the recovered edits directory with hbase.wal.dir, 
BASE_NAMESPACE_DIR got missed. This results in recovered edits being put in a 
separate directory rather than the default region directory even if the 
hbase.wal.dir is not overridden. Eg. if data is stored in 
/hbase/data/namespace/table1, recovered edits are put in  
/hbase/namespace/table1. This also messes up the regular cleaner chores which 
never operate on this new directory and these directories will never be 
deleted, even for split parents or dropped tables. We should change the default 
back to have the base namespace directory in path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22617) Recovered WAL directories not getting cleaned up

2019-06-21 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-22617:
---
Issue Type: Bug  (was: Task)

> Recovered WAL directories not getting cleaned up
> 
>
> Key: HBASE-22617
> URL: https://issues.apache.org/jira/browse/HBASE-22617
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Major
>
> While colocating the recovered edits directory with hbase.wal.dir, 
> BASE_NAMESPACE_DIR got missed. This results in recovered edits being put in a 
> separate directory rather than the default region directory even if the 
> hbase.wal.dir is not overridden. Eg. if data is stored in 
> /hbase/data/namespace/table1, recovered edits are put in  
> /hbase/namespace/table1. This also messes up the regular cleaner chores which 
> never operate on this new directory and these directories will never be 
> deleted, even for split parents or dropped tables. We should change the 
> default back to have the base namespace directory in path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1

2019-05-09 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-22330:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Backport HBASE-20724 (Sometimes some compacted storefiles are still opened 
> after region failover) to branch-1
> -
>
> Key: HBASE-22330
> URL: https://issues.apache.org/jira/browse/HBASE-22330
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, regionserver
>Affects Versions: 1.5.0, 1.4.9, 1.3.4
>Reporter: Andrew Purtell
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.5.0, 1.3.5, 1.4.11
>
> Attachments: HBASE-22330-addendum.branch-1.patch, 
> HBASE-22330.branch-1.001.patch, HBASE-22330.branch-1.002.patch, 
> HBASE-22330.branch-1.3.001.patch
>
>
> There appears to be a race condition between close and split which when 
> combined with a side effect of HBASE-20704, leads to the parent region store 
> files getting archived and cleared while daughter regions still have 
> references to those parent region store files.
> Here is the timeline of events observed for an affected region:
>  # RS1 faces ZooKeeper connectivity issue for master node and starts shutting 
> itself down. As part of this it starts to close the store and clean up the 
> compacted files (File A)
>  # Master starts bulk assigning regions and assign parent region to RS2
>  # Region opens on RS2 and ends up opening compacted store file(s) (suspect 
> this is due to HBASE-20724)
>  # Now split happens and daughter regions open on RS2 and try to run a 
> compaction as part of post open
>  # Split request at this point is complete. However now archiving proceeds on 
> RS1 and ends up archiving the store file that is referenced by the daughter. 
> Compaction fails due to FileNotFoundException and all subsequent attempts to 
> open the region will fail until manual resolution.
> We think having HBASE-20724 would help in such situations since we won't end 
> up loading compacted store files in the first place. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1

2019-05-09 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836764#comment-16836764
 ] 

Abhishek Singh Chouhan commented on HBASE-22330:


Pushed to relevant branches. Thanks [~xucang] [~apurtell]

> Backport HBASE-20724 (Sometimes some compacted storefiles are still opened 
> after region failover) to branch-1
> -
>
> Key: HBASE-22330
> URL: https://issues.apache.org/jira/browse/HBASE-22330
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, regionserver
>Affects Versions: 1.5.0, 1.4.9, 1.3.4
>Reporter: Andrew Purtell
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.5.0, 1.3.5, 1.4.11
>
> Attachments: HBASE-22330-addendum.branch-1.patch, 
> HBASE-22330.branch-1.001.patch, HBASE-22330.branch-1.002.patch, 
> HBASE-22330.branch-1.3.001.patch
>
>
> There appears to be a race condition between close and split which when 
> combined with a side effect of HBASE-20704, leads to the parent region store 
> files getting archived and cleared while daughter regions still have 
> references to those parent region store files.
> Here is the timeline of events observed for an affected region:
>  # RS1 faces ZooKeeper connectivity issue for master node and starts shutting 
> itself down. As part of this it starts to close the store and clean up the 
> compacted files (File A)
>  # Master starts bulk assigning regions and assign parent region to RS2
>  # Region opens on RS2 and ends up opening compacted store file(s) (suspect 
> this is due to HBASE-20724)
>  # Now split happens and daughter regions open on RS2 and try to run a 
> compaction as part of post open
>  # Split request at this point is complete. However now archiving proceeds on 
> RS1 and ends up archiving the store file that is referenced by the daughter. 
> Compaction fails due to FileNotFoundException and all subsequent attempts to 
> open the region will fail until manual resolution.
> We think having HBASE-20724 would help in such situations since we won't end 
> up loading compacted store files in the first place. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1

2019-05-09 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-22330:
---
Attachment: HBASE-22330-addendum.branch-1.patch

> Backport HBASE-20724 (Sometimes some compacted storefiles are still opened 
> after region failover) to branch-1
> -
>
> Key: HBASE-22330
> URL: https://issues.apache.org/jira/browse/HBASE-22330
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, regionserver
>Affects Versions: 1.5.0, 1.4.9, 1.3.4
>Reporter: Andrew Purtell
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.5.0, 1.3.5, 1.4.11
>
> Attachments: HBASE-22330-addendum.branch-1.patch, 
> HBASE-22330.branch-1.001.patch, HBASE-22330.branch-1.002.patch, 
> HBASE-22330.branch-1.3.001.patch
>
>
> There appears to be a race condition between close and split which when 
> combined with a side effect of HBASE-20704, leads to the parent region store 
> files getting archived and cleared while daughter regions still have 
> references to those parent region store files.
> Here is the timeline of events observed for an affected region:
>  # RS1 faces ZooKeeper connectivity issue for master node and starts shutting 
> itself down. As part of this it starts to close the store and clean up the 
> compacted files (File A)
>  # Master starts bulk assigning regions and assign parent region to RS2
>  # Region opens on RS2 and ends up opening compacted store file(s) (suspect 
> this is due to HBASE-20724)
>  # Now split happens and daughter regions open on RS2 and try to run a 
> compaction as part of post open
>  # Split request at this point is complete. However now archiving proceeds on 
> RS1 and ends up archiving the store file that is referenced by the daughter. 
> Compaction fails due to FileNotFoundException and all subsequent attempts to 
> open the region will fail until manual resolution.
> We think having HBASE-20724 would help in such situations since we won't end 
> up loading compacted store files in the first place. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1

2019-05-09 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-22330:
---
Hadoop Flags:   (was: Reviewed)
  Status: Patch Available  (was: Reopened)

> Backport HBASE-20724 (Sometimes some compacted storefiles are still opened 
> after region failover) to branch-1
> -
>
> Key: HBASE-22330
> URL: https://issues.apache.org/jira/browse/HBASE-22330
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, regionserver
>Affects Versions: 1.3.4, 1.4.9, 1.5.0
>Reporter: Andrew Purtell
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.5.0, 1.3.5, 1.4.11
>
> Attachments: HBASE-22330-addendum.branch-1.patch, 
> HBASE-22330.branch-1.001.patch, HBASE-22330.branch-1.002.patch, 
> HBASE-22330.branch-1.3.001.patch
>
>
> There appears to be a race condition between close and split which when 
> combined with a side effect of HBASE-20704, leads to the parent region store 
> files getting archived and cleared while daughter regions still have 
> references to those parent region store files.
> Here is the timeline of events observed for an affected region:
>  # RS1 faces ZooKeeper connectivity issue for master node and starts shutting 
> itself down. As part of this it starts to close the store and clean up the 
> compacted files (File A)
>  # Master starts bulk assigning regions and assign parent region to RS2
>  # Region opens on RS2 and ends up opening compacted store file(s) (suspect 
> this is due to HBASE-20724)
>  # Now split happens and daughter regions open on RS2 and try to run a 
> compaction as part of post open
>  # Split request at this point is complete. However now archiving proceeds on 
> RS1 and ends up archiving the store file that is referenced by the daughter. 
> Compaction fails due to FileNotFoundException and all subsequent attempts to 
> open the region will fail until manual resolution.
> We think having HBASE-20724 would help in such situations since we won't end 
> up loading compacted store files in the first place. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1

2019-05-09 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan reopened HBASE-22330:


Hardening the test

> Backport HBASE-20724 (Sometimes some compacted storefiles are still opened 
> after region failover) to branch-1
> -
>
> Key: HBASE-22330
> URL: https://issues.apache.org/jira/browse/HBASE-22330
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, regionserver
>Affects Versions: 1.5.0, 1.4.9, 1.3.4
>Reporter: Andrew Purtell
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.5.0, 1.3.5, 1.4.11
>
> Attachments: HBASE-22330.branch-1.001.patch, 
> HBASE-22330.branch-1.002.patch, HBASE-22330.branch-1.3.001.patch
>
>
> There appears to be a race condition between close and split which when 
> combined with a side effect of HBASE-20704, leads to the parent region store 
> files getting archived and cleared while daughter regions still have 
> references to those parent region store files.
> Here is the timeline of events observed for an affected region:
>  # RS1 faces ZooKeeper connectivity issue for master node and starts shutting 
> itself down. As part of this it starts to close the store and clean up the 
> compacted files (File A)
>  # Master starts bulk assigning regions and assign parent region to RS2
>  # Region opens on RS2 and ends up opening compacted store file(s) (suspect 
> this is due to HBASE-20724)
>  # Now split happens and daughter regions open on RS2 and try to run a 
> compaction as part of post open
>  # Split request at this point is complete. However now archiving proceeds on 
> RS1 and ends up archiving the store file that is referenced by the daughter. 
> Compaction fails due to FileNotFoundException and all subsequent attempts to 
> open the region will fail until manual resolution.
> We think having HBASE-20724 would help in such situations since we won't end 
> up loading compacted store files in the first place. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1

2019-05-09 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836633#comment-16836633
 ] 

Abhishek Singh Chouhan commented on HBASE-22330:


1 in 10 runs fail for me locally. Looks to be a test only issue due to 
differences in what testUtil.waitTableAvailable(..) does between master and 
branch-1. Let me put up an addendum for the test which removes flakiness. 

> Backport HBASE-20724 (Sometimes some compacted storefiles are still opened 
> after region failover) to branch-1
> -
>
> Key: HBASE-22330
> URL: https://issues.apache.org/jira/browse/HBASE-22330
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, regionserver
>Affects Versions: 1.5.0, 1.4.9, 1.3.4
>Reporter: Andrew Purtell
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.5.0, 1.3.5, 1.4.11
>
> Attachments: HBASE-22330.branch-1.001.patch, 
> HBASE-22330.branch-1.002.patch, HBASE-22330.branch-1.3.001.patch
>
>
> There appears to be a race condition between close and split which when 
> combined with a side effect of HBASE-20704, leads to the parent region store 
> files getting archived and cleared while daughter regions still have 
> references to those parent region store files.
> Here is the timeline of events observed for an affected region:
>  # RS1 faces ZooKeeper connectivity issue for master node and starts shutting 
> itself down. As part of this it starts to close the store and clean up the 
> compacted files (File A)
>  # Master starts bulk assigning regions and assign parent region to RS2
>  # Region opens on RS2 and ends up opening compacted store file(s) (suspect 
> this is due to HBASE-20724)
>  # Now split happens and daughter regions open on RS2 and try to run a 
> compaction as part of post open
>  # Split request at this point is complete. However now archiving proceeds on 
> RS1 and ends up archiving the store file that is referenced by the daughter. 
> Compaction fails due to FileNotFoundException and all subsequent attempts to 
> open the region will fail until manual resolution.
> We think having HBASE-20724 would help in such situations since we won't end 
> up loading compacted store files in the first place. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1

2019-05-08 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835966#comment-16835966
 ] 

Abhishek Singh Chouhan commented on HBASE-22330:


Pushed to branch-1.3/1.4/1. Thanks a ton for reviewing [~apurtell]

> Backport HBASE-20724 (Sometimes some compacted storefiles are still opened 
> after region failover) to branch-1
> -
>
> Key: HBASE-22330
> URL: https://issues.apache.org/jira/browse/HBASE-22330
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, regionserver
>Affects Versions: 1.5.0, 1.4.9, 1.3.4
>Reporter: Andrew Purtell
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.5.0, 1.3.5, 1.4.11
>
> Attachments: HBASE-22330.branch-1.001.patch, 
> HBASE-22330.branch-1.002.patch, HBASE-22330.branch-1.3.001.patch
>
>
> There appears to be a race condition between close and split which when 
> combined with a side effect of HBASE-20704, leads to the parent region store 
> files getting archived and cleared while daughter regions still have 
> references to those parent region store files.
> Here is the timeline of events observed for an affected region:
>  # RS1 faces ZooKeeper connectivity issue for master node and starts shutting 
> itself down. As part of this it starts to close the store and clean up the 
> compacted files (File A)
>  # Master starts bulk assigning regions and assign parent region to RS2
>  # Region opens on RS2 and ends up opening compacted store file(s) (suspect 
> this is due to HBASE-20724)
>  # Now split happens and daughter regions open on RS2 and try to run a 
> compaction as part of post open
>  # Split request at this point is complete. However now archiving proceeds on 
> RS1 and ends up archiving the store file that is referenced by the daughter. 
> Compaction fails due to FileNotFoundException and all subsequent attempts to 
> open the region will fail until manual resolution.
> We think having HBASE-20724 would help in such situations since we won't end 
> up loading compacted store files in the first place. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1

2019-05-08 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-22330:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Backport HBASE-20724 (Sometimes some compacted storefiles are still opened 
> after region failover) to branch-1
> -
>
> Key: HBASE-22330
> URL: https://issues.apache.org/jira/browse/HBASE-22330
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, regionserver
>Affects Versions: 1.5.0, 1.4.9, 1.3.4
>Reporter: Andrew Purtell
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.5.0, 1.3.5, 1.4.11
>
> Attachments: HBASE-22330.branch-1.001.patch, 
> HBASE-22330.branch-1.002.patch, HBASE-22330.branch-1.3.001.patch
>
>
> There appears to be a race condition between close and split which when 
> combined with a side effect of HBASE-20704, leads to the parent region store 
> files getting archived and cleared while daughter regions still have 
> references to those parent region store files.
> Here is the timeline of events observed for an affected region:
>  # RS1 faces ZooKeeper connectivity issue for master node and starts shutting 
> itself down. As part of this it starts to close the store and clean up the 
> compacted files (File A)
>  # Master starts bulk assigning regions and assign parent region to RS2
>  # Region opens on RS2 and ends up opening compacted store file(s) (suspect 
> this is due to HBASE-20724)
>  # Now split happens and daughter regions open on RS2 and try to run a 
> compaction as part of post open
>  # Split request at this point is complete. However now archiving proceeds on 
> RS1 and ends up archiving the store file that is referenced by the daughter. 
> Compaction fails due to FileNotFoundException and all subsequent attempts to 
> open the region will fail until manual resolution.
> We think having HBASE-20724 would help in such situations since we won't end 
> up loading compacted store files in the first place. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1

2019-05-08 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835836#comment-16835836
 ] 

Abhishek Singh Chouhan commented on HBASE-22330:


Tests came up fine for 1.3. Planning to commit later today if no objections. 
1.3 patch differs only in the test class due to api differences. [~apurtell]

> Backport HBASE-20724 (Sometimes some compacted storefiles are still opened 
> after region failover) to branch-1
> -
>
> Key: HBASE-22330
> URL: https://issues.apache.org/jira/browse/HBASE-22330
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, regionserver
>Affects Versions: 1.5.0, 1.4.9, 1.3.4
>Reporter: Andrew Purtell
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.5.0, 1.3.5, 1.4.11
>
> Attachments: HBASE-22330.branch-1.001.patch, 
> HBASE-22330.branch-1.002.patch, HBASE-22330.branch-1.3.001.patch
>
>
> There appears to be a race condition between close and split which when 
> combined with a side effect of HBASE-20704, leads to the parent region store 
> files getting archived and cleared while daughter regions still have 
> references to those parent region store files.
> Here is the timeline of events observed for an affected region:
>  # RS1 faces ZooKeeper connectivity issue for master node and starts shutting 
> itself down. As part of this it starts to close the store and clean up the 
> compacted files (File A)
>  # Master starts bulk assigning regions and assign parent region to RS2
>  # Region opens on RS2 and ends up opening compacted store file(s) (suspect 
> this is due to HBASE-20724)
>  # Now split happens and daughter regions open on RS2 and try to run a 
> compaction as part of post open
>  # Split request at this point is complete. However now archiving proceeds on 
> RS1 and ends up archiving the store file that is referenced by the daughter. 
> Compaction fails due to FileNotFoundException and all subsequent attempts to 
> open the region will fail until manual resolution.
> We think having HBASE-20724 would help in such situations since we won't end 
> up loading compacted store files in the first place. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1

2019-05-07 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835149#comment-16835149
 ] 

Abhishek Singh Chouhan commented on HBASE-22330:


Patch for branch-1 applies to branch-1.4. However 1.3 required slight 
modification in the test file due to wal api changes. Have added a patch for 
branch-1.3.

> Backport HBASE-20724 (Sometimes some compacted storefiles are still opened 
> after region failover) to branch-1
> -
>
> Key: HBASE-22330
> URL: https://issues.apache.org/jira/browse/HBASE-22330
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, regionserver
>Affects Versions: 1.5.0, 1.4.9, 1.3.4
>Reporter: Andrew Purtell
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-22330.branch-1.001.patch, 
> HBASE-22330.branch-1.002.patch, HBASE-22330.branch-1.3.001.patch
>
>
> There appears to be a race condition between close and split which when 
> combined with a side effect of HBASE-20704, leads to the parent region store 
> files getting archived and cleared while daughter regions still have 
> references to those parent region store files.
> Here is the timeline of events observed for an affected region:
>  # RS1 faces ZooKeeper connectivity issue for master node and starts shutting 
> itself down. As part of this it starts to close the store and clean up the 
> compacted files (File A)
>  # Master starts bulk assigning regions and assign parent region to RS2
>  # Region opens on RS2 and ends up opening compacted store file(s) (suspect 
> this is due to HBASE-20724)
>  # Now split happens and daughter regions open on RS2 and try to run a 
> compaction as part of post open
>  # Split request at this point is complete. However now archiving proceeds on 
> RS1 and ends up archiving the store file that is referenced by the daughter. 
> Compaction fails due to FileNotFoundException and all subsequent attempts to 
> open the region will fail until manual resolution.
> We think having HBASE-20724 would help in such situations since we won't end 
> up loading compacted store files in the first place. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1

2019-05-07 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-22330:
---
Attachment: HBASE-22330.branch-1.3.001.patch

> Backport HBASE-20724 (Sometimes some compacted storefiles are still opened 
> after region failover) to branch-1
> -
>
> Key: HBASE-22330
> URL: https://issues.apache.org/jira/browse/HBASE-22330
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, regionserver
>Affects Versions: 1.5.0, 1.4.9, 1.3.4
>Reporter: Andrew Purtell
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-22330.branch-1.001.patch, 
> HBASE-22330.branch-1.002.patch, HBASE-22330.branch-1.3.001.patch
>
>
> There appears to be a race condition between close and split which when 
> combined with a side effect of HBASE-20704, leads to the parent region store 
> files getting archived and cleared while daughter regions still have 
> references to those parent region store files.
> Here is the timeline of events observed for an affected region:
>  # RS1 faces ZooKeeper connectivity issue for master node and starts shutting 
> itself down. As part of this it starts to close the store and clean up the 
> compacted files (File A)
>  # Master starts bulk assigning regions and assign parent region to RS2
>  # Region opens on RS2 and ends up opening compacted store file(s) (suspect 
> this is due to HBASE-20724)
>  # Now split happens and daughter regions open on RS2 and try to run a 
> compaction as part of post open
>  # Split request at this point is complete. However now archiving proceeds on 
> RS1 and ends up archiving the store file that is referenced by the daughter. 
> Compaction fails due to FileNotFoundException and all subsequent attempts to 
> open the region will fail until manual resolution.
> We think having HBASE-20724 would help in such situations since we won't end 
> up loading compacted store files in the first place. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1

2019-05-06 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834296#comment-16834296
 ] 

Abhishek Singh Chouhan commented on HBASE-22330:


Thanks for having a look [~apurtell]. Attached v2 that fixes checkstyle issues. 
Tests passed locally for me, still running them again locally to be doubly sure.

> Backport HBASE-20724 (Sometimes some compacted storefiles are still opened 
> after region failover) to branch-1
> -
>
> Key: HBASE-22330
> URL: https://issues.apache.org/jira/browse/HBASE-22330
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, regionserver
>Affects Versions: 1.5.0, 1.4.9, 1.3.4
>Reporter: Andrew Purtell
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-22330.branch-1.001.patch, 
> HBASE-22330.branch-1.002.patch
>
>
> There appears to be a race condition between close and split which when 
> combined with a side effect of HBASE-20704, leads to the parent region store 
> files getting archived and cleared while daughter regions still have 
> references to those parent region store files.
> Here is the timeline of events observed for an affected region:
>  # RS1 faces ZooKeeper connectivity issue for master node and starts shutting 
> itself down. As part of this it starts to close the store and clean up the 
> compacted files (File A)
>  # Master starts bulk assigning regions and assign parent region to RS2
>  # Region opens on RS2 and ends up opening compacted store file(s) (suspect 
> this is due to HBASE-20724)
>  # Now split happens and daughter regions open on RS2 and try to run a 
> compaction as part of post open
>  # Split request at this point is complete. However now archiving proceeds on 
> RS1 and ends up archiving the store file that is referenced by the daughter. 
> Compaction fails due to FileNotFoundException and all subsequent attempts to 
> open the region will fail until manual resolution.
> We think having HBASE-20724 would help in such situations since we won't end 
> up loading compacted store files in the first place. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1

2019-05-06 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-22330:
---
Attachment: HBASE-22330.branch-1.002.patch

> Backport HBASE-20724 (Sometimes some compacted storefiles are still opened 
> after region failover) to branch-1
> -
>
> Key: HBASE-22330
> URL: https://issues.apache.org/jira/browse/HBASE-22330
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, regionserver
>Affects Versions: 1.5.0, 1.4.9, 1.3.4
>Reporter: Andrew Purtell
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-22330.branch-1.001.patch, 
> HBASE-22330.branch-1.002.patch
>
>
> There appears to be a race condition between close and split which when 
> combined with a side effect of HBASE-20704, leads to the parent region store 
> files getting archived and cleared while daughter regions still have 
> references to those parent region store files.
> Here is the timeline of events observed for an affected region:
>  # RS1 faces ZooKeeper connectivity issue for master node and starts shutting 
> itself down. As part of this it starts to close the store and clean up the 
> compacted files (File A)
>  # Master starts bulk assigning regions and assign parent region to RS2
>  # Region opens on RS2 and ends up opening compacted store file(s) (suspect 
> this is due to HBASE-20724)
>  # Now split happens and daughter regions open on RS2 and try to run a 
> compaction as part of post open
>  # Split request at this point is complete. However now archiving proceeds on 
> RS1 and ends up archiving the store file that is referenced by the daughter. 
> Compaction fails due to FileNotFoundException and all subsequent attempts to 
> open the region will fail until manual resolution.
> We think having HBASE-20724 would help in such situations since we won't end 
> up loading compacted store files in the first place. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1

2019-05-06 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-22330:
---
Status: Patch Available  (was: Open)

> Backport HBASE-20724 (Sometimes some compacted storefiles are still opened 
> after region failover) to branch-1
> -
>
> Key: HBASE-22330
> URL: https://issues.apache.org/jira/browse/HBASE-22330
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, regionserver
>Affects Versions: 1.3.4, 1.4.9, 1.5.0
>Reporter: Andrew Purtell
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-22330.branch-1.001.patch
>
>
> There appears to be a race condition between close and split which when 
> combined with a side effect of HBASE-20704, leads to the parent region store 
> files getting archived and cleared while daughter regions still have 
> references to those parent region store files.
> Here is the timeline of events observed for an affected region:
>  # RS1 faces ZooKeeper connectivity issue for master node and starts shutting 
> itself down. As part of this it starts to close the store and clean up the 
> compacted files (File A)
>  # Master starts bulk assigning regions and assign parent region to RS2
>  # Region opens on RS2 and ends up opening compacted store file(s) (suspect 
> this is due to HBASE-20724)
>  # Now split happens and daughter regions open on RS2 and try to run a 
> compaction as part of post open
>  # Split request at this point is complete. However now archiving proceeds on 
> RS1 and ends up archiving the store file that is referenced by the daughter. 
> Compaction fails due to FileNotFoundException and all subsequent attempts to 
> open the region will fail until manual resolution.
> We think having HBASE-20724 would help in such situations since we won't end 
> up loading compacted store files in the first place. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1

2019-05-06 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-22330:
---
Attachment: HBASE-22330.branch-1.001.patch

> Backport HBASE-20724 (Sometimes some compacted storefiles are still opened 
> after region failover) to branch-1
> -
>
> Key: HBASE-22330
> URL: https://issues.apache.org/jira/browse/HBASE-22330
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, regionserver
>Affects Versions: 1.5.0, 1.4.9, 1.3.4
>Reporter: Andrew Purtell
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-22330.branch-1.001.patch
>
>
> There appears to be a race condition between close and split which when 
> combined with a side effect of HBASE-20704, leads to the parent region store 
> files getting archived and cleared while daughter regions still have 
> references to those parent region store files.
> Here is the timeline of events observed for an affected region:
>  # RS1 faces ZooKeeper connectivity issue for master node and starts shutting 
> itself down. As part of this it starts to close the store and clean up the 
> compacted files (File A)
>  # Master starts bulk assigning regions and assign parent region to RS2
>  # Region opens on RS2 and ends up opening compacted store file(s) (suspect 
> this is due to HBASE-20724)
>  # Now split happens and daughter regions open on RS2 and try to run a 
> compaction as part of post open
>  # Split request at this point is complete. However now archiving proceeds on 
> RS1 and ends up archiving the store file that is referenced by the daughter. 
> Compaction fails due to FileNotFoundException and all subsequent attempts to 
> open the region will fail until manual resolution.
> We think having HBASE-20724 would help in such situations since we won't end 
> up loading compacted store files in the first place. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1

2019-05-06 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan reassigned HBASE-22330:
--

Assignee: Abhishek Singh Chouhan

> Backport HBASE-20724 (Sometimes some compacted storefiles are still opened 
> after region failover) to branch-1
> -
>
> Key: HBASE-22330
> URL: https://issues.apache.org/jira/browse/HBASE-22330
> Project: HBase
>  Issue Type: Sub-task
>  Components: Compaction, regionserver
>Affects Versions: 1.5.0, 1.4.9, 1.3.4
>Reporter: Andrew Purtell
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 1.5.0
>
>
> There appears to be a race condition between close and split which when 
> combined with a side effect of HBASE-20704, leads to the parent region store 
> files getting archived and cleared while daughter regions still have 
> references to those parent region store files.
> Here is the timeline of events observed for an affected region:
>  # RS1 faces ZooKeeper connectivity issue for master node and starts shutting 
> itself down. As part of this it starts to close the store and clean up the 
> compacted files (File A)
>  # Master starts bulk assigning regions and assign parent region to RS2
>  # Region opens on RS2 and ends up opening compacted store file(s) (suspect 
> this is due to HBASE-20724)
>  # Now split happens and daughter regions open on RS2 and try to run a 
> compaction as part of post open
>  # Split request at this point is complete. However now archiving proceeds on 
> RS1 and ends up archiving the store file that is referenced by the daughter. 
> Compaction fails due to FileNotFoundException and all subsequent attempts to 
> open the region will fail until manual resolution.
> We think having HBASE-20724 would help in such situations since we won't end 
> up loading compacted store files in the first place. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22274) Cell size limit check on append should consider cell's previous size.

2019-04-24 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825420#comment-16825420
 ] 

Abhishek Singh Chouhan commented on HBASE-22274:


Lgtm +1. Thanks [~xucang]

> Cell size limit check on append should consider cell's previous size.
> -
>
> Key: HBASE-22274
> URL: https://issues.apache.org/jira/browse/HBASE-22274
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.0.0, 1.3.5
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Minor
> Attachments: HBASE-22274-branch-1.001.patch, 
> HBASE-22274-branch-1.002.patch, HBASE-22274-master.001.patch, 
> HBASE-22274-master.002.patch, HBASE-22274-master.002.patch, 
> HBASE-22274-master.003.patch
>
>
> Now we have cell size limit check based on this parameter 
> *hbase.server.keyvalue.maxsize* 
> One case was missing: appending to a cell only take append op's cell size 
> into account against this limit check. we should check against the potential 
> final cell size after the append.'
> It's easy to reproduce this :
>  
> Apply this diff
>  
> {code:java}
> diff --git 
> a/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
>  
> b/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
>  index 5a285ef6ba..8633177ebe 100644 --- 
> a/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
>  +++ 
> b/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
>  @@ -6455,7 +6455,7 
> - t.append(new Append(ROW).addColumn(FAMILY, QUALIFIER, new byte[10 * 
> 1024])); 
> + t.append(new Append(ROW).addColumn(FAMILY, QUALIFIER, new byte[2 * 1024])); 
> {code}
>  
> Fix is to add this check in #reckonDeltas in HRegion class, where we have 
> already got the appended cell's size. 
> Will throw DoNotRetryIOException if checks is failed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22067) Fix log line in StochasticLoadBalancer when balancer is an ill-fit for cluster size

2019-03-19 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796808#comment-16796808
 ] 

Abhishek Singh Chouhan commented on HBASE-22067:


+1

> Fix log line in StochasticLoadBalancer when balancer is an ill-fit for 
> cluster size
> ---
>
> Key: HBASE-22067
> URL: https://issues.apache.org/jira/browse/HBASE-22067
> Project: HBase
>  Issue Type: Bug
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-22067.master.001.patch
>
>
> HBASE-21338 Added log lines regarding load balancer warnings. There is a bug 
> in log that uses wrong parameter.
> 'maxRunningTime' is used , should be maxSteps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-22045) Mutable range histogram reports incorrect outliers

2019-03-18 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan resolved HBASE-22045.

  Resolution: Fixed
Hadoop Flags: Reviewed

> Mutable range histogram reports incorrect outliers
> --
>
> Key: HBASE-22045
> URL: https://issues.apache.org/jira/browse/HBASE-22045
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 1.5.0, 2.0.0, 1.4.9, 2.1.3, 2.2.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.4.10, 2.3.0, 2.1.5, 2.2.1
>
> Attachments: HBASE-22045.master.001.patch
>
>
> MutableRangeHistogram during the snapshot calculates the outliers (eg. 
> mutate_TimeRange_60-inf) and adds the counter with incorrect calculation 
> by using the overall count of event and not the number of events in the 
> snapshot.
> {code:java}
> long val = histogram.getCount();
> if (val - cumNum > 0) {
>   metricsRecordBuilder.addCounter(
>   Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - 
> 1] + "-inf", desc),
>   val - cumNum);
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22045) Mutable range histogram reports incorrect outliers

2019-03-18 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-22045:
---
Affects Version/s: (was: 1.3.3)
Fix Version/s: (was: 1.3.4)

> Mutable range histogram reports incorrect outliers
> --
>
> Key: HBASE-22045
> URL: https://issues.apache.org/jira/browse/HBASE-22045
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 1.5.0, 2.0.0, 1.4.9, 2.1.3, 2.2.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.4.10, 2.3.0, 2.1.5, 2.2.1
>
> Attachments: HBASE-22045.master.001.patch
>
>
> MutableRangeHistogram during the snapshot calculates the outliers (eg. 
> mutate_TimeRange_60-inf) and adds the counter with incorrect calculation 
> by using the overall count of event and not the number of events in the 
> snapshot.
> {code:java}
> long val = histogram.getCount();
> if (val - cumNum > 0) {
>   metricsRecordBuilder.addCounter(
>   Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - 
> 1] + "-inf", desc),
>   val - cumNum);
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22045) Mutable range histogram reports incorrect outliers

2019-03-18 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795277#comment-16795277
 ] 

Abhishek Singh Chouhan commented on HBASE-22045:


Do we also want this in branch-2.0 or is that EOLing?

> Mutable range histogram reports incorrect outliers
> --
>
> Key: HBASE-22045
> URL: https://issues.apache.org/jira/browse/HBASE-22045
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 1.5.0, 1.3.3, 2.0.0, 1.4.9, 2.1.3, 2.2.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.4.10, 1.3.4, 2.3.0, 2.1.5, 2.2.1
>
> Attachments: HBASE-22045.master.001.patch
>
>
> MutableRangeHistogram during the snapshot calculates the outliers (eg. 
> mutate_TimeRange_60-inf) and adds the counter with incorrect calculation 
> by using the overall count of event and not the number of events in the 
> snapshot.
> {code:java}
> long val = histogram.getCount();
> if (val - cumNum > 0) {
>   metricsRecordBuilder.addCounter(
>   Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - 
> 1] + "-inf", desc),
>   val - cumNum);
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-22045) Mutable range histogram reports incorrect outliers

2019-03-18 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795242#comment-16795242
 ] 

Abhishek Singh Chouhan edited comment on HBASE-22045 at 3/18/19 5:33 PM:
-

Sorry for breaking the build [~apurtell]. Looks like we don't need this in 
branch-1.3 since it does not have HBASE-18060/HBASE-9774 which caused the bug. 
Got a bit mixed up with our light forks.


was (Author: abhishek.chouhan):
Sorry for breaking the build [~apurtell]. Looks like we don't need this in 
branch-1.3 since it does not have HBASE-18060/HBASE-9774 which caused the bug. 

> Mutable range histogram reports incorrect outliers
> --
>
> Key: HBASE-22045
> URL: https://issues.apache.org/jira/browse/HBASE-22045
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 1.5.0, 1.3.3, 2.0.0, 1.4.9, 2.1.3, 2.2.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.4.10, 1.3.4, 2.3.0, 2.1.5, 2.2.1
>
> Attachments: HBASE-22045.master.001.patch
>
>
> MutableRangeHistogram during the snapshot calculates the outliers (eg. 
> mutate_TimeRange_60-inf) and adds the counter with incorrect calculation 
> by using the overall count of event and not the number of events in the 
> snapshot.
> {code:java}
> long val = histogram.getCount();
> if (val - cumNum > 0) {
>   metricsRecordBuilder.addCounter(
>   Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - 
> 1] + "-inf", desc),
>   val - cumNum);
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22045) Mutable range histogram reports incorrect outliers

2019-03-18 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795242#comment-16795242
 ] 

Abhishek Singh Chouhan commented on HBASE-22045:


Sorry for breaking the build [~apurtell]. Looks like we don't need this in 
branch-1.3 since it does not have HBASE-18060/HBASE-9774 which caused the bug. 

> Mutable range histogram reports incorrect outliers
> --
>
> Key: HBASE-22045
> URL: https://issues.apache.org/jira/browse/HBASE-22045
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 1.5.0, 1.3.3, 2.0.0, 1.4.9, 2.1.3, 2.2.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.4.10, 1.3.4, 2.3.0, 2.1.5, 2.2.1
>
> Attachments: HBASE-22045.master.001.patch
>
>
> MutableRangeHistogram during the snapshot calculates the outliers (eg. 
> mutate_TimeRange_60-inf) and adds the counter with incorrect calculation 
> by using the overall count of event and not the number of events in the 
> snapshot.
> {code:java}
> long val = histogram.getCount();
> if (val - cumNum > 0) {
>   metricsRecordBuilder.addCounter(
>   Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - 
> 1] + "-inf", desc),
>   val - cumNum);
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22045) Mutable range histogram reports incorrect outliers

2019-03-12 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-22045:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Mutable range histogram reports incorrect outliers
> --
>
> Key: HBASE-22045
> URL: https://issues.apache.org/jira/browse/HBASE-22045
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 1.5.0, 1.3.3, 2.0.0, 1.4.9, 2.1.3, 2.2.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 3.0.0, 1.4.10, 1.3.4, 2.3.0, 1.5.1, 2.1.5, 2.2.1
>
> Attachments: HBASE-22045.master.001.patch
>
>
> MutableRangeHistogram during the snapshot calculates the outliers (eg. 
> mutate_TimeRange_60-inf) and adds the counter with incorrect calculation 
> by using the overall count of event and not the number of events in the 
> snapshot.
> {code:java}
> long val = histogram.getCount();
> if (val - cumNum > 0) {
>   metricsRecordBuilder.addCounter(
>   Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - 
> 1] + "-inf", desc),
>   val - cumNum);
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22045) Mutable range histogram reports incorrect outliers

2019-03-12 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-22045:
---
Fix Version/s: 2.2.1
   2.1.5
   1.5.1
   2.3.0
   1.3.4
   1.4.10
   3.0.0

> Mutable range histogram reports incorrect outliers
> --
>
> Key: HBASE-22045
> URL: https://issues.apache.org/jira/browse/HBASE-22045
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 1.5.0, 1.3.3, 2.0.0, 1.4.9, 2.1.3, 2.2.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 3.0.0, 1.4.10, 1.3.4, 2.3.0, 1.5.1, 2.1.5, 2.2.1
>
> Attachments: HBASE-22045.master.001.patch
>
>
> MutableRangeHistogram during the snapshot calculates the outliers (eg. 
> mutate_TimeRange_60-inf) and adds the counter with incorrect calculation 
> by using the overall count of event and not the number of events in the 
> snapshot.
> {code:java}
> long val = histogram.getCount();
> if (val - cumNum > 0) {
>   metricsRecordBuilder.addCounter(
>   Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - 
> 1] + "-inf", desc),
>   val - cumNum);
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22045) Mutable range histogram reports incorrect outliers

2019-03-12 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791176#comment-16791176
 ] 

Abhishek Singh Chouhan commented on HBASE-22045:


Thanks for the review [~apurtell]! Pushed to 1.3+

> Mutable range histogram reports incorrect outliers
> --
>
> Key: HBASE-22045
> URL: https://issues.apache.org/jira/browse/HBASE-22045
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 1.5.0, 1.3.3, 2.0.0, 1.4.9, 2.1.3, 2.2.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Fix For: 3.0.0, 1.4.10, 1.3.4, 2.3.0, 1.5.1, 2.1.5, 2.2.1
>
> Attachments: HBASE-22045.master.001.patch
>
>
> MutableRangeHistogram during the snapshot calculates the outliers (eg. 
> mutate_TimeRange_60-inf) and adds the counter with incorrect calculation 
> by using the overall count of event and not the number of events in the 
> snapshot.
> {code:java}
> long val = histogram.getCount();
> if (val - cumNum > 0) {
>   metricsRecordBuilder.addCounter(
>   Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - 
> 1] + "-inf", desc),
>   val - cumNum);
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22045) Mutable range histogram reports incorrect outliers

2019-03-12 Thread Abhishek Singh Chouhan (JIRA)
Abhishek Singh Chouhan created HBASE-22045:
--

 Summary: Mutable range histogram reports incorrect outliers
 Key: HBASE-22045
 URL: https://issues.apache.org/jira/browse/HBASE-22045
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.1.3, 1.4.9, 2.0.0, 1.3.3, 1.5.0, 3.0.0, 2.2.1
Reporter: Abhishek Singh Chouhan
Assignee: Abhishek Singh Chouhan


MutableRangeHistogram during the snapshot calculates the outliers (eg. 
mutate_TimeRange_60-inf) and adds the counter with incorrect calculation by 
using the overall count of event and not the number of events in the snapshot.
{code:java}
long val = histogram.getCount();
if (val - cumNum > 0) {
  metricsRecordBuilder.addCounter(
  Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - 1] 
+ "-inf", desc),
  val - cumNum);
}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22045) Mutable range histogram reports incorrect outliers

2019-03-12 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-22045:
---
Attachment: HBASE-22045.master.001.patch

> Mutable range histogram reports incorrect outliers
> --
>
> Key: HBASE-22045
> URL: https://issues.apache.org/jira/browse/HBASE-22045
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 1.5.0, 1.3.3, 2.0.0, 1.4.9, 2.1.3, 2.2.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Attachments: HBASE-22045.master.001.patch
>
>
> MutableRangeHistogram during the snapshot calculates the outliers (eg. 
> mutate_TimeRange_60-inf) and adds the counter with incorrect calculation 
> by using the overall count of event and not the number of events in the 
> snapshot.
> {code:java}
> long val = histogram.getCount();
> if (val - cumNum > 0) {
>   metricsRecordBuilder.addCounter(
>   Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - 
> 1] + "-inf", desc),
>   val - cumNum);
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22045) Mutable range histogram reports incorrect outliers

2019-03-12 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-22045:
---
Status: Patch Available  (was: Open)

> Mutable range histogram reports incorrect outliers
> --
>
> Key: HBASE-22045
> URL: https://issues.apache.org/jira/browse/HBASE-22045
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.3, 1.4.9, 2.0.0, 1.3.3, 1.5.0, 3.0.0, 2.2.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Attachments: HBASE-22045.master.001.patch
>
>
> MutableRangeHistogram during the snapshot calculates the outliers (eg. 
> mutate_TimeRange_60-inf) and adds the counter with incorrect calculation 
> by using the overall count of event and not the number of events in the 
> snapshot.
> {code:java}
> long val = histogram.getCount();
> if (val - cumNum > 0) {
>   metricsRecordBuilder.addCounter(
>   Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - 
> 1] + "-inf", desc),
>   val - cumNum);
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22045) Mutable range histogram reports incorrect outliers

2019-03-12 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790850#comment-16790850
 ] 

Abhishek Singh Chouhan commented on HBASE-22045:


[~apurtell] [~lhofhansl]

> Mutable range histogram reports incorrect outliers
> --
>
> Key: HBASE-22045
> URL: https://issues.apache.org/jira/browse/HBASE-22045
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 1.5.0, 1.3.3, 2.0.0, 1.4.9, 2.1.3, 2.2.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Major
> Attachments: HBASE-22045.master.001.patch
>
>
> MutableRangeHistogram during the snapshot calculates the outliers (eg. 
> mutate_TimeRange_60-inf) and adds the counter with incorrect calculation 
> by using the overall count of event and not the number of events in the 
> snapshot.
> {code:java}
> long val = histogram.getCount();
> if (val - cumNum > 0) {
>   metricsRecordBuilder.addCounter(
>   Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - 
> 1] + "-inf", desc),
>   val - cumNum);
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-12133) Add FastLongHistogram for metric computation

2019-03-11 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-12133:
---
Description: FastLongHistogram is a thread-safe class that estimate 
distribution of data and computes the quantiles. It's useful for computing 
aggregated metrics like P99/P95.  (was: _emphasized text_FastLongHistogram is a 
thread-safe class that estimate distribution of data and computes the 
quantiles. It's useful for computing aggregated metrics like P99/P95.
)

> Add FastLongHistogram for metric computation
> 
>
> Key: HBASE-12133
> URL: https://issues.apache.org/jira/browse/HBASE-12133
> Project: HBase
>  Issue Type: New Feature
>  Components: metrics
>Affects Versions: 0.98.8
>Reporter: Yi Deng
>Assignee: Yi Deng
>Priority: Minor
>  Labels: histogram, metrics
> Fix For: 0.99.1, 1.3.0
>
> Attachments: 
> 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch, 
> 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch, 
> 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch, 
> 12133.addendum.txt
>
>
> FastLongHistogram is a thread-safe class that estimate distribution of data 
> and computes the quantiles. It's useful for computing aggregated metrics like 
> P99/P95.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-12133) Add FastLongHistogram for metric computation

2019-03-11 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-12133:
---
Description: 
_emphasized text_FastLongHistogram is a thread-safe class that estimate 
distribution of data and computes the quantiles. It's useful for computing 
aggregated metrics like P99/P95.


  was:
FastLongHistogram is a thread-safe class that estimate distribution of data and 
computes the quantiles. It's useful for computing aggregated metrics like 
P99/P95.



> Add FastLongHistogram for metric computation
> 
>
> Key: HBASE-12133
> URL: https://issues.apache.org/jira/browse/HBASE-12133
> Project: HBase
>  Issue Type: New Feature
>  Components: metrics
>Affects Versions: 0.98.8
>Reporter: Yi Deng
>Assignee: Yi Deng
>Priority: Minor
>  Labels: histogram, metrics
> Fix For: 0.99.1, 1.3.0
>
> Attachments: 
> 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch, 
> 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch, 
> 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch, 
> 12133.addendum.txt
>
>
> _emphasized text_FastLongHistogram is a thread-safe class that estimate 
> distribution of data and computes the quantiles. It's useful for computing 
> aggregated metrics like P99/P95.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21680) Port HBASE-20194 (Basic Replication WebUI - Master) and HBASE-20193 (Basic Replication Web UI - Regionserver) to branch-1

2019-01-21 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748299#comment-16748299
 ] 

Abhishek Singh Chouhan commented on HBASE-21680:


Sounds good. Thanks!

> Port HBASE-20194 (Basic Replication WebUI - Master) and HBASE-20193 (Basic 
> Replication Web UI - Regionserver) to branch-1
> -
>
> Key: HBASE-21680
> URL: https://issues.apache.org/jira/browse/HBASE-21680
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-21680-branch-1.patch, HBASE-21680-branch-1.patch, 
> HBASE-21680-branch-1.patch, HBASE-21680-branch-1.patch, Screen Shot 
> 2019-01-16 at 3.20.00 PM.png, Screen Shot 2019-01-16 at 3.20.50 PM.png, 
> Screen Shot 2019-01-16 at 3.21.17 PM.png, Screen Shot 2019-01-17 at 5.25.21 
> PM.png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21680) Port HBASE-20194 (Basic Replication WebUI - Master) and HBASE-20193 (Basic Replication Web UI - Regionserver) to branch-1

2019-01-21 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748224#comment-16748224
 ] 

Abhishek Singh Chouhan commented on HBASE-21680:


Latest patch LGTM, +1. [~apurtell] Do we also want to fix the NPE issue over at 
HBASE-21749 with the patch here? (looks like we may hit the same thing here 
also)

> Port HBASE-20194 (Basic Replication WebUI - Master) and HBASE-20193 (Basic 
> Replication Web UI - Regionserver) to branch-1
> -
>
> Key: HBASE-21680
> URL: https://issues.apache.org/jira/browse/HBASE-21680
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-21680-branch-1.patch, HBASE-21680-branch-1.patch, 
> HBASE-21680-branch-1.patch, HBASE-21680-branch-1.patch, Screen Shot 
> 2019-01-16 at 3.20.00 PM.png, Screen Shot 2019-01-16 at 3.20.50 PM.png, 
> Screen Shot 2019-01-16 at 3.21.17 PM.png, Screen Shot 2019-01-17 at 5.25.21 
> PM.png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21616) Port HBASE-21034 (Add new throttle type: read/write capacity unit) to branch-1

2019-01-16 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744502#comment-16744502
 ] 

Abhishek Singh Chouhan commented on HBASE-21616:


LGTM +1. Checkstyle  warnings of the previous QA look related. Might be good to 
have a look.

> Port HBASE-21034 (Add new throttle type: read/write capacity unit) to branch-1
> --
>
> Key: HBASE-21616
> URL: https://issues.apache.org/jira/browse/HBASE-21616
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-21616-branch-1.patch, HBASE-21616-branch-1.patch
>
>
> Port HBASE-21034 (Add new throttle type: read/write capacity unit) to branch-1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20806) Split style journal for flushes and compactions

2018-07-10 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538268#comment-16538268
 ] 

Abhishek Singh Chouhan commented on HBASE-20806:


Thanks [~apurtell] for originally suggesting the improvement, review and 
commit!! :)

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Fix For: 3.0.0, 2.1.0, 1.5.0, 1.2.7, 1.3.3, 1.4.6, 2.0.2
>
> Attachments: HBASE-20806.branch-1.001.patch, 
> HBASE-20806.branch-1.002.patch, HBASE-20806.branch-1.003.patch, 
> HBASE-20806.branch-2.001.patch, HBASE-20806.master.001.patch, 
> HBASE-20806.master.002.patch, HBASE-20806.master.003.patch
>
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20806) Split style journal for flushes and compactions

2018-07-09 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536958#comment-16536958
 ] 

Abhishek Singh Chouhan commented on HBASE-20806:


Have attached the patch for branch-2 which can be applied to release branches 
branch-2.x if needed (same as the patch for master but was not being applied 
cleanly due to a semicolon present here 
[https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredTask.java#L33]
 which is missing in master). 

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-20806.branch-1.001.patch, 
> HBASE-20806.branch-1.002.patch, HBASE-20806.branch-1.003.patch, 
> HBASE-20806.branch-2.001.patch, HBASE-20806.master.001.patch, 
> HBASE-20806.master.002.patch, HBASE-20806.master.003.patch
>
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions

2018-07-09 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-20806:
---
Attachment: HBASE-20806.branch-2.001.patch

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-20806.branch-1.001.patch, 
> HBASE-20806.branch-1.002.patch, HBASE-20806.branch-1.003.patch, 
> HBASE-20806.branch-2.001.patch, HBASE-20806.master.001.patch, 
> HBASE-20806.master.002.patch, HBASE-20806.master.003.patch
>
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20806) Split style journal for flushes and compactions

2018-07-09 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536911#comment-16536911
 ] 

Abhishek Singh Chouhan commented on HBASE-20806:


Have pushed to master, branch-2, branch-1. [~apurtell] let me know the release 
branches i should push this into 

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-20806.branch-1.001.patch, 
> HBASE-20806.branch-1.002.patch, HBASE-20806.branch-1.003.patch, 
> HBASE-20806.branch-2.001.patch, HBASE-20806.master.001.patch, 
> HBASE-20806.master.002.patch, HBASE-20806.master.003.patch
>
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20806) Split style journal for flushes and compactions

2018-07-09 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536603#comment-16536603
 ] 

Abhishek Singh Chouhan commented on HBASE-20806:


Will commit later today unless objections.

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-20806.branch-1.001.patch, 
> HBASE-20806.branch-1.002.patch, HBASE-20806.branch-1.003.patch, 
> HBASE-20806.master.001.patch, HBASE-20806.master.002.patch, 
> HBASE-20806.master.003.patch
>
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions

2018-07-06 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-20806:
---
Attachment: HBASE-20806.master.003.patch

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-20806.branch-1.001.patch, 
> HBASE-20806.branch-1.002.patch, HBASE-20806.branch-1.003.patch, 
> HBASE-20806.master.001.patch, HBASE-20806.master.002.patch, 
> HBASE-20806.master.003.patch
>
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions

2018-07-06 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-20806:
---
Attachment: HBASE-20806.branch-1.003.patch

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-20806.branch-1.001.patch, 
> HBASE-20806.branch-1.002.patch, HBASE-20806.branch-1.003.patch, 
> HBASE-20806.master.001.patch, HBASE-20806.master.002.patch, 
> HBASE-20806.master.003.patch
>
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20806) Split style journal for flushes and compactions

2018-07-05 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533315#comment-16533315
 ] 

Abhishek Singh Chouhan commented on HBASE-20806:


Reattaching branch-1 patch.

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-20806.branch-1.001.patch, 
> HBASE-20806.branch-1.002.patch, HBASE-20806.master.001.patch, 
> HBASE-20806.master.002.patch
>
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions

2018-07-05 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-20806:
---
Attachment: (was: HBASE-20806.branch-1.002.patch)

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-20806.branch-1.001.patch, 
> HBASE-20806.branch-1.002.patch, HBASE-20806.master.001.patch, 
> HBASE-20806.master.002.patch
>
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions

2018-07-05 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-20806:
---
Attachment: HBASE-20806.branch-1.002.patch

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-20806.branch-1.001.patch, 
> HBASE-20806.branch-1.002.patch, HBASE-20806.master.001.patch, 
> HBASE-20806.master.002.patch
>
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions

2018-07-04 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-20806:
---
Attachment: HBASE-20806.branch-1.002.patch

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-20806.branch-1.001.patch, 
> HBASE-20806.branch-1.002.patch, HBASE-20806.master.001.patch, 
> HBASE-20806.master.002.patch
>
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20806) Split style journal for flushes and compactions

2018-07-04 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532891#comment-16532891
 ] 

Abhishek Singh Chouhan commented on HBASE-20806:


Fixing checkstyle warning and added missing shutdown in the added test.

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-20806.branch-1.001.patch, 
> HBASE-20806.master.001.patch, HBASE-20806.master.002.patch
>
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions

2018-07-04 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-20806:
---
Attachment: HBASE-20806.master.002.patch

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-20806.branch-1.001.patch, 
> HBASE-20806.master.001.patch, HBASE-20806.master.002.patch
>
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions

2018-07-04 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-20806:
---
Status: Patch Available  (was: Open)

Was afk for a few days. Here's a patch for master.

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-20806.branch-1.001.patch, 
> HBASE-20806.master.001.patch
>
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions

2018-07-04 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-20806:
---
Attachment: HBASE-20806.master.001.patch

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-20806.branch-1.001.patch, 
> HBASE-20806.master.001.patch
>
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20806) Split style journal for flushes and compactions

2018-06-29 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527627#comment-16527627
 ] 

Abhishek Singh Chouhan commented on HBASE-20806:


Added a simple patch that adds the journaling functionality (much like the 
earlier split one) to monitoredTask, this is disabled by default and enabled 
only for flush and compaction(since monitored tasks are also used in other 
places such as rpcs etc.). Will add patch for master too.

Looks something like this:

2018-06-28 21:42:00,959 DEBUG [main] regionserver.HRegion(2129): Flush status 
journal:

Acquiring readlock on region at 1530202320737

Obtaining lock to block concurrent updates at 1530202320738

Preparing to flush by snapshotting stores in 
bd201548dcb5ac5a951e54af54618b97 at 1530202320738

Finished memstore snapshotting 
testCompactionFailure,,1530202319214.bd201548dcb5ac5a951e54af54618b97., syncing 
WAL and waiting on mvcc, flushsize=2952768 at 1530202320747

Flushing stores of 
testCompactionFailure,,1530202319214.bd201548dcb5ac5a951e54af54618b97. at 
1530202320749

Flushing colfamily11: creating writer at 1530202320755

Flushing colfamily11: appending metadata at 1530202320908

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-20806.branch-1.001.patch
>
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions

2018-06-29 Thread Abhishek Singh Chouhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-20806:
---
Attachment: HBASE-20806.branch-1.001.patch

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-20806.branch-1.001.patch
>
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20806) Split style journal for flushes and compactions

2018-06-28 Thread Abhishek Singh Chouhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526282#comment-16526282
 ] 

Abhishek Singh Chouhan commented on HBASE-20806:


Both. Thinking of modifying taskmonitor such that we maintain a journal of 
various status (that we already set in flushes/compactions etc.) and then 
finally logging it when we complete flush/compaction.

> Split style journal for flushes and compactions
> ---
>
> Key: HBASE-20806
> URL: https://issues.apache.org/jira/browse/HBASE-20806
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
>
> In 1.x we have split transaction journal that gives a clear picture of when 
> various stages of splits took place. We should have a similar thing for 
> flushes and compactions so as to have insights into time spent in various 
> stages, which we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20806) Split style journal for flushes and compactions

2018-06-28 Thread Abhishek Singh Chouhan (JIRA)
Abhishek Singh Chouhan created HBASE-20806:
--

 Summary: Split style journal for flushes and compactions
 Key: HBASE-20806
 URL: https://issues.apache.org/jira/browse/HBASE-20806
 Project: HBase
  Issue Type: Improvement
Reporter: Abhishek Singh Chouhan
Assignee: Abhishek Singh Chouhan


In 1.x we have split transaction journal that gives a clear picture of when 
various stages of splits took place. We should have a similar thing for flushes 
and compactions so as to have insights into time spent in various stages, which 
we can use to identify regressions that might creep up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20139) NPE in RSRpcServices.get() when getRegion throws an exception

2018-03-06 Thread Abhishek Singh Chouhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-20139:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> NPE in RSRpcServices.get() when getRegion throws an exception
> -
>
> Key: HBASE-20139
> URL: https://issues.apache.org/jira/browse/HBASE-20139
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Fix For: 1.3.2, 1.5.0, 1.4.3
>
> Attachments: HBASE-20139.branch-1.001.patch, 
> HBASE-20139.branch-1.3.001.patch, HBASE-20139.branch-1.3.001.patch
>
>
> We can get a NPE in RsRpcServices at 
> {code:java}
> } finally {
> if (regionServer.metricsRegionServer != null) {
> regionServer.metricsRegionServer.updateGet(
> -> region.getTableDesc().getTableName(), EnvironmentEdgeManager.currentTime() 
> - before);
> }
> if (quota != null) {
> quota.close();
> }{code}
> when region itself is null which might happen when getRegion throws an 
> exception, this is then sent back to the client which is not able to handle 
> this/make sense of it.
> {code:java}
> 2018-03-06 08:31:25,100 DEBUG [0,queue=4,port=60020] ipc.RpcServer - 
> RpcServer.FifoWFPBQ.default.handler=30,queue=4,port=60020: callId: 5605567 
> service: ClientService methodName: Get size: 79 connection: xyz:58736 
> deadline: 9223372036854775807
> java.io.IOException
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2431)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2246)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35068)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2373)
>         ... 3 more{code}
> This has been fixed by [~stack] over at HBASE-18946 for master, backporting 
> the same to branch-1, 1.3 and 1.4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20139) NPE in RSRpcServices.get() when getRegion throws an exception

2018-03-06 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389131#comment-16389131
 ] 

Abhishek Singh Chouhan commented on HBASE-20139:


Pushed to branch-1, branch-1.4, branch-1.3. Thanks [~stack] [~uagashe]!!! :)

> NPE in RSRpcServices.get() when getRegion throws an exception
> -
>
> Key: HBASE-20139
> URL: https://issues.apache.org/jira/browse/HBASE-20139
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Fix For: 1.3.2, 1.5.0, 1.4.3
>
> Attachments: HBASE-20139.branch-1.001.patch, 
> HBASE-20139.branch-1.3.001.patch, HBASE-20139.branch-1.3.001.patch
>
>
> We can get a NPE in RsRpcServices at 
> {code:java}
> } finally {
> if (regionServer.metricsRegionServer != null) {
> regionServer.metricsRegionServer.updateGet(
> -> region.getTableDesc().getTableName(), EnvironmentEdgeManager.currentTime() 
> - before);
> }
> if (quota != null) {
> quota.close();
> }{code}
> when region itself is null which might happen when getRegion throws an 
> exception, this is then sent back to the client which is not able to handle 
> this/make sense of it.
> {code:java}
> 2018-03-06 08:31:25,100 DEBUG [0,queue=4,port=60020] ipc.RpcServer - 
> RpcServer.FifoWFPBQ.default.handler=30,queue=4,port=60020: callId: 5605567 
> service: ClientService methodName: Get size: 79 connection: xyz:58736 
> deadline: 9223372036854775807
> java.io.IOException
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2431)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2246)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35068)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2373)
>         ... 3 more{code}
> This has been fixed by [~stack] over at HBASE-18946 for master, backporting 
> the same to branch-1, 1.3 and 1.4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20139) NPE in RSRpcServices.get() when getRegion throws an exception

2018-03-06 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389108#comment-16389108
 ] 

Abhishek Singh Chouhan commented on HBASE-20139:


Test failure is unrelated. Committing shortly.

> NPE in RSRpcServices.get() when getRegion throws an exception
> -
>
> Key: HBASE-20139
> URL: https://issues.apache.org/jira/browse/HBASE-20139
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Fix For: 1.3.2, 1.5.0, 1.4.3
>
> Attachments: HBASE-20139.branch-1.001.patch, 
> HBASE-20139.branch-1.3.001.patch, HBASE-20139.branch-1.3.001.patch
>
>
> We can get a NPE in RsRpcServices at 
> {code:java}
> } finally {
> if (regionServer.metricsRegionServer != null) {
> regionServer.metricsRegionServer.updateGet(
> -> region.getTableDesc().getTableName(), EnvironmentEdgeManager.currentTime() 
> - before);
> }
> if (quota != null) {
> quota.close();
> }{code}
> when region itself is null which might happen when getRegion throws an 
> exception, this is then sent back to the client which is not able to handle 
> this/make sense of it.
> {code:java}
> 2018-03-06 08:31:25,100 DEBUG [0,queue=4,port=60020] ipc.RpcServer - 
> RpcServer.FifoWFPBQ.default.handler=30,queue=4,port=60020: callId: 5605567 
> service: ClientService methodName: Get size: 79 connection: xyz:58736 
> deadline: 9223372036854775807
> java.io.IOException
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2431)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2246)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35068)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2373)
>         ... 3 more{code}
> This has been fixed by [~stack] over at HBASE-18946 for master, backporting 
> the same to branch-1, 1.3 and 1.4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20139) NPE in RSRpcServices.get() when getRegion throws an exception

2018-03-06 Thread Abhishek Singh Chouhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-20139:
---
Attachment: HBASE-20139.branch-1.3.001.patch

> NPE in RSRpcServices.get() when getRegion throws an exception
> -
>
> Key: HBASE-20139
> URL: https://issues.apache.org/jira/browse/HBASE-20139
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Fix For: 1.3.2, 1.5.0, 1.4.3
>
> Attachments: HBASE-20139.branch-1.001.patch, 
> HBASE-20139.branch-1.3.001.patch
>
>
> We can get a NPE in RsRpcServices at 
> {code:java}
> } finally {
> if (regionServer.metricsRegionServer != null) {
> regionServer.metricsRegionServer.updateGet(
> -> region.getTableDesc().getTableName(), EnvironmentEdgeManager.currentTime() 
> - before);
> }
> if (quota != null) {
> quota.close();
> }{code}
> when region itself is null which might happen when getRegion throws an 
> exception, this is then sent back to the client which is not able to handle 
> this/make sense of it.
> {code:java}
> 2018-03-06 08:31:25,100 DEBUG [0,queue=4,port=60020] ipc.RpcServer - 
> RpcServer.FifoWFPBQ.default.handler=30,queue=4,port=60020: callId: 5605567 
> service: ClientService methodName: Get size: 79 connection: xyz:58736 
> deadline: 9223372036854775807
> java.io.IOException
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2431)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2246)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35068)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2373)
>         ... 3 more{code}
> This has been fixed by [~stack] over at HBASE-18946 for master, backporting 
> the same to branch-1, 1.3 and 1.4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20139) NPE in RSRpcServices.get() when getRegion throws an exception

2018-03-06 Thread Abhishek Singh Chouhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-20139:
---
Status: Patch Available  (was: Open)

branch-1 patch applies to branch-1.4 too.

> NPE in RSRpcServices.get() when getRegion throws an exception
> -
>
> Key: HBASE-20139
> URL: https://issues.apache.org/jira/browse/HBASE-20139
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Fix For: 1.3.2, 1.5.0, 1.4.3
>
> Attachments: HBASE-20139.branch-1.001.patch, 
> HBASE-20139.branch-1.3.001.patch
>
>
> We can get a NPE in RsRpcServices at 
> {code:java}
> } finally {
> if (regionServer.metricsRegionServer != null) {
> regionServer.metricsRegionServer.updateGet(
> -> region.getTableDesc().getTableName(), EnvironmentEdgeManager.currentTime() 
> - before);
> }
> if (quota != null) {
> quota.close();
> }{code}
> when region itself is null which might happen when getRegion throws an 
> exception, this is then sent back to the client which is not able to handle 
> this/make sense of it.
> {code:java}
> 2018-03-06 08:31:25,100 DEBUG [0,queue=4,port=60020] ipc.RpcServer - 
> RpcServer.FifoWFPBQ.default.handler=30,queue=4,port=60020: callId: 5605567 
> service: ClientService methodName: Get size: 79 connection: xyz:58736 
> deadline: 9223372036854775807
> java.io.IOException
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2431)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2246)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35068)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2373)
>         ... 3 more{code}
> This has been fixed by [~stack] over at HBASE-18946 for master, backporting 
> the same to branch-1, 1.3 and 1.4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20139) NPE in RSRpcServices.get() when getRegion throws an exception

2018-03-06 Thread Abhishek Singh Chouhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-20139:
---
Attachment: HBASE-20139.branch-1.001.patch

> NPE in RSRpcServices.get() when getRegion throws an exception
> -
>
> Key: HBASE-20139
> URL: https://issues.apache.org/jira/browse/HBASE-20139
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Fix For: 1.3.2, 1.5.0, 1.4.3
>
> Attachments: HBASE-20139.branch-1.001.patch
>
>
> We can get a NPE in RsRpcServices at 
> {code:java}
> } finally {
> if (regionServer.metricsRegionServer != null) {
> regionServer.metricsRegionServer.updateGet(
> -> region.getTableDesc().getTableName(), EnvironmentEdgeManager.currentTime() 
> - before);
> }
> if (quota != null) {
> quota.close();
> }{code}
> when region itself is null which might happen when getRegion throws an 
> exception, this is then sent back to the client which is not able to handle 
> this/make sense of it.
> {code:java}
> 2018-03-06 08:31:25,100 DEBUG [0,queue=4,port=60020] ipc.RpcServer - 
> RpcServer.FifoWFPBQ.default.handler=30,queue=4,port=60020: callId: 5605567 
> service: ClientService methodName: Get size: 79 connection: xyz:58736 
> deadline: 9223372036854775807
> java.io.IOException
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2431)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2246)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35068)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2373)
>         ... 3 more{code}
> This has been fixed by [~stack] over at HBASE-18946 for master, backporting 
> the same to branch-1, 1.3 and 1.4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20139) NPE in RSRpcServices.get() when getRegion throws an exception

2018-03-06 Thread Abhishek Singh Chouhan (JIRA)
Abhishek Singh Chouhan created HBASE-20139:
--

 Summary: NPE in RSRpcServices.get() when getRegion throws an 
exception
 Key: HBASE-20139
 URL: https://issues.apache.org/jira/browse/HBASE-20139
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Abhishek Singh Chouhan
Assignee: Abhishek Singh Chouhan
 Fix For: 1.3.2, 1.5.0, 1.4.3


We can get a NPE in RsRpcServices at 
{code:java}
} finally {
if (regionServer.metricsRegionServer != null) {
regionServer.metricsRegionServer.updateGet(
-> region.getTableDesc().getTableName(), EnvironmentEdgeManager.currentTime() - 
before);
}
if (quota != null) {
quota.close();
}{code}
when region itself is null which might happen when getRegion throws an 
exception, this is then sent back to the client which is not able to handle 
this/make sense of it.
{code:java}
2018-03-06 08:31:25,100 DEBUG [0,queue=4,port=60020] ipc.RpcServer - 
RpcServer.FifoWFPBQ.default.handler=30,queue=4,port=60020: callId: 5605567 
service: ClientService methodName: Get size: 79 connection: xyz:58736 deadline: 
9223372036854775807
java.io.IOException
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2431)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
Caused by: java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2246)
        at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35068)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2373)
        ... 3 more{code}
This has been fixed by [~stack] over at HBASE-18946 for master, backporting the 
same to branch-1, 1.3 and 1.4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19858) Backport HBASE-14061 (Support CF-level Storage Policy) to branch-1

2018-02-01 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348860#comment-16348860
 ] 

Abhishek Singh Chouhan commented on HBASE-19858:


In StoreFile.java we might want to add default value too, else we might end up 
passing null to setStoragePolicy :
{noformat}
if (null == policyName) {
- policyName = this.conf.get(HStore.BLOCK_STORAGE_POLICY_KEY);

+ policyName = this.conf.get(HStore.BLOCK_STORAGE_POLICY_KEY, 
HStore.DEFAULT_BLOCK_STORAGE_POLICY)
}{noformat}
 

Rest LGTM.

 

> Backport HBASE-14061 (Support CF-level Storage Policy) to branch-1
> --
>
> Key: HBASE-19858
> URL: https://issues.apache.org/jira/browse/HBASE-19858
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-19858-branch-1.patch
>
>
> Backport the following commits to branch-1:
>  * HBASE-14061 Support CF-level Storage Policy
>  * HBASE-14061 Support CF-level Storage Policy (addendum)
>  * HBASE-14061 Support CF-level Storage Policy (addendum2)
>  * HBASE-15172 Support setting storage policy in bulkload
>  * HBASE-17538 HDFS.setStoragePolicy() logs errors on local fs
>  * HBASE-18015 Storage class aware block placement for procedure v2 WALs
>  * HBASE-18017 Reduce frequency of setStoragePolicy failure warnings
>  * HBASE-19016 Coordinate storage policy property name for table schema and 
> bulkload
>  
> Fix
>  * Default storage policy if not configured cannot be "NONE"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-19858) Backport HBASE-14061 (Support CF-level Storage Policy) to branch-1

2018-02-01 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348766#comment-16348766
 ] 

Abhishek Singh Chouhan edited comment on HBASE-19858 at 2/1/18 3:40 PM:


While going through the patch realized that we check both 
hbase.hstore.block.storage.policy. and 
hbase.hstore.block.storage.policy for the bulk load case, 
hbase.hstore.block.storage.policy. gives the impression of setting 
the property in general for any cf with the name cf (which is not the case 
since in hstore we only check column descriptor or 
hbase.hstore.block.storage.policy). Can probably file a Jira to name it 
something like hbase.hstore.block.storage.policy.bulkload.cf_name.


was (Author: abhishek.chouhan):
While going through the patch realized that we check both 
hbase.hstore.block.storage.policy. and 
hbase.hstore.block.storage.policy for the bulk load case, 
hbase.hstore.block.storage.policy. gives the impression of setting 
the property in general for any cf with the name cf (which is not the case 
since in hstore we only check table descriptor or 
hbase.hstore.block.storage.policy). Can probably file a Jira to name it 
something like hbase.hstore.block.storage.policy.bulkload.cf_name.

> Backport HBASE-14061 (Support CF-level Storage Policy) to branch-1
> --
>
> Key: HBASE-19858
> URL: https://issues.apache.org/jira/browse/HBASE-19858
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-19858-branch-1.patch
>
>
> Backport the following commits to branch-1:
>  * HBASE-14061 Support CF-level Storage Policy
>  * HBASE-14061 Support CF-level Storage Policy (addendum)
>  * HBASE-14061 Support CF-level Storage Policy (addendum2)
>  * HBASE-15172 Support setting storage policy in bulkload
>  * HBASE-17538 HDFS.setStoragePolicy() logs errors on local fs
>  * HBASE-18015 Storage class aware block placement for procedure v2 WALs
>  * HBASE-18017 Reduce frequency of setStoragePolicy failure warnings
>  * HBASE-19016 Coordinate storage policy property name for table schema and 
> bulkload
>  
> Fix
>  * Default storage policy if not configured cannot be "NONE"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19858) Backport HBASE-14061 (Support CF-level Storage Policy) to branch-1

2018-02-01 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348766#comment-16348766
 ] 

Abhishek Singh Chouhan commented on HBASE-19858:


While going through the patch realized that we check both 
hbase.hstore.block.storage.policy. and 
hbase.hstore.block.storage.policy for the bulk load case, 
hbase.hstore.block.storage.policy. gives the impression of setting 
the property in general for any cf with the name cf (which is not the case 
since in hstore we only check table descriptor or 
hbase.hstore.block.storage.policy). Can probably file a Jira to name it 
something like hbase.hstore.block.storage.policy.bulkload.cf_name.

> Backport HBASE-14061 (Support CF-level Storage Policy) to branch-1
> --
>
> Key: HBASE-19858
> URL: https://issues.apache.org/jira/browse/HBASE-19858
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-19858-branch-1.patch
>
>
> Backport the following commits to branch-1:
>  * HBASE-14061 Support CF-level Storage Policy
>  * HBASE-14061 Support CF-level Storage Policy (addendum)
>  * HBASE-14061 Support CF-level Storage Policy (addendum2)
>  * HBASE-15172 Support setting storage policy in bulkload
>  * HBASE-17538 HDFS.setStoragePolicy() logs errors on local fs
>  * HBASE-18015 Storage class aware block placement for procedure v2 WALs
>  * HBASE-18017 Reduce frequency of setStoragePolicy failure warnings
>  * HBASE-19016 Coordinate storage policy property name for table schema and 
> bulkload
>  
> Fix
>  * Default storage policy if not configured cannot be "NONE"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19440) Not able to enable balancer with RSGroups once disabled

2017-12-06 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281373#comment-16281373
 ] 

Abhishek Singh Chouhan commented on HBASE-19440:


Thanks [~apurtell] [~tedyu] :)

> Not able to enable balancer with RSGroups once disabled
> ---
>
> Key: HBASE-19440
> URL: https://issues.apache.org/jira/browse/HBASE-19440
> Project: HBase
>  Issue Type: Bug
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
> Fix For: 1.4.0
>
> Attachments: HBASE-19440.branch-1.001.patch
>
>
> Once the balancer is disabled, trying to switch it back on doesn't work since 
> the prebalanceswitch coprocessor hook is incorrectly always returning false.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19440) Not able to enable balancer with RSGroups once disabled

2017-12-06 Thread Abhishek Singh Chouhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-19440:
---
Status: Patch Available  (was: Open)

Getting a QA.

> Not able to enable balancer with RSGroups once disabled
> ---
>
> Key: HBASE-19440
> URL: https://issues.apache.org/jira/browse/HBASE-19440
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
> Fix For: 1.3.2, 1.4.1
>
> Attachments: HBASE-19440.branch-1.001.patch
>
>
> Once the balancer is disabled, trying to switch it back on doesn't work since 
> the prebalanceswitch coprocessor hook is incorrectly always returning false.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19440) Not able to enable balancer with RSGroups once disabled

2017-12-06 Thread Abhishek Singh Chouhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-19440:
---
Attachment: HBASE-19440.branch-1.001.patch

> Not able to enable balancer with RSGroups once disabled
> ---
>
> Key: HBASE-19440
> URL: https://issues.apache.org/jira/browse/HBASE-19440
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
> Fix For: 1.3.2, 1.4.1
>
> Attachments: HBASE-19440.branch-1.001.patch
>
>
> Once the balancer is disabled, trying to switch it back on doesn't work since 
> the prebalanceswitch coprocessor hook is incorrectly always returning false.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19440) Not able to enable balancer with RSGroups once disabled

2017-12-06 Thread Abhishek Singh Chouhan (JIRA)
Abhishek Singh Chouhan created HBASE-19440:
--

 Summary: Not able to enable balancer with RSGroups once disabled
 Key: HBASE-19440
 URL: https://issues.apache.org/jira/browse/HBASE-19440
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Abhishek Singh Chouhan
Assignee: Abhishek Singh Chouhan
 Fix For: 1.3.2, 1.4.1


Once the balancer is disabled, trying to switch it back on doesn't work since 
the prebalanceswitch coprocessor hook is incorrectly always returning false.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18127) Enable state to be passed between the region observer coprocessor hook calls

2017-11-23 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264980#comment-16264980
 ] 

Abhishek Singh Chouhan commented on HBASE-18127:


The cases that i saw had all the coproc hooks related to an operation being 
executed by a single thread. Do we have cases where some coproc hooks are 
executed by one thread and other by some other thread so we need to pass state 
between them? (need to check more here). For the general case where a thread 
from some pool executes the operation involving coproc hooks (all hooks related 
to that operation), i was thinking maybe we could have a util class that 
subclasses ThreadPoolExecutor and set the thread local in beforeExecute and 
remove in afterExecute, and then we use this threadpool whereever we're using 
threadpools for operations that involve coproc hooks. 

> Enable state to be passed between the region observer coprocessor hook calls
> 
>
> Key: HBASE-18127
> URL: https://issues.apache.org/jira/browse/HBASE-18127
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Abhishek Singh Chouhan
> Attachments: HBASE-18127.master.001.patch, 
> HBASE-18127.master.002.patch, HBASE-18127.master.002.patch, 
> HBASE-18127.master.003.patch, HBASE-18127.master.004.patch, 
> HBASE-18127.master.005.patch, HBASE-18127.master.005.patch, 
> HBASE-18127.master.006.patch
>
>
> Allow regionobserver to optionally skip postPut/postDelete when 
> postBatchMutate was called.
> Right now a RegionObserver can only statically implement one or the other. In 
> scenarios where we need to work sometimes on the single postPut and 
> postDelete hooks and sometimes on the batchMutate hooks, there is currently 
> no place to convey this information to the single hooks. I.e. the work has 
> been done in the batch, skip the single hooks.
> There are various solutions:
> 1. Allow some state to be passed _per operation_.
> 2. Remove the single hooks and always only call batch hooks (with a default 
> wrapper for the single hooks).
> 3. more?
> [~apurtell], what we had discussed a few days back.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18127) Enable state to be passed between the region observer coprocessor hook calls

2017-11-23 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264932#comment-16264932
 ] 

Abhishek Singh Chouhan commented on HBASE-18127:


As Andrew pointed out we'd need to set this outside of RPC call context since 
coproc hooks are also called from other places that are not necessarily 
originating from a RPC call, eg. Flush table operation results in creation of 
sub procedure on the rs in which case the call context would be null. We might 
need to have a threadLocal in CoprocessorHost or Environment, however we'd need 
to set it and remove it across any threadpool doing any operations that  
involve coprocessor hooks (Doing this for rpc calls and procedures in general 
would cover most of the use cases but there might be more). [~anoop.hbase] 
[~ram_krish] [~apurtell]

> Enable state to be passed between the region observer coprocessor hook calls
> 
>
> Key: HBASE-18127
> URL: https://issues.apache.org/jira/browse/HBASE-18127
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Abhishek Singh Chouhan
> Attachments: HBASE-18127.master.001.patch, 
> HBASE-18127.master.002.patch, HBASE-18127.master.002.patch, 
> HBASE-18127.master.003.patch, HBASE-18127.master.004.patch, 
> HBASE-18127.master.005.patch, HBASE-18127.master.005.patch, 
> HBASE-18127.master.006.patch
>
>
> Allow regionobserver to optionally skip postPut/postDelete when 
> postBatchMutate was called.
> Right now a RegionObserver can only statically implement one or the other. In 
> scenarios where we need to work sometimes on the single postPut and 
> postDelete hooks and sometimes on the batchMutate hooks, there is currently 
> no place to convey this information to the single hooks. I.e. the work has 
> been done in the batch, skip the single hooks.
> There are various solutions:
> 1. Allow some state to be passed _per operation_.
> 2. Remove the single hooks and always only call batch hooks (with a default 
> wrapper for the single hooks).
> 3. more?
> [~apurtell], what we had discussed a few days back.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19215) Incorrect exception handling on the client causes incorrect call timeouts and byte buffer allocations on the server

2017-11-14 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251067#comment-16251067
 ] 

Abhishek Singh Chouhan commented on HBASE-19215:


Thanks for reviewing and committing [~apurtell]!!

> Incorrect exception handling on the client causes incorrect call timeouts and 
> byte buffer allocations on the server
> ---
>
> Key: HBASE-19215
> URL: https://issues.apache.org/jira/browse/HBASE-19215
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-19215-branch-1.3.patch, 
> HBASE-19215.branch-1.001.patch
>
>
> Ran into the situation of oome on the client : java.lang.OutOfMemoryError: 
> Direct buffer memory.
> When we encounter an unhandled exception during channel write at RpcClientImpl
> {noformat}
> checkIsOpen(); // Now we're checking that it didn't became idle in between.
> try {
>   call.callStats.setRequestSizeBytes(IPCUtil.write(this.out, header, 
> call.param,
>   cellBlock));
> } catch (IOException e) {
> {noformat}
> we end up leaving the connection open. This becomes especially problematic 
> when we get an unhandled exception between writing the length of our request 
> on the channel and subsequently writing the params and cellblocks
> {noformat}
>*dos.write(Bytes.toBytes(totalSize));*
> // This allocates a buffer that is the size of the message internally.
> header.writeDelimitedTo(dos);
> if (param != null) param.writeDelimitedTo(dos);
> if (cellBlock != null) dos.write(cellBlock.array(), 0, 
> cellBlock.remaining());
> dos.flush();
> return totalSize;
> {noformat}
> After reading the length rs allocates a bb and expects data to be filled. 
> However when we encounter an exception during param write we release the 
> writelock in rpcclientimpl and do not close the connection, the exception is 
> handled at AbstractRpcClient.callBlockingMethod and retried. Now the next 
> client request to the same rs writes to the channel however the server 
> interprets this as part of the previous request and errors out during proto 
> conversion when processing the request since its considered malformed(in the 
> worst case this might be misinterpreted as wrong data?). Now the remaining 
> data of the current request is read(the current request's size > prev 
> request's allocated partially filled bytebuffer) and is misinterpreted as the 
> size of new request, in my case this was in gbs. All the client requests time 
> out since this bytebuffer is never completely filled. We should close the 
> connection for any Throwable and not just ioexception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19215) Incorrect exception handling on the client causes incorrect call timeouts and byte buffer allocations on the server

2017-11-13 Thread Abhishek Singh Chouhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-19215:
---
Status: Patch Available  (was: Open)

> Incorrect exception handling on the client causes incorrect call timeouts and 
> byte buffer allocations on the server
> ---
>
> Key: HBASE-19215
> URL: https://issues.apache.org/jira/browse/HBASE-19215
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-19215.branch-1.001.patch
>
>
> Ran into the situation of oome on the client : java.lang.OutOfMemoryError: 
> Direct buffer memory.
> When we encounter an unhandled exception during channel write at RpcClientImpl
> {noformat}
> checkIsOpen(); // Now we're checking that it didn't became idle in between.
> try {
>   call.callStats.setRequestSizeBytes(IPCUtil.write(this.out, header, 
> call.param,
>   cellBlock));
> } catch (IOException e) {
> {noformat}
> we end up leaving the connection open. This becomes especially problematic 
> when we get an unhandled exception between writing the length of our request 
> on the channel and subsequently writing the params and cellblocks
> {noformat}
>*dos.write(Bytes.toBytes(totalSize));*
> // This allocates a buffer that is the size of the message internally.
> header.writeDelimitedTo(dos);
> if (param != null) param.writeDelimitedTo(dos);
> if (cellBlock != null) dos.write(cellBlock.array(), 0, 
> cellBlock.remaining());
> dos.flush();
> return totalSize;
> {noformat}
> After reading the length rs allocates a bb and expects data to be filled. 
> However when we encounter an exception during param write we release the 
> writelock in rpcclientimpl and do not close the connection, the exception is 
> handled at AbstractRpcClient.callBlockingMethod and retried. Now the next 
> client request to the same rs writes to the channel however the server 
> interprets this as part of the previous request and errors out during proto 
> conversion when processing the request since its considered malformed(in the 
> worst case this might be misinterpreted as wrong data?). Now the remaining 
> data of the current request is read(the current request's size > prev 
> request's allocated partially filled bytebuffer) and is misinterpreted as the 
> size of new request, in my case this was in gbs. All the client requests time 
> out since this bytebuffer is never completely filled. We should close the 
> connection for any Throwable and not just ioexception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19215) Incorrect exception handling on the client causes incorrect call timeouts and byte buffer allocations on the server

2017-11-13 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249449#comment-16249449
 ] 

Abhishek Singh Chouhan commented on HBASE-19215:


Patch applies to master as well, 1.3 will need a different one though.

> Incorrect exception handling on the client causes incorrect call timeouts and 
> byte buffer allocations on the server
> ---
>
> Key: HBASE-19215
> URL: https://issues.apache.org/jira/browse/HBASE-19215
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-19215.branch-1.001.patch
>
>
> Ran into the situation of oome on the client : java.lang.OutOfMemoryError: 
> Direct buffer memory.
> When we encounter an unhandled exception during channel write at RpcClientImpl
> {noformat}
> checkIsOpen(); // Now we're checking that it didn't became idle in between.
> try {
>   call.callStats.setRequestSizeBytes(IPCUtil.write(this.out, header, 
> call.param,
>   cellBlock));
> } catch (IOException e) {
> {noformat}
> we end up leaving the connection open. This becomes especially problematic 
> when we get an unhandled exception between writing the length of our request 
> on the channel and subsequently writing the params and cellblocks
> {noformat}
>*dos.write(Bytes.toBytes(totalSize));*
> // This allocates a buffer that is the size of the message internally.
> header.writeDelimitedTo(dos);
> if (param != null) param.writeDelimitedTo(dos);
> if (cellBlock != null) dos.write(cellBlock.array(), 0, 
> cellBlock.remaining());
> dos.flush();
> return totalSize;
> {noformat}
> After reading the length rs allocates a bb and expects data to be filled. 
> However when we encounter an exception during param write we release the 
> writelock in rpcclientimpl and do not close the connection, the exception is 
> handled at AbstractRpcClient.callBlockingMethod and retried. Now the next 
> client request to the same rs writes to the channel however the server 
> interprets this as part of the previous request and errors out during proto 
> conversion when processing the request since its considered malformed(in the 
> worst case this might be misinterpreted as wrong data?). Now the remaining 
> data of the current request is read(the current request's size > prev 
> request's allocated partially filled bytebuffer) and is misinterpreted as the 
> size of new request, in my case this was in gbs. All the client requests time 
> out since this bytebuffer is never completely filled. We should close the 
> connection for any Throwable and not just ioexception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19215) Incorrect exception handling on the client causes incorrect call timeouts and byte buffer allocations on the server

2017-11-13 Thread Abhishek Singh Chouhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-19215:
---
Attachment: HBASE-19215.branch-1.001.patch

Simple patch for branch-1 that catches Throwable and not just ioexception and 
closes the connection cleanly. [~apurtell] [~anoop.hbase] [~lhofhansl]

> Incorrect exception handling on the client causes incorrect call timeouts and 
> byte buffer allocations on the server
> ---
>
> Key: HBASE-19215
> URL: https://issues.apache.org/jira/browse/HBASE-19215
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-19215.branch-1.001.patch
>
>
> Ran into the situation of oome on the client : java.lang.OutOfMemoryError: 
> Direct buffer memory.
> When we encounter an unhandled exception during channel write at RpcClientImpl
> {noformat}
> checkIsOpen(); // Now we're checking that it didn't became idle in between.
> try {
>   call.callStats.setRequestSizeBytes(IPCUtil.write(this.out, header, 
> call.param,
>   cellBlock));
> } catch (IOException e) {
> {noformat}
> we end up leaving the connection open. This becomes especially problematic 
> when we get an unhandled exception between writing the length of our request 
> on the channel and subsequently writing the params and cellblocks
> {noformat}
>*dos.write(Bytes.toBytes(totalSize));*
> // This allocates a buffer that is the size of the message internally.
> header.writeDelimitedTo(dos);
> if (param != null) param.writeDelimitedTo(dos);
> if (cellBlock != null) dos.write(cellBlock.array(), 0, 
> cellBlock.remaining());
> dos.flush();
> return totalSize;
> {noformat}
> After reading the length rs allocates a bb and expects data to be filled. 
> However when we encounter an exception during param write we release the 
> writelock in rpcclientimpl and do not close the connection, the exception is 
> handled at AbstractRpcClient.callBlockingMethod and retried. Now the next 
> client request to the same rs writes to the channel however the server 
> interprets this as part of the previous request and errors out during proto 
> conversion when processing the request since its considered malformed(in the 
> worst case this might be misinterpreted as wrong data?). Now the remaining 
> data of the current request is read(the current request's size > prev 
> request's allocated partially filled bytebuffer) and is misinterpreted as the 
> size of new request, in my case this was in gbs. All the client requests time 
> out since this bytebuffer is never completely filled. We should close the 
> connection for any Throwable and not just ioexception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-19215) Incorrect exception handling on the client causes incorrect call timeouts and byte buffer allocations on the server

2017-11-10 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248339#comment-16248339
 ] 

Abhishek Singh Chouhan edited comment on HBASE-19215 at 11/11/17 5:28 AM:
--

Going to put up a patch on monday [~apurtell], won't be able to see this 
through on the weekend, feel free to pick this up if you need it fixed before 
then.


was (Author: abhishek.chouhan):
Going to put up a patch on monday [~apurtell]

> Incorrect exception handling on the client causes incorrect call timeouts and 
> byte buffer allocations on the server
> ---
>
> Key: HBASE-19215
> URL: https://issues.apache.org/jira/browse/HBASE-19215
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2
>
>
> Ran into the situation of oome on the client : java.lang.OutOfMemoryError: 
> Direct buffer memory.
> When we encounter an unhandled exception during channel write at RpcClientImpl
> {noformat}
> checkIsOpen(); // Now we're checking that it didn't became idle in between.
> try {
>   call.callStats.setRequestSizeBytes(IPCUtil.write(this.out, header, 
> call.param,
>   cellBlock));
> } catch (IOException e) {
> {noformat}
> we end up leaving the connection open. This becomes especially problematic 
> when we get an unhandled exception between writing the length of our request 
> on the channel and subsequently writing the params and cellblocks
> {noformat}
>*dos.write(Bytes.toBytes(totalSize));*
> // This allocates a buffer that is the size of the message internally.
> header.writeDelimitedTo(dos);
> if (param != null) param.writeDelimitedTo(dos);
> if (cellBlock != null) dos.write(cellBlock.array(), 0, 
> cellBlock.remaining());
> dos.flush();
> return totalSize;
> {noformat}
> After reading the length rs allocates a bb and expects data to be filled. 
> However when we encounter an exception during param write we release the 
> writelock in rpcclientimpl and do not close the connection, the exception is 
> handled at AbstractRpcClient.callBlockingMethod and retried. Now the next 
> client request to the same rs writes to the channel however the server 
> interprets this as part of the previous request and errors out during proto 
> conversion when processing the request since its considered malformed(in the 
> worst case this might be misinterpreted as wrong data?). Now the remaining 
> data of the current request is read(the current request's size > prev 
> request's allocated partially filled bytebuffer) and is misinterpreted as the 
> size of new request, in my case this was in gbs. All the client requests time 
> out since this bytebuffer is never completely filled. We should close the 
> connection for any Throwable and not just ioexception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19215) Incorrect exception handling on the client causes incorrect call timeouts and byte buffer allocations on the server

2017-11-10 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248339#comment-16248339
 ] 

Abhishek Singh Chouhan commented on HBASE-19215:


Going to put up a patch on monday [~apurtell]

> Incorrect exception handling on the client causes incorrect call timeouts and 
> byte buffer allocations on the server
> ---
>
> Key: HBASE-19215
> URL: https://issues.apache.org/jira/browse/HBASE-19215
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.1
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
> Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2
>
>
> Ran into the situation of oome on the client : java.lang.OutOfMemoryError: 
> Direct buffer memory.
> When we encounter an unhandled exception during channel write at RpcClientImpl
> {noformat}
> checkIsOpen(); // Now we're checking that it didn't became idle in between.
> try {
>   call.callStats.setRequestSizeBytes(IPCUtil.write(this.out, header, 
> call.param,
>   cellBlock));
> } catch (IOException e) {
> {noformat}
> we end up leaving the connection open. This becomes especially problematic 
> when we get an unhandled exception between writing the length of our request 
> on the channel and subsequently writing the params and cellblocks
> {noformat}
>*dos.write(Bytes.toBytes(totalSize));*
> // This allocates a buffer that is the size of the message internally.
> header.writeDelimitedTo(dos);
> if (param != null) param.writeDelimitedTo(dos);
> if (cellBlock != null) dos.write(cellBlock.array(), 0, 
> cellBlock.remaining());
> dos.flush();
> return totalSize;
> {noformat}
> After reading the length rs allocates a bb and expects data to be filled. 
> However when we encounter an exception during param write we release the 
> writelock in rpcclientimpl and do not close the connection, the exception is 
> handled at AbstractRpcClient.callBlockingMethod and retried. Now the next 
> client request to the same rs writes to the channel however the server 
> interprets this as part of the previous request and errors out during proto 
> conversion when processing the request since its considered malformed(in the 
> worst case this might be misinterpreted as wrong data?). Now the remaining 
> data of the current request is read(the current request's size > prev 
> request's allocated partially filled bytebuffer) and is misinterpreted as the 
> size of new request, in my case this was in gbs. All the client requests time 
> out since this bytebuffer is never completely filled. We should close the 
> connection for any Throwable and not just ioexception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19215) Incorrect exception handling on the client causes incorrect call timeouts and byte buffer allocations on the server

2017-11-08 Thread Abhishek Singh Chouhan (JIRA)
Abhishek Singh Chouhan created HBASE-19215:
--

 Summary: Incorrect exception handling on the client causes 
incorrect call timeouts and byte buffer allocations on the server
 Key: HBASE-19215
 URL: https://issues.apache.org/jira/browse/HBASE-19215
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Abhishek Singh Chouhan
Assignee: Abhishek Singh Chouhan


Ran into the situation of oome on the client : java.lang.OutOfMemoryError: 
Direct buffer memory.
When we encounter an unhandled exception during channel write at RpcClientImpl

{noformat}
checkIsOpen(); // Now we're checking that it didn't became idle in between.

try {
  call.callStats.setRequestSizeBytes(IPCUtil.write(this.out, header, 
call.param,
  cellBlock));
} catch (IOException e) {
{noformat}

we end up leaving the connection open. This becomes especially problematic when 
we get an unhandled exception between writing the length of our request on the 
channel and subsequently writing the params and cellblocks

{noformat}
   *dos.write(Bytes.toBytes(totalSize));*
// This allocates a buffer that is the size of the message internally.
header.writeDelimitedTo(dos);
if (param != null) param.writeDelimitedTo(dos);
if (cellBlock != null) dos.write(cellBlock.array(), 0, 
cellBlock.remaining());
dos.flush();
return totalSize;
{noformat}

After reading the length rs allocates a bb and expects data to be filled. 
However when we encounter an exception during param write we release the 
writelock in rpcclientimpl and do not close the connection, the exception is 
handled at AbstractRpcClient.callBlockingMethod and retried. Now the next 
client request to the same rs writes to the channel however the server 
interprets this as part of the previous request and errors out during proto 
conversion when processing the request since its considered malformed(in the 
worst case this might be misinterpreted as wrong data?). Now the remaining data 
of the current request is read(the current request's size > prev request's 
allocated partially filled bytebuffer) and is misinterpreted as the size of new 
request, in my case this was in gbs. All the client requests time out since 
this bytebuffer is never completely filled. We should close the connection for 
any Throwable and not just ioexception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19094) NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup

2017-10-27 Thread Abhishek Singh Chouhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-19094:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.0.0-alpha-4
   1.5.0
   1.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

> NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup
> -
>
> Key: HBASE-19094
> URL: https://issues.apache.org/jira/browse/HBASE-19094
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Fix For: 3.0.0, 1.4.0, 1.5.0, 2.0.0-alpha-4
>
> Attachments: HBASE-19094.branch-1.001.patch, 
> HBASE-19094.master.001.patch, HBASE-19094.master.001.patch
>
>
> {noformat}
> rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker - Caught exception while 
> verifying group region
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1638)
> at 
> org.apache.hadoop.hbase.client.ConnectionUtils$2.getClient(ConnectionUtils.java:167)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker$1.visit(RSGroupInfoManagerImpl.java:646)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:638)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:159)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.waitForGroupTableOnline(RSGroupInfoManagerImpl.java:661)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.run(RSGroupInfoManagerImpl.java:582)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19094) NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup

2017-10-27 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1674#comment-1674
 ] 

Abhishek Singh Chouhan commented on HBASE-19094:


Pushed to branch-1.4+. Thanks [~vik.karma] [~yuzhih...@gmail.com]!!

> NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup
> -
>
> Key: HBASE-19094
> URL: https://issues.apache.org/jira/browse/HBASE-19094
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-19094.branch-1.001.patch, 
> HBASE-19094.master.001.patch, HBASE-19094.master.001.patch
>
>
> {noformat}
> rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker - Caught exception while 
> verifying group region
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1638)
> at 
> org.apache.hadoop.hbase.client.ConnectionUtils$2.getClient(ConnectionUtils.java:167)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker$1.visit(RSGroupInfoManagerImpl.java:646)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:638)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:159)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.waitForGroupTableOnline(RSGroupInfoManagerImpl.java:661)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.run(RSGroupInfoManagerImpl.java:582)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19094) NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup

2017-10-27 Thread Abhishek Singh Chouhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-19094:
---
Attachment: HBASE-19094.master.001.patch

Hadoop QA din't pick the master patch, let me try again.

> NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup
> -
>
> Key: HBASE-19094
> URL: https://issues.apache.org/jira/browse/HBASE-19094
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-19094.branch-1.001.patch, 
> HBASE-19094.master.001.patch, HBASE-19094.master.001.patch
>
>
> {noformat}
> rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker - Caught exception while 
> verifying group region
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1638)
> at 
> org.apache.hadoop.hbase.client.ConnectionUtils$2.getClient(ConnectionUtils.java:167)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker$1.visit(RSGroupInfoManagerImpl.java:646)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:638)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:159)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.waitForGroupTableOnline(RSGroupInfoManagerImpl.java:661)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.run(RSGroupInfoManagerImpl.java:582)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19094) NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup

2017-10-26 Thread Abhishek Singh Chouhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220615#comment-16220615
 ] 

Abhishek Singh Chouhan commented on HBASE-19094:


[~tedyu] Observed this on a cluster in master logs and not in a UT. Its coming 
from a meta scan that uses a custom result visitor for RSGroups stuff, the 
exception is logged and swallowed and ultimately retried, which succeeded.
Since its a very specific code that parses the servername and then uses it to 
get a BlockingInterface i don't think it'd be worth it to add a ut for this 
catch. Looks to be a null check miss.

> NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup
> -
>
> Key: HBASE-19094
> URL: https://issues.apache.org/jira/browse/HBASE-19094
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-19094.branch-1.001.patch, 
> HBASE-19094.master.001.patch
>
>
> {noformat}
> rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker - Caught exception while 
> verifying group region
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1638)
> at 
> org.apache.hadoop.hbase.client.ConnectionUtils$2.getClient(ConnectionUtils.java:167)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker$1.visit(RSGroupInfoManagerImpl.java:646)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:638)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:159)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.waitForGroupTableOnline(RSGroupInfoManagerImpl.java:661)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.run(RSGroupInfoManagerImpl.java:582)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19094) NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup

2017-10-26 Thread Abhishek Singh Chouhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-19094:
---
Attachment: HBASE-19094.branch-1.001.patch

> NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup
> -
>
> Key: HBASE-19094
> URL: https://issues.apache.org/jira/browse/HBASE-19094
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-19094.branch-1.001.patch, 
> HBASE-19094.master.001.patch
>
>
> {noformat}
> rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker - Caught exception while 
> verifying group region
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1638)
> at 
> org.apache.hadoop.hbase.client.ConnectionUtils$2.getClient(ConnectionUtils.java:167)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker$1.visit(RSGroupInfoManagerImpl.java:646)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:638)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:159)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.waitForGroupTableOnline(RSGroupInfoManagerImpl.java:661)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.run(RSGroupInfoManagerImpl.java:582)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19094) NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup

2017-10-26 Thread Abhishek Singh Chouhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-19094:
---
Status: Patch Available  (was: Open)

> NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup
> -
>
> Key: HBASE-19094
> URL: https://issues.apache.org/jira/browse/HBASE-19094
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-19094.master.001.patch
>
>
> {noformat}
> rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker - Caught exception while 
> verifying group region
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1638)
> at 
> org.apache.hadoop.hbase.client.ConnectionUtils$2.getClient(ConnectionUtils.java:167)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker$1.visit(RSGroupInfoManagerImpl.java:646)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:638)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:159)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.waitForGroupTableOnline(RSGroupInfoManagerImpl.java:661)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.run(RSGroupInfoManagerImpl.java:582)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-19094) NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup

2017-10-26 Thread Abhishek Singh Chouhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan updated HBASE-19094:
---
Attachment: HBASE-19094.master.001.patch

> NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup
> -
>
> Key: HBASE-19094
> URL: https://issues.apache.org/jira/browse/HBASE-19094
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Abhishek Singh Chouhan
>Assignee: Abhishek Singh Chouhan
>Priority: Minor
> Attachments: HBASE-19094.master.001.patch
>
>
> {noformat}
> rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker - Caught exception while 
> verifying group region
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1638)
> at 
> org.apache.hadoop.hbase.client.ConnectionUtils$2.getClient(ConnectionUtils.java:167)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker$1.visit(RSGroupInfoManagerImpl.java:646)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:638)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:159)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.waitForGroupTableOnline(RSGroupInfoManagerImpl.java:661)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.run(RSGroupInfoManagerImpl.java:582)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19094) NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup

2017-10-26 Thread Abhishek Singh Chouhan (JIRA)
Abhishek Singh Chouhan created HBASE-19094:
--

 Summary: NPE in RSGroupStartupWorker.waitForGroupTableOnline 
during master startup
 Key: HBASE-19094
 URL: https://issues.apache.org/jira/browse/HBASE-19094
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.4.0
Reporter: Abhishek Singh Chouhan
Assignee: Abhishek Singh Chouhan
Priority: Minor


{noformat}
rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker - Caught exception while 
verifying group region
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1638)
at 
org.apache.hadoop.hbase.client.ConnectionUtils$2.getClient(ConnectionUtils.java:167)
at 
org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker$1.visit(RSGroupInfoManagerImpl.java:646)
at 
org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:638)
at 
org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:159)
at 
org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.waitForGroupTableOnline(RSGroupInfoManagerImpl.java:661)
at 
org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.run(RSGroupInfoManagerImpl.java:582)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   4   5   >