[jira] [Resolved] (HBASE-24618) Backport HBASE-21204 (NPE when scan raw DELETE_FAMILY_VERSION and codec is not set) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-24618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan resolved HBASE-24618. Resolution: Fixed > Backport HBASE-21204 (NPE when scan raw DELETE_FAMILY_VERSION and codec is > not set) to branch-1 > --- > > Key: HBASE-24618 > URL: https://issues.apache.org/jira/browse/HBASE-24618 > Project: HBase > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.7.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24618) Backport HBASE-21204 (NPE when scan raw DELETE_FAMILY_VERSION and codec is not set) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-24618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-24618: --- Hadoop Flags: Reviewed Affects Version/s: 1.6.0 Issue Type: Bug (was: Improvement) > Backport HBASE-21204 (NPE when scan raw DELETE_FAMILY_VERSION and codec is > not set) to branch-1 > --- > > Key: HBASE-24618 > URL: https://issues.apache.org/jira/browse/HBASE-24618 > Project: HBase > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.7.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24618) Backport HBASE-21204 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-24618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-24618: --- Fix Version/s: 1.7.0 > Backport HBASE-21204 to branch-1 > > > Key: HBASE-24618 > URL: https://issues.apache.org/jira/browse/HBASE-24618 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.7.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24618) Backport HBASE-21204 to branch-1
Abhishek Singh Chouhan created HBASE-24618: -- Summary: Backport HBASE-21204 to branch-1 Key: HBASE-24618 URL: https://issues.apache.org/jira/browse/HBASE-24618 Project: HBase Issue Type: Improvement Reporter: Abhishek Singh Chouhan Assignee: Abhishek Singh Chouhan -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24018) Access check for getTableDescriptors is too restrictive
[ https://issues.apache.org/jira/browse/HBASE-24018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17062122#comment-17062122 ] Abhishek Singh Chouhan commented on HBASE-24018: [~apurtell] [~larsh] Thoughts? > Access check for getTableDescriptors is too restrictive > --- > > Key: HBASE-24018 > URL: https://issues.apache.org/jira/browse/HBASE-24018 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Priority: Major > > Currently getTableDescriptor requires a user to have Admin or Create > permissions. A client might need to get table descriptors to act accordingly > eg. based on an attribute set or a CP loaded. It should not be necessary for > the client to have create or admin privileges just to read the descriptor, > execute and/or read permission should be sufficient? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24018) Access check for getTableDescriptors is too restrictive
Abhishek Singh Chouhan created HBASE-24018: -- Summary: Access check for getTableDescriptors is too restrictive Key: HBASE-24018 URL: https://issues.apache.org/jira/browse/HBASE-24018 Project: HBase Issue Type: Improvement Reporter: Abhishek Singh Chouhan Currently getTableDescriptor requires a user to have Admin or Create permissions. A client might need to get table descriptors to act accordingly eg. based on an attribute set or a CP loaded. It should not be necessary for the client to have create or admin privileges just to read the descriptor, execute and/or read permission should be sufficient? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23825) Increment proto conversion is broken
[ https://issues.apache.org/jira/browse/HBASE-23825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan resolved HBASE-23825. Hadoop Flags: Reviewed Resolution: Fixed > Increment proto conversion is broken > > > Key: HBASE-23825 > URL: https://issues.apache.org/jira/browse/HBASE-23825 > Project: HBase > Issue Type: Bug > Components: Increment >Affects Versions: 1.4.0, 1.2.6, 1.3.2, 1.4.1, 1.5.0, 1.1.11, 1.3.3, 1.4.2, > 1.4.3, 1.4.4, 1.4.5, 1.3.2.1, 1.4.6, 1.4.8, 1.4.7, 1.4.9, 1.4.10, 1.3.4, > 1.3.5, 1.3.6, 1.4.11, 1.4.12 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.6.0, 1.3.7, 1.4.13 > > > While converting the request back to Increment using ProtobufUtil.toIncrement > we incorrectly use the optimization to avoid copying the byte > array(HBaseZeroCopyByteString#zeroCopyGetBytes) on a BoundedByteString. The > optimization was only meant for LiteralByteString where it is safe to use the > backing byte array, however it ends up being used to BoundedByteString which > is a subclass of LiteralByteString. This essentially breaks increments since > we end up creating wrong cells on the server side. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23825) Increment proto conversion is broken
[ https://issues.apache.org/jira/browse/HBASE-23825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034960#comment-17034960 ] Abhishek Singh Chouhan commented on HBASE-23825: I've pushed to branch-1, 1.4, 1.3. Thanks for the reviews [~apurtell] [~anoop.hbase] [~sakthi] > Increment proto conversion is broken > > > Key: HBASE-23825 > URL: https://issues.apache.org/jira/browse/HBASE-23825 > Project: HBase > Issue Type: Bug > Components: Increment >Affects Versions: 1.4.0, 1.2.6, 1.3.2, 1.4.1, 1.5.0, 1.1.11, 1.3.3, 1.4.2, > 1.4.3, 1.4.4, 1.4.5, 1.3.2.1, 1.4.6, 1.4.8, 1.4.7, 1.4.9, 1.4.10, 1.3.4, > 1.3.5, 1.3.6, 1.4.11, 1.4.12 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Blocker > Fix For: 1.6.0, 1.3.7, 1.4.13 > > > While converting the request back to Increment using ProtobufUtil.toIncrement > we incorrectly use the optimization to avoid copying the byte > array(HBaseZeroCopyByteString#zeroCopyGetBytes) on a BoundedByteString. The > optimization was only meant for LiteralByteString where it is safe to use the > backing byte array, however it ends up being used to BoundedByteString which > is a subclass of LiteralByteString. This essentially breaks increments since > we end up creating wrong cells on the server side. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-23825) Increment proto conversion is broken
[ https://issues.apache.org/jira/browse/HBASE-23825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033988#comment-17033988 ] Abhishek Singh Chouhan edited comment on HBASE-23825 at 2/10/20 10:16 PM: -- This is not a problem in master and 2.x since we reverted HBASE-18026 from those branches. FYI [~andrew.purt...@gmail.com] was (Author: abhishek.chouhan): This is not a problem in master and 2.x since we reverted HBASE-18026. FYI [~andrew.purt...@gmail.com] > Increment proto conversion is broken > > > Key: HBASE-23825 > URL: https://issues.apache.org/jira/browse/HBASE-23825 > Project: HBase > Issue Type: Bug >Affects Versions: 1.5.0, 1.3.6, 1.4.12 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > > While converting the request back to Increment using ProtobufUtil.toIncrement > we incorrectly use the optimization to avoid copying the byte > array(HBaseZeroCopyByteString#zeroCopyGetBytes) on a BoundedByteString. The > optimization was only meant for LiteralByteString where it is safe to use the > backing byte array, however it ends up being used to BoundedByteString which > is a subclass of LiteralByteString. This essentially breaks increments since > we end up creating wrong cells on the server side. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23825) Increment proto conversion is broken
[ https://issues.apache.org/jira/browse/HBASE-23825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033988#comment-17033988 ] Abhishek Singh Chouhan commented on HBASE-23825: This is not a problem in master and 2.x since we reverted HBASE-18026. FYI [~andrew.purt...@gmail.com] > Increment proto conversion is broken > > > Key: HBASE-23825 > URL: https://issues.apache.org/jira/browse/HBASE-23825 > Project: HBase > Issue Type: Bug >Affects Versions: 1.5.0, 1.3.6, 1.4.12 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > > While converting the request back to Increment using ProtobufUtil.toIncrement > we incorrectly use the optimization to avoid copying the byte > array(HBaseZeroCopyByteString#zeroCopyGetBytes) on a BoundedByteString. The > optimization was only meant for LiteralByteString where it is safe to use the > backing byte array, however it ends up being used to BoundedByteString which > is a subclass of LiteralByteString. This essentially breaks increments since > we end up creating wrong cells on the server side. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23825) Increment proto conversion is broken
Abhishek Singh Chouhan created HBASE-23825: -- Summary: Increment proto conversion is broken Key: HBASE-23825 URL: https://issues.apache.org/jira/browse/HBASE-23825 Project: HBase Issue Type: Bug Affects Versions: 1.4.12, 1.3.6, 1.5.0 Reporter: Abhishek Singh Chouhan Assignee: Abhishek Singh Chouhan While converting the request back to Increment using ProtobufUtil.toIncrement we incorrectly use the optimization to avoid copying the byte array(HBaseZeroCopyByteString#zeroCopyGetBytes) on a BoundedByteString. The optimization was only meant for LiteralByteString where it is safe to use the backing byte array, however it ends up being used to BoundedByteString which is a subclass of LiteralByteString. This essentially breaks increments since we end up creating wrong cells on the server side. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-18127) Enable state to be passed between the region observer coprocessor hook calls
[ https://issues.apache.org/jira/browse/HBASE-18127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877362#comment-16877362 ] Abhishek Singh Chouhan commented on HBASE-18127: [~gjacoby] The place i left this at was that although it is possible to come up with a Executor implementation that takes care of transferring RpcCall between the caller thread and the one that will take up further execution, going that route would mean changing all existing such cases and replacing the executor with our own. In case there are places that simply end up creating another thread and calling further hooks from there, that would need another kind of handling or replacing those with custom executor based mechanism. We also need to think about the fact that further development on the code base would require that only these custom Executors that transfer necessary threadlocals between threads should be the only ones used, atleast when interacting with cp hooks, i'm not sure how feasible and error-free this would be(missing this would result in the state vanishing between some hooks :)). > Enable state to be passed between the region observer coprocessor hook calls > > > Key: HBASE-18127 > URL: https://issues.apache.org/jira/browse/HBASE-18127 > Project: HBase > Issue Type: New Feature >Reporter: Lars Hofhansl >Assignee: Abhishek Singh Chouhan >Priority: Major > Attachments: HBASE-18127.master.001.patch, > HBASE-18127.master.002.patch, HBASE-18127.master.002.patch, > HBASE-18127.master.003.patch, HBASE-18127.master.004.patch, > HBASE-18127.master.005.patch, HBASE-18127.master.005.patch, > HBASE-18127.master.006.patch > > > Allow regionobserver to optionally skip postPut/postDelete when > postBatchMutate was called. > Right now a RegionObserver can only statically implement one or the other. In > scenarios where we need to work sometimes on the single postPut and > postDelete hooks and sometimes on the batchMutate hooks, there is currently > no place to convey this information to the single hooks. I.e. the work has > been done in the batch, skip the single hooks. > There are various solutions: > 1. Allow some state to be passed _per operation_. > 2. Remove the single hooks and always only call batch hooks (with a default > wrapper for the single hooks). > 3. more? > [~apurtell], what we had discussed a few days back. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22617) Recovered WAL directories not getting cleaned up
[ https://issues.apache.org/jira/browse/HBASE-22617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871580#comment-16871580 ] Abhishek Singh Chouhan commented on HBASE-22617: Was out for the weekend. Thanks for taking this up [~Apache9] [~apurtell]. > Recovered WAL directories not getting cleaned up > > > Key: HBASE-22617 > URL: https://issues.apache.org/jira/browse/HBASE-22617 > Project: HBase > Issue Type: Bug > Components: wal >Affects Versions: 1.5.0 >Reporter: Abhishek Singh Chouhan >Assignee: Duo Zhang >Priority: Blocker > Fix For: 3.0.0, 1.5.0, 2.3.0, 2.0.6, 2.2.1, 2.1.6, 1.4.11 > > > While colocating the recovered edits directory with hbase.wal.dir, > BASE_NAMESPACE_DIR got missed. This results in recovered edits being put in a > separate directory rather than the default region directory even if the > hbase.wal.dir is not overridden. Eg. if data is stored in > /hbase/data/namespace/table1, recovered edits are put in > /hbase/namespace/table1. This also messes up the regular cleaner chores which > never operate on this new directory and these directories will never be > deleted, even for split parents or dropped tables. We should change the > default back to have the base namespace directory in path. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22617) Recovered WAL directories not getting cleaned up
Abhishek Singh Chouhan created HBASE-22617: -- Summary: Recovered WAL directories not getting cleaned up Key: HBASE-22617 URL: https://issues.apache.org/jira/browse/HBASE-22617 Project: HBase Issue Type: Task Affects Versions: 1.5.0 Reporter: Abhishek Singh Chouhan Assignee: Abhishek Singh Chouhan While colocating the recovered edits directory with hbase.wal.dir, BASE_NAMESPACE_DIR got missed. This results in recovered edits being put in a separate directory rather than the default region directory even if the hbase.wal.dir is not overridden. Eg. if data is stored in /hbase/data/namespace/table1, recovered edits are put in /hbase/namespace/table1. This also messes up the regular cleaner chores which never operate on this new directory and these directories will never be deleted, even for split parents or dropped tables. We should change the default back to have the base namespace directory in path. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22617) Recovered WAL directories not getting cleaned up
[ https://issues.apache.org/jira/browse/HBASE-22617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-22617: --- Issue Type: Bug (was: Task) > Recovered WAL directories not getting cleaned up > > > Key: HBASE-22617 > URL: https://issues.apache.org/jira/browse/HBASE-22617 > Project: HBase > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > > While colocating the recovered edits directory with hbase.wal.dir, > BASE_NAMESPACE_DIR got missed. This results in recovered edits being put in a > separate directory rather than the default region directory even if the > hbase.wal.dir is not overridden. Eg. if data is stored in > /hbase/data/namespace/table1, recovered edits are put in > /hbase/namespace/table1. This also messes up the regular cleaner chores which > never operate on this new directory and these directories will never be > deleted, even for split parents or dropped tables. We should change the > default back to have the base namespace directory in path. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-22330: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > Backport HBASE-20724 (Sometimes some compacted storefiles are still opened > after region failover) to branch-1 > - > > Key: HBASE-22330 > URL: https://issues.apache.org/jira/browse/HBASE-22330 > Project: HBase > Issue Type: Sub-task > Components: Compaction, regionserver >Affects Versions: 1.5.0, 1.4.9, 1.3.4 >Reporter: Andrew Purtell >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.5.0, 1.3.5, 1.4.11 > > Attachments: HBASE-22330-addendum.branch-1.patch, > HBASE-22330.branch-1.001.patch, HBASE-22330.branch-1.002.patch, > HBASE-22330.branch-1.3.001.patch > > > There appears to be a race condition between close and split which when > combined with a side effect of HBASE-20704, leads to the parent region store > files getting archived and cleared while daughter regions still have > references to those parent region store files. > Here is the timeline of events observed for an affected region: > # RS1 faces ZooKeeper connectivity issue for master node and starts shutting > itself down. As part of this it starts to close the store and clean up the > compacted files (File A) > # Master starts bulk assigning regions and assign parent region to RS2 > # Region opens on RS2 and ends up opening compacted store file(s) (suspect > this is due to HBASE-20724) > # Now split happens and daughter regions open on RS2 and try to run a > compaction as part of post open > # Split request at this point is complete. However now archiving proceeds on > RS1 and ends up archiving the store file that is referenced by the daughter. > Compaction fails due to FileNotFoundException and all subsequent attempts to > open the region will fail until manual resolution. > We think having HBASE-20724 would help in such situations since we won't end > up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836764#comment-16836764 ] Abhishek Singh Chouhan commented on HBASE-22330: Pushed to relevant branches. Thanks [~xucang] [~apurtell] > Backport HBASE-20724 (Sometimes some compacted storefiles are still opened > after region failover) to branch-1 > - > > Key: HBASE-22330 > URL: https://issues.apache.org/jira/browse/HBASE-22330 > Project: HBase > Issue Type: Sub-task > Components: Compaction, regionserver >Affects Versions: 1.5.0, 1.4.9, 1.3.4 >Reporter: Andrew Purtell >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.5.0, 1.3.5, 1.4.11 > > Attachments: HBASE-22330-addendum.branch-1.patch, > HBASE-22330.branch-1.001.patch, HBASE-22330.branch-1.002.patch, > HBASE-22330.branch-1.3.001.patch > > > There appears to be a race condition between close and split which when > combined with a side effect of HBASE-20704, leads to the parent region store > files getting archived and cleared while daughter regions still have > references to those parent region store files. > Here is the timeline of events observed for an affected region: > # RS1 faces ZooKeeper connectivity issue for master node and starts shutting > itself down. As part of this it starts to close the store and clean up the > compacted files (File A) > # Master starts bulk assigning regions and assign parent region to RS2 > # Region opens on RS2 and ends up opening compacted store file(s) (suspect > this is due to HBASE-20724) > # Now split happens and daughter regions open on RS2 and try to run a > compaction as part of post open > # Split request at this point is complete. However now archiving proceeds on > RS1 and ends up archiving the store file that is referenced by the daughter. > Compaction fails due to FileNotFoundException and all subsequent attempts to > open the region will fail until manual resolution. > We think having HBASE-20724 would help in such situations since we won't end > up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-22330: --- Attachment: HBASE-22330-addendum.branch-1.patch > Backport HBASE-20724 (Sometimes some compacted storefiles are still opened > after region failover) to branch-1 > - > > Key: HBASE-22330 > URL: https://issues.apache.org/jira/browse/HBASE-22330 > Project: HBase > Issue Type: Sub-task > Components: Compaction, regionserver >Affects Versions: 1.5.0, 1.4.9, 1.3.4 >Reporter: Andrew Purtell >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.5.0, 1.3.5, 1.4.11 > > Attachments: HBASE-22330-addendum.branch-1.patch, > HBASE-22330.branch-1.001.patch, HBASE-22330.branch-1.002.patch, > HBASE-22330.branch-1.3.001.patch > > > There appears to be a race condition between close and split which when > combined with a side effect of HBASE-20704, leads to the parent region store > files getting archived and cleared while daughter regions still have > references to those parent region store files. > Here is the timeline of events observed for an affected region: > # RS1 faces ZooKeeper connectivity issue for master node and starts shutting > itself down. As part of this it starts to close the store and clean up the > compacted files (File A) > # Master starts bulk assigning regions and assign parent region to RS2 > # Region opens on RS2 and ends up opening compacted store file(s) (suspect > this is due to HBASE-20724) > # Now split happens and daughter regions open on RS2 and try to run a > compaction as part of post open > # Split request at this point is complete. However now archiving proceeds on > RS1 and ends up archiving the store file that is referenced by the daughter. > Compaction fails due to FileNotFoundException and all subsequent attempts to > open the region will fail until manual resolution. > We think having HBASE-20724 would help in such situations since we won't end > up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-22330: --- Hadoop Flags: (was: Reviewed) Status: Patch Available (was: Reopened) > Backport HBASE-20724 (Sometimes some compacted storefiles are still opened > after region failover) to branch-1 > - > > Key: HBASE-22330 > URL: https://issues.apache.org/jira/browse/HBASE-22330 > Project: HBase > Issue Type: Sub-task > Components: Compaction, regionserver >Affects Versions: 1.3.4, 1.4.9, 1.5.0 >Reporter: Andrew Purtell >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.5.0, 1.3.5, 1.4.11 > > Attachments: HBASE-22330-addendum.branch-1.patch, > HBASE-22330.branch-1.001.patch, HBASE-22330.branch-1.002.patch, > HBASE-22330.branch-1.3.001.patch > > > There appears to be a race condition between close and split which when > combined with a side effect of HBASE-20704, leads to the parent region store > files getting archived and cleared while daughter regions still have > references to those parent region store files. > Here is the timeline of events observed for an affected region: > # RS1 faces ZooKeeper connectivity issue for master node and starts shutting > itself down. As part of this it starts to close the store and clean up the > compacted files (File A) > # Master starts bulk assigning regions and assign parent region to RS2 > # Region opens on RS2 and ends up opening compacted store file(s) (suspect > this is due to HBASE-20724) > # Now split happens and daughter regions open on RS2 and try to run a > compaction as part of post open > # Split request at this point is complete. However now archiving proceeds on > RS1 and ends up archiving the store file that is referenced by the daughter. > Compaction fails due to FileNotFoundException and all subsequent attempts to > open the region will fail until manual resolution. > We think having HBASE-20724 would help in such situations since we won't end > up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan reopened HBASE-22330: Hardening the test > Backport HBASE-20724 (Sometimes some compacted storefiles are still opened > after region failover) to branch-1 > - > > Key: HBASE-22330 > URL: https://issues.apache.org/jira/browse/HBASE-22330 > Project: HBase > Issue Type: Sub-task > Components: Compaction, regionserver >Affects Versions: 1.5.0, 1.4.9, 1.3.4 >Reporter: Andrew Purtell >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.5.0, 1.3.5, 1.4.11 > > Attachments: HBASE-22330.branch-1.001.patch, > HBASE-22330.branch-1.002.patch, HBASE-22330.branch-1.3.001.patch > > > There appears to be a race condition between close and split which when > combined with a side effect of HBASE-20704, leads to the parent region store > files getting archived and cleared while daughter regions still have > references to those parent region store files. > Here is the timeline of events observed for an affected region: > # RS1 faces ZooKeeper connectivity issue for master node and starts shutting > itself down. As part of this it starts to close the store and clean up the > compacted files (File A) > # Master starts bulk assigning regions and assign parent region to RS2 > # Region opens on RS2 and ends up opening compacted store file(s) (suspect > this is due to HBASE-20724) > # Now split happens and daughter regions open on RS2 and try to run a > compaction as part of post open > # Split request at this point is complete. However now archiving proceeds on > RS1 and ends up archiving the store file that is referenced by the daughter. > Compaction fails due to FileNotFoundException and all subsequent attempts to > open the region will fail until manual resolution. > We think having HBASE-20724 would help in such situations since we won't end > up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836633#comment-16836633 ] Abhishek Singh Chouhan commented on HBASE-22330: 1 in 10 runs fail for me locally. Looks to be a test only issue due to differences in what testUtil.waitTableAvailable(..) does between master and branch-1. Let me put up an addendum for the test which removes flakiness. > Backport HBASE-20724 (Sometimes some compacted storefiles are still opened > after region failover) to branch-1 > - > > Key: HBASE-22330 > URL: https://issues.apache.org/jira/browse/HBASE-22330 > Project: HBase > Issue Type: Sub-task > Components: Compaction, regionserver >Affects Versions: 1.5.0, 1.4.9, 1.3.4 >Reporter: Andrew Purtell >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.5.0, 1.3.5, 1.4.11 > > Attachments: HBASE-22330.branch-1.001.patch, > HBASE-22330.branch-1.002.patch, HBASE-22330.branch-1.3.001.patch > > > There appears to be a race condition between close and split which when > combined with a side effect of HBASE-20704, leads to the parent region store > files getting archived and cleared while daughter regions still have > references to those parent region store files. > Here is the timeline of events observed for an affected region: > # RS1 faces ZooKeeper connectivity issue for master node and starts shutting > itself down. As part of this it starts to close the store and clean up the > compacted files (File A) > # Master starts bulk assigning regions and assign parent region to RS2 > # Region opens on RS2 and ends up opening compacted store file(s) (suspect > this is due to HBASE-20724) > # Now split happens and daughter regions open on RS2 and try to run a > compaction as part of post open > # Split request at this point is complete. However now archiving proceeds on > RS1 and ends up archiving the store file that is referenced by the daughter. > Compaction fails due to FileNotFoundException and all subsequent attempts to > open the region will fail until manual resolution. > We think having HBASE-20724 would help in such situations since we won't end > up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835966#comment-16835966 ] Abhishek Singh Chouhan commented on HBASE-22330: Pushed to branch-1.3/1.4/1. Thanks a ton for reviewing [~apurtell] > Backport HBASE-20724 (Sometimes some compacted storefiles are still opened > after region failover) to branch-1 > - > > Key: HBASE-22330 > URL: https://issues.apache.org/jira/browse/HBASE-22330 > Project: HBase > Issue Type: Sub-task > Components: Compaction, regionserver >Affects Versions: 1.5.0, 1.4.9, 1.3.4 >Reporter: Andrew Purtell >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.5.0, 1.3.5, 1.4.11 > > Attachments: HBASE-22330.branch-1.001.patch, > HBASE-22330.branch-1.002.patch, HBASE-22330.branch-1.3.001.patch > > > There appears to be a race condition between close and split which when > combined with a side effect of HBASE-20704, leads to the parent region store > files getting archived and cleared while daughter regions still have > references to those parent region store files. > Here is the timeline of events observed for an affected region: > # RS1 faces ZooKeeper connectivity issue for master node and starts shutting > itself down. As part of this it starts to close the store and clean up the > compacted files (File A) > # Master starts bulk assigning regions and assign parent region to RS2 > # Region opens on RS2 and ends up opening compacted store file(s) (suspect > this is due to HBASE-20724) > # Now split happens and daughter regions open on RS2 and try to run a > compaction as part of post open > # Split request at this point is complete. However now archiving proceeds on > RS1 and ends up archiving the store file that is referenced by the daughter. > Compaction fails due to FileNotFoundException and all subsequent attempts to > open the region will fail until manual resolution. > We think having HBASE-20724 would help in such situations since we won't end > up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-22330: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > Backport HBASE-20724 (Sometimes some compacted storefiles are still opened > after region failover) to branch-1 > - > > Key: HBASE-22330 > URL: https://issues.apache.org/jira/browse/HBASE-22330 > Project: HBase > Issue Type: Sub-task > Components: Compaction, regionserver >Affects Versions: 1.5.0, 1.4.9, 1.3.4 >Reporter: Andrew Purtell >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.5.0, 1.3.5, 1.4.11 > > Attachments: HBASE-22330.branch-1.001.patch, > HBASE-22330.branch-1.002.patch, HBASE-22330.branch-1.3.001.patch > > > There appears to be a race condition between close and split which when > combined with a side effect of HBASE-20704, leads to the parent region store > files getting archived and cleared while daughter regions still have > references to those parent region store files. > Here is the timeline of events observed for an affected region: > # RS1 faces ZooKeeper connectivity issue for master node and starts shutting > itself down. As part of this it starts to close the store and clean up the > compacted files (File A) > # Master starts bulk assigning regions and assign parent region to RS2 > # Region opens on RS2 and ends up opening compacted store file(s) (suspect > this is due to HBASE-20724) > # Now split happens and daughter regions open on RS2 and try to run a > compaction as part of post open > # Split request at this point is complete. However now archiving proceeds on > RS1 and ends up archiving the store file that is referenced by the daughter. > Compaction fails due to FileNotFoundException and all subsequent attempts to > open the region will fail until manual resolution. > We think having HBASE-20724 would help in such situations since we won't end > up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835836#comment-16835836 ] Abhishek Singh Chouhan commented on HBASE-22330: Tests came up fine for 1.3. Planning to commit later today if no objections. 1.3 patch differs only in the test class due to api differences. [~apurtell] > Backport HBASE-20724 (Sometimes some compacted storefiles are still opened > after region failover) to branch-1 > - > > Key: HBASE-22330 > URL: https://issues.apache.org/jira/browse/HBASE-22330 > Project: HBase > Issue Type: Sub-task > Components: Compaction, regionserver >Affects Versions: 1.5.0, 1.4.9, 1.3.4 >Reporter: Andrew Purtell >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.5.0, 1.3.5, 1.4.11 > > Attachments: HBASE-22330.branch-1.001.patch, > HBASE-22330.branch-1.002.patch, HBASE-22330.branch-1.3.001.patch > > > There appears to be a race condition between close and split which when > combined with a side effect of HBASE-20704, leads to the parent region store > files getting archived and cleared while daughter regions still have > references to those parent region store files. > Here is the timeline of events observed for an affected region: > # RS1 faces ZooKeeper connectivity issue for master node and starts shutting > itself down. As part of this it starts to close the store and clean up the > compacted files (File A) > # Master starts bulk assigning regions and assign parent region to RS2 > # Region opens on RS2 and ends up opening compacted store file(s) (suspect > this is due to HBASE-20724) > # Now split happens and daughter regions open on RS2 and try to run a > compaction as part of post open > # Split request at this point is complete. However now archiving proceeds on > RS1 and ends up archiving the store file that is referenced by the daughter. > Compaction fails due to FileNotFoundException and all subsequent attempts to > open the region will fail until manual resolution. > We think having HBASE-20724 would help in such situations since we won't end > up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835149#comment-16835149 ] Abhishek Singh Chouhan commented on HBASE-22330: Patch for branch-1 applies to branch-1.4. However 1.3 required slight modification in the test file due to wal api changes. Have added a patch for branch-1.3. > Backport HBASE-20724 (Sometimes some compacted storefiles are still opened > after region failover) to branch-1 > - > > Key: HBASE-22330 > URL: https://issues.apache.org/jira/browse/HBASE-22330 > Project: HBase > Issue Type: Sub-task > Components: Compaction, regionserver >Affects Versions: 1.5.0, 1.4.9, 1.3.4 >Reporter: Andrew Purtell >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-22330.branch-1.001.patch, > HBASE-22330.branch-1.002.patch, HBASE-22330.branch-1.3.001.patch > > > There appears to be a race condition between close and split which when > combined with a side effect of HBASE-20704, leads to the parent region store > files getting archived and cleared while daughter regions still have > references to those parent region store files. > Here is the timeline of events observed for an affected region: > # RS1 faces ZooKeeper connectivity issue for master node and starts shutting > itself down. As part of this it starts to close the store and clean up the > compacted files (File A) > # Master starts bulk assigning regions and assign parent region to RS2 > # Region opens on RS2 and ends up opening compacted store file(s) (suspect > this is due to HBASE-20724) > # Now split happens and daughter regions open on RS2 and try to run a > compaction as part of post open > # Split request at this point is complete. However now archiving proceeds on > RS1 and ends up archiving the store file that is referenced by the daughter. > Compaction fails due to FileNotFoundException and all subsequent attempts to > open the region will fail until manual resolution. > We think having HBASE-20724 would help in such situations since we won't end > up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-22330: --- Attachment: HBASE-22330.branch-1.3.001.patch > Backport HBASE-20724 (Sometimes some compacted storefiles are still opened > after region failover) to branch-1 > - > > Key: HBASE-22330 > URL: https://issues.apache.org/jira/browse/HBASE-22330 > Project: HBase > Issue Type: Sub-task > Components: Compaction, regionserver >Affects Versions: 1.5.0, 1.4.9, 1.3.4 >Reporter: Andrew Purtell >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-22330.branch-1.001.patch, > HBASE-22330.branch-1.002.patch, HBASE-22330.branch-1.3.001.patch > > > There appears to be a race condition between close and split which when > combined with a side effect of HBASE-20704, leads to the parent region store > files getting archived and cleared while daughter regions still have > references to those parent region store files. > Here is the timeline of events observed for an affected region: > # RS1 faces ZooKeeper connectivity issue for master node and starts shutting > itself down. As part of this it starts to close the store and clean up the > compacted files (File A) > # Master starts bulk assigning regions and assign parent region to RS2 > # Region opens on RS2 and ends up opening compacted store file(s) (suspect > this is due to HBASE-20724) > # Now split happens and daughter regions open on RS2 and try to run a > compaction as part of post open > # Split request at this point is complete. However now archiving proceeds on > RS1 and ends up archiving the store file that is referenced by the daughter. > Compaction fails due to FileNotFoundException and all subsequent attempts to > open the region will fail until manual resolution. > We think having HBASE-20724 would help in such situations since we won't end > up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834296#comment-16834296 ] Abhishek Singh Chouhan commented on HBASE-22330: Thanks for having a look [~apurtell]. Attached v2 that fixes checkstyle issues. Tests passed locally for me, still running them again locally to be doubly sure. > Backport HBASE-20724 (Sometimes some compacted storefiles are still opened > after region failover) to branch-1 > - > > Key: HBASE-22330 > URL: https://issues.apache.org/jira/browse/HBASE-22330 > Project: HBase > Issue Type: Sub-task > Components: Compaction, regionserver >Affects Versions: 1.5.0, 1.4.9, 1.3.4 >Reporter: Andrew Purtell >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-22330.branch-1.001.patch, > HBASE-22330.branch-1.002.patch > > > There appears to be a race condition between close and split which when > combined with a side effect of HBASE-20704, leads to the parent region store > files getting archived and cleared while daughter regions still have > references to those parent region store files. > Here is the timeline of events observed for an affected region: > # RS1 faces ZooKeeper connectivity issue for master node and starts shutting > itself down. As part of this it starts to close the store and clean up the > compacted files (File A) > # Master starts bulk assigning regions and assign parent region to RS2 > # Region opens on RS2 and ends up opening compacted store file(s) (suspect > this is due to HBASE-20724) > # Now split happens and daughter regions open on RS2 and try to run a > compaction as part of post open > # Split request at this point is complete. However now archiving proceeds on > RS1 and ends up archiving the store file that is referenced by the daughter. > Compaction fails due to FileNotFoundException and all subsequent attempts to > open the region will fail until manual resolution. > We think having HBASE-20724 would help in such situations since we won't end > up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-22330: --- Attachment: HBASE-22330.branch-1.002.patch > Backport HBASE-20724 (Sometimes some compacted storefiles are still opened > after region failover) to branch-1 > - > > Key: HBASE-22330 > URL: https://issues.apache.org/jira/browse/HBASE-22330 > Project: HBase > Issue Type: Sub-task > Components: Compaction, regionserver >Affects Versions: 1.5.0, 1.4.9, 1.3.4 >Reporter: Andrew Purtell >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-22330.branch-1.001.patch, > HBASE-22330.branch-1.002.patch > > > There appears to be a race condition between close and split which when > combined with a side effect of HBASE-20704, leads to the parent region store > files getting archived and cleared while daughter regions still have > references to those parent region store files. > Here is the timeline of events observed for an affected region: > # RS1 faces ZooKeeper connectivity issue for master node and starts shutting > itself down. As part of this it starts to close the store and clean up the > compacted files (File A) > # Master starts bulk assigning regions and assign parent region to RS2 > # Region opens on RS2 and ends up opening compacted store file(s) (suspect > this is due to HBASE-20724) > # Now split happens and daughter regions open on RS2 and try to run a > compaction as part of post open > # Split request at this point is complete. However now archiving proceeds on > RS1 and ends up archiving the store file that is referenced by the daughter. > Compaction fails due to FileNotFoundException and all subsequent attempts to > open the region will fail until manual resolution. > We think having HBASE-20724 would help in such situations since we won't end > up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-22330: --- Status: Patch Available (was: Open) > Backport HBASE-20724 (Sometimes some compacted storefiles are still opened > after region failover) to branch-1 > - > > Key: HBASE-22330 > URL: https://issues.apache.org/jira/browse/HBASE-22330 > Project: HBase > Issue Type: Sub-task > Components: Compaction, regionserver >Affects Versions: 1.3.4, 1.4.9, 1.5.0 >Reporter: Andrew Purtell >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-22330.branch-1.001.patch > > > There appears to be a race condition between close and split which when > combined with a side effect of HBASE-20704, leads to the parent region store > files getting archived and cleared while daughter regions still have > references to those parent region store files. > Here is the timeline of events observed for an affected region: > # RS1 faces ZooKeeper connectivity issue for master node and starts shutting > itself down. As part of this it starts to close the store and clean up the > compacted files (File A) > # Master starts bulk assigning regions and assign parent region to RS2 > # Region opens on RS2 and ends up opening compacted store file(s) (suspect > this is due to HBASE-20724) > # Now split happens and daughter regions open on RS2 and try to run a > compaction as part of post open > # Split request at this point is complete. However now archiving proceeds on > RS1 and ends up archiving the store file that is referenced by the daughter. > Compaction fails due to FileNotFoundException and all subsequent attempts to > open the region will fail until manual resolution. > We think having HBASE-20724 would help in such situations since we won't end > up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-22330: --- Attachment: HBASE-22330.branch-1.001.patch > Backport HBASE-20724 (Sometimes some compacted storefiles are still opened > after region failover) to branch-1 > - > > Key: HBASE-22330 > URL: https://issues.apache.org/jira/browse/HBASE-22330 > Project: HBase > Issue Type: Sub-task > Components: Compaction, regionserver >Affects Versions: 1.5.0, 1.4.9, 1.3.4 >Reporter: Andrew Purtell >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-22330.branch-1.001.patch > > > There appears to be a race condition between close and split which when > combined with a side effect of HBASE-20704, leads to the parent region store > files getting archived and cleared while daughter regions still have > references to those parent region store files. > Here is the timeline of events observed for an affected region: > # RS1 faces ZooKeeper connectivity issue for master node and starts shutting > itself down. As part of this it starts to close the store and clean up the > compacted files (File A) > # Master starts bulk assigning regions and assign parent region to RS2 > # Region opens on RS2 and ends up opening compacted store file(s) (suspect > this is due to HBASE-20724) > # Now split happens and daughter regions open on RS2 and try to run a > compaction as part of post open > # Split request at this point is complete. However now archiving proceeds on > RS1 and ends up archiving the store file that is referenced by the daughter. > Compaction fails due to FileNotFoundException and all subsequent attempts to > open the region will fail until manual resolution. > We think having HBASE-20724 would help in such situations since we won't end > up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-22330) Backport HBASE-20724 (Sometimes some compacted storefiles are still opened after region failover) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan reassigned HBASE-22330: -- Assignee: Abhishek Singh Chouhan > Backport HBASE-20724 (Sometimes some compacted storefiles are still opened > after region failover) to branch-1 > - > > Key: HBASE-22330 > URL: https://issues.apache.org/jira/browse/HBASE-22330 > Project: HBase > Issue Type: Sub-task > Components: Compaction, regionserver >Affects Versions: 1.5.0, 1.4.9, 1.3.4 >Reporter: Andrew Purtell >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 1.5.0 > > > There appears to be a race condition between close and split which when > combined with a side effect of HBASE-20704, leads to the parent region store > files getting archived and cleared while daughter regions still have > references to those parent region store files. > Here is the timeline of events observed for an affected region: > # RS1 faces ZooKeeper connectivity issue for master node and starts shutting > itself down. As part of this it starts to close the store and clean up the > compacted files (File A) > # Master starts bulk assigning regions and assign parent region to RS2 > # Region opens on RS2 and ends up opening compacted store file(s) (suspect > this is due to HBASE-20724) > # Now split happens and daughter regions open on RS2 and try to run a > compaction as part of post open > # Split request at this point is complete. However now archiving proceeds on > RS1 and ends up archiving the store file that is referenced by the daughter. > Compaction fails due to FileNotFoundException and all subsequent attempts to > open the region will fail until manual resolution. > We think having HBASE-20724 would help in such situations since we won't end > up loading compacted store files in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22274) Cell size limit check on append should consider cell's previous size.
[ https://issues.apache.org/jira/browse/HBASE-22274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825420#comment-16825420 ] Abhishek Singh Chouhan commented on HBASE-22274: Lgtm +1. Thanks [~xucang] > Cell size limit check on append should consider cell's previous size. > - > > Key: HBASE-22274 > URL: https://issues.apache.org/jira/browse/HBASE-22274 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.0.0, 1.3.5 >Reporter: Xu Cang >Assignee: Xu Cang >Priority: Minor > Attachments: HBASE-22274-branch-1.001.patch, > HBASE-22274-branch-1.002.patch, HBASE-22274-master.001.patch, > HBASE-22274-master.002.patch, HBASE-22274-master.002.patch, > HBASE-22274-master.003.patch > > > Now we have cell size limit check based on this parameter > *hbase.server.keyvalue.maxsize* > One case was missing: appending to a cell only take append op's cell size > into account against this limit check. we should check against the potential > final cell size after the append.' > It's easy to reproduce this : > > Apply this diff > > {code:java} > diff --git > a/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java > > b/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java > index 5a285ef6ba..8633177ebe 100644 --- > a/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java > +++ > b/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java > @@ -6455,7 +6455,7 > - t.append(new Append(ROW).addColumn(FAMILY, QUALIFIER, new byte[10 * > 1024])); > + t.append(new Append(ROW).addColumn(FAMILY, QUALIFIER, new byte[2 * 1024])); > {code} > > Fix is to add this check in #reckonDeltas in HRegion class, where we have > already got the appended cell's size. > Will throw DoNotRetryIOException if checks is failed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22067) Fix log line in StochasticLoadBalancer when balancer is an ill-fit for cluster size
[ https://issues.apache.org/jira/browse/HBASE-22067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796808#comment-16796808 ] Abhishek Singh Chouhan commented on HBASE-22067: +1 > Fix log line in StochasticLoadBalancer when balancer is an ill-fit for > cluster size > --- > > Key: HBASE-22067 > URL: https://issues.apache.org/jira/browse/HBASE-22067 > Project: HBase > Issue Type: Bug >Reporter: Xu Cang >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-22067.master.001.patch > > > HBASE-21338 Added log lines regarding load balancer warnings. There is a bug > in log that uses wrong parameter. > 'maxRunningTime' is used , should be maxSteps. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-22045) Mutable range histogram reports incorrect outliers
[ https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan resolved HBASE-22045. Resolution: Fixed Hadoop Flags: Reviewed > Mutable range histogram reports incorrect outliers > -- > > Key: HBASE-22045 > URL: https://issues.apache.org/jira/browse/HBASE-22045 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 1.5.0, 2.0.0, 1.4.9, 2.1.3, 2.2.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.4.10, 2.3.0, 2.1.5, 2.2.1 > > Attachments: HBASE-22045.master.001.patch > > > MutableRangeHistogram during the snapshot calculates the outliers (eg. > mutate_TimeRange_60-inf) and adds the counter with incorrect calculation > by using the overall count of event and not the number of events in the > snapshot. > {code:java} > long val = histogram.getCount(); > if (val - cumNum > 0) { > metricsRecordBuilder.addCounter( > Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - > 1] + "-inf", desc), > val - cumNum); > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22045) Mutable range histogram reports incorrect outliers
[ https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-22045: --- Affects Version/s: (was: 1.3.3) Fix Version/s: (was: 1.3.4) > Mutable range histogram reports incorrect outliers > -- > > Key: HBASE-22045 > URL: https://issues.apache.org/jira/browse/HBASE-22045 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 1.5.0, 2.0.0, 1.4.9, 2.1.3, 2.2.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.4.10, 2.3.0, 2.1.5, 2.2.1 > > Attachments: HBASE-22045.master.001.patch > > > MutableRangeHistogram during the snapshot calculates the outliers (eg. > mutate_TimeRange_60-inf) and adds the counter with incorrect calculation > by using the overall count of event and not the number of events in the > snapshot. > {code:java} > long val = histogram.getCount(); > if (val - cumNum > 0) { > metricsRecordBuilder.addCounter( > Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - > 1] + "-inf", desc), > val - cumNum); > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22045) Mutable range histogram reports incorrect outliers
[ https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795277#comment-16795277 ] Abhishek Singh Chouhan commented on HBASE-22045: Do we also want this in branch-2.0 or is that EOLing? > Mutable range histogram reports incorrect outliers > -- > > Key: HBASE-22045 > URL: https://issues.apache.org/jira/browse/HBASE-22045 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 1.5.0, 1.3.3, 2.0.0, 1.4.9, 2.1.3, 2.2.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.4.10, 1.3.4, 2.3.0, 2.1.5, 2.2.1 > > Attachments: HBASE-22045.master.001.patch > > > MutableRangeHistogram during the snapshot calculates the outliers (eg. > mutate_TimeRange_60-inf) and adds the counter with incorrect calculation > by using the overall count of event and not the number of events in the > snapshot. > {code:java} > long val = histogram.getCount(); > if (val - cumNum > 0) { > metricsRecordBuilder.addCounter( > Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - > 1] + "-inf", desc), > val - cumNum); > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-22045) Mutable range histogram reports incorrect outliers
[ https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795242#comment-16795242 ] Abhishek Singh Chouhan edited comment on HBASE-22045 at 3/18/19 5:33 PM: - Sorry for breaking the build [~apurtell]. Looks like we don't need this in branch-1.3 since it does not have HBASE-18060/HBASE-9774 which caused the bug. Got a bit mixed up with our light forks. was (Author: abhishek.chouhan): Sorry for breaking the build [~apurtell]. Looks like we don't need this in branch-1.3 since it does not have HBASE-18060/HBASE-9774 which caused the bug. > Mutable range histogram reports incorrect outliers > -- > > Key: HBASE-22045 > URL: https://issues.apache.org/jira/browse/HBASE-22045 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 1.5.0, 1.3.3, 2.0.0, 1.4.9, 2.1.3, 2.2.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.4.10, 1.3.4, 2.3.0, 2.1.5, 2.2.1 > > Attachments: HBASE-22045.master.001.patch > > > MutableRangeHistogram during the snapshot calculates the outliers (eg. > mutate_TimeRange_60-inf) and adds the counter with incorrect calculation > by using the overall count of event and not the number of events in the > snapshot. > {code:java} > long val = histogram.getCount(); > if (val - cumNum > 0) { > metricsRecordBuilder.addCounter( > Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - > 1] + "-inf", desc), > val - cumNum); > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22045) Mutable range histogram reports incorrect outliers
[ https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795242#comment-16795242 ] Abhishek Singh Chouhan commented on HBASE-22045: Sorry for breaking the build [~apurtell]. Looks like we don't need this in branch-1.3 since it does not have HBASE-18060/HBASE-9774 which caused the bug. > Mutable range histogram reports incorrect outliers > -- > > Key: HBASE-22045 > URL: https://issues.apache.org/jira/browse/HBASE-22045 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 1.5.0, 1.3.3, 2.0.0, 1.4.9, 2.1.3, 2.2.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.4.10, 1.3.4, 2.3.0, 2.1.5, 2.2.1 > > Attachments: HBASE-22045.master.001.patch > > > MutableRangeHistogram during the snapshot calculates the outliers (eg. > mutate_TimeRange_60-inf) and adds the counter with incorrect calculation > by using the overall count of event and not the number of events in the > snapshot. > {code:java} > long val = histogram.getCount(); > if (val - cumNum > 0) { > metricsRecordBuilder.addCounter( > Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - > 1] + "-inf", desc), > val - cumNum); > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22045) Mutable range histogram reports incorrect outliers
[ https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-22045: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Mutable range histogram reports incorrect outliers > -- > > Key: HBASE-22045 > URL: https://issues.apache.org/jira/browse/HBASE-22045 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 1.5.0, 1.3.3, 2.0.0, 1.4.9, 2.1.3, 2.2.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 3.0.0, 1.4.10, 1.3.4, 2.3.0, 1.5.1, 2.1.5, 2.2.1 > > Attachments: HBASE-22045.master.001.patch > > > MutableRangeHistogram during the snapshot calculates the outliers (eg. > mutate_TimeRange_60-inf) and adds the counter with incorrect calculation > by using the overall count of event and not the number of events in the > snapshot. > {code:java} > long val = histogram.getCount(); > if (val - cumNum > 0) { > metricsRecordBuilder.addCounter( > Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - > 1] + "-inf", desc), > val - cumNum); > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22045) Mutable range histogram reports incorrect outliers
[ https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-22045: --- Fix Version/s: 2.2.1 2.1.5 1.5.1 2.3.0 1.3.4 1.4.10 3.0.0 > Mutable range histogram reports incorrect outliers > -- > > Key: HBASE-22045 > URL: https://issues.apache.org/jira/browse/HBASE-22045 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 1.5.0, 1.3.3, 2.0.0, 1.4.9, 2.1.3, 2.2.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 3.0.0, 1.4.10, 1.3.4, 2.3.0, 1.5.1, 2.1.5, 2.2.1 > > Attachments: HBASE-22045.master.001.patch > > > MutableRangeHistogram during the snapshot calculates the outliers (eg. > mutate_TimeRange_60-inf) and adds the counter with incorrect calculation > by using the overall count of event and not the number of events in the > snapshot. > {code:java} > long val = histogram.getCount(); > if (val - cumNum > 0) { > metricsRecordBuilder.addCounter( > Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - > 1] + "-inf", desc), > val - cumNum); > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22045) Mutable range histogram reports incorrect outliers
[ https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791176#comment-16791176 ] Abhishek Singh Chouhan commented on HBASE-22045: Thanks for the review [~apurtell]! Pushed to 1.3+ > Mutable range histogram reports incorrect outliers > -- > > Key: HBASE-22045 > URL: https://issues.apache.org/jira/browse/HBASE-22045 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 1.5.0, 1.3.3, 2.0.0, 1.4.9, 2.1.3, 2.2.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > Fix For: 3.0.0, 1.4.10, 1.3.4, 2.3.0, 1.5.1, 2.1.5, 2.2.1 > > Attachments: HBASE-22045.master.001.patch > > > MutableRangeHistogram during the snapshot calculates the outliers (eg. > mutate_TimeRange_60-inf) and adds the counter with incorrect calculation > by using the overall count of event and not the number of events in the > snapshot. > {code:java} > long val = histogram.getCount(); > if (val - cumNum > 0) { > metricsRecordBuilder.addCounter( > Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - > 1] + "-inf", desc), > val - cumNum); > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22045) Mutable range histogram reports incorrect outliers
Abhishek Singh Chouhan created HBASE-22045: -- Summary: Mutable range histogram reports incorrect outliers Key: HBASE-22045 URL: https://issues.apache.org/jira/browse/HBASE-22045 Project: HBase Issue Type: Bug Affects Versions: 2.1.3, 1.4.9, 2.0.0, 1.3.3, 1.5.0, 3.0.0, 2.2.1 Reporter: Abhishek Singh Chouhan Assignee: Abhishek Singh Chouhan MutableRangeHistogram during the snapshot calculates the outliers (eg. mutate_TimeRange_60-inf) and adds the counter with incorrect calculation by using the overall count of event and not the number of events in the snapshot. {code:java} long val = histogram.getCount(); if (val - cumNum > 0) { metricsRecordBuilder.addCounter( Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - 1] + "-inf", desc), val - cumNum); }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22045) Mutable range histogram reports incorrect outliers
[ https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-22045: --- Attachment: HBASE-22045.master.001.patch > Mutable range histogram reports incorrect outliers > -- > > Key: HBASE-22045 > URL: https://issues.apache.org/jira/browse/HBASE-22045 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 1.5.0, 1.3.3, 2.0.0, 1.4.9, 2.1.3, 2.2.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > Attachments: HBASE-22045.master.001.patch > > > MutableRangeHistogram during the snapshot calculates the outliers (eg. > mutate_TimeRange_60-inf) and adds the counter with incorrect calculation > by using the overall count of event and not the number of events in the > snapshot. > {code:java} > long val = histogram.getCount(); > if (val - cumNum > 0) { > metricsRecordBuilder.addCounter( > Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - > 1] + "-inf", desc), > val - cumNum); > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22045) Mutable range histogram reports incorrect outliers
[ https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-22045: --- Status: Patch Available (was: Open) > Mutable range histogram reports incorrect outliers > -- > > Key: HBASE-22045 > URL: https://issues.apache.org/jira/browse/HBASE-22045 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.3, 1.4.9, 2.0.0, 1.3.3, 1.5.0, 3.0.0, 2.2.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > Attachments: HBASE-22045.master.001.patch > > > MutableRangeHistogram during the snapshot calculates the outliers (eg. > mutate_TimeRange_60-inf) and adds the counter with incorrect calculation > by using the overall count of event and not the number of events in the > snapshot. > {code:java} > long val = histogram.getCount(); > if (val - cumNum > 0) { > metricsRecordBuilder.addCounter( > Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - > 1] + "-inf", desc), > val - cumNum); > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22045) Mutable range histogram reports incorrect outliers
[ https://issues.apache.org/jira/browse/HBASE-22045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790850#comment-16790850 ] Abhishek Singh Chouhan commented on HBASE-22045: [~apurtell] [~lhofhansl] > Mutable range histogram reports incorrect outliers > -- > > Key: HBASE-22045 > URL: https://issues.apache.org/jira/browse/HBASE-22045 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 1.5.0, 1.3.3, 2.0.0, 1.4.9, 2.1.3, 2.2.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Major > Attachments: HBASE-22045.master.001.patch > > > MutableRangeHistogram during the snapshot calculates the outliers (eg. > mutate_TimeRange_60-inf) and adds the counter with incorrect calculation > by using the overall count of event and not the number of events in the > snapshot. > {code:java} > long val = histogram.getCount(); > if (val - cumNum > 0) { > metricsRecordBuilder.addCounter( > Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - > 1] + "-inf", desc), > val - cumNum); > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-12133) Add FastLongHistogram for metric computation
[ https://issues.apache.org/jira/browse/HBASE-12133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-12133: --- Description: FastLongHistogram is a thread-safe class that estimate distribution of data and computes the quantiles. It's useful for computing aggregated metrics like P99/P95. (was: _emphasized text_FastLongHistogram is a thread-safe class that estimate distribution of data and computes the quantiles. It's useful for computing aggregated metrics like P99/P95. ) > Add FastLongHistogram for metric computation > > > Key: HBASE-12133 > URL: https://issues.apache.org/jira/browse/HBASE-12133 > Project: HBase > Issue Type: New Feature > Components: metrics >Affects Versions: 0.98.8 >Reporter: Yi Deng >Assignee: Yi Deng >Priority: Minor > Labels: histogram, metrics > Fix For: 0.99.1, 1.3.0 > > Attachments: > 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch, > 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch, > 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch, > 12133.addendum.txt > > > FastLongHistogram is a thread-safe class that estimate distribution of data > and computes the quantiles. It's useful for computing aggregated metrics like > P99/P95. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-12133) Add FastLongHistogram for metric computation
[ https://issues.apache.org/jira/browse/HBASE-12133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-12133: --- Description: _emphasized text_FastLongHistogram is a thread-safe class that estimate distribution of data and computes the quantiles. It's useful for computing aggregated metrics like P99/P95. was: FastLongHistogram is a thread-safe class that estimate distribution of data and computes the quantiles. It's useful for computing aggregated metrics like P99/P95. > Add FastLongHistogram for metric computation > > > Key: HBASE-12133 > URL: https://issues.apache.org/jira/browse/HBASE-12133 > Project: HBase > Issue Type: New Feature > Components: metrics >Affects Versions: 0.98.8 >Reporter: Yi Deng >Assignee: Yi Deng >Priority: Minor > Labels: histogram, metrics > Fix For: 0.99.1, 1.3.0 > > Attachments: > 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch, > 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch, > 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch, > 12133.addendum.txt > > > _emphasized text_FastLongHistogram is a thread-safe class that estimate > distribution of data and computes the quantiles. It's useful for computing > aggregated metrics like P99/P95. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21680) Port HBASE-20194 (Basic Replication WebUI - Master) and HBASE-20193 (Basic Replication Web UI - Regionserver) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-21680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748299#comment-16748299 ] Abhishek Singh Chouhan commented on HBASE-21680: Sounds good. Thanks! > Port HBASE-20194 (Basic Replication WebUI - Master) and HBASE-20193 (Basic > Replication Web UI - Regionserver) to branch-1 > - > > Key: HBASE-21680 > URL: https://issues.apache.org/jira/browse/HBASE-21680 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-21680-branch-1.patch, HBASE-21680-branch-1.patch, > HBASE-21680-branch-1.patch, HBASE-21680-branch-1.patch, Screen Shot > 2019-01-16 at 3.20.00 PM.png, Screen Shot 2019-01-16 at 3.20.50 PM.png, > Screen Shot 2019-01-16 at 3.21.17 PM.png, Screen Shot 2019-01-17 at 5.25.21 > PM.png > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21680) Port HBASE-20194 (Basic Replication WebUI - Master) and HBASE-20193 (Basic Replication Web UI - Regionserver) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-21680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748224#comment-16748224 ] Abhishek Singh Chouhan commented on HBASE-21680: Latest patch LGTM, +1. [~apurtell] Do we also want to fix the NPE issue over at HBASE-21749 with the patch here? (looks like we may hit the same thing here also) > Port HBASE-20194 (Basic Replication WebUI - Master) and HBASE-20193 (Basic > Replication Web UI - Regionserver) to branch-1 > - > > Key: HBASE-21680 > URL: https://issues.apache.org/jira/browse/HBASE-21680 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-21680-branch-1.patch, HBASE-21680-branch-1.patch, > HBASE-21680-branch-1.patch, HBASE-21680-branch-1.patch, Screen Shot > 2019-01-16 at 3.20.00 PM.png, Screen Shot 2019-01-16 at 3.20.50 PM.png, > Screen Shot 2019-01-16 at 3.21.17 PM.png, Screen Shot 2019-01-17 at 5.25.21 > PM.png > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21616) Port HBASE-21034 (Add new throttle type: read/write capacity unit) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744502#comment-16744502 ] Abhishek Singh Chouhan commented on HBASE-21616: LGTM +1. Checkstyle warnings of the previous QA look related. Might be good to have a look. > Port HBASE-21034 (Add new throttle type: read/write capacity unit) to branch-1 > -- > > Key: HBASE-21616 > URL: https://issues.apache.org/jira/browse/HBASE-21616 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-21616-branch-1.patch, HBASE-21616-branch-1.patch > > > Port HBASE-21034 (Add new throttle type: read/write capacity unit) to branch-1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538268#comment-16538268 ] Abhishek Singh Chouhan commented on HBASE-20806: Thanks [~apurtell] for originally suggesting the improvement, review and commit!! :) > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Fix For: 3.0.0, 2.1.0, 1.5.0, 1.2.7, 1.3.3, 1.4.6, 2.0.2 > > Attachments: HBASE-20806.branch-1.001.patch, > HBASE-20806.branch-1.002.patch, HBASE-20806.branch-1.003.patch, > HBASE-20806.branch-2.001.patch, HBASE-20806.master.001.patch, > HBASE-20806.master.002.patch, HBASE-20806.master.003.patch > > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536958#comment-16536958 ] Abhishek Singh Chouhan commented on HBASE-20806: Have attached the patch for branch-2 which can be applied to release branches branch-2.x if needed (same as the patch for master but was not being applied cleanly due to a semicolon present here [https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredTask.java#L33] which is missing in master). > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-20806.branch-1.001.patch, > HBASE-20806.branch-1.002.patch, HBASE-20806.branch-1.003.patch, > HBASE-20806.branch-2.001.patch, HBASE-20806.master.001.patch, > HBASE-20806.master.002.patch, HBASE-20806.master.003.patch > > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-20806: --- Attachment: HBASE-20806.branch-2.001.patch > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-20806.branch-1.001.patch, > HBASE-20806.branch-1.002.patch, HBASE-20806.branch-1.003.patch, > HBASE-20806.branch-2.001.patch, HBASE-20806.master.001.patch, > HBASE-20806.master.002.patch, HBASE-20806.master.003.patch > > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536911#comment-16536911 ] Abhishek Singh Chouhan commented on HBASE-20806: Have pushed to master, branch-2, branch-1. [~apurtell] let me know the release branches i should push this into > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-20806.branch-1.001.patch, > HBASE-20806.branch-1.002.patch, HBASE-20806.branch-1.003.patch, > HBASE-20806.branch-2.001.patch, HBASE-20806.master.001.patch, > HBASE-20806.master.002.patch, HBASE-20806.master.003.patch > > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536603#comment-16536603 ] Abhishek Singh Chouhan commented on HBASE-20806: Will commit later today unless objections. > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-20806.branch-1.001.patch, > HBASE-20806.branch-1.002.patch, HBASE-20806.branch-1.003.patch, > HBASE-20806.master.001.patch, HBASE-20806.master.002.patch, > HBASE-20806.master.003.patch > > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-20806: --- Attachment: HBASE-20806.master.003.patch > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-20806.branch-1.001.patch, > HBASE-20806.branch-1.002.patch, HBASE-20806.branch-1.003.patch, > HBASE-20806.master.001.patch, HBASE-20806.master.002.patch, > HBASE-20806.master.003.patch > > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-20806: --- Attachment: HBASE-20806.branch-1.003.patch > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-20806.branch-1.001.patch, > HBASE-20806.branch-1.002.patch, HBASE-20806.branch-1.003.patch, > HBASE-20806.master.001.patch, HBASE-20806.master.002.patch, > HBASE-20806.master.003.patch > > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533315#comment-16533315 ] Abhishek Singh Chouhan commented on HBASE-20806: Reattaching branch-1 patch. > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-20806.branch-1.001.patch, > HBASE-20806.branch-1.002.patch, HBASE-20806.master.001.patch, > HBASE-20806.master.002.patch > > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-20806: --- Attachment: (was: HBASE-20806.branch-1.002.patch) > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-20806.branch-1.001.patch, > HBASE-20806.branch-1.002.patch, HBASE-20806.master.001.patch, > HBASE-20806.master.002.patch > > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-20806: --- Attachment: HBASE-20806.branch-1.002.patch > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-20806.branch-1.001.patch, > HBASE-20806.branch-1.002.patch, HBASE-20806.master.001.patch, > HBASE-20806.master.002.patch > > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-20806: --- Attachment: HBASE-20806.branch-1.002.patch > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-20806.branch-1.001.patch, > HBASE-20806.branch-1.002.patch, HBASE-20806.master.001.patch, > HBASE-20806.master.002.patch > > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532891#comment-16532891 ] Abhishek Singh Chouhan commented on HBASE-20806: Fixing checkstyle warning and added missing shutdown in the added test. > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-20806.branch-1.001.patch, > HBASE-20806.master.001.patch, HBASE-20806.master.002.patch > > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-20806: --- Attachment: HBASE-20806.master.002.patch > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-20806.branch-1.001.patch, > HBASE-20806.master.001.patch, HBASE-20806.master.002.patch > > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-20806: --- Status: Patch Available (was: Open) Was afk for a few days. Here's a patch for master. > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-20806.branch-1.001.patch, > HBASE-20806.master.001.patch > > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-20806: --- Attachment: HBASE-20806.master.001.patch > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-20806.branch-1.001.patch, > HBASE-20806.master.001.patch > > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527627#comment-16527627 ] Abhishek Singh Chouhan commented on HBASE-20806: Added a simple patch that adds the journaling functionality (much like the earlier split one) to monitoredTask, this is disabled by default and enabled only for flush and compaction(since monitored tasks are also used in other places such as rpcs etc.). Will add patch for master too. Looks something like this: 2018-06-28 21:42:00,959 DEBUG [main] regionserver.HRegion(2129): Flush status journal: Acquiring readlock on region at 1530202320737 Obtaining lock to block concurrent updates at 1530202320738 Preparing to flush by snapshotting stores in bd201548dcb5ac5a951e54af54618b97 at 1530202320738 Finished memstore snapshotting testCompactionFailure,,1530202319214.bd201548dcb5ac5a951e54af54618b97., syncing WAL and waiting on mvcc, flushsize=2952768 at 1530202320747 Flushing stores of testCompactionFailure,,1530202319214.bd201548dcb5ac5a951e54af54618b97. at 1530202320749 Flushing colfamily11: creating writer at 1530202320755 Flushing colfamily11: appending metadata at 1530202320908 > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-20806.branch-1.001.patch > > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-20806: --- Attachment: HBASE-20806.branch-1.001.patch > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-20806.branch-1.001.patch > > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20806) Split style journal for flushes and compactions
[ https://issues.apache.org/jira/browse/HBASE-20806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526282#comment-16526282 ] Abhishek Singh Chouhan commented on HBASE-20806: Both. Thinking of modifying taskmonitor such that we maintain a journal of various status (that we already set in flushes/compactions etc.) and then finally logging it when we complete flush/compaction. > Split style journal for flushes and compactions > --- > > Key: HBASE-20806 > URL: https://issues.apache.org/jira/browse/HBASE-20806 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > > In 1.x we have split transaction journal that gives a clear picture of when > various stages of splits took place. We should have a similar thing for > flushes and compactions so as to have insights into time spent in various > stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20806) Split style journal for flushes and compactions
Abhishek Singh Chouhan created HBASE-20806: -- Summary: Split style journal for flushes and compactions Key: HBASE-20806 URL: https://issues.apache.org/jira/browse/HBASE-20806 Project: HBase Issue Type: Improvement Reporter: Abhishek Singh Chouhan Assignee: Abhishek Singh Chouhan In 1.x we have split transaction journal that gives a clear picture of when various stages of splits took place. We should have a similar thing for flushes and compactions so as to have insights into time spent in various stages, which we can use to identify regressions that might creep up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20139) NPE in RSRpcServices.get() when getRegion throws an exception
[ https://issues.apache.org/jira/browse/HBASE-20139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-20139: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > NPE in RSRpcServices.get() when getRegion throws an exception > - > > Key: HBASE-20139 > URL: https://issues.apache.org/jira/browse/HBASE-20139 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Fix For: 1.3.2, 1.5.0, 1.4.3 > > Attachments: HBASE-20139.branch-1.001.patch, > HBASE-20139.branch-1.3.001.patch, HBASE-20139.branch-1.3.001.patch > > > We can get a NPE in RsRpcServices at > {code:java} > } finally { > if (regionServer.metricsRegionServer != null) { > regionServer.metricsRegionServer.updateGet( > -> region.getTableDesc().getTableName(), EnvironmentEdgeManager.currentTime() > - before); > } > if (quota != null) { > quota.close(); > }{code} > when region itself is null which might happen when getRegion throws an > exception, this is then sent back to the client which is not able to handle > this/make sense of it. > {code:java} > 2018-03-06 08:31:25,100 DEBUG [0,queue=4,port=60020] ipc.RpcServer - > RpcServer.FifoWFPBQ.default.handler=30,queue=4,port=60020: callId: 5605567 > service: ClientService methodName: Get size: 79 connection: xyz:58736 > deadline: 9223372036854775807 > java.io.IOException > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2431) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2246) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35068) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2373) > ... 3 more{code} > This has been fixed by [~stack] over at HBASE-18946 for master, backporting > the same to branch-1, 1.3 and 1.4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20139) NPE in RSRpcServices.get() when getRegion throws an exception
[ https://issues.apache.org/jira/browse/HBASE-20139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389131#comment-16389131 ] Abhishek Singh Chouhan commented on HBASE-20139: Pushed to branch-1, branch-1.4, branch-1.3. Thanks [~stack] [~uagashe]!!! :) > NPE in RSRpcServices.get() when getRegion throws an exception > - > > Key: HBASE-20139 > URL: https://issues.apache.org/jira/browse/HBASE-20139 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Fix For: 1.3.2, 1.5.0, 1.4.3 > > Attachments: HBASE-20139.branch-1.001.patch, > HBASE-20139.branch-1.3.001.patch, HBASE-20139.branch-1.3.001.patch > > > We can get a NPE in RsRpcServices at > {code:java} > } finally { > if (regionServer.metricsRegionServer != null) { > regionServer.metricsRegionServer.updateGet( > -> region.getTableDesc().getTableName(), EnvironmentEdgeManager.currentTime() > - before); > } > if (quota != null) { > quota.close(); > }{code} > when region itself is null which might happen when getRegion throws an > exception, this is then sent back to the client which is not able to handle > this/make sense of it. > {code:java} > 2018-03-06 08:31:25,100 DEBUG [0,queue=4,port=60020] ipc.RpcServer - > RpcServer.FifoWFPBQ.default.handler=30,queue=4,port=60020: callId: 5605567 > service: ClientService methodName: Get size: 79 connection: xyz:58736 > deadline: 9223372036854775807 > java.io.IOException > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2431) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2246) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35068) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2373) > ... 3 more{code} > This has been fixed by [~stack] over at HBASE-18946 for master, backporting > the same to branch-1, 1.3 and 1.4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20139) NPE in RSRpcServices.get() when getRegion throws an exception
[ https://issues.apache.org/jira/browse/HBASE-20139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389108#comment-16389108 ] Abhishek Singh Chouhan commented on HBASE-20139: Test failure is unrelated. Committing shortly. > NPE in RSRpcServices.get() when getRegion throws an exception > - > > Key: HBASE-20139 > URL: https://issues.apache.org/jira/browse/HBASE-20139 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Fix For: 1.3.2, 1.5.0, 1.4.3 > > Attachments: HBASE-20139.branch-1.001.patch, > HBASE-20139.branch-1.3.001.patch, HBASE-20139.branch-1.3.001.patch > > > We can get a NPE in RsRpcServices at > {code:java} > } finally { > if (regionServer.metricsRegionServer != null) { > regionServer.metricsRegionServer.updateGet( > -> region.getTableDesc().getTableName(), EnvironmentEdgeManager.currentTime() > - before); > } > if (quota != null) { > quota.close(); > }{code} > when region itself is null which might happen when getRegion throws an > exception, this is then sent back to the client which is not able to handle > this/make sense of it. > {code:java} > 2018-03-06 08:31:25,100 DEBUG [0,queue=4,port=60020] ipc.RpcServer - > RpcServer.FifoWFPBQ.default.handler=30,queue=4,port=60020: callId: 5605567 > service: ClientService methodName: Get size: 79 connection: xyz:58736 > deadline: 9223372036854775807 > java.io.IOException > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2431) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2246) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35068) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2373) > ... 3 more{code} > This has been fixed by [~stack] over at HBASE-18946 for master, backporting > the same to branch-1, 1.3 and 1.4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20139) NPE in RSRpcServices.get() when getRegion throws an exception
[ https://issues.apache.org/jira/browse/HBASE-20139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-20139: --- Attachment: HBASE-20139.branch-1.3.001.patch > NPE in RSRpcServices.get() when getRegion throws an exception > - > > Key: HBASE-20139 > URL: https://issues.apache.org/jira/browse/HBASE-20139 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Fix For: 1.3.2, 1.5.0, 1.4.3 > > Attachments: HBASE-20139.branch-1.001.patch, > HBASE-20139.branch-1.3.001.patch > > > We can get a NPE in RsRpcServices at > {code:java} > } finally { > if (regionServer.metricsRegionServer != null) { > regionServer.metricsRegionServer.updateGet( > -> region.getTableDesc().getTableName(), EnvironmentEdgeManager.currentTime() > - before); > } > if (quota != null) { > quota.close(); > }{code} > when region itself is null which might happen when getRegion throws an > exception, this is then sent back to the client which is not able to handle > this/make sense of it. > {code:java} > 2018-03-06 08:31:25,100 DEBUG [0,queue=4,port=60020] ipc.RpcServer - > RpcServer.FifoWFPBQ.default.handler=30,queue=4,port=60020: callId: 5605567 > service: ClientService methodName: Get size: 79 connection: xyz:58736 > deadline: 9223372036854775807 > java.io.IOException > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2431) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2246) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35068) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2373) > ... 3 more{code} > This has been fixed by [~stack] over at HBASE-18946 for master, backporting > the same to branch-1, 1.3 and 1.4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20139) NPE in RSRpcServices.get() when getRegion throws an exception
[ https://issues.apache.org/jira/browse/HBASE-20139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-20139: --- Status: Patch Available (was: Open) branch-1 patch applies to branch-1.4 too. > NPE in RSRpcServices.get() when getRegion throws an exception > - > > Key: HBASE-20139 > URL: https://issues.apache.org/jira/browse/HBASE-20139 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Fix For: 1.3.2, 1.5.0, 1.4.3 > > Attachments: HBASE-20139.branch-1.001.patch, > HBASE-20139.branch-1.3.001.patch > > > We can get a NPE in RsRpcServices at > {code:java} > } finally { > if (regionServer.metricsRegionServer != null) { > regionServer.metricsRegionServer.updateGet( > -> region.getTableDesc().getTableName(), EnvironmentEdgeManager.currentTime() > - before); > } > if (quota != null) { > quota.close(); > }{code} > when region itself is null which might happen when getRegion throws an > exception, this is then sent back to the client which is not able to handle > this/make sense of it. > {code:java} > 2018-03-06 08:31:25,100 DEBUG [0,queue=4,port=60020] ipc.RpcServer - > RpcServer.FifoWFPBQ.default.handler=30,queue=4,port=60020: callId: 5605567 > service: ClientService methodName: Get size: 79 connection: xyz:58736 > deadline: 9223372036854775807 > java.io.IOException > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2431) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2246) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35068) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2373) > ... 3 more{code} > This has been fixed by [~stack] over at HBASE-18946 for master, backporting > the same to branch-1, 1.3 and 1.4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20139) NPE in RSRpcServices.get() when getRegion throws an exception
[ https://issues.apache.org/jira/browse/HBASE-20139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-20139: --- Attachment: HBASE-20139.branch-1.001.patch > NPE in RSRpcServices.get() when getRegion throws an exception > - > > Key: HBASE-20139 > URL: https://issues.apache.org/jira/browse/HBASE-20139 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Fix For: 1.3.2, 1.5.0, 1.4.3 > > Attachments: HBASE-20139.branch-1.001.patch > > > We can get a NPE in RsRpcServices at > {code:java} > } finally { > if (regionServer.metricsRegionServer != null) { > regionServer.metricsRegionServer.updateGet( > -> region.getTableDesc().getTableName(), EnvironmentEdgeManager.currentTime() > - before); > } > if (quota != null) { > quota.close(); > }{code} > when region itself is null which might happen when getRegion throws an > exception, this is then sent back to the client which is not able to handle > this/make sense of it. > {code:java} > 2018-03-06 08:31:25,100 DEBUG [0,queue=4,port=60020] ipc.RpcServer - > RpcServer.FifoWFPBQ.default.handler=30,queue=4,port=60020: callId: 5605567 > service: ClientService methodName: Get size: 79 connection: xyz:58736 > deadline: 9223372036854775807 > java.io.IOException > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2431) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2246) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35068) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2373) > ... 3 more{code} > This has been fixed by [~stack] over at HBASE-18946 for master, backporting > the same to branch-1, 1.3 and 1.4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20139) NPE in RSRpcServices.get() when getRegion throws an exception
Abhishek Singh Chouhan created HBASE-20139: -- Summary: NPE in RSRpcServices.get() when getRegion throws an exception Key: HBASE-20139 URL: https://issues.apache.org/jira/browse/HBASE-20139 Project: HBase Issue Type: Bug Affects Versions: 1.3.1 Reporter: Abhishek Singh Chouhan Assignee: Abhishek Singh Chouhan Fix For: 1.3.2, 1.5.0, 1.4.3 We can get a NPE in RsRpcServices at {code:java} } finally { if (regionServer.metricsRegionServer != null) { regionServer.metricsRegionServer.updateGet( -> region.getTableDesc().getTableName(), EnvironmentEdgeManager.currentTime() - before); } if (quota != null) { quota.close(); }{code} when region itself is null which might happen when getRegion throws an exception, this is then sent back to the client which is not able to handle this/make sense of it. {code:java} 2018-03-06 08:31:25,100 DEBUG [0,queue=4,port=60020] ipc.RpcServer - RpcServer.FifoWFPBQ.default.handler=30,queue=4,port=60020: callId: 5605567 service: ClientService methodName: Get size: 79 connection: xyz:58736 deadline: 9223372036854775807 java.io.IOException at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2431) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2246) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35068) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2373) ... 3 more{code} This has been fixed by [~stack] over at HBASE-18946 for master, backporting the same to branch-1, 1.3 and 1.4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19858) Backport HBASE-14061 (Support CF-level Storage Policy) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-19858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348860#comment-16348860 ] Abhishek Singh Chouhan commented on HBASE-19858: In StoreFile.java we might want to add default value too, else we might end up passing null to setStoragePolicy : {noformat} if (null == policyName) { - policyName = this.conf.get(HStore.BLOCK_STORAGE_POLICY_KEY); + policyName = this.conf.get(HStore.BLOCK_STORAGE_POLICY_KEY, HStore.DEFAULT_BLOCK_STORAGE_POLICY) }{noformat} Rest LGTM. > Backport HBASE-14061 (Support CF-level Storage Policy) to branch-1 > -- > > Key: HBASE-19858 > URL: https://issues.apache.org/jira/browse/HBASE-19858 > Project: HBase > Issue Type: Task >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-19858-branch-1.patch > > > Backport the following commits to branch-1: > * HBASE-14061 Support CF-level Storage Policy > * HBASE-14061 Support CF-level Storage Policy (addendum) > * HBASE-14061 Support CF-level Storage Policy (addendum2) > * HBASE-15172 Support setting storage policy in bulkload > * HBASE-17538 HDFS.setStoragePolicy() logs errors on local fs > * HBASE-18015 Storage class aware block placement for procedure v2 WALs > * HBASE-18017 Reduce frequency of setStoragePolicy failure warnings > * HBASE-19016 Coordinate storage policy property name for table schema and > bulkload > > Fix > * Default storage policy if not configured cannot be "NONE" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-19858) Backport HBASE-14061 (Support CF-level Storage Policy) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-19858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348766#comment-16348766 ] Abhishek Singh Chouhan edited comment on HBASE-19858 at 2/1/18 3:40 PM: While going through the patch realized that we check both hbase.hstore.block.storage.policy. and hbase.hstore.block.storage.policy for the bulk load case, hbase.hstore.block.storage.policy. gives the impression of setting the property in general for any cf with the name cf (which is not the case since in hstore we only check column descriptor or hbase.hstore.block.storage.policy). Can probably file a Jira to name it something like hbase.hstore.block.storage.policy.bulkload.cf_name. was (Author: abhishek.chouhan): While going through the patch realized that we check both hbase.hstore.block.storage.policy. and hbase.hstore.block.storage.policy for the bulk load case, hbase.hstore.block.storage.policy. gives the impression of setting the property in general for any cf with the name cf (which is not the case since in hstore we only check table descriptor or hbase.hstore.block.storage.policy). Can probably file a Jira to name it something like hbase.hstore.block.storage.policy.bulkload.cf_name. > Backport HBASE-14061 (Support CF-level Storage Policy) to branch-1 > -- > > Key: HBASE-19858 > URL: https://issues.apache.org/jira/browse/HBASE-19858 > Project: HBase > Issue Type: Task >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-19858-branch-1.patch > > > Backport the following commits to branch-1: > * HBASE-14061 Support CF-level Storage Policy > * HBASE-14061 Support CF-level Storage Policy (addendum) > * HBASE-14061 Support CF-level Storage Policy (addendum2) > * HBASE-15172 Support setting storage policy in bulkload > * HBASE-17538 HDFS.setStoragePolicy() logs errors on local fs > * HBASE-18015 Storage class aware block placement for procedure v2 WALs > * HBASE-18017 Reduce frequency of setStoragePolicy failure warnings > * HBASE-19016 Coordinate storage policy property name for table schema and > bulkload > > Fix > * Default storage policy if not configured cannot be "NONE" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19858) Backport HBASE-14061 (Support CF-level Storage Policy) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-19858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348766#comment-16348766 ] Abhishek Singh Chouhan commented on HBASE-19858: While going through the patch realized that we check both hbase.hstore.block.storage.policy. and hbase.hstore.block.storage.policy for the bulk load case, hbase.hstore.block.storage.policy. gives the impression of setting the property in general for any cf with the name cf (which is not the case since in hstore we only check table descriptor or hbase.hstore.block.storage.policy). Can probably file a Jira to name it something like hbase.hstore.block.storage.policy.bulkload.cf_name. > Backport HBASE-14061 (Support CF-level Storage Policy) to branch-1 > -- > > Key: HBASE-19858 > URL: https://issues.apache.org/jira/browse/HBASE-19858 > Project: HBase > Issue Type: Task >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-19858-branch-1.patch > > > Backport the following commits to branch-1: > * HBASE-14061 Support CF-level Storage Policy > * HBASE-14061 Support CF-level Storage Policy (addendum) > * HBASE-14061 Support CF-level Storage Policy (addendum2) > * HBASE-15172 Support setting storage policy in bulkload > * HBASE-17538 HDFS.setStoragePolicy() logs errors on local fs > * HBASE-18015 Storage class aware block placement for procedure v2 WALs > * HBASE-18017 Reduce frequency of setStoragePolicy failure warnings > * HBASE-19016 Coordinate storage policy property name for table schema and > bulkload > > Fix > * Default storage policy if not configured cannot be "NONE" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19440) Not able to enable balancer with RSGroups once disabled
[ https://issues.apache.org/jira/browse/HBASE-19440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281373#comment-16281373 ] Abhishek Singh Chouhan commented on HBASE-19440: Thanks [~apurtell] [~tedyu] :) > Not able to enable balancer with RSGroups once disabled > --- > > Key: HBASE-19440 > URL: https://issues.apache.org/jira/browse/HBASE-19440 > Project: HBase > Issue Type: Bug >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan > Fix For: 1.4.0 > > Attachments: HBASE-19440.branch-1.001.patch > > > Once the balancer is disabled, trying to switch it back on doesn't work since > the prebalanceswitch coprocessor hook is incorrectly always returning false. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19440) Not able to enable balancer with RSGroups once disabled
[ https://issues.apache.org/jira/browse/HBASE-19440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-19440: --- Status: Patch Available (was: Open) Getting a QA. > Not able to enable balancer with RSGroups once disabled > --- > > Key: HBASE-19440 > URL: https://issues.apache.org/jira/browse/HBASE-19440 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan > Fix For: 1.3.2, 1.4.1 > > Attachments: HBASE-19440.branch-1.001.patch > > > Once the balancer is disabled, trying to switch it back on doesn't work since > the prebalanceswitch coprocessor hook is incorrectly always returning false. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19440) Not able to enable balancer with RSGroups once disabled
[ https://issues.apache.org/jira/browse/HBASE-19440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-19440: --- Attachment: HBASE-19440.branch-1.001.patch > Not able to enable balancer with RSGroups once disabled > --- > > Key: HBASE-19440 > URL: https://issues.apache.org/jira/browse/HBASE-19440 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan > Fix For: 1.3.2, 1.4.1 > > Attachments: HBASE-19440.branch-1.001.patch > > > Once the balancer is disabled, trying to switch it back on doesn't work since > the prebalanceswitch coprocessor hook is incorrectly always returning false. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19440) Not able to enable balancer with RSGroups once disabled
Abhishek Singh Chouhan created HBASE-19440: -- Summary: Not able to enable balancer with RSGroups once disabled Key: HBASE-19440 URL: https://issues.apache.org/jira/browse/HBASE-19440 Project: HBase Issue Type: Bug Affects Versions: 1.3.1 Reporter: Abhishek Singh Chouhan Assignee: Abhishek Singh Chouhan Fix For: 1.3.2, 1.4.1 Once the balancer is disabled, trying to switch it back on doesn't work since the prebalanceswitch coprocessor hook is incorrectly always returning false. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18127) Enable state to be passed between the region observer coprocessor hook calls
[ https://issues.apache.org/jira/browse/HBASE-18127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264980#comment-16264980 ] Abhishek Singh Chouhan commented on HBASE-18127: The cases that i saw had all the coproc hooks related to an operation being executed by a single thread. Do we have cases where some coproc hooks are executed by one thread and other by some other thread so we need to pass state between them? (need to check more here). For the general case where a thread from some pool executes the operation involving coproc hooks (all hooks related to that operation), i was thinking maybe we could have a util class that subclasses ThreadPoolExecutor and set the thread local in beforeExecute and remove in afterExecute, and then we use this threadpool whereever we're using threadpools for operations that involve coproc hooks. > Enable state to be passed between the region observer coprocessor hook calls > > > Key: HBASE-18127 > URL: https://issues.apache.org/jira/browse/HBASE-18127 > Project: HBase > Issue Type: New Feature >Reporter: Lars Hofhansl >Assignee: Abhishek Singh Chouhan > Attachments: HBASE-18127.master.001.patch, > HBASE-18127.master.002.patch, HBASE-18127.master.002.patch, > HBASE-18127.master.003.patch, HBASE-18127.master.004.patch, > HBASE-18127.master.005.patch, HBASE-18127.master.005.patch, > HBASE-18127.master.006.patch > > > Allow regionobserver to optionally skip postPut/postDelete when > postBatchMutate was called. > Right now a RegionObserver can only statically implement one or the other. In > scenarios where we need to work sometimes on the single postPut and > postDelete hooks and sometimes on the batchMutate hooks, there is currently > no place to convey this information to the single hooks. I.e. the work has > been done in the batch, skip the single hooks. > There are various solutions: > 1. Allow some state to be passed _per operation_. > 2. Remove the single hooks and always only call batch hooks (with a default > wrapper for the single hooks). > 3. more? > [~apurtell], what we had discussed a few days back. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18127) Enable state to be passed between the region observer coprocessor hook calls
[ https://issues.apache.org/jira/browse/HBASE-18127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264932#comment-16264932 ] Abhishek Singh Chouhan commented on HBASE-18127: As Andrew pointed out we'd need to set this outside of RPC call context since coproc hooks are also called from other places that are not necessarily originating from a RPC call, eg. Flush table operation results in creation of sub procedure on the rs in which case the call context would be null. We might need to have a threadLocal in CoprocessorHost or Environment, however we'd need to set it and remove it across any threadpool doing any operations that involve coprocessor hooks (Doing this for rpc calls and procedures in general would cover most of the use cases but there might be more). [~anoop.hbase] [~ram_krish] [~apurtell] > Enable state to be passed between the region observer coprocessor hook calls > > > Key: HBASE-18127 > URL: https://issues.apache.org/jira/browse/HBASE-18127 > Project: HBase > Issue Type: New Feature >Reporter: Lars Hofhansl >Assignee: Abhishek Singh Chouhan > Attachments: HBASE-18127.master.001.patch, > HBASE-18127.master.002.patch, HBASE-18127.master.002.patch, > HBASE-18127.master.003.patch, HBASE-18127.master.004.patch, > HBASE-18127.master.005.patch, HBASE-18127.master.005.patch, > HBASE-18127.master.006.patch > > > Allow regionobserver to optionally skip postPut/postDelete when > postBatchMutate was called. > Right now a RegionObserver can only statically implement one or the other. In > scenarios where we need to work sometimes on the single postPut and > postDelete hooks and sometimes on the batchMutate hooks, there is currently > no place to convey this information to the single hooks. I.e. the work has > been done in the batch, skip the single hooks. > There are various solutions: > 1. Allow some state to be passed _per operation_. > 2. Remove the single hooks and always only call batch hooks (with a default > wrapper for the single hooks). > 3. more? > [~apurtell], what we had discussed a few days back. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19215) Incorrect exception handling on the client causes incorrect call timeouts and byte buffer allocations on the server
[ https://issues.apache.org/jira/browse/HBASE-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251067#comment-16251067 ] Abhishek Singh Chouhan commented on HBASE-19215: Thanks for reviewing and committing [~apurtell]!! > Incorrect exception handling on the client causes incorrect call timeouts and > byte buffer allocations on the server > --- > > Key: HBASE-19215 > URL: https://issues.apache.org/jira/browse/HBASE-19215 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2 > > Attachments: HBASE-19215-branch-1.3.patch, > HBASE-19215.branch-1.001.patch > > > Ran into the situation of oome on the client : java.lang.OutOfMemoryError: > Direct buffer memory. > When we encounter an unhandled exception during channel write at RpcClientImpl > {noformat} > checkIsOpen(); // Now we're checking that it didn't became idle in between. > try { > call.callStats.setRequestSizeBytes(IPCUtil.write(this.out, header, > call.param, > cellBlock)); > } catch (IOException e) { > {noformat} > we end up leaving the connection open. This becomes especially problematic > when we get an unhandled exception between writing the length of our request > on the channel and subsequently writing the params and cellblocks > {noformat} >*dos.write(Bytes.toBytes(totalSize));* > // This allocates a buffer that is the size of the message internally. > header.writeDelimitedTo(dos); > if (param != null) param.writeDelimitedTo(dos); > if (cellBlock != null) dos.write(cellBlock.array(), 0, > cellBlock.remaining()); > dos.flush(); > return totalSize; > {noformat} > After reading the length rs allocates a bb and expects data to be filled. > However when we encounter an exception during param write we release the > writelock in rpcclientimpl and do not close the connection, the exception is > handled at AbstractRpcClient.callBlockingMethod and retried. Now the next > client request to the same rs writes to the channel however the server > interprets this as part of the previous request and errors out during proto > conversion when processing the request since its considered malformed(in the > worst case this might be misinterpreted as wrong data?). Now the remaining > data of the current request is read(the current request's size > prev > request's allocated partially filled bytebuffer) and is misinterpreted as the > size of new request, in my case this was in gbs. All the client requests time > out since this bytebuffer is never completely filled. We should close the > connection for any Throwable and not just ioexception. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19215) Incorrect exception handling on the client causes incorrect call timeouts and byte buffer allocations on the server
[ https://issues.apache.org/jira/browse/HBASE-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-19215: --- Status: Patch Available (was: Open) > Incorrect exception handling on the client causes incorrect call timeouts and > byte buffer allocations on the server > --- > > Key: HBASE-19215 > URL: https://issues.apache.org/jira/browse/HBASE-19215 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2 > > Attachments: HBASE-19215.branch-1.001.patch > > > Ran into the situation of oome on the client : java.lang.OutOfMemoryError: > Direct buffer memory. > When we encounter an unhandled exception during channel write at RpcClientImpl > {noformat} > checkIsOpen(); // Now we're checking that it didn't became idle in between. > try { > call.callStats.setRequestSizeBytes(IPCUtil.write(this.out, header, > call.param, > cellBlock)); > } catch (IOException e) { > {noformat} > we end up leaving the connection open. This becomes especially problematic > when we get an unhandled exception between writing the length of our request > on the channel and subsequently writing the params and cellblocks > {noformat} >*dos.write(Bytes.toBytes(totalSize));* > // This allocates a buffer that is the size of the message internally. > header.writeDelimitedTo(dos); > if (param != null) param.writeDelimitedTo(dos); > if (cellBlock != null) dos.write(cellBlock.array(), 0, > cellBlock.remaining()); > dos.flush(); > return totalSize; > {noformat} > After reading the length rs allocates a bb and expects data to be filled. > However when we encounter an exception during param write we release the > writelock in rpcclientimpl and do not close the connection, the exception is > handled at AbstractRpcClient.callBlockingMethod and retried. Now the next > client request to the same rs writes to the channel however the server > interprets this as part of the previous request and errors out during proto > conversion when processing the request since its considered malformed(in the > worst case this might be misinterpreted as wrong data?). Now the remaining > data of the current request is read(the current request's size > prev > request's allocated partially filled bytebuffer) and is misinterpreted as the > size of new request, in my case this was in gbs. All the client requests time > out since this bytebuffer is never completely filled. We should close the > connection for any Throwable and not just ioexception. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19215) Incorrect exception handling on the client causes incorrect call timeouts and byte buffer allocations on the server
[ https://issues.apache.org/jira/browse/HBASE-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249449#comment-16249449 ] Abhishek Singh Chouhan commented on HBASE-19215: Patch applies to master as well, 1.3 will need a different one though. > Incorrect exception handling on the client causes incorrect call timeouts and > byte buffer allocations on the server > --- > > Key: HBASE-19215 > URL: https://issues.apache.org/jira/browse/HBASE-19215 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2 > > Attachments: HBASE-19215.branch-1.001.patch > > > Ran into the situation of oome on the client : java.lang.OutOfMemoryError: > Direct buffer memory. > When we encounter an unhandled exception during channel write at RpcClientImpl > {noformat} > checkIsOpen(); // Now we're checking that it didn't became idle in between. > try { > call.callStats.setRequestSizeBytes(IPCUtil.write(this.out, header, > call.param, > cellBlock)); > } catch (IOException e) { > {noformat} > we end up leaving the connection open. This becomes especially problematic > when we get an unhandled exception between writing the length of our request > on the channel and subsequently writing the params and cellblocks > {noformat} >*dos.write(Bytes.toBytes(totalSize));* > // This allocates a buffer that is the size of the message internally. > header.writeDelimitedTo(dos); > if (param != null) param.writeDelimitedTo(dos); > if (cellBlock != null) dos.write(cellBlock.array(), 0, > cellBlock.remaining()); > dos.flush(); > return totalSize; > {noformat} > After reading the length rs allocates a bb and expects data to be filled. > However when we encounter an exception during param write we release the > writelock in rpcclientimpl and do not close the connection, the exception is > handled at AbstractRpcClient.callBlockingMethod and retried. Now the next > client request to the same rs writes to the channel however the server > interprets this as part of the previous request and errors out during proto > conversion when processing the request since its considered malformed(in the > worst case this might be misinterpreted as wrong data?). Now the remaining > data of the current request is read(the current request's size > prev > request's allocated partially filled bytebuffer) and is misinterpreted as the > size of new request, in my case this was in gbs. All the client requests time > out since this bytebuffer is never completely filled. We should close the > connection for any Throwable and not just ioexception. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19215) Incorrect exception handling on the client causes incorrect call timeouts and byte buffer allocations on the server
[ https://issues.apache.org/jira/browse/HBASE-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-19215: --- Attachment: HBASE-19215.branch-1.001.patch Simple patch for branch-1 that catches Throwable and not just ioexception and closes the connection cleanly. [~apurtell] [~anoop.hbase] [~lhofhansl] > Incorrect exception handling on the client causes incorrect call timeouts and > byte buffer allocations on the server > --- > > Key: HBASE-19215 > URL: https://issues.apache.org/jira/browse/HBASE-19215 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2 > > Attachments: HBASE-19215.branch-1.001.patch > > > Ran into the situation of oome on the client : java.lang.OutOfMemoryError: > Direct buffer memory. > When we encounter an unhandled exception during channel write at RpcClientImpl > {noformat} > checkIsOpen(); // Now we're checking that it didn't became idle in between. > try { > call.callStats.setRequestSizeBytes(IPCUtil.write(this.out, header, > call.param, > cellBlock)); > } catch (IOException e) { > {noformat} > we end up leaving the connection open. This becomes especially problematic > when we get an unhandled exception between writing the length of our request > on the channel and subsequently writing the params and cellblocks > {noformat} >*dos.write(Bytes.toBytes(totalSize));* > // This allocates a buffer that is the size of the message internally. > header.writeDelimitedTo(dos); > if (param != null) param.writeDelimitedTo(dos); > if (cellBlock != null) dos.write(cellBlock.array(), 0, > cellBlock.remaining()); > dos.flush(); > return totalSize; > {noformat} > After reading the length rs allocates a bb and expects data to be filled. > However when we encounter an exception during param write we release the > writelock in rpcclientimpl and do not close the connection, the exception is > handled at AbstractRpcClient.callBlockingMethod and retried. Now the next > client request to the same rs writes to the channel however the server > interprets this as part of the previous request and errors out during proto > conversion when processing the request since its considered malformed(in the > worst case this might be misinterpreted as wrong data?). Now the remaining > data of the current request is read(the current request's size > prev > request's allocated partially filled bytebuffer) and is misinterpreted as the > size of new request, in my case this was in gbs. All the client requests time > out since this bytebuffer is never completely filled. We should close the > connection for any Throwable and not just ioexception. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HBASE-19215) Incorrect exception handling on the client causes incorrect call timeouts and byte buffer allocations on the server
[ https://issues.apache.org/jira/browse/HBASE-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248339#comment-16248339 ] Abhishek Singh Chouhan edited comment on HBASE-19215 at 11/11/17 5:28 AM: -- Going to put up a patch on monday [~apurtell], won't be able to see this through on the weekend, feel free to pick this up if you need it fixed before then. was (Author: abhishek.chouhan): Going to put up a patch on monday [~apurtell] > Incorrect exception handling on the client causes incorrect call timeouts and > byte buffer allocations on the server > --- > > Key: HBASE-19215 > URL: https://issues.apache.org/jira/browse/HBASE-19215 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2 > > > Ran into the situation of oome on the client : java.lang.OutOfMemoryError: > Direct buffer memory. > When we encounter an unhandled exception during channel write at RpcClientImpl > {noformat} > checkIsOpen(); // Now we're checking that it didn't became idle in between. > try { > call.callStats.setRequestSizeBytes(IPCUtil.write(this.out, header, > call.param, > cellBlock)); > } catch (IOException e) { > {noformat} > we end up leaving the connection open. This becomes especially problematic > when we get an unhandled exception between writing the length of our request > on the channel and subsequently writing the params and cellblocks > {noformat} >*dos.write(Bytes.toBytes(totalSize));* > // This allocates a buffer that is the size of the message internally. > header.writeDelimitedTo(dos); > if (param != null) param.writeDelimitedTo(dos); > if (cellBlock != null) dos.write(cellBlock.array(), 0, > cellBlock.remaining()); > dos.flush(); > return totalSize; > {noformat} > After reading the length rs allocates a bb and expects data to be filled. > However when we encounter an exception during param write we release the > writelock in rpcclientimpl and do not close the connection, the exception is > handled at AbstractRpcClient.callBlockingMethod and retried. Now the next > client request to the same rs writes to the channel however the server > interprets this as part of the previous request and errors out during proto > conversion when processing the request since its considered malformed(in the > worst case this might be misinterpreted as wrong data?). Now the remaining > data of the current request is read(the current request's size > prev > request's allocated partially filled bytebuffer) and is misinterpreted as the > size of new request, in my case this was in gbs. All the client requests time > out since this bytebuffer is never completely filled. We should close the > connection for any Throwable and not just ioexception. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19215) Incorrect exception handling on the client causes incorrect call timeouts and byte buffer allocations on the server
[ https://issues.apache.org/jira/browse/HBASE-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248339#comment-16248339 ] Abhishek Singh Chouhan commented on HBASE-19215: Going to put up a patch on monday [~apurtell] > Incorrect exception handling on the client causes incorrect call timeouts and > byte buffer allocations on the server > --- > > Key: HBASE-19215 > URL: https://issues.apache.org/jira/browse/HBASE-19215 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2 > > > Ran into the situation of oome on the client : java.lang.OutOfMemoryError: > Direct buffer memory. > When we encounter an unhandled exception during channel write at RpcClientImpl > {noformat} > checkIsOpen(); // Now we're checking that it didn't became idle in between. > try { > call.callStats.setRequestSizeBytes(IPCUtil.write(this.out, header, > call.param, > cellBlock)); > } catch (IOException e) { > {noformat} > we end up leaving the connection open. This becomes especially problematic > when we get an unhandled exception between writing the length of our request > on the channel and subsequently writing the params and cellblocks > {noformat} >*dos.write(Bytes.toBytes(totalSize));* > // This allocates a buffer that is the size of the message internally. > header.writeDelimitedTo(dos); > if (param != null) param.writeDelimitedTo(dos); > if (cellBlock != null) dos.write(cellBlock.array(), 0, > cellBlock.remaining()); > dos.flush(); > return totalSize; > {noformat} > After reading the length rs allocates a bb and expects data to be filled. > However when we encounter an exception during param write we release the > writelock in rpcclientimpl and do not close the connection, the exception is > handled at AbstractRpcClient.callBlockingMethod and retried. Now the next > client request to the same rs writes to the channel however the server > interprets this as part of the previous request and errors out during proto > conversion when processing the request since its considered malformed(in the > worst case this might be misinterpreted as wrong data?). Now the remaining > data of the current request is read(the current request's size > prev > request's allocated partially filled bytebuffer) and is misinterpreted as the > size of new request, in my case this was in gbs. All the client requests time > out since this bytebuffer is never completely filled. We should close the > connection for any Throwable and not just ioexception. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19215) Incorrect exception handling on the client causes incorrect call timeouts and byte buffer allocations on the server
Abhishek Singh Chouhan created HBASE-19215: -- Summary: Incorrect exception handling on the client causes incorrect call timeouts and byte buffer allocations on the server Key: HBASE-19215 URL: https://issues.apache.org/jira/browse/HBASE-19215 Project: HBase Issue Type: Bug Affects Versions: 1.3.1 Reporter: Abhishek Singh Chouhan Assignee: Abhishek Singh Chouhan Ran into the situation of oome on the client : java.lang.OutOfMemoryError: Direct buffer memory. When we encounter an unhandled exception during channel write at RpcClientImpl {noformat} checkIsOpen(); // Now we're checking that it didn't became idle in between. try { call.callStats.setRequestSizeBytes(IPCUtil.write(this.out, header, call.param, cellBlock)); } catch (IOException e) { {noformat} we end up leaving the connection open. This becomes especially problematic when we get an unhandled exception between writing the length of our request on the channel and subsequently writing the params and cellblocks {noformat} *dos.write(Bytes.toBytes(totalSize));* // This allocates a buffer that is the size of the message internally. header.writeDelimitedTo(dos); if (param != null) param.writeDelimitedTo(dos); if (cellBlock != null) dos.write(cellBlock.array(), 0, cellBlock.remaining()); dos.flush(); return totalSize; {noformat} After reading the length rs allocates a bb and expects data to be filled. However when we encounter an exception during param write we release the writelock in rpcclientimpl and do not close the connection, the exception is handled at AbstractRpcClient.callBlockingMethod and retried. Now the next client request to the same rs writes to the channel however the server interprets this as part of the previous request and errors out during proto conversion when processing the request since its considered malformed(in the worst case this might be misinterpreted as wrong data?). Now the remaining data of the current request is read(the current request's size > prev request's allocated partially filled bytebuffer) and is misinterpreted as the size of new request, in my case this was in gbs. All the client requests time out since this bytebuffer is never completely filled. We should close the connection for any Throwable and not just ioexception. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19094) NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup
[ https://issues.apache.org/jira/browse/HBASE-19094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-19094: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.0.0-alpha-4 1.5.0 1.4.0 3.0.0 Status: Resolved (was: Patch Available) > NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup > - > > Key: HBASE-19094 > URL: https://issues.apache.org/jira/browse/HBASE-19094 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Fix For: 3.0.0, 1.4.0, 1.5.0, 2.0.0-alpha-4 > > Attachments: HBASE-19094.branch-1.001.patch, > HBASE-19094.master.001.patch, HBASE-19094.master.001.patch > > > {noformat} > rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker - Caught exception while > verifying group region > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1638) > at > org.apache.hadoop.hbase.client.ConnectionUtils$2.getClient(ConnectionUtils.java:167) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker$1.visit(RSGroupInfoManagerImpl.java:646) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:638) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:159) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.waitForGroupTableOnline(RSGroupInfoManagerImpl.java:661) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.run(RSGroupInfoManagerImpl.java:582) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19094) NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup
[ https://issues.apache.org/jira/browse/HBASE-19094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1674#comment-1674 ] Abhishek Singh Chouhan commented on HBASE-19094: Pushed to branch-1.4+. Thanks [~vik.karma] [~yuzhih...@gmail.com]!! > NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup > - > > Key: HBASE-19094 > URL: https://issues.apache.org/jira/browse/HBASE-19094 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-19094.branch-1.001.patch, > HBASE-19094.master.001.patch, HBASE-19094.master.001.patch > > > {noformat} > rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker - Caught exception while > verifying group region > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1638) > at > org.apache.hadoop.hbase.client.ConnectionUtils$2.getClient(ConnectionUtils.java:167) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker$1.visit(RSGroupInfoManagerImpl.java:646) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:638) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:159) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.waitForGroupTableOnline(RSGroupInfoManagerImpl.java:661) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.run(RSGroupInfoManagerImpl.java:582) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19094) NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup
[ https://issues.apache.org/jira/browse/HBASE-19094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-19094: --- Attachment: HBASE-19094.master.001.patch Hadoop QA din't pick the master patch, let me try again. > NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup > - > > Key: HBASE-19094 > URL: https://issues.apache.org/jira/browse/HBASE-19094 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-19094.branch-1.001.patch, > HBASE-19094.master.001.patch, HBASE-19094.master.001.patch > > > {noformat} > rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker - Caught exception while > verifying group region > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1638) > at > org.apache.hadoop.hbase.client.ConnectionUtils$2.getClient(ConnectionUtils.java:167) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker$1.visit(RSGroupInfoManagerImpl.java:646) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:638) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:159) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.waitForGroupTableOnline(RSGroupInfoManagerImpl.java:661) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.run(RSGroupInfoManagerImpl.java:582) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19094) NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup
[ https://issues.apache.org/jira/browse/HBASE-19094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220615#comment-16220615 ] Abhishek Singh Chouhan commented on HBASE-19094: [~tedyu] Observed this on a cluster in master logs and not in a UT. Its coming from a meta scan that uses a custom result visitor for RSGroups stuff, the exception is logged and swallowed and ultimately retried, which succeeded. Since its a very specific code that parses the servername and then uses it to get a BlockingInterface i don't think it'd be worth it to add a ut for this catch. Looks to be a null check miss. > NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup > - > > Key: HBASE-19094 > URL: https://issues.apache.org/jira/browse/HBASE-19094 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-19094.branch-1.001.patch, > HBASE-19094.master.001.patch > > > {noformat} > rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker - Caught exception while > verifying group region > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1638) > at > org.apache.hadoop.hbase.client.ConnectionUtils$2.getClient(ConnectionUtils.java:167) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker$1.visit(RSGroupInfoManagerImpl.java:646) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:638) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:159) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.waitForGroupTableOnline(RSGroupInfoManagerImpl.java:661) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.run(RSGroupInfoManagerImpl.java:582) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19094) NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup
[ https://issues.apache.org/jira/browse/HBASE-19094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-19094: --- Attachment: HBASE-19094.branch-1.001.patch > NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup > - > > Key: HBASE-19094 > URL: https://issues.apache.org/jira/browse/HBASE-19094 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-19094.branch-1.001.patch, > HBASE-19094.master.001.patch > > > {noformat} > rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker - Caught exception while > verifying group region > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1638) > at > org.apache.hadoop.hbase.client.ConnectionUtils$2.getClient(ConnectionUtils.java:167) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker$1.visit(RSGroupInfoManagerImpl.java:646) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:638) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:159) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.waitForGroupTableOnline(RSGroupInfoManagerImpl.java:661) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.run(RSGroupInfoManagerImpl.java:582) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19094) NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup
[ https://issues.apache.org/jira/browse/HBASE-19094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-19094: --- Status: Patch Available (was: Open) > NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup > - > > Key: HBASE-19094 > URL: https://issues.apache.org/jira/browse/HBASE-19094 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-19094.master.001.patch > > > {noformat} > rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker - Caught exception while > verifying group region > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1638) > at > org.apache.hadoop.hbase.client.ConnectionUtils$2.getClient(ConnectionUtils.java:167) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker$1.visit(RSGroupInfoManagerImpl.java:646) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:638) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:159) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.waitForGroupTableOnline(RSGroupInfoManagerImpl.java:661) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.run(RSGroupInfoManagerImpl.java:582) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19094) NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup
[ https://issues.apache.org/jira/browse/HBASE-19094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh Chouhan updated HBASE-19094: --- Attachment: HBASE-19094.master.001.patch > NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup > - > > Key: HBASE-19094 > URL: https://issues.apache.org/jira/browse/HBASE-19094 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Abhishek Singh Chouhan >Assignee: Abhishek Singh Chouhan >Priority: Minor > Attachments: HBASE-19094.master.001.patch > > > {noformat} > rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker - Caught exception while > verifying group region > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1638) > at > org.apache.hadoop.hbase.client.ConnectionUtils$2.getClient(ConnectionUtils.java:167) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker$1.visit(RSGroupInfoManagerImpl.java:646) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:638) > at > org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:159) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.waitForGroupTableOnline(RSGroupInfoManagerImpl.java:661) > at > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.run(RSGroupInfoManagerImpl.java:582) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19094) NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup
Abhishek Singh Chouhan created HBASE-19094: -- Summary: NPE in RSGroupStartupWorker.waitForGroupTableOnline during master startup Key: HBASE-19094 URL: https://issues.apache.org/jira/browse/HBASE-19094 Project: HBase Issue Type: Bug Affects Versions: 1.4.0 Reporter: Abhishek Singh Chouhan Assignee: Abhishek Singh Chouhan Priority: Minor {noformat} rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker - Caught exception while verifying group region java.lang.NullPointerException at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(ConnectionManager.java:1638) at org.apache.hadoop.hbase.client.ConnectionUtils$2.getClient(ConnectionUtils.java:167) at org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker$1.visit(RSGroupInfoManagerImpl.java:646) at org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:638) at org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:159) at org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.waitForGroupTableOnline(RSGroupInfoManagerImpl.java:661) at org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$RSGroupStartupWorker.run(RSGroupInfoManagerImpl.java:582) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)