[jira] [Updated] (HBASE-13937) Partially revert HBASE-13172
[ https://issues.apache.org/jira/browse/HBASE-13937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13937: Fix Version/s: (was: 1.3.0) Partially revert HBASE-13172 - Key: HBASE-13937 URL: https://issues.apache.org/jira/browse/HBASE-13937 Project: HBase Issue Type: Sub-task Components: Region Assignment Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.14, 1.0.2, 1.2.0, 1.1.1 Attachments: hbase-13937_v1.patch, hbase-13937_v2.patch, hbase-13937_v3-branch-1.1.patch, hbase-13937_v3.patch, hbase-13937_v3.patch HBASE-13172 is supposed to fix a UT issue, but causes other problems that parent jira (HBASE-13605) is attempting to fix. However, HBASE-13605 patch v4 uncovers at least 2 different issues which are, to put it mildly, major design flaws in AM / RS. Regardless of 13605, the issue with 13172 is that we catch {{ServerNotRunningYetException}} from {{isServerReachable()}} and return false, which then puts the Server to the {{RegionStates.deadServers}} list. Once it is in that list, we can still assign and unassign regions to the RS after it has started (because regular assignment does not check whether the server is in {{RegionStates.deadServers}}. However, after the first assign and unassign, we cannot assign the region again since then the check for the lastServer will think that the server is dead. It turns out that a proper patch for 13605 is very hard without fixing rest of broken AM assumptions (see HBASE-13605, HBASE-13877 and HBASE-13895 for a colorful history). For 1.1.1, I think we should just revert parts of HBASE-13172 for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14021) Quota table has a wrong description on the UI
[ https://issues.apache.org/jira/browse/HBASE-14021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613225#comment-14613225 ] Ashish Singhi commented on HBASE-14021: --- Thanks Ted for the quick review. Quota table has a wrong description on the UI - Key: HBASE-14021 URL: https://issues.apache.org/jira/browse/HBASE-14021 Project: HBase Issue Type: Bug Components: UI Affects Versions: 1.1.0 Reporter: Ashish Singhi Assignee: Ashish Singhi Priority: Minor Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-14021.patch, error.png, fix.png !error.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13984) Add option to allow caller to know the heartbeat and scanner position when scanner timeout
[ https://issues.apache.org/jira/browse/HBASE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613193#comment-14613193 ] He Liangliang commented on HBASE-13984: --- [~jonathan.lawlor] Hold on before I make the update Add option to allow caller to know the heartbeat and scanner position when scanner timeout -- Key: HBASE-13984 URL: https://issues.apache.org/jira/browse/HBASE-13984 Project: HBase Issue Type: Improvement Components: Scanners Reporter: He Liangliang Assignee: He Liangliang Attachments: HBASE-13984-V1.diff HBASE-13090 introduced scanner heartbeat. However, there are still some limitations (see HBASE-13215). In some application, for example, an operation access hbase to scan table data, and there is strict limit that this call must return in a fixed interval. At the same time, this call is stateless, so the call must return the next position to continue the scan. This is typical use case for online applications. Based on this requirement, some improvements are proposed: 1. Allow client set a flag whether pass the heartbeat (a fake row) to the caller (via ResultScanner next) 2. Allow the client pass a timeout to the server, which can override the server side default value 3. When requested by the client, the server peek the next cell and return to the client in the heartbeat message -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13984) Add option to allow caller to know the heartbeat and scanner position when scanner timeout
[ https://issues.apache.org/jira/browse/HBASE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613207#comment-14613207 ] He Liangliang commented on HBASE-13984: --- [~stack] What is the expected behavior if the user want peek next and also accept partial? Since the kvs in the store heap and the joint heap are not in sort order, so what should the next be if the time limit is reached between scanning two heaps? Looks like next cell is applicable only for non-partial results. Add option to allow caller to know the heartbeat and scanner position when scanner timeout -- Key: HBASE-13984 URL: https://issues.apache.org/jira/browse/HBASE-13984 Project: HBase Issue Type: Improvement Components: Scanners Reporter: He Liangliang Assignee: He Liangliang Attachments: HBASE-13984-V1.diff HBASE-13090 introduced scanner heartbeat. However, there are still some limitations (see HBASE-13215). In some application, for example, an operation access hbase to scan table data, and there is strict limit that this call must return in a fixed interval. At the same time, this call is stateless, so the call must return the next position to continue the scan. This is typical use case for online applications. Based on this requirement, some improvements are proposed: 1. Allow client set a flag whether pass the heartbeat (a fake row) to the caller (via ResultScanner next) 2. Allow the client pass a timeout to the server, which can override the server side default value 3. When requested by the client, the server peek the next cell and return to the client in the heartbeat message -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-8642) [Snapshot] List and delete snapshot by table
[ https://issues.apache.org/jira/browse/HBASE-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613224#comment-14613224 ] Ashish Singhi commented on HBASE-8642: -- bq. some way for a user to know which set of snapshots got successfully deleted or failed. This is how it will be displayed on the console. {noformat} hbase(main):002:0 delete_table_snapshots 't', '.*' SNAPSHOT TABLE + CREATION TIME at (Fri Jul 03 19:40:17 +0530 2015) st (Fri Jul 03 19:40:19 +0530 2015) s1 t (Fri Jul 03 19:40:21 +0530 2015) Delete the above 3 snapshots (y/n)? y Successfully deleted snapshot: a Failed to delete snapshot: s, due to below exception, org.apache.hadoop.hbase.snapshot.SnapshotDoesNotExistException: org.apache.hadoop.hbase.snapshot.SnapshotDoesNotExistException: Snapshot 's' doesn't exist on the filesystem at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:282) at org.apache.hadoop.hbase.master.MasterRpcServices.deleteSnapshot(MasterRpcServices.java:464) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:49860) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2132) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) at java.lang.Thread.run(Thread.java:745) Successfully deleted snapshot: s1 0 row(s) in 0.0460 seconds {noformat} [Snapshot] List and delete snapshot by table Key: HBASE-8642 URL: https://issues.apache.org/jira/browse/HBASE-8642 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.98.0, 0.95.0, 0.95.1, 0.95.2 Reporter: Julian Zhou Assignee: Ashish Singhi Fix For: 2.0.0 Attachments: 8642-trunk-0.95-v0.patch, 8642-trunk-0.95-v1.patch, 8642-trunk-0.95-v2.patch, HBASE-8642-v1.patch, HBASE-8642-v2.patch, HBASE-8642-v3.patch, HBASE-8642-v4.patch, HBASE-8642.patch Support list and delete snapshots by table names. User scenario: A user wants to delete all the snapshots which were taken in January month for a table 't' where snapshot names starts with 'Jan'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13394) Failed to recreate a table when quota is enabled
[ https://issues.apache.org/jira/browse/HBASE-13394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13394: Fix Version/s: (was: 1.2.0) Failed to recreate a table when quota is enabled Key: HBASE-13394 URL: https://issues.apache.org/jira/browse/HBASE-13394 Project: HBase Issue Type: Bug Components: security Affects Versions: 2.0.0, 1.1.0 Reporter: Y. SREENIVASULU REDDY Assignee: Ashish Singhi Labels: quota Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13394-branch-1.1.patch, HBASE-13394-v1.patch, HBASE-13394-v2.patch, HBASE-13394-v3.patch, HBASE-13394-v4.patch, HBASE-13394.patch Steps to reproduce. Enable quota by setting {{hbase.quota.enabled}} to true Create a table say with name 't1', make sure the creation fails after adding this table entry into namespace quota cache. Now correct the failure and recreate the table 't1'. It fails with below exception. {noformat} 2015-04-02 14:23:53,729 | ERROR | FifoRpcScheduler.handler1-thread-23 | Unexpected throwable object | org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2154) java.lang.IllegalStateException: Table already in the cache t1 at org.apache.hadoop.hbase.namespace.NamespaceTableAndRegionInfo.addTable(NamespaceTableAndRegionInfo.java:97) at org.apache.hadoop.hbase.namespace.NamespaceStateManager.addTable(NamespaceStateManager.java:171) at org.apache.hadoop.hbase.namespace.NamespaceStateManager.checkAndUpdateNamespaceTableCount(NamespaceStateManager.java:147) at org.apache.hadoop.hbase.namespace.NamespaceAuditor.checkQuotaToCreateTable(NamespaceAuditor.java:76) at org.apache.hadoop.hbase.quotas.MasterQuotaManager.checkNamespaceTableAndRegionQuota(MasterQuotaManager.java:344) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1781) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1818) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:42273) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2116) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:74) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} P.S: Line numbers may not be in sync. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13090: Fix Version/s: (was: 1.2.0) Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: 13090-branch-1.addendum, HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13149) HBase MR is broken on Hadoop 2.5+ Yarn
[ https://issues.apache.org/jira/browse/HBASE-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13149: Fix Version/s: (was: 1.2.0) HBase MR is broken on Hadoop 2.5+ Yarn -- Key: HBASE-13149 URL: https://issues.apache.org/jira/browse/HBASE-13149 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 0.98.10.1 Reporter: Jerry He Assignee: Jerry He Priority: Blocker Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13149-0.98.patch, HBASE-13149-master.patch, jackson-core-asl-compat_report.html, jackson-jaxrs-compat_report.html, jackson-mapper-asl-compat_report.html, jackson-xc-compat_report.html, jackson_1.8_to_1.9_compat_report.html Running the server MR tools is not working on Yarn version 2.5+. Running org.apache.hadoop.hbase.mapreduce.Export: {noformat} Exception in thread main java.lang.NoSuchMethodError: org.codehaus.jackson.map.ObjectMapper.setSerializationInclusion(Lorg/codehaus/jackson/map/annotate/JsonSerialize$Inclusion;)Lorg/codehaus/jackson/map/ObjectMapper; at org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.configObjectMapper(YarnJacksonJaxbJsonProvider.java:59) at org.apache.hadoop.yarn.util.timeline.TimelineUtils.clinit(TimelineUtils.java:47) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:166) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapred.ResourceMgrDelegate.serviceInit(ResourceMgrDelegate.java:102) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapred.ResourceMgrDelegate.init(ResourceMgrDelegate.java:96) at org.apache.hadoop.mapred.YARNRunner.init(YARNRunner.java:112) at org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34) at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:95) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:82) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75) at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1266) at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1262) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapreduce.Job.connect(Job.java:1261) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1290) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314) at org.apache.hadoop.hbase.mapreduce.Export.main(Export.java:189) {noformat} The problem seems to be the jackson jar version. HADOOP-10104 updated jackson version to 1.9.13. YARN-2092 reported a problem as well. HBase is using jackson 1.8.8. This version of the jar in the classpath seem to cause the problem. Should we upgrade to jackson 1.9.13? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-8642) [Snapshot] List and delete snapshot by table
[ https://issues.apache.org/jira/browse/HBASE-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613222#comment-14613222 ] Ashish Singhi commented on HBASE-8642: -- Patch addressing Matteo comment from RB. Also ensured that there is some way for a user to know which set of snapshots got successfully deleted or failed. Please review. [Snapshot] List and delete snapshot by table Key: HBASE-8642 URL: https://issues.apache.org/jira/browse/HBASE-8642 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.98.0, 0.95.0, 0.95.1, 0.95.2 Reporter: Julian Zhou Assignee: Ashish Singhi Fix For: 2.0.0 Attachments: 8642-trunk-0.95-v0.patch, 8642-trunk-0.95-v1.patch, 8642-trunk-0.95-v2.patch, HBASE-8642-v1.patch, HBASE-8642-v2.patch, HBASE-8642-v3.patch, HBASE-8642-v4.patch, HBASE-8642.patch Support list and delete snapshots by table names. User scenario: A user wants to delete all the snapshots which were taken in January month for a table 't' where snapshot names starts with 'Jan'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-8642) [Snapshot] List and delete snapshot by table
[ https://issues.apache.org/jira/browse/HBASE-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Singhi updated HBASE-8642: - Attachment: HBASE-8642-v4.patch [Snapshot] List and delete snapshot by table Key: HBASE-8642 URL: https://issues.apache.org/jira/browse/HBASE-8642 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.98.0, 0.95.0, 0.95.1, 0.95.2 Reporter: Julian Zhou Assignee: Ashish Singhi Fix For: 2.0.0 Attachments: 8642-trunk-0.95-v0.patch, 8642-trunk-0.95-v1.patch, 8642-trunk-0.95-v2.patch, HBASE-8642-v1.patch, HBASE-8642-v2.patch, HBASE-8642-v3.patch, HBASE-8642-v4.patch, HBASE-8642.patch Support list and delete snapshots by table names. User scenario: A user wants to delete all the snapshots which were taken in January month for a table 't' where snapshot names starts with 'Jan'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13255) Bad grammar in RegionServer status page
[ https://issues.apache.org/jira/browse/HBASE-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13255: Fix Version/s: (was: 1.2.0) Bad grammar in RegionServer status page --- Key: HBASE-13255 URL: https://issues.apache.org/jira/browse/HBASE-13255 Project: HBase Issue Type: Improvement Components: monitoring Reporter: Josh Elser Assignee: Josh Elser Priority: Trivial Fix For: 2.0.0, 1.1.0 Attachments: 0001-HBASE-13255-Fix-grammar-in-Regions-description-parag.patch, HBASE-13255.patch Noticed on the rs-status page, the blurb under the Regions section could use some grammatical improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang reassigned HBASE-14017: -- Assignee: Stephen Yuan Jiang (was: Matteo Bertozzi) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.0.1 Reporter: Matteo Bertozzi Assignee: Stephen Yuan Jiang Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch, HBASE-14017.v1-branch1.1.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - the queue is empty and wlock is false Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: tryWrite() set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock - NPE when trying to get the queue {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-14017: --- Assignee: Matteo Bertozzi (was: Stephen Yuan Jiang) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.0.1 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch, HBASE-14017.v1-branch1.1.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - the queue is empty and wlock is false Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: tryWrite() set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock - NPE when trying to get the queue {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-14017: --- Status: Patch Available (was: In Progress) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 1.1.0.1, 2.0.0, 1.2.0 Reporter: Matteo Bertozzi Assignee: Stephen Yuan Jiang Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch, HBASE-14017.v1-branch1.1.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - the queue is empty and wlock is false Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: tryWrite() set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock - NPE when trying to get the queue {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613299#comment-14613299 ] Matteo Bertozzi commented on HBASE-14017: - you are looking at the specific table implementation, try to look more at the RunQueue object alone. that acquireDelete() is a ref-count of poor people. it has nothing to do with insert/update/delete. if we implement refcount for that object you'll have an unref() == 0, instead of acquireDelete(). the fact that we have read/write lock in the table is because we have read/write operation support, and since we don't have refcount in the base RunQueue object we can just implement acquireDelete() as a tryExclusiveLock(). but the acquireDelete() has no knowledge of delete operation in term of delete table. it is equivalent to a refcounted unref() == 0, how it is implemented is just a shortcut to use what we have already. Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch, HBASE-14017.v1-branch1.1.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - the queue is empty and wlock is false Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: tryWrite() set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock - NPE when trying to get the queue {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13387) Add ByteBufferedCell an extension to Cell
[ https://issues.apache.org/jira/browse/HBASE-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613330#comment-14613330 ] Anoop Sam John commented on HBASE-13387: Worked on a patch for this. Mainly in CellComparator and CellUtil matching APIs, added the ByteBufferedCell instance check. When the cell is instance of ByteBufferedCell , we will use getXXXByteBuffer() API rathen than getXXXArray(). (Pls note that HBASE-12345 added Unsafe based compare in ByteBufferUtil to compare BBs). ByteBufferedCell is created as an interface extending Cell. Doing perf test with PE (range scan with 10K range and all cells filtered out at server) I was seeing a 7% perf down. {code} public int compareRows(final Cell left, final Cell right) { if (left instanceof ByteBufferedCell right instanceof ByteBufferedCell) { return ByteBufferUtils.compareTo(((ByteBufferedCell) left).getRowByteBuffer(), ((ByteBufferedCell) left).getRowPositionInByteBuffer(), left.getRowLength(), ((ByteBufferedCell) right).getRowByteBuffer(), ((ByteBufferedCell) right).getRowPositionInByteBuffer(), right.getRowLength()); } if (left instanceof ByteBufferedCell) { return ByteBufferUtils.compareTo(((ByteBufferedCell) left).getRowByteBuffer(), ((ByteBufferedCell) left).getRowPositionInByteBuffer(), left.getRowLength(), right.getRowArray(), right.getRowOffset(), right.getRowLength()); } if (right instanceof ByteBufferedCell) { return -(ByteBufferUtils.compareTo(((ByteBufferedCell) right).getRowByteBuffer(), ((ByteBufferedCell) right).getRowPositionInByteBuffer(), right.getRowLength(), left.getRowArray(), left.getRowOffset(), left.getRowLength())); } return Bytes.compareTo(left.getRowArray(), left.getRowOffset(), left.getRowLength(), right.getRowArray(), right.getRowOffset(), right.getRowLength()); } {code} Basically the code is like this and we are still not making cell of type ByteBufferedCell. Then tested by changing ByteBufferedCell into an abstract class instead of Interface. The diff is quite visible. There is no perf degrade with this. Calling non overriding method via interface type seems less performing. Should be related to java run time optimization and inlining. Add ByteBufferedCell an extension to Cell - Key: HBASE-13387 URL: https://issues.apache.org/jira/browse/HBASE-13387 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Attachments: ByteBufferedCell.docx, WIP_HBASE-13387_V2.patch, WIP_ServerCell.patch This came in btw the discussion abt the parent Jira and recently Stack added as a comment on the E2E patch on the parent Jira. The idea is to add a new Interface 'ByteBufferedCell' in which we can add new buffer based getter APIs and getters for position in components in BB. We will keep this interface @InterfaceAudience.Private. When the Cell is backed by a DBB, we can create an Object implementing this new interface. The Comparators has to be aware abt this new Cell extension and has to use the BB based APIs rather than getXXXArray(). Also give util APIs in CellUtil to abstract the checks for new Cell type. (Like matchingXXX APIs, getValueAstype APIs etc) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12988) [Replication]Parallel apply edits across regions
[ https://issues.apache.org/jira/browse/HBASE-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613364#comment-14613364 ] Andrew Purtell commented on HBASE-12988: bq. If not don't worry I'll do it as part of checking the 0.98.14 RC (this change will be in it for sure). I'll check it with the new setting at 1 (default for 0.98) and 10... [Replication]Parallel apply edits across regions Key: HBASE-12988 URL: https://issues.apache.org/jira/browse/HBASE-12988 Project: HBase Issue Type: Improvement Components: Replication Reporter: hongyu bi Assignee: Lars Hofhansl Attachments: 12988-v2.txt, 12988-v3.txt, 12988-v4.txt, 12988.txt, HBASE-12988-0.98.patch, ParallelReplication-v2.txt we can apply edits to slave cluster in parallel on table-level to speed up replication . update : per conversation blow , it's better to apply edits on row-level in parallel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14002) Add --noReplicationSetup option to IntegrationTestReplication
[ https://issues.apache.org/jira/browse/HBASE-14002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613371#comment-14613371 ] Andrew Purtell commented on HBASE-14002: Ok, let me commit this and pick it back through to 0.98 Add --noReplicationSetup option to IntegrationTestReplication - Key: HBASE-14002 URL: https://issues.apache.org/jira/browse/HBASE-14002 Project: HBase Issue Type: Improvement Components: integration tests Reporter: Dima Spivak Assignee: Dima Spivak Attachments: HBASE-14002_master.patch IntegrationTestReplication has been flaky for me on pre-1.1 versions of HBase because of not-actually-synchronous operations in HBaseAdmin/Admin, which hamper its setupTablesAndReplication method. To get around this, I'd like to add a \-nrs/--noReplicationSetup option to the test to allow it to be run on clusters in which the necessary tables and replication have already been setup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13925) Use zookeeper multi to clear znodes in ZKProcedureUtil
[ https://issues.apache.org/jira/browse/HBASE-13925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613372#comment-14613372 ] Andrew Purtell commented on HBASE-13925: Ok, I will pick back HBASE-7847 and then commit this back through to 0.98. Use zookeeper multi to clear znodes in ZKProcedureUtil -- Key: HBASE-13925 URL: https://issues.apache.org/jira/browse/HBASE-13925 Project: HBase Issue Type: Improvement Affects Versions: 0.98.13 Reporter: Ashish Singhi Assignee: Ashish Singhi Fix For: 2.0.0, 0.98.14, 1.3.0 Attachments: HBASE-13925-v1-again.patch, HBASE-13925-v1.patch, HBASE-13925.patch Address the TODO in ZKProcedureUtil clearChildZNodes() and clearZNodes methods -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14007) Writing to table through MR should fail upfront if table does not exist/disabled
[ https://issues.apache.org/jira/browse/HBASE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14007: --- Status: Open (was: Patch Available) Writing to table through MR should fail upfront if table does not exist/disabled Key: HBASE-14007 URL: https://issues.apache.org/jira/browse/HBASE-14007 Project: HBase Issue Type: Improvement Components: mapreduce Affects Versions: 1.1.1 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Priority: Minor Labels: mapreduce Fix For: 2.0.0, 1.3.0 Attachments: HBASE-14007.patch TableOutputFormat.checkOutputSpecs() needs to be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12988) [Replication]Parallel apply edits across regions
[ https://issues.apache.org/jira/browse/HBASE-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613410#comment-14613410 ] Andrew Purtell commented on HBASE-12988: bq. Not sure I follow the part with the DEBUG message. Oh I just mean making what's going on with the new logic available in cluster logs at DEBUG level. No need if you don't think it worth it. [Replication]Parallel apply edits across regions Key: HBASE-12988 URL: https://issues.apache.org/jira/browse/HBASE-12988 Project: HBase Issue Type: Improvement Components: Replication Reporter: hongyu bi Assignee: Lars Hofhansl Attachments: 12988-v2.txt, 12988-v3.txt, 12988-v4.txt, 12988.txt, HBASE-12988-0.98.patch, ParallelReplication-v2.txt we can apply edits to slave cluster in parallel on table-level to speed up replication . update : per conversation blow , it's better to apply edits on row-level in parallel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-14017: --- Status: In Progress (was: Patch Available) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 1.1.0.1, 2.0.0, 1.2.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch, HBASE-14017.v1-branch1.1.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - the queue is empty and wlock is false Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: tryWrite() set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock - NPE when trying to get the queue {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-14017: --- Attachment: HBASE-14017.v1-branch1.1.patch Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.0.1 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch, HBASE-14017.v1-branch1.1.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - the queue is empty and wlock is false Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: tryWrite() set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock - NPE when trying to get the queue {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14020) Unsafe based optimized write in ByteBufferOutputStream
[ https://issues.apache.org/jira/browse/HBASE-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-14020: --- Attachment: HBASE-14020.patch Tested the patch with PE tool. ./hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred --addColumns=false --rows=100 scanRange1 20 I can see 4.2% better performance. Did not test JMH micro benchmark which will be much more IMO. Unsafe based optimized write in ByteBufferOutputStream -- Key: HBASE-14020 URL: https://issues.apache.org/jira/browse/HBASE-14020 Project: HBase Issue Type: Sub-task Components: Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-14020.patch We use this class to build the cellblock at RPC layer. The write operation is doing puts to java ByteBuffer which is having lot of overhead. Instead we can do Unsafe based copy to buffer operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14020) Unsafe based optimized write in ByteBufferOutputStream
[ https://issues.apache.org/jira/browse/HBASE-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-14020: --- Status: Patch Available (was: Open) Unsafe based optimized write in ByteBufferOutputStream -- Key: HBASE-14020 URL: https://issues.apache.org/jira/browse/HBASE-14020 Project: HBase Issue Type: Sub-task Components: Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-14020.patch We use this class to build the cellblock at RPC layer. The write operation is doing puts to java ByteBuffer which is having lot of overhead. Instead we can do Unsafe based copy to buffer operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13387) Add ByteBufferedCell an extension to Cell
[ https://issues.apache.org/jira/browse/HBASE-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613337#comment-14613337 ] stack commented on HBASE-13387: --- A little birdy (smile) told me that you did your perf testing using both JMH and PE, is that true [~anoop.hbase]? Nice work and interesting finding boss. Add ByteBufferedCell an extension to Cell - Key: HBASE-13387 URL: https://issues.apache.org/jira/browse/HBASE-13387 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Attachments: ByteBufferedCell.docx, WIP_HBASE-13387_V2.patch, WIP_ServerCell.patch This came in btw the discussion abt the parent Jira and recently Stack added as a comment on the E2E patch on the parent Jira. The idea is to add a new Interface 'ByteBufferedCell' in which we can add new buffer based getter APIs and getters for position in components in BB. We will keep this interface @InterfaceAudience.Private. When the Cell is backed by a DBB, we can create an Object implementing this new interface. The Comparators has to be aware abt this new Cell extension and has to use the BB based APIs rather than getXXXArray(). Also give util APIs in CellUtil to abstract the checks for new Cell type. (Like matchingXXX APIs, getValueAstype APIs etc) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13977) Convert getKey and related APIs to Cell
[ https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613334#comment-14613334 ] ramkrishna.s.vasudevan commented on HBASE-13977: Thanks for the reviews. The git server seems to be down. Will commit it once it is up. Convert getKey and related APIs to Cell --- Key: HBASE-13977 URL: https://issues.apache.org/jira/browse/HBASE-13977 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-13977.patch, HBASE-13977_1.patch, HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch, HBASE-13977_4.patch, HBASE-13977_5.patch During the course of changes for HBASE-11425 felt that more APIs can be converted to return Cell instead of BB like getKey, getLastKey. We can also rename the getKeyValue to getCell. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13977) Convert getKey and related APIs to Cell
[ https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-13977: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.0.0 Status: Resolved (was: Patch Available) The server just came back. Pushed to master. Thanks for the detailed reviews Anoop and Stack. Convert getKey and related APIs to Cell --- Key: HBASE-13977 URL: https://issues.apache.org/jira/browse/HBASE-13977 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-13977.patch, HBASE-13977_1.patch, HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch, HBASE-13977_4.patch, HBASE-13977_5.patch During the course of changes for HBASE-11425 felt that more APIs can be converted to return Cell instead of BB like getKey, getLastKey. We can also rename the getKeyValue to getCell. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12988) [Replication]Parallel apply edits across regions
[ https://issues.apache.org/jira/browse/HBASE-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613401#comment-14613401 ] Lars Hofhansl commented on HBASE-12988: --- Thanks [~apurtell]. Absolutely yes on the constant and hbase-defaults.xml. Not sure I follow the part with the DEBUG message. Future.get() will rethrow any error it encounters and then the outer catch will process those in the same way we do now. So in effect the error handling is unchanged. The only difference is when multiple tasks encounter an exception we only remember the last one... Is that part what you meant? I wanted to give the other tasks a chance to run and then only reschedule those that actually did encounter an error. [Replication]Parallel apply edits across regions Key: HBASE-12988 URL: https://issues.apache.org/jira/browse/HBASE-12988 Project: HBase Issue Type: Improvement Components: Replication Reporter: hongyu bi Assignee: Lars Hofhansl Attachments: 12988-v2.txt, 12988-v3.txt, 12988-v4.txt, 12988.txt, HBASE-12988-0.98.patch, ParallelReplication-v2.txt we can apply edits to slave cluster in parallel on table-level to speed up replication . update : per conversation blow , it's better to apply edits on row-level in parallel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-14017: --- Affects Version/s: (was: 1.1.0.1) 1.3.0 1.1.1 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch, HBASE-14017.v1-branch1.1.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - the queue is empty and wlock is false Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: tryWrite() set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock - NPE when trying to get the queue {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613291#comment-14613291 ] Ted Yu commented on HBASE-14017: Stephen: Jenkins machines are being upgraded. Hopefully the upgrade would be completed soon. Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch, HBASE-14017.v1-branch1.1.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - the queue is empty and wlock is false Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: tryWrite() set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock - NPE when trying to get the queue {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14020) Unsafe based optimized write in ByteBufferOutputStream
[ https://issues.apache.org/jira/browse/HBASE-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613339#comment-14613339 ] Ted Yu commented on HBASE-14020: Thanks for the fast response. +1 Unsafe based optimized write in ByteBufferOutputStream -- Key: HBASE-14020 URL: https://issues.apache.org/jira/browse/HBASE-14020 Project: HBase Issue Type: Sub-task Components: Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-14020.patch, HBASE-14020_v2.patch We use this class to build the cellblock at RPC layer. The write operation is doing puts to java ByteBuffer which is having lot of overhead. Instead we can do Unsafe based copy to buffer operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-13500) Deprecate KVComparator and move to CellComparator
[ https://issues.apache.org/jira/browse/HBASE-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-13500. Resolution: Fixed Hadoop Flags: Reviewed All sub-jiras are closed. If any thing is needed further, we can raise new JIRAS. For now closing this parent task. Deprecate KVComparator and move to CellComparator - Key: HBASE-13500 URL: https://issues.apache.org/jira/browse/HBASE-13500 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows
[ https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613400#comment-14613400 ] Ted Yu commented on HBASE-13702: TestImportTsv passed with patch v3. Nice job, Apekshit. ImportTsv: Add dry-run functionality and log bad rows - Key: HBASE-13702 URL: https://issues.apache.org/jira/browse/HBASE-13702 Project: HBase Issue Type: New Feature Reporter: Apekshit Sharma Assignee: Apekshit Sharma Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13702-branch-1-v2.patch, HBASE-13702-branch-1-v3.patch, HBASE-13702-branch-1.patch, HBASE-13702-v2.patch, HBASE-13702-v3.patch, HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch ImportTSV job skips bad records by default (keeps a count though). -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is encountered. To be easily able to determine which rows are corrupted in an input, rather than failing on one row at a time seems like a good feature to have. Moreover, there should be 'dry-run' functionality in such kinds of tools, which can essentially does a quick run of tool without making any changes but reporting any errors/warnings and success/failure. To identify corrupted rows, simply logging them should be enough. In worst case, all rows will be logged and size of logs will be same as input size, which seems fine. However, user might have to do some work figuring out where the logs. Is there some link we can show to the user when the tool starts which can help them with that? For the dry run, we can simply use if-else to skip over writing out KVs, and any other mutations, if present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613282#comment-14613282 ] Stephen Yuan Jiang commented on HBASE-14017: I am fine with your approach, not big deal (though I disagree - a little bit over-engineering - Read (shared) /Write (exclusive) lock is standard way of practice, we don't want to expand to Insert/Update/Delete lock so that we can have flexibility to implement different approach in the future - all of them just Write lock). Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.0.1 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - the queue is empty and wlock is false Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: tryWrite() set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock - NPE when trying to get the queue {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14015) Allow setting a richer state value when toString a pv2
[ https://issues.apache.org/jira/browse/HBASE-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14015: -- Attachment: 14015.addendum.to.fix.compile.issue.on.branch-1.branch-1.2.txt Addendum I applied to fix compile issue found by [~ashish singhi] (Thanks Ashish). Applied to branch-1 and branch-1.2. Allow setting a richer state value when toString a pv2 -- Key: HBASE-14015 URL: https://issues.apache.org/jira/browse/HBASE-14015 Project: HBase Issue Type: Improvement Components: proc-v2 Reporter: stack Assignee: stack Priority: Minor Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 0001-HBASE-14015-Allow-setting-a-richer-state-value-when-.patch, 14015.addendum.to.fix.compile.issue.on.branch-1.branch-1.2.txt Debugging, my procedure after a crash was loaded out of the store and its state was RUNNING. It would help if I knew in which of the states of a StateMachineProcedure it was going to start RUNNING at. Chatting w/ Matteo, he suggested allowing Procedures customize the String. Here is patch that makes it so StateMachineProcedure will now print out the base state -- RUNNING, FINISHED -- followed by a ':' and then the StateMachineProcedure state: e.g. SimpleStateMachineProcedure state=RUNNABLE:SERVER_CRASH_ASSIGN -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14020) Unsafe based optimized write in ByteBufferOutputStream
[ https://issues.apache.org/jira/browse/HBASE-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-14020: --- Attachment: HBASE-14020_v2.patch Addressing Ted's comments. Thanks Ted. Unsafe based optimized write in ByteBufferOutputStream -- Key: HBASE-14020 URL: https://issues.apache.org/jira/browse/HBASE-14020 Project: HBase Issue Type: Sub-task Components: Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-14020.patch, HBASE-14020_v2.patch We use this class to build the cellblock at RPC layer. The write operation is doing puts to java ByteBuffer which is having lot of overhead. Instead we can do Unsafe based copy to buffer operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-13639) SyncTable - rsync for HBase tables
[ https://issues.apache.org/jira/browse/HBASE-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-13639. Resolution: Fixed Re-resolving after commit of Hadoop 1 build fix addendum. SyncTable - rsync for HBase tables -- Key: HBASE-13639 URL: https://issues.apache.org/jira/browse/HBASE-13639 Project: HBase Issue Type: New Feature Reporter: Dave Latham Assignee: Dave Latham Fix For: 2.0.0, 0.98.14, 1.2.0 Attachments: HBASE-13639-0.98-addendum-hadoop-1.patch, HBASE-13639-0.98.patch, HBASE-13639-v1.patch, HBASE-13639-v2.patch, HBASE-13639-v3-0.98.patch, HBASE-13639-v3.patch, HBASE-13639.patch Given HBase tables in remote clusters with similar but not identical data, efficiently update a target table such that the data in question is identical to a source table. Efficiency in this context means using far less network traffic than would be required to ship all the data from one cluster to the other. Takes inspiration from rsync. Design doc: https://docs.google.com/document/d/1-2c9kJEWNrXf5V4q_wBcoIXfdchN7Pxvxv1IO6PW0-U/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14020) Unsafe based optimized write in ByteBufferOutputStream
[ https://issues.apache.org/jira/browse/HBASE-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613349#comment-14613349 ] Anoop Sam John commented on HBASE-14020: [~stack] are you fine with this patch? Seems build machines are not yet back.. Will submit for QA run once it is back. Unsafe based optimized write in ByteBufferOutputStream -- Key: HBASE-14020 URL: https://issues.apache.org/jira/browse/HBASE-14020 Project: HBase Issue Type: Sub-task Components: Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-14020.patch, HBASE-14020_v2.patch We use this class to build the cellblock at RPC layer. The write operation is doing puts to java ByteBuffer which is having lot of overhead. Instead we can do Unsafe based copy to buffer operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14020) Unsafe based optimized write in ByteBufferOutputStream
[ https://issues.apache.org/jira/browse/HBASE-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613328#comment-14613328 ] Ted Yu commented on HBASE-14020: lgtm {code} 359* for {@code in}. The position and limit of the {@code in} buffer to be set properly by callee. {code} 'to be set properly by callee' - 'should be set properly by caller' {code} 656* @param offset offset in the ByteBuffer {code} I don't see offset in putInt(ByteBuffer buffer, int val) {code} 661 UnsafeAccess.putInt(buffer, buffer.position(), val); 662 buffer.position(buffer.position() + Bytes.SIZEOF_INT); {code} Looks like you can utilize the return value of UnsafeAccess.putInt() in the buffer.position() call. Unsafe based optimized write in ByteBufferOutputStream -- Key: HBASE-14020 URL: https://issues.apache.org/jira/browse/HBASE-14020 Project: HBase Issue Type: Sub-task Components: Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-14020.patch We use this class to build the cellblock at RPC layer. The write operation is doing puts to java ByteBuffer which is having lot of overhead. Instead we can do Unsafe based copy to buffer operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13387) Add ByteBufferedCell an extension to Cell
[ https://issues.apache.org/jira/browse/HBASE-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613343#comment-14613343 ] Anoop Sam John commented on HBASE-13387: Yes boss.. I have done the perf test with JMH as well as PE.. The JMH test suit I will attach to this Jira ... Add ByteBufferedCell an extension to Cell - Key: HBASE-13387 URL: https://issues.apache.org/jira/browse/HBASE-13387 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Attachments: ByteBufferedCell.docx, WIP_HBASE-13387_V2.patch, WIP_ServerCell.patch This came in btw the discussion abt the parent Jira and recently Stack added as a comment on the E2E patch on the parent Jira. The idea is to add a new Interface 'ByteBufferedCell' in which we can add new buffer based getter APIs and getters for position in components in BB. We will keep this interface @InterfaceAudience.Private. When the Cell is backed by a DBB, we can create an Object implementing this new interface. The Comparators has to be aware abt this new Cell extension and has to use the BB based APIs rather than getXXXArray(). Also give util APIs in CellUtil to abstract the checks for new Cell type. (Like matchingXXX APIs, getValueAstype APIs etc) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613288#comment-14613288 ] Stephen Yuan Jiang commented on HBASE-14017: I am not sure why Jenkins not running on the patch in master. Attach the patch for branch-1.1 (I am tested with the new added UT; without patch, repro consistently; with patch, problem disappeared) and try to re-trigger the Jenkins job. Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.0.1 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - the queue is empty and wlock is false Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: tryWrite() set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock - NPE when trying to get the queue {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13925) Use zookeeper multi to clear znodes in ZKProcedureUtil
[ https://issues.apache.org/jira/browse/HBASE-13925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613375#comment-14613375 ] Ashish Singhi commented on HBASE-13925: --- Thanks Andrew. Just let me know if I can be any helpful here. Use zookeeper multi to clear znodes in ZKProcedureUtil -- Key: HBASE-13925 URL: https://issues.apache.org/jira/browse/HBASE-13925 Project: HBase Issue Type: Improvement Affects Versions: 0.98.13 Reporter: Ashish Singhi Assignee: Ashish Singhi Fix For: 2.0.0, 0.98.14, 1.3.0 Attachments: HBASE-13925-v1-again.patch, HBASE-13925-v1.patch, HBASE-13925.patch Address the TODO in ZKProcedureUtil clearChildZNodes() and clearZNodes methods -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-13387) Add ByteBufferedCell an extension to Cell
[ https://issues.apache.org/jira/browse/HBASE-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613330#comment-14613330 ] Anoop Sam John edited comment on HBASE-13387 at 7/3/15 5:42 PM: Worked on a patch for this. Mainly in CellComparator and CellUtil matching APIs, added the ByteBufferedCell instance check. When the cell is instance of ByteBufferedCell , we will use getXXXByteBuffer() API rathen than getXXXArray(). (Pls note that HBASE-12345 added Unsafe based compare in ByteBufferUtil to compare BBs). ByteBufferedCell is created as an interface extending Cell. Doing perf test with PE (range scan with 10K range and all cells filtered out at server) I was seeing a 7% perf down. ( This is with this patch alone which add some extra burden of below kind of type check in our compare methods) {code} public int compareRows(final Cell left, final Cell right) { if (left instanceof ByteBufferedCell right instanceof ByteBufferedCell) { return ByteBufferUtils.compareTo(((ByteBufferedCell) left).getRowByteBuffer(), ((ByteBufferedCell) left).getRowPositionInByteBuffer(), left.getRowLength(), ((ByteBufferedCell) right).getRowByteBuffer(), ((ByteBufferedCell) right).getRowPositionInByteBuffer(), right.getRowLength()); } if (left instanceof ByteBufferedCell) { return ByteBufferUtils.compareTo(((ByteBufferedCell) left).getRowByteBuffer(), ((ByteBufferedCell) left).getRowPositionInByteBuffer(), left.getRowLength(), right.getRowArray(), right.getRowOffset(), right.getRowLength()); } if (right instanceof ByteBufferedCell) { return -(ByteBufferUtils.compareTo(((ByteBufferedCell) right).getRowByteBuffer(), ((ByteBufferedCell) right).getRowPositionInByteBuffer(), right.getRowLength(), left.getRowArray(), left.getRowOffset(), left.getRowLength())); } return Bytes.compareTo(left.getRowArray(), left.getRowOffset(), left.getRowLength(), right.getRowArray(), right.getRowOffset(), right.getRowLength()); } {code} Basically the code is like this and we are still not making cell of type ByteBufferedCell. Then tested by changing ByteBufferedCell into an abstract class instead of Interface. The diff is quite visible. There is no perf degrade with this. Calling non overriding method via interface type seems less performing. Should be related to java run time optimization and inlining. was (Author: anoop.hbase): Worked on a patch for this. Mainly in CellComparator and CellUtil matching APIs, added the ByteBufferedCell instance check. When the cell is instance of ByteBufferedCell , we will use getXXXByteBuffer() API rathen than getXXXArray(). (Pls note that HBASE-12345 added Unsafe based compare in ByteBufferUtil to compare BBs). ByteBufferedCell is created as an interface extending Cell. Doing perf test with PE (range scan with 10K range and all cells filtered out at server) I was seeing a 7% perf down. {code} public int compareRows(final Cell left, final Cell right) { if (left instanceof ByteBufferedCell right instanceof ByteBufferedCell) { return ByteBufferUtils.compareTo(((ByteBufferedCell) left).getRowByteBuffer(), ((ByteBufferedCell) left).getRowPositionInByteBuffer(), left.getRowLength(), ((ByteBufferedCell) right).getRowByteBuffer(), ((ByteBufferedCell) right).getRowPositionInByteBuffer(), right.getRowLength()); } if (left instanceof ByteBufferedCell) { return ByteBufferUtils.compareTo(((ByteBufferedCell) left).getRowByteBuffer(), ((ByteBufferedCell) left).getRowPositionInByteBuffer(), left.getRowLength(), right.getRowArray(), right.getRowOffset(), right.getRowLength()); } if (right instanceof ByteBufferedCell) { return -(ByteBufferUtils.compareTo(((ByteBufferedCell) right).getRowByteBuffer(), ((ByteBufferedCell) right).getRowPositionInByteBuffer(), right.getRowLength(), left.getRowArray(), left.getRowOffset(), left.getRowLength())); } return Bytes.compareTo(left.getRowArray(), left.getRowOffset(), left.getRowLength(), right.getRowArray(), right.getRowOffset(), right.getRowLength()); } {code} Basically the code is like this and we are still not making cell of type ByteBufferedCell. Then tested by changing ByteBufferedCell into an abstract class instead of Interface. The diff is quite visible. There is no perf degrade with this. Calling non overriding method via interface type seems less performing. Should be related to java run time optimization and inlining. Add ByteBufferedCell an extension to Cell - Key: HBASE-13387 URL: https://issues.apache.org/jira/browse/HBASE-13387 Project: HBase Issue Type: Sub-task
[jira] [Resolved] (HBASE-14018) RegionServer is aborted when flushing memstore.
[ https://issues.apache.org/jira/browse/HBASE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-14018. Resolution: Invalid Please mail u...@hbase.apache.org for troubleshooting advice and help. This is the project dev tracker. Thanks! RegionServer is aborted when flushing memstore. --- Key: HBASE-14018 URL: https://issues.apache.org/jira/browse/HBASE-14018 Project: HBase Issue Type: Bug Components: hadoop2, hbase Affects Versions: 1.0.1.1 Environment: CentOS x64 Server Reporter: Dinh Duong Mai Attachments: hbase-hadoop-master-node1.vmcluster.log, hbase-hadoop-regionserver-node1.vmcluster.log, hbase-hadoop-zookeeper-node1.vmcluster.log + Pseudo-distributed Hadoop (2.6.0), ZK_HBASE_MANAGE = true (1 master, 1 regionserver). + Put data to OpenTSDB, 1000 records / s, for 2000 seconds. + RegionServer is aborted. === RegionServer logs === 2015-07-03 16:37:37,332 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=371.27 KB, freeSize=181.41 MB, max=181.78 MB, blockCount=5, accesses=1623, hits=172, hitRatio=10.60%, , cachingAccesses=177, cachingHits=151, cachingHitsRatio=85.31%, evictions=1139, evicted=21, evictedPerRun=0.018437225371599197 2015-07-03 16:37:37,898 INFO [node1:16040Replication Statistics #0] regionserver.Replication: Normal source for cluster 1: Total replicated edits: 2744, currently replicating from: hdfs://node1.vmcluster:9000/hbase/WALs/node1.vmcluster,16040,1435897652505/node1.vmcluster%2C16040%2C1435897652505.default.1435908458590 at position: 19207814 2015-07-03 16:42:37,331 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=371.27 KB, freeSize=181.41 MB, max=181.78 MB, blockCount=5, accesses=1624, hits=173, hitRatio=10.65%, , cachingAccesses=178, cachingHits=152, cachingHitsRatio=85.39%, evictions=1169, evicted=21, evictedPerRun=0.01796407252550125 2015-07-03 16:42:37,899 INFO [node1:16040Replication Statistics #0] regionserver.Replication: Normal source for cluster 1: Total replicated edits: 3049, currently replicating from: hdfs://node1.vmcluster:9000/hbase/WALs/node1.vmcluster,16040,1435897652505/node1.vmcluster%2C16040%2C1435897652505.default.1435908458590 at position: 33026416 2015-07-03 16:43:27,217 INFO [MemStoreFlusher.1] regionserver.HRegion: Started memstore flush for tsdb,,1435897759785.2d49cd81fb6513f51af58bd0394c4e0d., current region memstore size 128.05 MB 2015-07-03 16:43:27,899 FATAL [MemStoreFlusher.1] regionserver.HRegionServer: ABORTING region server node1.vmcluster,16040,1435897652505: Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1435897759785.2d49cd81fb6513f51af58bd0394c4e0d. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2001) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1772) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1704) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743 at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478) at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263) at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1955) ... 7 more 2015-07-03 16:43:27,901 FATAL [MemStoreFlusher.1]
[jira] [Commented] (HBASE-13500) Deprecate KVComparator and move to CellComparator
[ https://issues.apache.org/jira/browse/HBASE-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613345#comment-14613345 ] Anoop Sam John commented on HBASE-13500: Great.. This was a nice clean up.. We changed all places to cell still this area was having KV style handling.. Thanks.. Deprecate KVComparator and move to CellComparator - Key: HBASE-13500 URL: https://issues.apache.org/jira/browse/HBASE-13500 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13861) BucketCacheTmpl.jamon has wrong bucket free and used labels
[ https://issues.apache.org/jira/browse/HBASE-13861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613344#comment-14613344 ] Andrew Purtell commented on HBASE-13861: bq. Isn't this basically the /jmx endpoint? I would hope so. BucketCacheTmpl.jamon has wrong bucket free and used labels --- Key: HBASE-13861 URL: https://issues.apache.org/jira/browse/HBASE-13861 Project: HBase Issue Type: Bug Components: regionserver, UI Affects Versions: 1.1.0 Reporter: Lars George Assignee: Matt Warhaftig Labels: beginner Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2 Attachments: hbase-13861-v1.patch See this from the template, and note the label and actual values for the last two columns. {noformat} table class=table table-striped tr thBucket Offset/th thAllocation Size/th thFree Count/th thUsed Count/th /tr %for Bucket bucket: buckets % tr td% bucket.getBaseOffset() %/td td% bucket.getItemAllocationSize() %/td td% bucket.getFreeBytes() %/td td% bucket.getUsedBytes() %/td /tr {noformat} They are labeled counts but are bytes, duh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12988) [Replication]Parallel apply edits across regions
[ https://issues.apache.org/jira/browse/HBASE-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613361#comment-14613361 ] Andrew Purtell commented on HBASE-12988: bq. In 1.1+ I plan to default the number of threads to 10. Before that (1.0 and 0.98) to 1, so that the behavior and performance characteristics do not change. Sounds reasonable. Can you pull this out into a constant and javadoc it? Should go into hbase-defaults.xml too I think. {quote} {code} replication.source.maxthreads {code} {quote} What do you think about DEBUG level logging where we are submitting futures and then handling errors here? {quote} {code} +ListFutureInteger futures = new ArrayListFutureInteger(entryLists.size()); +for (int i=0; ientryLists.size(); i++) { + if (!entryLists.get(i).isEmpty()) { +futures.add(exec.submit(new Replicator(entryLists.get(i), i))); + } +} +IOException iox = null; +for (FutureInteger f : futures) { + try { +// wait for all futures remove successful parts +entryLists.remove(f.get()); + } catch (InterruptedException ie) { +iox = new IOException(ie); + } catch (ExecutionException ee) { +// cause must be an IOException +iox = (IOException)ee.getCause(); + } +} +if (iox != null) { + // if we had any exception, try again + throw iox; +} {code} {quote} Have you by chance had a chance to run ITR with the latest changes applied? It's a bit of a PITA to set up, you'll need to have two single node clusters running on the same node as minimum. If not don't worry I'll do it as part of checking the 0.98.14 RC (this change will be in it for sure). [Replication]Parallel apply edits across regions Key: HBASE-12988 URL: https://issues.apache.org/jira/browse/HBASE-12988 Project: HBase Issue Type: Improvement Components: Replication Reporter: hongyu bi Assignee: Lars Hofhansl Attachments: 12988-v2.txt, 12988-v3.txt, 12988-v4.txt, 12988.txt, HBASE-12988-0.98.patch, ParallelReplication-v2.txt we can apply edits to slave cluster in parallel on table-level to speed up replication . update : per conversation blow , it's better to apply edits on row-level in parallel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12213) HFileBlock backed by Array of ByteBuffers
[ https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613367#comment-14613367 ] Ted Yu commented on HBASE-12213: For MultiByteBufferInputStream : {code} +if (len = 0) { + return 0; +} {code} The above check can be lifted to beginning of method. {code} + public long skip(long n) { +long k = Math.min(n, available()); +if (k 0) { + k = 0; +} {code} When k is 0, we can directly return from method, right ? HFileBlock backed by Array of ByteBuffers - Key: HBASE-12213 URL: https://issues.apache.org/jira/browse/HBASE-12213 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: ramkrishna.s.vasudevan Attachments: HBASE-12213_1.patch, HBASE-12213_2.patch In L2 cache (offheap) an HFile block might have been cached into multiple chunks of buffers. If HFileBlock need single BB, we will end up in recreation of bigger BB and copying. Instead we can make HFileBlock to serve data from an array of BBs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13978) Variable never assigned in SimpleTotalOrderPartitioner.getPartition()
[ https://issues.apache.org/jira/browse/HBASE-13978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-13978: --- Fix Version/s: 0.98.14 Picked to 0.98 Variable never assigned in SimpleTotalOrderPartitioner.getPartition() -- Key: HBASE-13978 URL: https://issues.apache.org/jira/browse/HBASE-13978 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 1.1.0.1 Reporter: Lars George Assignee: Bhupendra Kumar Jain Labels: beginner Fix For: 2.0.0, 0.98.14, 1.2.0 Attachments: 0001-HBASE-13978-Variable-never-assigned-in-SimpleTotalOr.patch See https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/SimpleTotalOrderPartitioner.java#L104, which has an {{if}} statement that tries to limit the code to run only once, but since it does not assign {{this.lastReduces}} it will always trigger and recompute the splits (and log them). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13498) Add more docs and a basic check for storage policy handling
[ https://issues.apache.org/jira/browse/HBASE-13498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13498: Fix Version/s: (was: 1.2.0) Add more docs and a basic check for storage policy handling --- Key: HBASE-13498 URL: https://issues.apache.org/jira/browse/HBASE-13498 Project: HBase Issue Type: Sub-task Components: wal Reporter: Sean Busbey Assignee: Sean Busbey Priority: Minor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13498.1.patch.txt, HBASE-13498.2.patch.txt some minor clean up: * make sure our javadocs contain enough info for someone unfamiliar with HDFS storage policies to get started. * add a basic test that verifies things happily continue on non-supported versions, or with non-supported policies * clarify some log messages -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13490) foreground daemon start re-executes ulimit output
[ https://issues.apache.org/jira/browse/HBASE-13490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13490: Fix Version/s: (was: 1.2.0) foreground daemon start re-executes ulimit output - Key: HBASE-13490 URL: https://issues.apache.org/jira/browse/HBASE-13490 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 2.0.0, 1.1.0, 0.98.11 Reporter: Y. SREENIVASULU REDDY Assignee: Y. SREENIVASULU REDDY Priority: Minor Fix For: 2.0.0, 1.1.0, 0.98.13 Attachments: HBASE-13490.patch Linux Command execution is failing while starting HBase processes using hbase-daemon.sh file While starting any HBase process (HMaster or Regionserver) ulimit command execution is failing. {code} echo `date` Starting $command on `hostname` ${HBASE_LOGLOG} `ulimit -a` $HBASE_LOGLOG 21 {code} Log message is follows. {noformat} Thu Apr 16 19:24:25 IST 2015 Starting regionserver on HOST-10 /opt/hdfsdata/HA/install/hbase/regionserver/bin/hbase-daemon.sh: line 207: core: command not found {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13499) AsyncRpcClient test cases failure in powerpc
[ https://issues.apache.org/jira/browse/HBASE-13499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13499: Fix Version/s: (was: 1.2.0) AsyncRpcClient test cases failure in powerpc Key: HBASE-13499 URL: https://issues.apache.org/jira/browse/HBASE-13499 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: sangamesh Assignee: Duo Zhang Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13499.patch The new AsyncRpcClient feature added through the jira defect HBASE-12684 causing some test cases failures in powerpc64 environment. I am testing it in master branch. Looks like the version of netty (4.0.23) doesn't provide a support for non amd64 platforms and suggested to use pure java netty Here is the discussion on that https://github.com/aphyr/riemann/pull/508 So new Async test cases will fail in ppc64 and other non amd64 platforms too. Here is the output of the error. Running org.apache.hadoop.hbase.ipc.TestAsyncIPC Tests run: 24, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 2.802 sec FAILURE! - in org.apache.hadoop.hbase.ipc.TestAsyncIPC testRTEDuringAsyncConnectionSetup[3](org.apache.hadoop.hbase.ipc.TestAsyncIPC) Time elapsed: 0.048 sec ERROR! java.lang.UnsatisfiedLinkError: /tmp/libnetty-transport-native-epoll4286512618055650929.so: /tmp/libnetty-transport-native-epoll4286512618055650929.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a Power PC 64-bit platform) at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1965) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13491) Issue in FuzzyRowFilter#getNextForFuzzyRule
[ https://issues.apache.org/jira/browse/HBASE-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13491: Fix Version/s: (was: 1.2.0) Issue in FuzzyRowFilter#getNextForFuzzyRule --- Key: HBASE-13491 URL: https://issues.apache.org/jira/browse/HBASE-13491 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 1.0.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.13 Attachments: HBASE-13491-branch-1.1.patch, HBASE-13491-branch-1.1.patch, HBASE-13491.patch {code} for (int i = 0; i result.length; i++) { if (i = fuzzyKeyMeta.length || fuzzyKeyMeta[i] == 1) { result[i] = row[offset + i]; if (!order.isMax(row[i])) { // this is non-fixed position and is not at max value, hence we can increase it toInc = i; } } {code} See we take row bytes with out considering the row offset. The test cases are passing as we pass 0 offset row bytes. Change in the test will reveal the bug. Came across this when I was working on HBASE-11425 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13515) Handle FileNotFoundException in region replica replay for flush/compaction events
[ https://issues.apache.org/jira/browse/HBASE-13515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13515: Fix Version/s: (was: 1.2.0) Handle FileNotFoundException in region replica replay for flush/compaction events - Key: HBASE-13515 URL: https://issues.apache.org/jira/browse/HBASE-13515 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 2.0.0, 1.1.0 Attachments: hbase-13515_v1.patch, hbase-13515_v1.patch I had this patch laying around that somehow dropped from my plate. We should skip replaying compaction / flush and region open event markers if the files (from flush or compaction) can no longer be found from the secondary. If we do not skip, the replay will be retried forever, effectively blocking the replication further. Bulk load already does this, we just need to do it for flush / compaction and region open events as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13516) Increase PermSize to 128MB
[ https://issues.apache.org/jira/browse/HBASE-13516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13516: Fix Version/s: (was: 1.2.0) Increase PermSize to 128MB -- Key: HBASE-13516 URL: https://issues.apache.org/jira/browse/HBASE-13516 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 2.0.0, 1.1.0 Attachments: hbase-13516_v1.patch, hbase-13516_v2.patch HBase uses ~40MB, and with Phoenix we use ~56MB of Perm space out of 64MB by default. Every Filter and Coprocessor increases that. Running out of perm space triggers a stop the world full GC of the entire heap. We have seen this in misconfigured cluster. Should we default to {{-XX:PermSize=128m -XX:MaxPermSize=128m}} out of the box as a convenience for users? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13514) Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout
[ https://issues.apache.org/jira/browse/HBASE-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13514: Fix Version/s: (was: 1.2.0) Fix test failures in TestScannerHeartbeatMessages caused by incorrect setting of hbase.rpc.timeout -- Key: HBASE-13514 URL: https://issues.apache.org/jira/browse/HBASE-13514 Project: HBase Issue Type: Sub-task Affects Versions: 1.1.0, 1.2.0 Reporter: Jonathan Lawlor Assignee: Jonathan Lawlor Priority: Minor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13514-branch-1.1.patch, HBASE-13514-branch-1.patch, HBASE-13514.patch The test inside TestScannerHeartbeatMessages is failing because the configured value of hbase.rpc.timeout cannot be less than 2 seconds in branch-1 and branch-1.1 but the test expects that it can be set to 0.5 seconds. This is because of the field MIN_RPC_TIMEOUT in {{RpcRetryingCaller}} which exists in branch-1 and branch-1.1 but is no longer in master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13518) Typo in hbase.hconnection.meta.lookup.threads.core parameter
[ https://issues.apache.org/jira/browse/HBASE-13518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13518: Fix Version/s: (was: 1.2.0) Typo in hbase.hconnection.meta.lookup.threads.core parameter Key: HBASE-13518 URL: https://issues.apache.org/jira/browse/HBASE-13518 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: Devaraj Das Fix For: 2.0.0, 1.1.0 Attachments: 13518-1.branch-1.patch, 13518-1.txt A possible typo coming from patch in HBASE-13036. I think we want {{hbase.hconnection.meta.lookup.threads.core}}, not {{hbase.hconnection.meta.lookup.threads.max.core}} to be in line with the regular thread pool configuration. {code} //To start with, threads.max.core threads can hit the meta (including replicas). //After that, requests will get queued up in the passed queue, and only after //the queue is full, a new thread will be started this.metaLookupPool = getThreadPool( conf.getInt(hbase.hconnection.meta.lookup.threads.max, 128), conf.getInt(hbase.hconnection.meta.lookup.threads.max.core, 10), {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13517) Publish a client artifact with shaded dependencies
[ https://issues.apache.org/jira/browse/HBASE-13517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13517: Fix Version/s: (was: 1.2.0) Publish a client artifact with shaded dependencies -- Key: HBASE-13517 URL: https://issues.apache.org/jira/browse/HBASE-13517 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 1.1.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13517-v1.patch, HBASE-13517-v2.patch, HBASE-13517-v3.patch, HBASE-13517.patch Guava's moved on. Hadoop has not. Jackson moves whenever it feels like it. Protobuf moves with breaking point changes. While shading all of the time would break people that require the transitive dependencies for MR or other things. Lets provide an artifact with our dependencies shaded. Then users can have the choice to use the shaded version or the non-shaded version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot
[ https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613517#comment-14613517 ] Lars Hofhansl commented on HBASE-7912: -- [~eclark] had mentioned something he did with incremental backups using snapshots once. HBase Backup/Restore Based on HBase Snapshot Key: HBASE-7912 URL: https://issues.apache.org/jira/browse/HBASE-7912 Project: HBase Issue Type: Sub-task Reporter: Richard Ding Assignee: Vladimir Rodionov Labels: backup Fix For: 2.0.0 Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf Finally, we completed the implementation of our backup/restore solution, and would like to share with community through this jira. We are leveraging existing hbase snapshot feature, and provide a general solution to common users. Our full backup is using snapshot to capture metadata locally and using exportsnapshot to move data to another cluster; the incremental backup is using offline-WALplayer to backup HLogs; we also leverage global distribution rolllog and flush to improve performance; other added-on values such as convert, merge, progress report, and CLI commands. So that a common user can backup hbase data without in-depth knowledge of hbase. Our solution also contains some usability features for enterprise users. The detail design document and CLI command will be attached in this jira. We plan to use 10~12 subtasks to share each of the following features, and document the detail implement in the subtasks: * *Full Backup* : provide local and remote back/restore for a list of tables * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental backup) * *distributed* Logroll and distributed flush * Backup *Manifest* and history * *Incremental* backup: to build on top of full backup as daily/weekly backup * *Convert* incremental backup WAL files into hfiles * *Merge* several backup images into one(like merge weekly into monthly) * *add and remove* table to and from Backup image * *Cancel* a backup process * backup progress *status* * full backup based on *existing snapshot* *-* *Below is the original description, to keep here as the history for the design and discussion back in 2013* There have been attempts in the past to come up with a viable HBase backup/restore solution (e.g., HBASE-4618). Recently, there are many advancements and new features in HBase, for example, FileLink, Snapshot, and Distributed Barrier Procedure. This is a proposal for a backup/restore solution that utilizes these new features to achieve better performance and consistency. A common practice of backup and restore in database is to first take full baseline backup, and then periodically take incremental backup that capture the changes since the full baseline backup. HBase cluster can store massive amount data. Combination of full backups with incremental backups has tremendous benefit for HBase as well. The following is a typical scenario for full and incremental backup. # The user takes a full backup of a table or a set of tables in HBase. # The user schedules periodical incremental backups to capture the changes from the full backup, or from last incremental backup. # The user needs to restore table data to a past point of time. # The full backup is restored to the table(s) or to different table name(s). Then the incremental backups that are up to the desired point in time are applied on top of the full backup. We would support the following key features and capabilities. * Full backup uses HBase snapshot to capture HFiles. * Use HBase WALs to capture incremental changes, but we use bulk load of HFiles for fast incremental restore. * Support single table or a set of tables, and column family level backup and restore. * Restore to different table names. * Support adding additional tables or CF to backup set without interruption of incremental backup schedule. * Support rollup/combining of incremental backups into longer period and bigger incremental backups. * Unified command line interface for all the above. The solution will support HBase backup to FileSystem, either on the same cluster or across clusters. It has the flexibility to support backup to other devices and servers in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13446) Add docs warning about missing data for downstream on versions prior to HBASE-13262
[ https://issues.apache.org/jira/browse/HBASE-13446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-13446: -- Assignee: (was: Lars Hofhansl) Add docs warning about missing data for downstream on versions prior to HBASE-13262 --- Key: HBASE-13446 URL: https://issues.apache.org/jira/browse/HBASE-13446 Project: HBase Issue Type: Task Components: documentation Affects Versions: 0.98.0, 1.0.0 Reporter: Sean Busbey Priority: Critical Fix For: 2.0.0, 0.98.14, 1.0.2 From conversation at the end of HBASE-13262: [~davelatham] {quote} Should we put a warning somewhere (mailing list? book?) about this? Something like: IF (client OR server is = 0.98.11/1.0.0) AND server has a smaller value for hbase.client.scanner.max.result.size than client does, THEN scan requests that reach the server's hbase.client.scanner.max.result.size are likely to miss data. In particular, 0.98.11 defaults hbase.client.scanner.max.result.size to 2MB but other versions default to larger values, so be very careful using 0.98.11 servers with any other client version. {quote} [~busbey] {quote} How about we add a note in the ref guide for upgrades and for troubleshooting? {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13352) Add hbase.import.version to Import usage.
[ https://issues.apache.org/jira/browse/HBASE-13352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613520#comment-14613520 ] Lars Hofhansl commented on HBASE-13352: --- +1 Add hbase.import.version to Import usage. - Key: HBASE-13352 URL: https://issues.apache.org/jira/browse/HBASE-13352 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 2.0.0, 0.98.14, 1.0.2, 1.1.2, 1.3.0, 1.2.1 Attachments: 13352-v2.txt, 13352.txt, hbase-13352_v3.patch We just tried to export some (small amount of) data out of an 0.94 cluster to 0.98 cluster. We used Export/Import for that. By default we found that the import M/R job correctly reports the number of records seen, but _silently_ does not import anything. After looking at the 0.98 it's obvious there's an hbase.import.version (-Dhbase.import.version=0.94) to make this work. Two issues: # -Dhbase.import.version=0.94 should be show with the the Import.usage # If not given it should not just silently not import anything In this issue I'll just a trivially add this option to the Import tool's usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot
[ https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613541#comment-14613541 ] Matteo Bertozzi commented on HBASE-7912: [~lhofhansl] snapshots are already incremental in case you don't compact. you just export the new files. I don't think Elliott did more than that. but this is different, here we are incremental because we copy the WAL. In case of no compaction the snapshot-incremental is probably better but as soon you start having compactions you have to copy the compacted files with the data you already have exported, since snapshot are only file based (so we may copy more data compared to the set of wals we are copying). HBase Backup/Restore Based on HBase Snapshot Key: HBASE-7912 URL: https://issues.apache.org/jira/browse/HBASE-7912 Project: HBase Issue Type: Sub-task Reporter: Richard Ding Assignee: Vladimir Rodionov Labels: backup Fix For: 2.0.0 Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf Finally, we completed the implementation of our backup/restore solution, and would like to share with community through this jira. We are leveraging existing hbase snapshot feature, and provide a general solution to common users. Our full backup is using snapshot to capture metadata locally and using exportsnapshot to move data to another cluster; the incremental backup is using offline-WALplayer to backup HLogs; we also leverage global distribution rolllog and flush to improve performance; other added-on values such as convert, merge, progress report, and CLI commands. So that a common user can backup hbase data without in-depth knowledge of hbase. Our solution also contains some usability features for enterprise users. The detail design document and CLI command will be attached in this jira. We plan to use 10~12 subtasks to share each of the following features, and document the detail implement in the subtasks: * *Full Backup* : provide local and remote back/restore for a list of tables * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental backup) * *distributed* Logroll and distributed flush * Backup *Manifest* and history * *Incremental* backup: to build on top of full backup as daily/weekly backup * *Convert* incremental backup WAL files into hfiles * *Merge* several backup images into one(like merge weekly into monthly) * *add and remove* table to and from Backup image * *Cancel* a backup process * backup progress *status* * full backup based on *existing snapshot* *-* *Below is the original description, to keep here as the history for the design and discussion back in 2013* There have been attempts in the past to come up with a viable HBase backup/restore solution (e.g., HBASE-4618). Recently, there are many advancements and new features in HBase, for example, FileLink, Snapshot, and Distributed Barrier Procedure. This is a proposal for a backup/restore solution that utilizes these new features to achieve better performance and consistency. A common practice of backup and restore in database is to first take full baseline backup, and then periodically take incremental backup that capture the changes since the full baseline backup. HBase cluster can store massive amount data. Combination of full backups with incremental backups has tremendous benefit for HBase as well. The following is a typical scenario for full and incremental backup. # The user takes a full backup of a table or a set of tables in HBase. # The user schedules periodical incremental backups to capture the changes from the full backup, or from last incremental backup. # The user needs to restore table data to a past point of time. # The full backup is restored to the table(s) or to different table name(s). Then the incremental backups that are up to the desired point in time are applied on top of the full backup. We would support the following key features and capabilities. * Full backup uses HBase snapshot to capture HFiles. * Use HBase WALs to capture incremental changes, but we use bulk load of HFiles for fast incremental restore. * Support single table or a set of tables, and column family level backup and restore. * Restore to different table names. * Support adding additional tables or CF to backup set without interruption of incremental backup schedule. * Support rollup/combining of incremental backups into longer period
[jira] [Updated] (HBASE-13417) batchCoprocessorService() does not handle NULL keys
[ https://issues.apache.org/jira/browse/HBASE-13417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13417: Fix Version/s: (was: 1.2.0) batchCoprocessorService() does not handle NULL keys --- Key: HBASE-13417 URL: https://issues.apache.org/jira/browse/HBASE-13417 Project: HBase Issue Type: Bug Components: Coprocessors Affects Versions: 1.0.0 Reporter: Lars George Assignee: Abhishek Singh Chouhan Priority: Minor Labels: beginner Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13417.patch The JavaDoc for {{batchCoprocessorService()}} reads: {noformat} * @param startKey * start region selection with region containing this row. If {@code null}, the * selection will start with the first table region. * @param endKey * select regions up to and including the region containing this row. If {@code null}, * selection will continue through the last table region. {noformat} Setting the call to {{null}} keys like so {code} Mapbyte[], CountResponse results = table.batchCoprocessorService( RowCountService.getDescriptor().findMethodByName(getRowCount), request, null, null, CountResponse.getDefaultInstance()); {code} yields an exception: {noformat} java.lang.NullPointerException at org.apache.hadoop.hbase.util.Bytes.compareTo(Bytes.java:1187) at org.apache.hadoop.hbase.client.HTable.getKeysAndRegionsInRange(HTable.java:744) at org.apache.hadoop.hbase.client.HTable.getKeysAndRegionsInRange(HTable.java:723) at org.apache.hadoop.hbase.client.HTable.batchCoprocessorService(HTable.java:1801) at org.apache.hadoop.hbase.client.HTable.batchCoprocessorService(HTable.java:1778) at coprocessor.EndpointBatchExample.main(EndpointBatchExample.java:60) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) {noformat} This is caused by the call shipping off the keys to {{getStartKeysInRange()}} as-is and that method uses {{Bytes.compareTo()}} on the {{null}} keys and fails. One can work around using {{HConstants.EMPTY_START_ROW, HConstants.EMPTY_END_ROW}} instead, but that is not documented, nor standard behavior. Need to handle {{null}} keys as advertised. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12091) Optionally ignore edits for dropped tables for replication.
[ https://issues.apache.org/jira/browse/HBASE-12091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613529#comment-14613529 ] Lars Hofhansl commented on HBASE-12091: --- Another option to do this: Wait until we actually get a TableNotFound exception from the sink. When that happens check whether this one table exists on the source and ignore all edit for this table for the retry. The disadvantage is that we're trying at least twice in this case (but the case is rare). Optionally ignore edits for dropped tables for replication. --- Key: HBASE-12091 URL: https://issues.apache.org/jira/browse/HBASE-12091 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl We just ran into a scenario where we dropped a table from both the source and the sink, but the source still has outstanding edits that now it could not get rid of. Now all replication is backed up behind these unreplicatable edits. We should have an option to ignore edits for tables dropped at the source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14022) TestMultiTableSnapshotInputFormatImpl uses a class only available in JRE 1.7+
Andrew Purtell created HBASE-14022: -- Summary: TestMultiTableSnapshotInputFormatImpl uses a class only available in JRE 1.7+ Key: HBASE-14022 URL: https://issues.apache.org/jira/browse/HBASE-14022 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 0.98.14 Only applicable to 0.98. Another instance where minimum supported versions of the JRE/JDK and Hadoop lag far behind current committer dev tooling. Fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613446#comment-14613446 ] Andrew Purtell commented on HBASE-13991: bq. I think we agree that it may make sense to switchover entirely to the new layout instead of making it optional. +1 Agree there's broad agreement on that. Looks like also we should be able to split the need to haves from the nice to haves to scope work into increments and make it possible for some changes to go further back than master. Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13482) Phoenix is failing to scan tables on secure environments.
[ https://issues.apache.org/jira/browse/HBASE-13482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13482: Fix Version/s: (was: 1.2.0) Phoenix is failing to scan tables on secure environments. -- Key: HBASE-13482 URL: https://issues.apache.org/jira/browse/HBASE-13482 Project: HBase Issue Type: Bug Reporter: Alicia Ying Shu Assignee: Alicia Ying Shu Fix For: 2.0.0, 1.1.0, 0.98.13 Attachments: Hbase-13482-v1.patch, Hbase-13482.patch When executed on secure environments, phoenix query is getting the following exception message: java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.security.AccessDeniedException: org.apache.hadoop.hbase.security.AccessDeniedException: User 'null' is not the scanner owner! org.apache.hadoop.hbase.security.access.AccessController.requireScannerOwner(AccessController.java:2048) org.apache.hadoop.hbase.security.access.AccessController.preScannerNext(AccessController.java:2022) org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$53.call(RegionCoprocessorHost.java:1336) org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1671) org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1746) org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1720) org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preScannerNext(RegionCoprocessorHost.java:1331) org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2227) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13496) Make Bytes$LexicographicalComparerHolder$UnsafeComparer::compareTo inlineable
[ https://issues.apache.org/jira/browse/HBASE-13496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13496: Fix Version/s: (was: 1.2.0) Make Bytes$LexicographicalComparerHolder$UnsafeComparer::compareTo inlineable - Key: HBASE-13496 URL: https://issues.apache.org/jira/browse/HBASE-13496 Project: HBase Issue Type: Sub-task Components: Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0, 1.1.0, 1.0.2 Attachments: ByteBufferUtils.java, HBASE-13496.patch, HBASE-13496.patch, OffheapVsOnHeapCompareTest.java, onheapoffheapcompare.tgz While testing with some other perf comparisons I have noticed that the above method (which is very hot in read path) is not getting inline bq.@ 16 org.apache.hadoop.hbase.util.Bytes$LexicographicalComparerHolder$UnsafeComparer::compareTo (364 bytes) hot method too big We can do minor refactoring to make it inlineable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13431) Allow to skip store file range check based on column family while creating reference files in HRegionFileSystem#splitStoreFile
[ https://issues.apache.org/jira/browse/HBASE-13431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13431: Fix Version/s: (was: 1.2.0) Allow to skip store file range check based on column family while creating reference files in HRegionFileSystem#splitStoreFile -- Key: HBASE-13431 URL: https://issues.apache.org/jira/browse/HBASE-13431 Project: HBase Issue Type: Improvement Reporter: Rajeshbabu Chintaguntla Assignee: Rajeshbabu Chintaguntla Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13431.patch, HBASE-13431_branch-1.0.patch, HBASE-13431_branch-1.1.patch, HBASE-13431_branch-1.patch, HBASE-13431_v2.patch, HBASE-13431_v3.patch, HBASE-13431_v4.patch APPROACH #3 at PHOENIX-1734 helps to implement local indexing without much changes in HBase. For split we need one kernel change to allow creating both top and bottom reference files for index column family store files even when the split key not in the storefile key range. The changes helps in this case are 1) pass boolean to HRegionFileSystem#splitStoreFile to allow to skip the storefile key range check. 2) move the splitStoreFile with extra boolean parameter to the new interface introduced at HBASE-12975. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13437) ThriftServer leaks ZooKeeper connections
[ https://issues.apache.org/jira/browse/HBASE-13437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13437: Fix Version/s: (was: 1.2.0) ThriftServer leaks ZooKeeper connections Key: HBASE-13437 URL: https://issues.apache.org/jira/browse/HBASE-13437 Project: HBase Issue Type: Bug Components: Thrift Affects Versions: 0.98.8 Reporter: Winger Pun Assignee: Winger Pun Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2 Attachments: HBASE-13437_1.patch, HBASE-13437_1.patch, hbase-13437-fix.patch HBase ThriftServer will cache Zookeeper connection in memory using org.apache.hadoop.hbase.util.ConnectionCache. This class has a mechanism called chore to clean up connections idle for too long(default is 10 min). But method timedOut for testing whether idle exceed for maxIdleTime always return false which leads to never release the Zookeeper connection. If we send request to ThriftServer every maxIdleTime then ThriftServer will keep thousands of Zookeeper Connection soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13466) Document deprecations in 1.x - Part 1
[ https://issues.apache.org/jira/browse/HBASE-13466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13466: Fix Version/s: (was: 1.2.0) Document deprecations in 1.x - Part 1 - Key: HBASE-13466 URL: https://issues.apache.org/jira/browse/HBASE-13466 Project: HBase Issue Type: Sub-task Reporter: Lars Francke Assignee: Lars Francke Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13466-v1-branch-1.patch, HBASE-13466-v2-branch-1.patch, HBASE-13466-v3-branch-1.patch, HBASE-13466-v4-branch-1.patch, HBASE-13466.1.patch This documents deprecations for the following issues: * HBASE-6038 * HBASE-1502 * HBASE-5453 * HBASE-5357 * HBASE-9870 * HBASE-10870 * HBASE-12363 * HBASE-9508 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-12311) Version stats in HFiles?
[ https://issues.apache.org/jira/browse/HBASE-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-12311. --- Resolution: Invalid This is no longer needed. I added much better heuristics now to decide when we should SEEK and when we should SKIP. Version stats in HFiles? Key: HBASE-12311 URL: https://issues.apache.org/jira/browse/HBASE-12311 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl Attachments: 12311-indexed-0.98-v2.txt, 12311-indexed-0.98.txt, 12311-v2.txt, 12311-v3.txt, 12311.txt, CellStatTracker.java In HBASE-9778 I basically punted the decision on whether doing repeated scanner.next() called instead of the issueing (re)seeks to the user. I think we can do better. One way do that is maintain simple stats of what the maximum number of versions we've seen for any row/col combination and store these in the HFile's metadata (just like the timerange, oldest Put, etc). Then we estimate fairly accurately whether we have to expect lots of versions (i.e. seek between columns is better) or not (in which case we'd issue repeated next()'s). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-12765) SplitTransaction creates too many threads (potentially)
[ https://issues.apache.org/jira/browse/HBASE-12765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-12765. --- Resolution: Invalid SplitTransaction creates too many threads (potentially) --- Key: HBASE-12765 URL: https://issues.apache.org/jira/browse/HBASE-12765 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl Attachments: 12765.txt In splitStoreFiles(...) we create a new thread pool with as many threads as there are files to split. We should be able to do better. During times of very heavy write loads there might be a lot of files to split and multiple splits might be going on at the same time on the same region server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14022) TestMultiTableSnapshotInputFormatImpl uses a class only available in JRE 1.7+
[ https://issues.apache.org/jira/browse/HBASE-14022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-14022: --- Status: Patch Available (was: Open) TestMultiTableSnapshotInputFormatImpl uses a class only available in JRE 1.7+ - Key: HBASE-14022 URL: https://issues.apache.org/jira/browse/HBASE-14022 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 0.98.14 Attachments: HBASE-14022-0.98.patch Only applicable to 0.98. Another instance where minimum supported versions of the JRE/JDK and Hadoop lag far behind current committer dev tooling. Fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14022) TestMultiTableSnapshotInputFormatImpl uses a class only available in JRE 1.7+
[ https://issues.apache.org/jira/browse/HBASE-14022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-14022: --- Attachment: HBASE-14022-0.98.patch TestMultiTableSnapshotInputFormatImpl uses a class only available in JRE 1.7+ - Key: HBASE-14022 URL: https://issues.apache.org/jira/browse/HBASE-14022 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 0.98.14 Attachments: HBASE-14022-0.98.patch Only applicable to 0.98. Another instance where minimum supported versions of the JRE/JDK and Hadoop lag far behind current committer dev tooling. Fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-7847) Use zookeeper multi to clear znodes
[ https://issues.apache.org/jira/browse/HBASE-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-7847: -- Fix Version/s: 1.0.2 0.98.14 Picked back to 0.98 and branch-1.0 Use zookeeper multi to clear znodes --- Key: HBASE-7847 URL: https://issues.apache.org/jira/browse/HBASE-7847 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Assignee: Rakesh R Fix For: 2.0.0, 1.1.0, 0.98.14, 1.0.2 Attachments: 7847-v1.txt, 7847_v6.patch, 7847_v6.patch, HBASE-7847.patch, HBASE-7847.patch, HBASE-7847.patch, HBASE-7847_v4.patch, HBASE-7847_v5.patch, HBASE-7847_v6.patch, HBASE-7847_v7.1.patch, HBASE-7847_v7.patch, HBASE-7847_v9.patch In ZKProcedureUtil, clearChildZNodes() and clearZNodes(String procedureName) should utilize zookeeper multi so that they're atomic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13925) Use zookeeper multi to clear znodes in ZKProcedureUtil
[ https://issues.apache.org/jira/browse/HBASE-13925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-13925: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 1.1.2 1.2.0 1.0.2 Status: Resolved (was: Patch Available) Pushed to 0.98 and up Use zookeeper multi to clear znodes in ZKProcedureUtil -- Key: HBASE-13925 URL: https://issues.apache.org/jira/browse/HBASE-13925 Project: HBase Issue Type: Improvement Affects Versions: 0.98.13 Reporter: Ashish Singhi Assignee: Ashish Singhi Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-13925-v1-again.patch, HBASE-13925-v1.patch, HBASE-13925.patch Address the TODO in ZKProcedureUtil clearChildZNodes() and clearZNodes methods -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12988) [Replication]Parallel apply edits across regions
[ https://issues.apache.org/jira/browse/HBASE-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-12988: -- Attachment: 12988-v5.txt v5 * constants in HConstants * default in hbase-default.xml * improved comments * trace messages when enqueueing tasks [Replication]Parallel apply edits across regions Key: HBASE-12988 URL: https://issues.apache.org/jira/browse/HBASE-12988 Project: HBase Issue Type: Improvement Components: Replication Reporter: hongyu bi Assignee: Lars Hofhansl Attachments: 12988-v2.txt, 12988-v3.txt, 12988-v4.txt, 12988-v5.txt, 12988.txt, HBASE-12988-0.98.patch, ParallelReplication-v2.txt we can apply edits to slave cluster in parallel on table-level to speed up replication . update : per conversation blow , it's better to apply edits on row-level in parallel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12988) [Replication]Parallel apply edits across regions
[ https://issues.apache.org/jira/browse/HBASE-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613419#comment-14613419 ] Andrew Purtell commented on HBASE-12988: +1 [Replication]Parallel apply edits across regions Key: HBASE-12988 URL: https://issues.apache.org/jira/browse/HBASE-12988 Project: HBase Issue Type: Improvement Components: Replication Reporter: hongyu bi Assignee: Lars Hofhansl Attachments: 12988-v2.txt, 12988-v3.txt, 12988-v4.txt, 12988-v5.txt, 12988.txt, HBASE-12988-0.98.patch, ParallelReplication-v2.txt we can apply edits to slave cluster in parallel on table-level to speed up replication . update : per conversation blow , it's better to apply edits on row-level in parallel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14002) Add --noReplicationSetup option to IntegrationTestReplication
[ https://issues.apache.org/jira/browse/HBASE-14002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-14002: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 1.3.0 1.1.2 1.2.0 0.98.14 2.0.0 Status: Resolved (was: Patch Available) Pushed to 0.98 and up, except branch-1.0 where we are missing IntegrationTestReplication. Shall I rectify that [~enis]? Add --noReplicationSetup option to IntegrationTestReplication - Key: HBASE-14002 URL: https://issues.apache.org/jira/browse/HBASE-14002 Project: HBase Issue Type: Improvement Components: integration tests Reporter: Dima Spivak Assignee: Dima Spivak Fix For: 2.0.0, 0.98.14, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-14002_master.patch IntegrationTestReplication has been flaky for me on pre-1.1 versions of HBase because of not-actually-synchronous operations in HBaseAdmin/Admin, which hamper its setupTablesAndReplication method. To get around this, I'd like to add a \-nrs/--noReplicationSetup option to the test to allow it to be run on clusters in which the necessary tables and replication have already been setup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12988) [Replication]Parallel apply edits across regions
[ https://issues.apache.org/jira/browse/HBASE-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613424#comment-14613424 ] Lars Hofhansl commented on HBASE-12988: --- Just remembering now... [~abhishek.chouhan], you had added a bunch of log messages locally to track where time is spent during replication. Should we fold those in here? Or separate jira? [Replication]Parallel apply edits across regions Key: HBASE-12988 URL: https://issues.apache.org/jira/browse/HBASE-12988 Project: HBase Issue Type: Improvement Components: Replication Reporter: hongyu bi Assignee: Lars Hofhansl Attachments: 12988-v2.txt, 12988-v3.txt, 12988-v4.txt, 12988-v5.txt, 12988.txt, HBASE-12988-0.98.patch, ParallelReplication-v2.txt we can apply edits to slave cluster in parallel on table-level to speed up replication . update : per conversation blow , it's better to apply edits on row-level in parallel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot
[ https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613440#comment-14613440 ] Jerry He commented on HBASE-7912: - I went thru the v5 doc. Looks good. There are some changes in there I see are good. One of them is to use a new system table hbase:backup instead of zookeeper. I definitely see benefit of using a system table. There is also a section talking about with security support consistent with the existing HBase security. Regarding the section First incremental after full backup restore, yes, there could data duplicated in two backups (the full and the incr). It is better to fix it during the backup. [~mbertozzi]'s filesystem layout question is mainly a concern for upgrade and migration, I think. We've faced such problems. We will need to make sure the current backup images (data and metadata) can be used to restore to future hbase releases. Will continue to read, and may comment later. HBase Backup/Restore Based on HBase Snapshot Key: HBASE-7912 URL: https://issues.apache.org/jira/browse/HBASE-7912 Project: HBase Issue Type: Sub-task Reporter: Richard Ding Assignee: Vladimir Rodionov Labels: backup Fix For: 2.0.0 Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf Finally, we completed the implementation of our backup/restore solution, and would like to share with community through this jira. We are leveraging existing hbase snapshot feature, and provide a general solution to common users. Our full backup is using snapshot to capture metadata locally and using exportsnapshot to move data to another cluster; the incremental backup is using offline-WALplayer to backup HLogs; we also leverage global distribution rolllog and flush to improve performance; other added-on values such as convert, merge, progress report, and CLI commands. So that a common user can backup hbase data without in-depth knowledge of hbase. Our solution also contains some usability features for enterprise users. The detail design document and CLI command will be attached in this jira. We plan to use 10~12 subtasks to share each of the following features, and document the detail implement in the subtasks: * *Full Backup* : provide local and remote back/restore for a list of tables * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental backup) * *distributed* Logroll and distributed flush * Backup *Manifest* and history * *Incremental* backup: to build on top of full backup as daily/weekly backup * *Convert* incremental backup WAL files into hfiles * *Merge* several backup images into one(like merge weekly into monthly) * *add and remove* table to and from Backup image * *Cancel* a backup process * backup progress *status* * full backup based on *existing snapshot* *-* *Below is the original description, to keep here as the history for the design and discussion back in 2013* There have been attempts in the past to come up with a viable HBase backup/restore solution (e.g., HBASE-4618). Recently, there are many advancements and new features in HBase, for example, FileLink, Snapshot, and Distributed Barrier Procedure. This is a proposal for a backup/restore solution that utilizes these new features to achieve better performance and consistency. A common practice of backup and restore in database is to first take full baseline backup, and then periodically take incremental backup that capture the changes since the full baseline backup. HBase cluster can store massive amount data. Combination of full backups with incremental backups has tremendous benefit for HBase as well. The following is a typical scenario for full and incremental backup. # The user takes a full backup of a table or a set of tables in HBase. # The user schedules periodical incremental backups to capture the changes from the full backup, or from last incremental backup. # The user needs to restore table data to a past point of time. # The full backup is restored to the table(s) or to different table name(s). Then the incremental backups that are up to the desired point in time are applied on top of the full backup. We would support the following key features and capabilities. * Full backup uses HBase snapshot to capture HFiles. * Use HBase WALs to capture incremental changes, but we use bulk load of HFiles for fast incremental restore. * Support single table or a set of
[jira] [Updated] (HBASE-14018) RegionServer is aborted when flushing memstore.
[ https://issues.apache.org/jira/browse/HBASE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinh Duong Mai updated HBASE-14018: --- Description: + Pseudo-distributed Hadoop, ZK_HBASE_MANAGE = true (1 master, 1 regionserver). + Put data to OpenTSDB, 1000 records / s, for 2000 seconds. + RegionServer is aborted. === RegionServer logs === 2015-07-03 16:37:37,332 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=371.27 KB, freeSize=181.41 MB, max=181.78 MB, blockCount=5, accesses=1623, hits=172, hitRatio=10.60%, , cachingAccesses=177, cachingHits=151, cachingHitsRatio=85.31%, evictions=1139, evicted=21, evictedPerRun=0.018437225371599197 2015-07-03 16:37:37,898 INFO [node1:16040Replication Statistics #0] regionserver.Replication: Normal source for cluster 1: Total replicated edits: 2744, currently replicating from: hdfs://node1.vmcluster:9000/hbase/WALs/node1.vmcluster,16040,1435897652505/node1.vmcluster%2C16040%2C1435897652505.default.1435908458590 at position: 19207814 2015-07-03 16:42:37,331 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=371.27 KB, freeSize=181.41 MB, max=181.78 MB, blockCount=5, accesses=1624, hits=173, hitRatio=10.65%, , cachingAccesses=178, cachingHits=152, cachingHitsRatio=85.39%, evictions=1169, evicted=21, evictedPerRun=0.01796407252550125 2015-07-03 16:42:37,899 INFO [node1:16040Replication Statistics #0] regionserver.Replication: Normal source for cluster 1: Total replicated edits: 3049, currently replicating from: hdfs://node1.vmcluster:9000/hbase/WALs/node1.vmcluster,16040,1435897652505/node1.vmcluster%2C16040%2C1435897652505.default.1435908458590 at position: 33026416 2015-07-03 16:43:27,217 INFO [MemStoreFlusher.1] regionserver.HRegion: Started memstore flush for tsdb,,1435897759785.2d49cd81fb6513f51af58bd0394c4e0d., current region memstore size 128.05 MB 2015-07-03 16:43:27,899 FATAL [MemStoreFlusher.1] regionserver.HRegionServer: ABORTING region server node1.vmcluster,16040,1435897652505: Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1435897759785.2d49cd81fb6513f51af58bd0394c4e0d. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2001) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1772) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1704) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743 at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478) at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263) at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1955) ... 7 more 2015-07-03 16:43:27,901 FATAL [MemStoreFlusher.1] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2015-07-03 16:43:27,942 INFO [MemStoreFlusher.1] regionserver.HRegionServer: Dump of metrics as JSON on abort: { beans : [ { name : java.lang:type=Memory, modelerType : sun.management.MemoryImpl, HeapMemoryUsage : { committed : 170471424, init : 31457280, max : 476512256, used : 151515136 }, Verbose : false, ObjectPendingFinalizationCount : 0, NonHeapMemoryUsage : { committed : 72335360, init : 2555904, max : -1, used : 70646296 }, ObjectName : java.lang:type=Memory } ], beans : [ { name :
[jira] [Updated] (HBASE-14019) Hbase table import throws RetriesExhaustedException
[ https://issues.apache.org/jira/browse/HBASE-14019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wesley Connor updated HBASE-14019: -- Attachment: error.txt Hbase table import throws RetriesExhaustedException --- Key: HBASE-14019 URL: https://issues.apache.org/jira/browse/HBASE-14019 Project: HBase Issue Type: Bug Components: hadoop2, hbase Affects Versions: 0.98.9 Environment: hbase-0.98.9-hadoop2 hadoop-2.6 Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.63-2+deb7u1 x86_64 GNU/Linux Oracle jdk1.8.0_45/ Reporter: Wesley Connor Attachments: error.txt hbase-0.98.9-hadoop2/bin/hbase org.apache.hadoop.hbase.mapreduce.Import item_restore /data/item_backup Fails with numerous RetriesExhaustedException The export process eg hbase-0.98.9-hadoop2/bin/hbase org.apache.hadoop.hbase.mapreduce.Export item /data/item_backup works flawlessly and the file item_backup is created import of the same file to a table of a different name, fails. Please see attached job log -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14019) Hbase table import throws RetriesExhaustedException
Wesley Connor created HBASE-14019: - Summary: Hbase table import throws RetriesExhaustedException Key: HBASE-14019 URL: https://issues.apache.org/jira/browse/HBASE-14019 Project: HBase Issue Type: Bug Components: hadoop2, hbase Affects Versions: 0.98.9 Environment: hbase-0.98.9-hadoop2 hadoop-2.6 Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.63-2+deb7u1 x86_64 GNU/Linux Oracle jdk1.8.0_45/ Reporter: Wesley Connor Attachments: error.txt hbase-0.98.9-hadoop2/bin/hbase org.apache.hadoop.hbase.mapreduce.Import item_restore /data/item_backup Fails with numerous RetriesExhaustedException The export process eg hbase-0.98.9-hadoop2/bin/hbase org.apache.hadoop.hbase.mapreduce.Export item /data/item_backup works flawlessly and the file item_backup is created import of the same file to a table of a different name, fails. Please see attached job log -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12016) Reduce number of versions in Meta table. Make it configurable
[ https://issues.apache.org/jira/browse/HBASE-12016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov updated HBASE-12016: Fix Version/s: 1.0.0 Reduce number of versions in Meta table. Make it configurable - Key: HBASE-12016 URL: https://issues.apache.org/jira/browse/HBASE-12016 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Andrey Stepachev Assignee: Andrey Stepachev Priority: Minor Fix For: 1.0.0, 2.0.0 Attachments: HBASE-12016.patch, HBASE-12016.patch, HBASE-12016.patch, HBASE-12016.patch, HBASE-12016.patch, HBASE-12016.patch, HBASE-12016.patch Currently meta keeps up to 10 versions of each KV. For big metas it leads to substantial memory overhead and scan slowdowns. (see https://issues.apache.org/jira/browse/HBASE-11165 ) Need to keep reasonable number of versions (suggested value is 3). Number of versions configurable via parameter: hbase.meta.versions -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14018) RegionServer is aborted when flushing memstore.
[ https://issues.apache.org/jira/browse/HBASE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinh Duong Mai updated HBASE-14018: --- Description: + Pseudo-distributed Hadoop (2.6.0), ZK_HBASE_MANAGE = true (1 master, 1 regionserver). + Put data to OpenTSDB, 1000 records / s, for 2000 seconds. + RegionServer is aborted. === RegionServer logs === 2015-07-03 16:37:37,332 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=371.27 KB, freeSize=181.41 MB, max=181.78 MB, blockCount=5, accesses=1623, hits=172, hitRatio=10.60%, , cachingAccesses=177, cachingHits=151, cachingHitsRatio=85.31%, evictions=1139, evicted=21, evictedPerRun=0.018437225371599197 2015-07-03 16:37:37,898 INFO [node1:16040Replication Statistics #0] regionserver.Replication: Normal source for cluster 1: Total replicated edits: 2744, currently replicating from: hdfs://node1.vmcluster:9000/hbase/WALs/node1.vmcluster,16040,1435897652505/node1.vmcluster%2C16040%2C1435897652505.default.1435908458590 at position: 19207814 2015-07-03 16:42:37,331 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=371.27 KB, freeSize=181.41 MB, max=181.78 MB, blockCount=5, accesses=1624, hits=173, hitRatio=10.65%, , cachingAccesses=178, cachingHits=152, cachingHitsRatio=85.39%, evictions=1169, evicted=21, evictedPerRun=0.01796407252550125 2015-07-03 16:42:37,899 INFO [node1:16040Replication Statistics #0] regionserver.Replication: Normal source for cluster 1: Total replicated edits: 3049, currently replicating from: hdfs://node1.vmcluster:9000/hbase/WALs/node1.vmcluster,16040,1435897652505/node1.vmcluster%2C16040%2C1435897652505.default.1435908458590 at position: 33026416 2015-07-03 16:43:27,217 INFO [MemStoreFlusher.1] regionserver.HRegion: Started memstore flush for tsdb,,1435897759785.2d49cd81fb6513f51af58bd0394c4e0d., current region memstore size 128.05 MB 2015-07-03 16:43:27,899 FATAL [MemStoreFlusher.1] regionserver.HRegionServer: ABORTING region server node1.vmcluster,16040,1435897652505: Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1435897759785.2d49cd81fb6513f51af58bd0394c4e0d. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2001) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1772) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1704) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743 at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478) at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263) at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1955) ... 7 more 2015-07-03 16:43:27,901 FATAL [MemStoreFlusher.1] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] === HMaster logs === 2015-07-03 13:29:20,671 INFO [RegionOpenAndInitThread-tsdb-meta-1] regionserver.HRegion: creating HRegion tsdb-meta HTD == 'tsdb-meta', {NAME = 'name', BLOOMFILTER = 'ROW', VERSIONS = '1', IN_MEMORY = 'false', KEEP_DELETED_CELLS = 'FALSE', DATA_BLOCK_ENCODING = 'NONE', TTL = 'FOREVER', COMPRESSION = 'NONE', MIN_VERSIONS = '0', BLOCKCACHE = 'true', BLOCKSIZE = '65536', REPLICATION_SCOPE = '1'} RootDir = hdfs://node1.vmcluster:9000/hbase/.tmp Table name == tsdb-meta 2015-07-03 13:29:20,696 INFO [RegionOpenAndInitThread-tsdb-meta-1]
[jira] [Created] (HBASE-14018) RegionServer is aborted when flushing memstore.
Dinh Duong Mai created HBASE-14018: -- Summary: RegionServer is aborted when flushing memstore. Key: HBASE-14018 URL: https://issues.apache.org/jira/browse/HBASE-14018 Project: HBase Issue Type: Bug Affects Versions: 1.0.1.1 Environment: CentOS x64 Server Reporter: Dinh Duong Mai + Pseudo-distributed Hadoop, ZK_HBASE_MANAGE = true (1 master, 1 regionserver). + Put data to OpenTSDB, 1000 records / s, for 2000 seconds. + RegionServer is aborted. RegionServer logs: 2015-07-03 16:43:27,320 INFO [MemStoreFlusher.1] regionserver.HRegion: Started memstore flush for tsdb,,1435894216453.35f5c254751fef111cdd3788f8465324., current region memstore size 128.12 MB 2015-07-03 16:43:27,955 FATAL [MemStoreFlusher.1] regionserver.HRegionServer: ABORTING region server node2.vmcluster,16040,1435897661557: Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1435894216453.35f5c254751fef111cdd3788f8465324. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2001) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1772) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1704) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743 at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478) at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263) at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1955) ... 7 more 2015-07-03 16:43:27,956 FATAL [MemStoreFlusher.1] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14018) RegionServer is aborted when flushing memstore.
[ https://issues.apache.org/jira/browse/HBASE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinh Duong Mai updated HBASE-14018: --- Attachment: hbase-hadoop-zookeeper-node1.vmcluster.log hbase-hadoop-regionserver-node1.vmcluster.log hbase-hadoop-master-node1.vmcluster.log check at 2015-07-03 16:43:27 RegionServer is aborted when flushing memstore. --- Key: HBASE-14018 URL: https://issues.apache.org/jira/browse/HBASE-14018 Project: HBase Issue Type: Bug Affects Versions: 1.0.1.1 Environment: CentOS x64 Server Reporter: Dinh Duong Mai Attachments: hbase-hadoop-master-node1.vmcluster.log, hbase-hadoop-regionserver-node1.vmcluster.log, hbase-hadoop-zookeeper-node1.vmcluster.log + Pseudo-distributed Hadoop, ZK_HBASE_MANAGE = true (1 master, 1 regionserver). + Put data to OpenTSDB, 1000 records / s, for 2000 seconds. + RegionServer is aborted. === RegionServer logs === 2015-07-03 16:37:37,332 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=371.27 KB, freeSize=181.41 MB, max=181.78 MB, blockCount=5, accesses=1623, hits=172, hitRatio=10.60%, , cachingAccesses=177, cachingHits=151, cachingHitsRatio=85.31%, evictions=1139, evicted=21, evictedPerRun=0.018437225371599197 2015-07-03 16:37:37,898 INFO [node1:16040Replication Statistics #0] regionserver.Replication: Normal source for cluster 1: Total replicated edits: 2744, currently replicating from: hdfs://node1.vmcluster:9000/hbase/WALs/node1.vmcluster,16040,1435897652505/node1.vmcluster%2C16040%2C1435897652505.default.1435908458590 at position: 19207814 2015-07-03 16:42:37,331 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=371.27 KB, freeSize=181.41 MB, max=181.78 MB, blockCount=5, accesses=1624, hits=173, hitRatio=10.65%, , cachingAccesses=178, cachingHits=152, cachingHitsRatio=85.39%, evictions=1169, evicted=21, evictedPerRun=0.01796407252550125 2015-07-03 16:42:37,899 INFO [node1:16040Replication Statistics #0] regionserver.Replication: Normal source for cluster 1: Total replicated edits: 3049, currently replicating from: hdfs://node1.vmcluster:9000/hbase/WALs/node1.vmcluster,16040,1435897652505/node1.vmcluster%2C16040%2C1435897652505.default.1435908458590 at position: 33026416 2015-07-03 16:43:27,217 INFO [MemStoreFlusher.1] regionserver.HRegion: Started memstore flush for tsdb,,1435897759785.2d49cd81fb6513f51af58bd0394c4e0d., current region memstore size 128.05 MB 2015-07-03 16:43:27,899 FATAL [MemStoreFlusher.1] regionserver.HRegionServer: ABORTING region server node1.vmcluster,16040,1435897652505: Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1435897759785.2d49cd81fb6513f51af58bd0394c4e0d. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2001) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1772) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1704) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743 at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478) at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263) at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1955) ... 7 more 2015-07-03 16:43:27,901 FATAL
[jira] [Updated] (HBASE-14018) RegionServer is aborted when flushing memstore.
[ https://issues.apache.org/jira/browse/HBASE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinh Duong Mai updated HBASE-14018: --- Description: + Pseudo-distributed Hadoop, ZK_HBASE_MANAGE = true (1 master, 1 regionserver). + Put data to OpenTSDB, 1000 records / s, for 2000 seconds. + RegionServer is aborted. === RegionServer logs === 2015-07-03 16:37:37,332 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=371.27 KB, freeSize=181.41 MB, max=181.78 MB, blockCount=5, accesses=1623, hits=172, hitRatio=10.60%, , cachingAccesses=177, cachingHits=151, cachingHitsRatio=85.31%, evictions=1139, evicted=21, evictedPerRun=0.018437225371599197 2015-07-03 16:37:37,898 INFO [node1:16040Replication Statistics #0] regionserver.Replication: Normal source for cluster 1: Total replicated edits: 2744, currently replicating from: hdfs://node1.vmcluster:9000/hbase/WALs/node1.vmcluster,16040,1435897652505/node1.vmcluster%2C16040%2C1435897652505.default.1435908458590 at position: 19207814 2015-07-03 16:42:37,331 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=371.27 KB, freeSize=181.41 MB, max=181.78 MB, blockCount=5, accesses=1624, hits=173, hitRatio=10.65%, , cachingAccesses=178, cachingHits=152, cachingHitsRatio=85.39%, evictions=1169, evicted=21, evictedPerRun=0.01796407252550125 2015-07-03 16:42:37,899 INFO [node1:16040Replication Statistics #0] regionserver.Replication: Normal source for cluster 1: Total replicated edits: 3049, currently replicating from: hdfs://node1.vmcluster:9000/hbase/WALs/node1.vmcluster,16040,1435897652505/node1.vmcluster%2C16040%2C1435897652505.default.1435908458590 at position: 33026416 2015-07-03 16:43:27,217 INFO [MemStoreFlusher.1] regionserver.HRegion: Started memstore flush for tsdb,,1435897759785.2d49cd81fb6513f51af58bd0394c4e0d., current region memstore size 128.05 MB 2015-07-03 16:43:27,899 FATAL [MemStoreFlusher.1] regionserver.HRegionServer: ABORTING region server node1.vmcluster,16040,1435897652505: Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1435897759785.2d49cd81fb6513f51af58bd0394c4e0d. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2001) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1772) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1704) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743 at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478) at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263) at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1955) ... 7 more 2015-07-03 16:43:27,901 FATAL [MemStoreFlusher.1] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] === HMaster logs === 2015-07-03 13:29:20,671 INFO [RegionOpenAndInitThread-tsdb-meta-1] regionserver.HRegion: creating HRegion tsdb-meta HTD == 'tsdb-meta', {NAME = 'name', BLOOMFILTER = 'ROW', VERSIONS = '1', IN_MEMORY = 'false', KEEP_DELETED_CELLS = 'FALSE', DATA_BLOCK_ENCODING = 'NONE', TTL = 'FOREVER', COMPRESSION = 'NONE', MIN_VERSIONS = '0', BLOCKCACHE = 'true', BLOCKSIZE = '65536', REPLICATION_SCOPE = '1'} RootDir = hdfs://node1.vmcluster:9000/hbase/.tmp Table name == tsdb-meta 2015-07-03 13:29:20,696 INFO [RegionOpenAndInitThread-tsdb-meta-1] regionserver.HRegion:
[jira] [Commented] (HBASE-12596) bulkload needs to follow locality
[ https://issues.apache.org/jira/browse/HBASE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613007#comment-14613007 ] Victor Xu commented on HBASE-12596: --- Thanks for your suggestion. The default value should be true and I need to revise some unit tests before that. New patches will be updated soon. bulkload needs to follow locality - Key: HBASE-12596 URL: https://issues.apache.org/jira/browse/HBASE-12596 Project: HBase Issue Type: Improvement Components: HFile, regionserver Affects Versions: 0.98.8 Environment: hadoop-2.3.0, hbase-0.98.8, jdk1.7 Reporter: Victor Xu Assignee: Victor Xu Fix For: 0.98.14 Attachments: HBASE-12596-0.98-v1.patch, HBASE-12596-0.98-v2.patch, HBASE-12596-master-v1.patch, HBASE-12596-master-v2.patch, HBASE-12596.patch Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles to be loaded; 2. Move these HFiles to the right hdfs directory. However, the locality could be loss during the first step. Why not just write the HFiles directly into the right place? We can do this easily because StoreFile.WriterBuilder has the withFavoredNodes method, and we just need to call it in HFileOutputFormat's getNewWriter(). This feature is disabled by default, and we could use 'hbase.bulkload.locality.sensitive.enabled=true' to enable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14018) RegionServer is aborted when flushing memstore.
[ https://issues.apache.org/jira/browse/HBASE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinh Duong Mai updated HBASE-14018: --- Description: + Pseudo-distributed Hadoop, ZK_HBASE_MANAGE = true (1 master, 1 regionserver). + Put data to OpenTSDB, 1000 records / s, for 2000 seconds. + RegionServer is aborted. === RegionServer logs === 2015-07-03 16:37:37,332 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=371.27 KB, freeSize=181.41 MB, max=181.78 MB, blockCount=5, accesses=1623, hits=172, hitRatio=10.60%, , cachingAccesses=177, cachingHits=151, cachingHitsRatio=85.31%, evictions=1139, evicted=21, evictedPerRun=0.018437225371599197 2015-07-03 16:37:37,898 INFO [node1:16040Replication Statistics #0] regionserver.Replication: Normal source for cluster 1: Total replicated edits: 2744, currently replicating from: hdfs://node1.vmcluster:9000/hbase/WALs/node1.vmcluster,16040,1435897652505/node1.vmcluster%2C16040%2C1435897652505.default.1435908458590 at position: 19207814 2015-07-03 16:42:37,331 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=371.27 KB, freeSize=181.41 MB, max=181.78 MB, blockCount=5, accesses=1624, hits=173, hitRatio=10.65%, , cachingAccesses=178, cachingHits=152, cachingHitsRatio=85.39%, evictions=1169, evicted=21, evictedPerRun=0.01796407252550125 2015-07-03 16:42:37,899 INFO [node1:16040Replication Statistics #0] regionserver.Replication: Normal source for cluster 1: Total replicated edits: 3049, currently replicating from: hdfs://node1.vmcluster:9000/hbase/WALs/node1.vmcluster,16040,1435897652505/node1.vmcluster%2C16040%2C1435897652505.default.1435908458590 at position: 33026416 2015-07-03 16:43:27,217 INFO [MemStoreFlusher.1] regionserver.HRegion: Started memstore flush for tsdb,,1435897759785.2d49cd81fb6513f51af58bd0394c4e0d., current region memstore size 128.05 MB 2015-07-03 16:43:27,899 FATAL [MemStoreFlusher.1] regionserver.HRegionServer: ABORTING region server node1.vmcluster,16040,1435897652505: Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1435897759785.2d49cd81fb6513f51af58bd0394c4e0d. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2001) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1772) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1704) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743 at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478) at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263) at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1955) ... 7 more 2015-07-03 16:43:27,901 FATAL [MemStoreFlusher.1] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] === HMaster logs === 2015-07-03 13:29:20,671 INFO [RegionOpenAndInitThread-tsdb-meta-1] regionserver.HRegion: creating HRegion tsdb-meta HTD == 'tsdb-meta', {NAME = 'name', BLOOMFILTER = 'ROW', VERSIONS = '1', IN_MEMORY = 'false', KEEP_DELETED_CELLS = 'FALSE', DATA_BLOCK_ENCODING = 'NONE', TTL = 'FOREVER', COMPRESSION = 'NONE', MIN_VERSIONS = '0', BLOCKCACHE = 'true', BLOCKSIZE = '65536', REPLICATION_SCOPE = '1'} RootDir = hdfs://node1.vmcluster:9000/hbase/.tmp Table name == tsdb-meta 2015-07-03 13:29:20,696 INFO [RegionOpenAndInitThread-tsdb-meta-1] regionserver.HRegion:
[jira] [Commented] (HBASE-12213) HFileBlock backed by Array of ByteBuffers
[ https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612998#comment-14612998 ] Anoop Sam John commented on HBASE-12213: bq.We get around 9 to 10 % improvement. That is great Ram.. Thanks for doing the perf test for every sub task.. Yes the MBB and usage of its we tried to save every op as much as possible.. When we avoid the temp on heap buffer creation and copy and MBB actually wraps the underlying offheap BBs only, we will be able to see much more gain. Random performance is the main thing we are expecting there too. HFileBlock backed by Array of ByteBuffers - Key: HBASE-12213 URL: https://issues.apache.org/jira/browse/HBASE-12213 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: ramkrishna.s.vasudevan Attachments: HBASE-12213_1.patch, HBASE-12213_2.patch In L2 cache (offheap) an HFile block might have been cached into multiple chunks of buffers. If HFileBlock need single BB, we will end up in recreation of bigger BB and copying. Instead we can make HFileBlock to serve data from an array of BBs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14018) RegionServer is aborted when flushing memstore.
[ https://issues.apache.org/jira/browse/HBASE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinh Duong Mai updated HBASE-14018: --- Component/s: hbase hadoop2 RegionServer is aborted when flushing memstore. --- Key: HBASE-14018 URL: https://issues.apache.org/jira/browse/HBASE-14018 Project: HBase Issue Type: Bug Components: hadoop2, hbase Affects Versions: 1.0.1.1 Environment: CentOS x64 Server Reporter: Dinh Duong Mai Attachments: hbase-hadoop-master-node1.vmcluster.log, hbase-hadoop-regionserver-node1.vmcluster.log, hbase-hadoop-zookeeper-node1.vmcluster.log + Pseudo-distributed Hadoop, ZK_HBASE_MANAGE = true (1 master, 1 regionserver). + Put data to OpenTSDB, 1000 records / s, for 2000 seconds. + RegionServer is aborted. === RegionServer logs === 2015-07-03 16:37:37,332 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=371.27 KB, freeSize=181.41 MB, max=181.78 MB, blockCount=5, accesses=1623, hits=172, hitRatio=10.60%, , cachingAccesses=177, cachingHits=151, cachingHitsRatio=85.31%, evictions=1139, evicted=21, evictedPerRun=0.018437225371599197 2015-07-03 16:37:37,898 INFO [node1:16040Replication Statistics #0] regionserver.Replication: Normal source for cluster 1: Total replicated edits: 2744, currently replicating from: hdfs://node1.vmcluster:9000/hbase/WALs/node1.vmcluster,16040,1435897652505/node1.vmcluster%2C16040%2C1435897652505.default.1435908458590 at position: 19207814 2015-07-03 16:42:37,331 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=371.27 KB, freeSize=181.41 MB, max=181.78 MB, blockCount=5, accesses=1624, hits=173, hitRatio=10.65%, , cachingAccesses=178, cachingHits=152, cachingHitsRatio=85.39%, evictions=1169, evicted=21, evictedPerRun=0.01796407252550125 2015-07-03 16:42:37,899 INFO [node1:16040Replication Statistics #0] regionserver.Replication: Normal source for cluster 1: Total replicated edits: 3049, currently replicating from: hdfs://node1.vmcluster:9000/hbase/WALs/node1.vmcluster,16040,1435897652505/node1.vmcluster%2C16040%2C1435897652505.default.1435908458590 at position: 33026416 2015-07-03 16:43:27,217 INFO [MemStoreFlusher.1] regionserver.HRegion: Started memstore flush for tsdb,,1435897759785.2d49cd81fb6513f51af58bd0394c4e0d., current region memstore size 128.05 MB 2015-07-03 16:43:27,899 FATAL [MemStoreFlusher.1] regionserver.HRegionServer: ABORTING region server node1.vmcluster,16040,1435897652505: Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1435897759785.2d49cd81fb6513f51af58bd0394c4e0d. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2001) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1772) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1704) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743 at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478) at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263) at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1955) ... 7 more 2015-07-03 16:43:27,901 FATAL [MemStoreFlusher.1] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are:
[jira] [Commented] (HBASE-14015) Allow setting a richer state value when toString a pv2
[ https://issues.apache.org/jira/browse/HBASE-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612890#comment-14612890 ] Ashish Singhi commented on HBASE-14015: --- [~saint@gmail.com], this seems to be introducing a compilation error in at least branch-1. {noformat} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCompile) on project hbase-procedure: Compilation failure: Compilation failure: [ERROR] /home/root1/Code/H-b-1/hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/TestProcedureToString.java:[28,49] error: cannot find symbol [ERROR] symbol: class MasterTests [ERROR] location: package org.apache.hadoop.hbase.testclassification [ERROR] /home/root1/Code/H-b-1/hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/TestProcedureToString.java:[33,11] error: cannot find symbol {noformat} Allow setting a richer state value when toString a pv2 -- Key: HBASE-14015 URL: https://issues.apache.org/jira/browse/HBASE-14015 Project: HBase Issue Type: Improvement Components: proc-v2 Reporter: stack Assignee: stack Priority: Minor Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 0001-HBASE-14015-Allow-setting-a-richer-state-value-when-.patch Debugging, my procedure after a crash was loaded out of the store and its state was RUNNING. It would help if I knew in which of the states of a StateMachineProcedure it was going to start RUNNING at. Chatting w/ Matteo, he suggested allowing Procedures customize the String. Here is patch that makes it so StateMachineProcedure will now print out the base state -- RUNNING, FINISHED -- followed by a ':' and then the StateMachineProcedure state: e.g. SimpleStateMachineProcedure state=RUNNABLE:SERVER_CRASH_ASSIGN -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12213) HFileBlock backed by Array of ByteBuffers
[ https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612900#comment-14612900 ] ramkrishna.s.vasudevan commented on HBASE-12213: Did some perf testing using the Performance evaluation tool. For range scan there is no difference between with and without patch ScanRange1 rows=1 with 25 threads, filterAll = true without patch {code} 2015-07-03 17:06:35,332 INFO [main] hbase.PerformanceEvaluation: [RandomScanWithRange1Test] Summary of timings (ms): [573388, 574365, 573071, 571351, 572586, 572020, 572732, 573370, 573627, 573146, 573245, 574017, 573049, 574283, 571470, 574312, 574539, 574515, 571540, 574108, 573530, 574177, 574143, 572927, 574541] 2015-07-03 17:06:35,333 INFO [main] hbase.PerformanceEvaluation: [RandomScanWithRange1Test]Min: 571351ms Max: 574541ms Avg: 573362ms {code} with patch {code} 2015-07-03 16:49:57,711 INFO [main] hbase.PerformanceEvaluation: [RandomScanWithRange1Test] Summary of timings (ms): [572276, 569970, 572610, 571517, 572027, 570168, 570932, 572951, 570492, 572410, 572037, 571495, 572238, 572055, 572114, 571142, 572500, 569813, 572383, 572223, 571191, 572197, 571473, 571235, 569209] 2015-07-03 16:49:57,715 INFO [main] hbase.PerformanceEvaluation: [RandomScanWithRange1Test]Min: 569209ms Max: 572951ms Avg: 571546ms {code} But when we do gets which involves lot of blockSeek code randomReads = 10 threads, multiGets = 100, filterAll = true withoutpatch {code} 2015-07-03 17:13:37,172 INFO [main] hbase.PerformanceEvaluation: [RandomReadTest] Summary of timings (ms): [158292, 158191, 157503, 158375, 158182, 158901, 157680, 158583, 158388, 158279] 2015-07-03 17:13:37,173 INFO [main] hbase.PerformanceEvaluation: [RandomReadTest] Min: 157503ms Max: 158901ms Avg: 158237ms 2015-07-03 17:33:36,854 INFO [main] hbase.PerformanceEvaluation: [RandomReadTest] Summary of timings (ms): [163757, 163478, 163175, 163876, 163490, 163187, 163813, 163937, 162839, 163398] 2015-07-03 17:33:36,855 INFO [main] hbase.PerformanceEvaluation: [RandomReadTest] Min: 162839ms Max: 163937ms Avg: 163495ms 2015-07-03 17:36:44,255 INFO [main] hbase.PerformanceEvaluation: [RandomReadTest] Summary of timings (ms): [161414, 162715, 162266, 162409, 161884, 162161, 162235, 162077, 162852, 161575] 2015-07-03 17:36:44,256 INFO [main] hbase.PerformanceEvaluation: [RandomReadTest] Min: 161414ms Max: 162852ms Avg: 162158ms {code} with patch {code} 2015-07-03 17:26:02,496 INFO [main] hbase.PerformanceEvaluation: [RandomReadTest] Summary of timings (ms): [150311, 150721, 150350, 149602, 150339, 150593, 151117, 149046, 150403, 150087] 2015-07-03 17:26:02,497 INFO [main] hbase.PerformanceEvaluation: [RandomReadTest] Min: 149046ms Max: 151117ms Avg: 150256ms 2015-07-03 17:43:52,968 INFO [main] hbase.PerformanceEvaluation: [RandomReadTest] Summary of timings (ms): [148812, 148077, 148482, 148160, 148479, 147835, 148461, 147683, 148499, 149299] 2015-07-03 17:43:52,970 INFO [main] hbase.PerformanceEvaluation: [RandomReadTest] Min: 147683ms Max: 149299ms Avg: 148378ms 2015-07-03 17:46:30,247 INFO [main] hbase.PerformanceEvaluation: [RandomReadTest] Summary of timings (ms): [149254, 148839, 148658, 148982, 148443, 147960, 148590, 148508, 148855, 148767] 2015-07-03 17:46:30,248 INFO [main] hbase.PerformanceEvaluation: [RandomReadTest] Min: 147960ms Max: 149254ms Avg: 148685ms {code} We see significant gain. Currently note that we are copying the offheap buckets to the onheap single BB. The blockSeek positional based reads and some optimization in the KeyOnlyKV is the reason for this performance gain. Even without MBB we should be able to see the same gain because now the MBB is going to act as a wrapper over the single BB that was formed from the underlying offheap buckets. HFileBlock backed by Array of ByteBuffers - Key: HBASE-12213 URL: https://issues.apache.org/jira/browse/HBASE-12213 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: ramkrishna.s.vasudevan Attachments: HBASE-12213_1.patch In L2 cache (offheap) an HFile block might have been cached into multiple chunks of buffers. If HFileBlock need single BB, we will end up in recreation of bigger BB and copying. Instead we can make HFileBlock to serve data from an array of BBs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12213) HFileBlock backed by Array of ByteBuffers
[ https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612902#comment-14612902 ] ramkrishna.s.vasudevan commented on HBASE-12213: We get around 9 to 10 % improvement. HFileBlock backed by Array of ByteBuffers - Key: HBASE-12213 URL: https://issues.apache.org/jira/browse/HBASE-12213 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: ramkrishna.s.vasudevan Attachments: HBASE-12213_1.patch In L2 cache (offheap) an HFile block might have been cached into multiple chunks of buffers. If HFileBlock need single BB, we will end up in recreation of bigger BB and copying. Instead we can make HFileBlock to serve data from an array of BBs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12213) HFileBlock backed by Array of ByteBuffers
[ https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-12213: --- Status: Open (was: Patch Available) HFileBlock backed by Array of ByteBuffers - Key: HBASE-12213 URL: https://issues.apache.org/jira/browse/HBASE-12213 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: ramkrishna.s.vasudevan Attachments: HBASE-12213_1.patch In L2 cache (offheap) an HFile block might have been cached into multiple chunks of buffers. If HFileBlock need single BB, we will end up in recreation of bigger BB and copying. Instead we can make HFileBlock to serve data from an array of BBs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12596) bulkload needs to follow locality
[ https://issues.apache.org/jira/browse/HBASE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612875#comment-14612875 ] Ashish Singhi commented on HBASE-12596: --- Skimmed the patch. Can we add some message. {code} LOG.warn(, e); {code} Can we add isDebugEnabled and isTraceEnabled for debug and trace level log messages. Can we add a log inside this if check saying that locality is enabled. {code} if (conf.getBoolean(LOCALITY_SENSITIVE_CONF_KEY, DEFAULT_LOCALITY_SENSITIVE)) {code} bulkload needs to follow locality - Key: HBASE-12596 URL: https://issues.apache.org/jira/browse/HBASE-12596 Project: HBase Issue Type: Improvement Components: HFile, regionserver Affects Versions: 0.98.8 Environment: hadoop-2.3.0, hbase-0.98.8, jdk1.7 Reporter: Victor Xu Assignee: Victor Xu Fix For: 0.98.14 Attachments: HBASE-12596-0.98-v1.patch, HBASE-12596-0.98-v2.patch, HBASE-12596-master-v1.patch, HBASE-12596-master-v2.patch, HBASE-12596.patch Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles to be loaded; 2. Move these HFiles to the right hdfs directory. However, the locality could be loss during the first step. Why not just write the HFiles directly into the right place? We can do this easily because StoreFile.WriterBuilder has the withFavoredNodes method, and we just need to call it in HFileOutputFormat's getNewWriter(). This feature is disabled by default, and we could use 'hbase.bulkload.locality.sensitive.enabled=true' to enable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)