[jira] [Created] (HBASE-10077) Per family WAL encryption
Andrew Purtell created HBASE-10077: -- Summary: Per family WAL encryption Key: HBASE-10077 URL: https://issues.apache.org/jira/browse/HBASE-10077 Project: HBase Issue Type: Improvement Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.1 HBASE-7544 introduces WAL encryption to prevent the leakage of protected data to disk by way of WAL files. However it is currently enabled globally for the regionserver. Encryption of WAL entries should depend on whether or not an entry in the WAL is to be stored within an encrypted column family. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9883) Support Tags in TColumnValue in Thrift
[ https://issues.apache.org/jira/browse/HBASE-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838711#comment-13838711 ] Anoop Sam John commented on HBASE-9883: --- Is this fixed already as part of HBASE-9884 Ram? Support Tags in TColumnValue in Thrift -- Key: HBASE-9883 URL: https://issues.apache.org/jira/browse/HBASE-9883 Project: HBase Issue Type: Improvement Components: Thrift Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.98.1 Suggested by Anoop, to handle this seperately raised this JIRA. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9883) Support Tags in TColumnValue in Thrift
[ https://issues.apache.org/jira/browse/HBASE-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838730#comment-13838730 ] ramkrishna.s.vasudevan commented on HBASE-9883: --- No. It has to be fixed as part of this JIRA. Support Tags in TColumnValue in Thrift -- Key: HBASE-9883 URL: https://issues.apache.org/jira/browse/HBASE-9883 Project: HBase Issue Type: Improvement Components: Thrift Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.98.1 Suggested by Anoop, to handle this seperately raised this JIRA. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10031) Add a section on the transparent CF encryption feature to the manual
[ https://issues.apache.org/jira/browse/HBASE-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10031: --- Attachment: 10031.patch Add a section on the transparent CF encryption feature to the manual Key: HBASE-10031 URL: https://issues.apache.org/jira/browse/HBASE-10031 Project: HBase Issue Type: Task Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Blocker Fix For: 0.98.0 Attachments: 10031.patch Document HBASE-7544 in detail in the manual. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HBASE-10031) Add a section on the transparent CF encryption feature to the manual
[ https://issues.apache.org/jira/browse/HBASE-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-10031. Resolution: Fixed Attached what I committed to trunk. Documentation updates have been committed using both RTC and CTR. Opting for CTR for expediency. Add a section on the transparent CF encryption feature to the manual Key: HBASE-10031 URL: https://issues.apache.org/jira/browse/HBASE-10031 Project: HBase Issue Type: Task Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Blocker Fix For: 0.98.0 Attachments: 10031.patch Document HBASE-7544 in detail in the manual. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (HBASE-10031) Add a section on the transparent CF encryption feature to the manual
[ https://issues.apache.org/jira/browse/HBASE-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838759#comment-13838759 ] Andrew Purtell edited comment on HBASE-10031 at 12/4/13 9:17 AM: - Attached what I committed to trunk. Documentation updates have been committed using both RTC and CTR. Opting for CTR for expediency. Edit: I ran mvn pre-site and eyeballed the resulting HTML output. was (Author: apurtell): Attached what I committed to trunk. Documentation updates have been committed using both RTC and CTR. Opting for CTR for expediency. Add a section on the transparent CF encryption feature to the manual Key: HBASE-10031 URL: https://issues.apache.org/jira/browse/HBASE-10031 Project: HBase Issue Type: Task Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Blocker Fix For: 0.98.0 Attachments: 10031.patch Document HBASE-7544 in detail in the manual. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9682) Bulk loading fails after ClassLoader is updated on OSGi client
[ https://issues.apache.org/jira/browse/HBASE-9682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Sela updated HBASE-9682: - Affects Version/s: 0.94.12 Bulk loading fails after ClassLoader is updated on OSGi client -- Key: HBASE-9682 URL: https://issues.apache.org/jira/browse/HBASE-9682 Project: HBase Issue Type: Bug Components: Client, HFile, io Affects Versions: 0.94.2, 0.94.12 Reporter: Amit Sela In an OSGi environment (felix) client, running with a bundled HBase (used bnd tool), after CL is updated - for instance when updating the client bundle without updating the HBase bundle, Algorithm class in HBase is a static enum and therefore it's configuration member still holds on to the old CL. This causes NPE when trying to bulk load using LoadIncrementalHFile. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9682) Bulk loading fails after ClassLoader is updated on OSGi client
[ https://issues.apache.org/jira/browse/HBASE-9682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Sela updated HBASE-9682: - Attachment: HBASE-9682.patch Patch fixing the issue for HBase 0.94.12 I removed the Configuration member and added a createConfiguration method that creates a new Configuration (with the hadoop.native.lib boolean true) per demand. This way, there are no cached configuration with stale CL after bundle update (in OSGi client environment) Bulk loading fails after ClassLoader is updated on OSGi client -- Key: HBASE-9682 URL: https://issues.apache.org/jira/browse/HBASE-9682 Project: HBase Issue Type: Bug Components: Client, HFile, io Affects Versions: 0.94.2, 0.94.12 Reporter: Amit Sela Attachments: HBASE-9682.patch In an OSGi environment (felix) client, running with a bundled HBase (used bnd tool), after CL is updated - for instance when updating the client bundle without updating the HBase bundle, Algorithm class in HBase is a static enum and therefore it's configuration member still holds on to the old CL. This causes NPE when trying to bulk load using LoadIncrementalHFile. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5273) Provide a coprocessor template for fast development and testing
[ https://issues.apache.org/jira/browse/HBASE-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838799#comment-13838799 ] takeshi.miao commented on HBASE-5273: - I think this ticket could be closed due to the related example codes already in trunk {code} find -name example ./hbase-examples/src/main/java/org/apache/hadoop/hbase/coprocessor/example ./hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example ./hbase-server/src/main/java/org/apache/hadoop/hbase/backup/example ./hbase-server/src/test/java/org/apache/hadoop/hbase/backup/example {code} Provide a coprocessor template for fast development and testing --- Key: HBASE-5273 URL: https://issues.apache.org/jira/browse/HBASE-5273 Project: HBase Issue Type: Improvement Components: Coprocessors Affects Versions: 0.92.0 Reporter: Mingjie Lai Priority: Minor While reworking on the coprocessor blog, I start to realize that we should have a template of coprocessor that helps users to quickly start to develop, test a customized coprocessors. Currently there are some built-in coprocessors but all over the code base, and a user has to search around the code to see how to develop a new one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-7057) Store Server Load in a Table
[ https://issues.apache.org/jira/browse/HBASE-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838803#comment-13838803 ] Elliott Clark commented on HBASE-7057: -- Nope sounds great thanks [~apurtell] Store Server Load in a Table Key: HBASE-7057 URL: https://issues.apache.org/jira/browse/HBASE-7057 Project: HBase Issue Type: Improvement Components: metrics, UI Affects Versions: 0.95.2 Reporter: Elliott Clark Priority: Critical Labels: noob Currently the server hart beat sends over server load and region loads. This is used to display and store metrics about a region server. It is also used to remember the sequence id of flushes. This should be moved into an HBase table. * Allow the last sequence id to persist over a master restart. * That would allow the balancer to have a more complete picture of what's happened in the past. * Allow tools to be created to monitor hbase using hbase. * Simplify/remove the heartbeat. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9986) Incorporate HTTPS support for HBase (0.94 port)
[ https://issues.apache.org/jira/browse/HBASE-9986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838840#comment-13838840 ] Hudson commented on HBASE-9986: --- SUCCESS: Integrated in HBase-0.94-security #351 (See [https://builds.apache.org/job/HBase-0.94-security/351/]) HBASE-9986 Incorporate HTTPS support for HBase (0.94 port) (Aditya Kishore) (larsh: rev 1547706) * /hbase/branches/0.94/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/BackupMasterStatusTmpl.jamon * /hbase/branches/0.94/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/MasterStatusTmpl.jamon * /hbase/branches/0.94/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RSStatusTmpl.jamon * /hbase/branches/0.94/src/main/resources/hbase-webapps/master/table.jsp Incorporate HTTPS support for HBase (0.94 port) --- Key: HBASE-9986 URL: https://issues.apache.org/jira/browse/HBASE-9986 Project: HBase Issue Type: Task Affects Versions: 0.94.13 Reporter: Aditya Kishore Assignee: Aditya Kishore Fix For: 0.94.15 Attachments: HBASE-9954_0.94.patch In various classes, http://; is hard coded. This JIRA adds support for using HBASE web UI via HTTPS -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9261) Add cp hooks after {start|close}RegionOperation in batchMutate
[ https://issues.apache.org/jira/browse/HBASE-9261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838843#comment-13838843 ] ramkrishna.s.vasudevan commented on HBASE-9261: --- In postCompleteBatchMutate() can we pass a flag to indicate whether it is getting called out of success or failure in the finally block? Add cp hooks after {start|close}RegionOperation in batchMutate -- Key: HBASE-9261 URL: https://issues.apache.org/jira/browse/HBASE-9261 Project: HBase Issue Type: Sub-task Reporter: rajeshbabu Assignee: rajeshbabu Attachments: HBASE-9261.patch, HBASE-9261_v2.patch, HBASE-9261_v3.patch, HBASE-9261_v4.patch These hooks helps for checking Resources(blocking memstore size) and necessary locking on index region while performing batch of mutations. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10078) Dynamic Filter - Not using DynamicClassLoader whe using FilterList
Federico Gaule created HBASE-10078: -- Summary: Dynamic Filter - Not using DynamicClassLoader whe using FilterList Key: HBASE-10078 URL: https://issues.apache.org/jira/browse/HBASE-10078 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.13 Reporter: Federico Gaule Priority: Minor I've tried to use dynamic jar load (https://issues.apache.org/jira/browse/HBASE-1936) but seems to have an issue with FilterList. Here is some log from my app where i send a Get with a FilterList containing AFilter and other with BFilter. 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Class d.p.AFilter not found - using dynamical class loader 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class: d.p.AFilter 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Loading new jar files, if any 2013-12-02 13:55:42,677 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class again: d.p.AFilter 2013-12-02 13:55:43,004 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: Can't find class d.p.BFilter java.lang.ClassNotFoundException: d.p.BFilter at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) at org.apache.hadoop.hbase.io.HbaseObjectWritable.getClassByName(HbaseObjectWritable.java:792) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:679) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.filter.FilterList.readFields(FilterList.java:324) at org.apache.hadoop.hbase.client.Get.readFields(Get.java:405) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.client.Action.readFields(Action.java:101) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.client.MultiAction.readFields(MultiAction.java:116) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:126) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1311) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1226) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) AFilter is not found so it tries with DynamicClassLoader, but when it tries to load AFilter, it uses URLClassLoader and fails without checking out for dynamic jars. I think the issue is releated to FilterList#readFields public void readFields(final DataInput in) throws IOException { byte opByte = in.readByte(); operator = Operator.values()[opByte]; int size = in.readInt(); if (size 0) { filters = new ArrayListFilter(size); for (int i = 0; i size; i++) { Filter filter = (Filter)HbaseObjectWritable.readObject(in, conf); filters.add(filter); } } } HbaseObjectWritable#readObject uses a conf (created by calling HBaseConfiguration.create()) which i suppose doesn't include a DynamicClassLoader instance. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10078) Dynamic Filter - Not using DynamicClassLoader whe using FilterList
[ https://issues.apache.org/jira/browse/HBASE-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Federico Gaule updated HBASE-10078: --- Description: I've tried to use dynamic jar load (https://issues.apache.org/jira/browse/HBASE-1936) but seems to have an issue with FilterList. Here is some log from my app where i send a Get with a FilterList containing AFilter and other with BFilter. 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Class d.p.AFilter not found - using dynamical class loader 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class: d.p.AFilter 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Loading new jar files, if any 2013-12-02 13:55:42,677 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class again: d.p.AFilter 2013-12-02 13:55:43,004 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: Can't find class d.p.BFilter java.lang.ClassNotFoundException: d.p.BFilter at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) at org.apache.hadoop.hbase.io.HbaseObjectWritable.getClassByName(HbaseObjectWritable.java:792) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:679) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.filter.FilterList.readFields(FilterList.java:324) at org.apache.hadoop.hbase.client.Get.readFields(Get.java:405) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.client.Action.readFields(Action.java:101) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.client.MultiAction.readFields(MultiAction.java:116) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:126) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1311) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1226) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) AFilter is not found so it tries with DynamicClassLoader, but when it tries to load AFilter, it uses URLClassLoader and fails without checking out for dynamic jars. I think the issue is releated to FilterList#readFields {code:title=FilterList.java|borderStyle=solid} public void readFields(final DataInput in) throws IOException { byte opByte = in.readByte(); operator = Operator.values()[opByte]; int size = in.readInt(); if (size 0) { filters = new ArrayListFilter(size); for (int i = 0; i size; i++) { Filter filter = (Filter)HbaseObjectWritable.readObject(in, conf); filters.add(filter); } } } {code} HbaseObjectWritable#readObject uses a conf (created by calling HBaseConfiguration.create()) which i suppose doesn't include a DynamicClassLoader instance. was: I've tried to use dynamic jar load (https://issues.apache.org/jira/browse/HBASE-1936) but seems to have an issue with FilterList. Here is some log from my app where i send a Get with a FilterList containing AFilter and other with BFilter. 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Class d.p.AFilter not found - using dynamical class loader 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class: d.p.AFilter 2013-12-02 13:55:42,564 DEBUG
[jira] [Updated] (HBASE-10078) Dynamic Filter - Not using DynamicClassLoader whe using FilterList
[ https://issues.apache.org/jira/browse/HBASE-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Federico Gaule updated HBASE-10078: --- Description: I've tried to use dynamic jar load (https://issues.apache.org/jira/browse/HBASE-1936) but seems to have an issue with FilterList. Here is some log from my app where i send a Get with a FilterList containing AFilter and other with BFilter. 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Class d.p.AFilter not found - using dynamical class loader 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class: d.p.AFilter 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Loading new jar files, if any 2013-12-02 13:55:42,677 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class again: d.p.AFilter 2013-12-02 13:55:43,004 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: Can't find class d.p.BFilter java.lang.ClassNotFoundException: d.p.BFilter at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) at org.apache.hadoop.hbase.io.HbaseObjectWritable.getClassByName(HbaseObjectWritable.java:792) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:679) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.filter.FilterList.readFields(FilterList.java:324) at org.apache.hadoop.hbase.client.Get.readFields(Get.java:405) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.client.Action.readFields(Action.java:101) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.client.MultiAction.readFields(MultiAction.java:116) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:126) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1311) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1226) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) AFilter is not found so it tries with DynamicClassLoader, but when it tries to load AFilter, it uses URLClassLoader and fails without checking out for dynamic jars. I think the issue is releated to FilterList#readFields {code:title=FilterList.java|borderStyle=solid} public void readFields(final DataInput in) throws IOException { byte opByte = in.readByte(); operator = Operator.values()[opByte]; int size = in.readInt(); if (size 0) { filters = new ArrayListFilter(size); for (int i = 0; i size; i++) { Filter filter = (Filter)HbaseObjectWritable.readObject(in, conf); filters.add(filter); } } } {code} HbaseObjectWritable#readObject uses a conf (created by calling HBaseConfiguration.create()) which i suppose doesn't include a DynamicClassLoader instance. was: I've tried to use dynamic jar load (https://issues.apache.org/jira/browse/HBASE-1936) but seems to have an issue with FilterList. Here is some log from my app where i send a Get with a FilterList containing AFilter and other with BFilter. 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Class d.p.AFilter not found - using dynamical class loader 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class: d.p.AFilter 2013-12-02 13:55:42,564 DEBUG
[jira] [Updated] (HBASE-10078) Dynamic Filter - Not using DynamicClassLoader whe using FilterList
[ https://issues.apache.org/jira/browse/HBASE-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Federico Gaule updated HBASE-10078: --- Description: I've tried to use dynamic jar load (https://issues.apache.org/jira/browse/HBASE-1936) but seems to have an issue with FilterList. Here is some log from my app where i send a Get with a FilterList containing AFilter and other with BFilter. {noformat} 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Class d.p.AFilter not found - using dynamical class loader 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class: d.p.AFilter 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Loading new jar files, if any 2013-12-02 13:55:42,677 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class again: d.p.AFilter 2013-12-02 13:55:43,004 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: Can't find class d.p.BFilter java.lang.ClassNotFoundException: d.p.BFilter at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) at org.apache.hadoop.hbase.io.HbaseObjectWritable.getClassByName(HbaseObjectWritable.java:792) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:679) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.filter.FilterList.readFields(FilterList.java:324) at org.apache.hadoop.hbase.client.Get.readFields(Get.java:405) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.client.Action.readFields(Action.java:101) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.client.MultiAction.readFields(MultiAction.java:116) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:126) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1311) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1226) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {noformat} AFilter is not found so it tries with DynamicClassLoader, but when it tries to load AFilter, it uses URLClassLoader and fails without checking out for dynamic jars. I think the issue is releated to FilterList#readFields {code:title=FilterList.java|borderStyle=solid} public void readFields(final DataInput in) throws IOException { byte opByte = in.readByte(); operator = Operator.values()[opByte]; int size = in.readInt(); if (size 0) { filters = new ArrayListFilter(size); for (int i = 0; i size; i++) { Filter filter = (Filter)HbaseObjectWritable.readObject(in, conf); filters.add(filter); } } } {code} HbaseObjectWritable#readObject uses a conf (created by calling HBaseConfiguration.create()) which i suppose doesn't include a DynamicClassLoader instance. was: I've tried to use dynamic jar load (https://issues.apache.org/jira/browse/HBASE-1936) but seems to have an issue with FilterList. Here is some log from my app where i send a Get with a FilterList containing AFilter and other with BFilter. 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Class d.p.AFilter not found - using dynamical class loader 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class: d.p.AFilter 2013-12-02 13:55:42,564 DEBUG
[jira] [Updated] (HBASE-10078) Dynamic Filter - Not using DynamicClassLoader when using FilterList
[ https://issues.apache.org/jira/browse/HBASE-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Federico Gaule updated HBASE-10078: --- Summary: Dynamic Filter - Not using DynamicClassLoader when using FilterList (was: Dynamic Filter - Not using DynamicClassLoader whe using FilterList) Dynamic Filter - Not using DynamicClassLoader when using FilterList --- Key: HBASE-10078 URL: https://issues.apache.org/jira/browse/HBASE-10078 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.13 Reporter: Federico Gaule Priority: Minor I've tried to use dynamic jar load (https://issues.apache.org/jira/browse/HBASE-1936) but seems to have an issue with FilterList. Here is some log from my app where i send a Get with a FilterList containing AFilter and other with BFilter. {noformat} 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Class d.p.AFilter not found - using dynamical class loader 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class: d.p.AFilter 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Loading new jar files, if any 2013-12-02 13:55:42,677 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class again: d.p.AFilter 2013-12-02 13:55:43,004 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: Can't find class d.p.BFilter java.lang.ClassNotFoundException: d.p.BFilter at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) at org.apache.hadoop.hbase.io.HbaseObjectWritable.getClassByName(HbaseObjectWritable.java:792) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:679) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.filter.FilterList.readFields(FilterList.java:324) at org.apache.hadoop.hbase.client.Get.readFields(Get.java:405) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.client.Action.readFields(Action.java:101) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.client.MultiAction.readFields(MultiAction.java:116) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:126) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1311) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1226) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {noformat} AFilter is not found so it tries with DynamicClassLoader, but when it tries to load AFilter, it uses URLClassLoader and fails without checking out for dynamic jars. I think the issue is releated to FilterList#readFields {code:title=FilterList.java|borderStyle=solid} public void readFields(final DataInput in) throws IOException { byte opByte = in.readByte(); operator = Operator.values()[opByte]; int size = in.readInt(); if (size 0) { filters = new ArrayListFilter(size); for (int i = 0; i size; i++) { Filter filter = (Filter)HbaseObjectWritable.readObject(in, conf); filters.add(filter); } } } {code} HbaseObjectWritable#readObject uses a conf (created by calling HBaseConfiguration.create()) which
[jira] [Commented] (HBASE-9986) Incorporate HTTPS support for HBase (0.94 port)
[ https://issues.apache.org/jira/browse/HBASE-9986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838851#comment-13838851 ] Hudson commented on HBASE-9986: --- SUCCESS: Integrated in HBase-0.94 #1217 (See [https://builds.apache.org/job/HBase-0.94/1217/]) HBASE-9986 Incorporate HTTPS support for HBase (0.94 port) (Aditya Kishore) (larsh: rev 1547706) * /hbase/branches/0.94/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/BackupMasterStatusTmpl.jamon * /hbase/branches/0.94/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/MasterStatusTmpl.jamon * /hbase/branches/0.94/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RSStatusTmpl.jamon * /hbase/branches/0.94/src/main/resources/hbase-webapps/master/table.jsp Incorporate HTTPS support for HBase (0.94 port) --- Key: HBASE-9986 URL: https://issues.apache.org/jira/browse/HBASE-9986 Project: HBase Issue Type: Task Affects Versions: 0.94.13 Reporter: Aditya Kishore Assignee: Aditya Kishore Fix For: 0.94.15 Attachments: HBASE-9954_0.94.patch In various classes, http://; is hard coded. This JIRA adds support for using HBASE web UI via HTTPS -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HBASE-5273) Provide a coprocessor template for fast development and testing
[ https://issues.apache.org/jira/browse/HBASE-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-5273. -- Resolution: Won't Fix Resolving as won't fix as per suggestion above by [~takeshi.miao] Provide a coprocessor template for fast development and testing --- Key: HBASE-5273 URL: https://issues.apache.org/jira/browse/HBASE-5273 Project: HBase Issue Type: Improvement Components: Coprocessors Affects Versions: 0.92.0 Reporter: Mingjie Lai Priority: Minor While reworking on the coprocessor blog, I start to realize that we should have a template of coprocessor that helps users to quickly start to develop, test a customized coprocessors. Currently there are some built-in coprocessors but all over the code base, and a user has to search around the code to see how to develop a new one. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10031) Add a section on the transparent CF encryption feature to the manual
[ https://issues.apache.org/jira/browse/HBASE-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838889#comment-13838889 ] stack commented on HBASE-10031: --- The doc is great. Add a section on the transparent CF encryption feature to the manual Key: HBASE-10031 URL: https://issues.apache.org/jira/browse/HBASE-10031 Project: HBase Issue Type: Task Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Blocker Fix For: 0.98.0 Attachments: 10031.patch Document HBASE-7544 in detail in the manual. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9485) TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart
[ https://issues.apache.org/jira/browse/HBASE-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838927#comment-13838927 ] Ted Yu commented on HBASE-9485: --- Integrated to 0.96, 0.98 and trunk. Thanks for the reviews, Devaraj, Nick and Vinod. TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart -- Key: HBASE-9485 URL: https://issues.apache.org/jira/browse/HBASE-9485 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0, 0.96.2 Attachments: 9485-v2.txt HBase extends OutputCommitter which turns recovery off. Meaning all completed maps are lost on RM restart and job starts from scratch. FileOutputCommitter implements recovery so we should look at that to see what is potentially needed for recovery. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9485) TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart
[ https://issues.apache.org/jira/browse/HBASE-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9485: -- Fix Version/s: 0.96.2 0.98.0 TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart -- Key: HBASE-9485 URL: https://issues.apache.org/jira/browse/HBASE-9485 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0, 0.96.2 Attachments: 9485-v2.txt HBase extends OutputCommitter which turns recovery off. Meaning all completed maps are lost on RM restart and job starts from scratch. FileOutputCommitter implements recovery so we should look at that to see what is potentially needed for recovery. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9892: - Attachment: HBASE-9892-v5.txt First cut at a trunk patch (I just saw Liu Shaohui sleeping at his desk; obviously a man who is working too hard -- smile). Liu wants this patch so he can run two hbases on single node. Add info port to ServerName to support multi instances in a node Key: HBASE-9892 URL: https://issues.apache.org/jira/browse/HBASE-9892 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, HBASE-9892-0.94-v3.diff, HBASE-9892-0.94-v4.diff, HBASE-9892-0.94-v5.diff, HBASE-9892-v5.txt The full GC time of regionserver with big heap( 30G ) usually can not be controlled in 30s. At the same time, the servers with 64G memory are normal. So we try to deploy multi rs instances(2-3 ) in a single node and the heap of each rs is about 20G ~ 24G. Most of the things works fine, except the hbase web ui. The master get the RS info port from conf, which is suitable for this situation of multi rs instances in a node. So we add info port to ServerName. a. at the startup, rs report it's info port to Hmaster. b, For root region, rs write the servername with info port ro the zookeeper root-region-server node. c, For meta regions, rs write the servername with info port to root region d. For user regions, rs write the servername with info port to meta regions So hmaster and client can get info port from the servername. To test this feature, I change the rs num from 1 to 3 in standalone mode, so we can test it in standalone mode, I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows how Hoya handle this problem? PS: There are different formats for servername in zk node and meta table, i think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10079) Increments lost after flush
Jonathan Hsieh created HBASE-10079: -- Summary: Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839042#comment-13839042 ] Jonathan Hsieh commented on HBASE-10079: Here's a link to the test programs I used to pull out this bug. It needs to be polished and turned in to an IT test as well as a perf test probably in a separate issue. https://github.com/jmhsieh/hbase/tree/increval Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10080) Unnecessary call to locateRegion when creating an HTable instance
[ https://issues.apache.org/jira/browse/HBASE-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-10080: Attachment: 10080.v1.patch Unnecessary call to locateRegion when creating an HTable instance - Key: HBASE-10080 URL: https://issues.apache.org/jira/browse/HBASE-10080 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.98.0, 0.96.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Trivial Fix For: 0.96.2, 0.98.1 Attachments: 10080.v1.patch It's more or less in contradiction with the objective of having lightweight HTable objects and the data may be stale when we will use it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10080) Unnecessary call to locateRegion when creating an HTable instance
Nicolas Liochon created HBASE-10080: --- Summary: Unnecessary call to locateRegion when creating an HTable instance Key: HBASE-10080 URL: https://issues.apache.org/jira/browse/HBASE-10080 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.98.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Trivial Fix For: 0.96.2, 0.98.1 Attachments: 10080.v1.patch It's more or less in contradiction with the objective of having lightweight HTable objects and the data may be stale when we will use it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10080) Unnecessary call to locateRegion when creating an HTable instance
[ https://issues.apache.org/jira/browse/HBASE-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-10080: Status: Patch Available (was: Open) Unnecessary call to locateRegion when creating an HTable instance - Key: HBASE-10080 URL: https://issues.apache.org/jira/browse/HBASE-10080 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.98.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Trivial Fix For: 0.96.2, 0.98.1 Attachments: 10080.v1.patch It's more or less in contradiction with the objective of having lightweight HTable objects and the data may be stale when we will use it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10080) Unnecessary call to locateRegion when creating an HTable instance
[ https://issues.apache.org/jira/browse/HBASE-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839088#comment-13839088 ] Lars Hofhansl commented on HBASE-10080: --- This is verifying essentially that the table exists. It is light weight if: # the location is already cached, or # you end up accessing the first region of the table later anyway Unnecessary call to locateRegion when creating an HTable instance - Key: HBASE-10080 URL: https://issues.apache.org/jira/browse/HBASE-10080 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.98.0, 0.96.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Trivial Fix For: 0.96.2, 0.98.1 Attachments: 10080.v1.patch It's more or less in contradiction with the objective of having lightweight HTable objects and the data may be stale when we will use it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839085#comment-13839085 ] Jonathan Hsieh commented on HBASE-10079: In 0.96.0: * flush: Not able to reproduce data loss * with kill: Not able to reproduce data loss. had an overcount of 1. * with kill -9: Not able to reproduce data loss. had an overcount of 1. The overcount of 1 is likely a different bug that I think that I'll let slide. Likely the client thought it failed and retried, but it actually made it to the log and increments not being idempotent. So the bug is somewhere between 0.96.0 and 0.96.1rc1. Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839099#comment-13839099 ] Sergey Shelukhin commented on HBASE-10079: -- does the writer check for exceptions? can you try disabling nonces, to see if there could be collisions (although I would expect the client to receive exceptions in such cases) Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839102#comment-13839102 ] Sergey Shelukhin commented on HBASE-10079: -- hbase.regionserver.nonces.enabled is the server config setting. Although, during replay, the updates are never blocked if nonces collide. Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839103#comment-13839103 ] Jonathan Hsieh commented on HBASE-10079: Do you the increval rig with the github link in the first comment? No, that a was a quick and dirty program to duplicate a customer issue. I'm in the process of adding flushes to the TestAtomicOperation unit tests that [~lhofhansl] mentioned in the mailing list. I'll be able to bisect find the bug that way. Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839114#comment-13839114 ] Jonathan Hsieh commented on HBASE-10079: This was the issue that fixed the problem in 0.94 and 0.95 branches (at the time). It added at test to TestHRegion called testParallelIncrementWithMemStoreFlush that test the situtaion. Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839103#comment-13839103 ] Jonathan Hsieh edited comment on HBASE-10079 at 12/4/13 5:49 PM: - Does the increval rig with the github link in the first comment check for exceptions? No, it was a quick and dirty program to duplicate a customer issue. I'm in the process of adding flushes to the TestAtomicOperation unit tests that [~lhofhansl] mentioned in the mailing list. I'll be able to bisect find the bug that way. was (Author: jmhsieh): Do you the increval rig with the github link in the first comment? No, that a was a quick and dirty program to duplicate a customer issue. I'm in the process of adding flushes to the TestAtomicOperation unit tests that [~lhofhansl] mentioned in the mailing list. I'll be able to bisect find the bug that way. Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-7091) support custom GC options in hbase-env.sh
[ https://issues.apache.org/jira/browse/HBASE-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839122#comment-13839122 ] Nicolas Liochon commented on HBASE-7091: I understand. Jira created :-). support custom GC options in hbase-env.sh - Key: HBASE-7091 URL: https://issues.apache.org/jira/browse/HBASE-7091 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.94.4 Reporter: Jesse Yates Assignee: Jesse Yates Labels: newbie Fix For: 0.94.4, 0.95.0 Attachments: hbase-7091-v1.patch When running things like bin/start-hbase and bin/hbase-daemon.sh start [master|regionserver|etc] we end up setting HBASE_OPTS property a couple times via calling hbase-env.sh. This is generally not a problem for most cases, but when you want to set your own GC log properties, one would think you should set HBASE_GC_OPTS, which get added to HBASE_OPTS. NOPE! That would make too much sense. Running bin/hbase-daemons.sh will run bin/hbase-daemon.sh with the daemons it needs to start. Each time through hbase-daemon.sh we also call bin/hbase. This isn't a big deal except for each call to hbase-daemon.sh, we also source hbase-env.sh twice (once in the script and once in bin/hbase). This is important for my next point. Note that to turn on GC logging, you uncomment: {code} # export HBASE_OPTS=$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps $HBASE_GC_OPTS {code} and then to log to a gc file for each server, you then uncomment: {code} # export HBASE_USE_GC_LOGFILE=true {code} in hbase-env.sh On the first pass through hbase-daemon.sh, HBASE_GC_OPTS isn't set, so HBASE_OPTS doesn't get anything funky, but we set HBASE_USE_GC_LOGFILE, which then sets HBASE_GC_OPTS to the log file (-Xloggc:...). Then in bin/hbase we again run hbase-env.sh, which now hs HBASE_GC_OPTS set, adding the GC file. This isn't a general problem because HBASE_OPTS is set without prefixing the existing HBASE_OPTS (eg. HBASE_OPTS=$HBASE_OPTS ...), allowing easy updating. However, GC OPTS don't work the same and this is really odd behavior when you want to set your own GC opts, which can include turning on GC log rolling (yes, yes, they really are jvm opts, but they ought to support their own param, to help minimize clutter). The simple version of this patch will just add an idempotent GC option to hbase-env.sh and some comments that uncommenting {code} # export HBASE_USE_GC_LOGFILE=true {code} will lead to a custom gc log file per server (along with an example name), so you don't need to set -Xloggc. The more complex solution does the above and also solves the multiple calls to hbase-env.sh so we can be sane about how all this works. Note that to fix this, hbase-daemon.sh just needs to read in HBASE_USE_GC_LOGFILE after sourcing hbase-env.sh and then update HBASE_OPTS. Oh and also not source hbase-env.sh in bin/hbase. Even further, we might want to consider adding options just for cases where we don't need gc logging - i.e. the shell, the config reading tool, hcbk, etc. This is the hardest version to handle since the first couple will willy-nilly apply the gc options. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10081) Since HBASE-7091, HBASE_OPTS cannot be set on the command line
Nicolas Liochon created HBASE-10081: --- Summary: Since HBASE-7091, HBASE_OPTS cannot be set on the command line Key: HBASE-10081 URL: https://issues.apache.org/jira/browse/HBASE-10081 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.96.0, 0.98.0 Reporter: Nicolas Liochon Priority: Minor Discussed in HBASE-7091. It's not critical, but a little bit surprising, as the comments in bin/hbase doesn't say anything about this. If you create your own hbase-env then it's not an issue... -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10074) consolidate and improve capacity/sizing documentation
[ https://issues.apache.org/jira/browse/HBASE-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839128#comment-13839128 ] Sergey Shelukhin commented on HBASE-10074: -- [~ndimiduk] bq. Mind adding some JIRA references here? Actually, do you have particular JIRAs in mind? [~stack] thanks! consolidate and improve capacity/sizing documentation - Key: HBASE-10074 URL: https://issues.apache.org/jira/browse/HBASE-10074 Project: HBase Issue Type: Improvement Components: documentation Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10074.patch Region count description is in config section; region size description is in architecture sections; both of these have a lot of good technical details, but imho we could do better in terms of admin-centric advice. Currently, there's a nearly-empty capacity section; I'd like to rewrite it to consolidate capacity planning/sizing/region sizing information, and some basic configuration pertaining to it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839114#comment-13839114 ] Jonathan Hsieh edited comment on HBASE-10079 at 12/4/13 5:47 PM: - HBASE-6195 was the issue that fixed the problem in 0.94 and 0.95 branches (at the time). It added at test to TestHRegion called testParallelIncrementWithMemStoreFlush that test the situtaion. was (Author: jmhsieh): This was the issue that fixed the problem in 0.94 and 0.95 branches (at the time). It added at test to TestHRegion called testParallelIncrementWithMemStoreFlush that test the situtaion. Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839153#comment-13839153 ] Jonathan Hsieh commented on HBASE-10079: TestHRegion#testParallelIncrementWithMemStoreFlush passes on the 0.96 tip The test actually waits for all the increments to be done before flushing (instead of while other increments are happening) so my bet is that it doesn't actually test the race condition. Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9524) Multi row get does not return any results even if any one of the rows specified in the query is missing and improve exception handling
[ https://issues.apache.org/jira/browse/HBASE-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839186#comment-13839186 ] Vandana Ayyalasomayajula commented on HBASE-9524: - Both the issues found in the Hadoop QA run seem to be unrelated. https://builds.apache.org/job/PreCommit-HBASE-Build/8053//artifact/trunk/patchprocess/patchSiteOutput.txt https://builds.apache.org/job/PreCommit-HBASE-Build/8053//artifact/trunk/patchprocess/patchJavadocWarnings.txt [~ndimiduk] Can you please take a look at the latest patch when you have time ? Thanks! Multi row get does not return any results even if any one of the rows specified in the query is missing and improve exception handling -- Key: HBASE-9524 URL: https://issues.apache.org/jira/browse/HBASE-9524 Project: HBase Issue Type: Improvement Components: REST Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Priority: Minor Attachments: HBASE-9524_trunk.01.patch, hbase-9524_trunk.00.patch When a client tries to retrieve multiple rows using REST API, even if one of the specified rows does not exist, 404 is returned. The correct way should be to return the result for the found rows and ignore the non-existent ones. Also, in the current code base, only some exceptions are handled, if some exception like Access denied or no column found exception is throws by the APIs, 500 ( server not found) is returned to user. This is leaves the end user wondering what caused the rest command to fail. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10082) Describe 'table' output is all on one line, could use better formatting
Maxime C Dumas created HBASE-10082: -- Summary: Describe 'table' output is all on one line, could use better formatting Key: HBASE-10082 URL: https://issues.apache.org/jira/browse/HBASE-10082 Project: HBase Issue Type: Improvement Environment: 0.94.2-cdh4.2.1 Reporter: Maxime C Dumas If you describe 'table' from the HBase shell, you get an output like this for a very simple table: hbase(main):023:0 describe 'movie' DESCRIPTION ENABLED {NAME = 'movie', FAMILIES = [{NAME = 'info', DATA_BLOCK_ENCODING = 'NONE', B true LOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', VERSIONS = '3', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACH E = 'true'}, {NAME = 'media', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'N ONE', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERS IONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '6 5536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} 1 row(s) in 0.0250 seconds Not only everything is on one row, but also it seems to be limited in width (82 chars). I suggest we do a line return on each column family, or format it into a JSON (lint) format, or anything more readable! Thanks! -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10082) Describe 'table' output is all on one line, could use better formatting
[ https://issues.apache.org/jira/browse/HBASE-10082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxime C Dumas updated HBASE-10082: --- Priority: Minor (was: Major) Describe 'table' output is all on one line, could use better formatting --- Key: HBASE-10082 URL: https://issues.apache.org/jira/browse/HBASE-10082 Project: HBase Issue Type: Improvement Environment: 0.94.2-cdh4.2.1 Reporter: Maxime C Dumas Priority: Minor If you describe 'table' from the HBase shell, you get an output like this for a very simple table: hbase(main):023:0 describe 'movie' DESCRIPTION ENABLED {NAME = 'movie', FAMILIES = [{NAME = 'info', DATA_BLOCK_ENCODING = 'NONE', B true LOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', VERSIONS = '3', COMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACH E = 'true'}, {NAME = 'media', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'N ONE', REPLICATION_SCOPE = '0', VERSIONS = '1', COMPRESSION = 'NONE', MIN_VERS IONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '6 5536', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} 1 row(s) in 0.0250 seconds Not only everything is on one row, but also it seems to be limited in width (82 chars). I suggest we do a line return on each column family, or format it into a JSON (lint) format, or anything more readable! Thanks! -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10080) Unnecessary call to locateRegion when creating an HTable instance
[ https://issues.apache.org/jira/browse/HBASE-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839194#comment-13839194 ] Hadoop QA commented on HBASE-10080: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617015/10080.v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8055//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8055//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8055//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8055//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8055//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8055//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8055//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8055//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8055//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8055//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8055//console This message is automatically generated. Unnecessary call to locateRegion when creating an HTable instance - Key: HBASE-10080 URL: https://issues.apache.org/jira/browse/HBASE-10080 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.98.0, 0.96.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Trivial Fix For: 0.96.2, 0.98.1 Attachments: 10080.v1.patch It's more or less in contradiction with the objective of having lightweight HTable objects and the data may be stale when we will use it -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10074) consolidate and improve capacity/sizing documentation
[ https://issues.apache.org/jira/browse/HBASE-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-10074: - Attachment: HBASE-10074.01.patch incorporated feedback, some spelling fixes and rephrases. I'd assume +1 stands, will commit in the afternoon consolidate and improve capacity/sizing documentation - Key: HBASE-10074 URL: https://issues.apache.org/jira/browse/HBASE-10074 Project: HBase Issue Type: Improvement Components: documentation Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10074.01.patch, HBASE-10074.patch Region count description is in config section; region size description is in architecture sections; both of these have a lot of good technical details, but imho we could do better in terms of admin-centric advice. Currently, there's a nearly-empty capacity section; I'd like to rewrite it to consolidate capacity planning/sizing/region sizing information, and some basic configuration pertaining to it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9648) collection one expired storefile causes it to be replaced by another expired storefile
[ https://issues.apache.org/jira/browse/HBASE-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839201#comment-13839201 ] Sergey Shelukhin commented on HBASE-9648: - stumbled upon this jira (not bug :)) again... do you want to go with either patch collection one expired storefile causes it to be replaced by another expired storefile -- Key: HBASE-9648 URL: https://issues.apache.org/jira/browse/HBASE-9648 Project: HBase Issue Type: Bug Components: Compaction Reporter: Sergey Shelukhin Assignee: Jean-Marc Spaggiari Attachments: HBASE-9648-v0-0.94.patch, HBASE-9648-v0-trunk.patch, HBASE-9648-v1-trunk.patch, HBASE-9648.patch There's a shortcut in compaction selection that causes the selection of expired store files to quickly delete. However, there's also the code that ensures we write at least one file to preserve seqnum. This new empty file is expired, because it has no data, presumably. So it's collected again, etc. This affects 94, probably also 96. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10074) consolidate and improve capacity/sizing documentation
[ https://issues.apache.org/jira/browse/HBASE-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839204#comment-13839204 ] Sergey Shelukhin commented on HBASE-10074: -- [~stack] ok for 96? consolidate and improve capacity/sizing documentation - Key: HBASE-10074 URL: https://issues.apache.org/jira/browse/HBASE-10074 Project: HBase Issue Type: Improvement Components: documentation Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10074.01.patch, HBASE-10074.patch Region count description is in config section; region size description is in architecture sections; both of these have a lot of good technical details, but imho we could do better in terms of admin-centric advice. Currently, there's a nearly-empty capacity section; I'd like to rewrite it to consolidate capacity planning/sizing/region sizing information, and some basic configuration pertaining to it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HBASE-8929) IntegrationTestBigLinkedList reuses old data in some cases
[ https://issues.apache.org/jira/browse/HBASE-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HBASE-8929. - Resolution: Not A Problem IntegrationTestBigLinkedList reuses old data in some cases -- Key: HBASE-8929 URL: https://issues.apache.org/jira/browse/HBASE-8929 Project: HBase Issue Type: Bug Components: test Reporter: Sergey Shelukhin Priority: Minor When running the test repeatedly on the same cluster one can sometimes see unexpected reference count, where the number found is (in the observed case) a multiple of the number expected, so instead ok 2.5m nodes it finds 12.5m, for example. It looks like it's reading the data from the old run. Setup should delete that (not cleanup, as the data may be used for debugging after the test) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HBASE-8777) HBase client should determine the destination server after retry time
[ https://issues.apache.org/jira/browse/HBASE-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HBASE-8777. - Resolution: Won't Fix probably too late for 94 HBase client should determine the destination server after retry time - Key: HBASE-8777 URL: https://issues.apache.org/jira/browse/HBASE-8777 Project: HBase Issue Type: Improvement Components: Client Affects Versions: 0.94.9 Reporter: Sergey Shelukhin HBase currently determines which server to go to, then creates delayed callable with pre-determined server and goes there. For later 16-32-... second retries this approach is suboptimal, the cluster could have seen massive changes in the meantime, so retry might be completely useless. We should re-locate regions after the delay, at least for longer retries. Given how grouping is currently done it would be a bit of a refactoring. The effect of this is alleviated (to a degree) on trunk by server-based retries (if we fail going to the pre-delay server after delay and then determine the server has changed, we will go to the new server immediately, so we only lose the failed round-trip time); on 94, if the region is opened on some other server during the delay, we'd go to the old one, fail, then find out it's on different server, wait a bunch more time because it's a late-stage retry and THEN go to the new one, as far as I see. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839233#comment-13839233 ] Jonathan Hsieh commented on HBASE-10079: I tweaked the test and wasn't able to duplicate at the unit test level. I'm looking into reverting a few patches touching memstore/flush area and testing on the cluster (HBASE-9963 and HBASE-10014 seems like candidates) to see if they caused the problem. Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10040) Fix Potential Resource Leak in HRegion
[ https://issues.apache.org/jira/browse/HBASE-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839243#comment-13839243 ] Sergey Shelukhin commented on HBASE-10040: -- There is not good reason... probably added like that to allow user scanners to throw. It could be removed Fix Potential Resource Leak in HRegion -- Key: HBASE-10040 URL: https://issues.apache.org/jira/browse/HBASE-10040 Project: HBase Issue Type: Sub-task Affects Versions: 0.98.0, 0.96.0 Reporter: Elliott Clark Assignee: Elliott Clark -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10040) Fix Potential Resource Leak in HRegion
[ https://issues.apache.org/jira/browse/HBASE-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839244#comment-13839244 ] Sergey Shelukhin commented on HBASE-10040: -- Although techniclaly that ould be API change Fix Potential Resource Leak in HRegion -- Key: HBASE-10040 URL: https://issues.apache.org/jira/browse/HBASE-10040 Project: HBase Issue Type: Sub-task Affects Versions: 0.98.0, 0.96.0 Reporter: Elliott Clark Assignee: Elliott Clark -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-4811) Support reverse Scan
[ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839271#comment-13839271 ] Hudson commented on HBASE-4811: --- SUCCESS: Integrated in HBase-TRUNK #4711 (See [https://builds.apache.org/job/HBase-TRUNK/4711/]) HBASE-10072. Regenerate ClientProtos after HBASE-4811 (apurtell: rev 1547720) * /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java Support reverse Scan Key: HBASE-4811 URL: https://issues.apache.org/jira/browse/HBASE-4811 Project: HBase Issue Type: New Feature Components: Client Affects Versions: 0.20.6, 0.94.7 Reporter: John Carrino Assignee: chunhui shen Fix For: 0.98.0 Attachments: 4811-0.94-v22.txt, 4811-0.94-v23.txt, 4811-0.94-v3.txt, 4811-trunk-v10.txt, 4811-trunk-v29.patch, 4811-trunk-v5.patch, HBase-4811-0.94-v2.txt, HBase-4811-0.94.3modified.txt, hbase-4811-0.94 v21.patch, hbase-4811-0.94-v24.patch, hbase-4811-trunkv1.patch, hbase-4811-trunkv11.patch, hbase-4811-trunkv12.patch, hbase-4811-trunkv13.patch, hbase-4811-trunkv14.patch, hbase-4811-trunkv15.patch, hbase-4811-trunkv16.patch, hbase-4811-trunkv17.patch, hbase-4811-trunkv18.patch, hbase-4811-trunkv19.patch, hbase-4811-trunkv20.patch, hbase-4811-trunkv21.patch, hbase-4811-trunkv24.patch, hbase-4811-trunkv24.patch, hbase-4811-trunkv25.patch, hbase-4811-trunkv26.patch, hbase-4811-trunkv27.patch, hbase-4811-trunkv28.patch, hbase-4811-trunkv4.patch, hbase-4811-trunkv6.patch, hbase-4811-trunkv7.patch, hbase-4811-trunkv8.patch, hbase-4811-trunkv9.patch Reversed scan means scan the rows backward. And StartRow bigger than StopRow in a reversed scan. For example, for the following rows: aaa/c1:q1/value1 aaa/c1:q2/value2 bbb/c1:q1/value1 bbb/c1:q2/value2 ccc/c1:q1/value1 ccc/c1:q2/value2 ddd/c1:q1/value1 ddd/c1:q2/value2 eee/c1:q1/value1 eee/c1:q2/value2 you could do a reversed scan from 'ddd' to 'bbb'(exclude) like this: Scan scan = new Scan(); scan.setStartRow('ddd'); scan.setStopRow('bbb'); scan.setReversed(true); for(Result result:htable.getScanner(scan)){ System.out.println(result); } Aslo you could do the reversed scan with shell like this: hbase scan 'table',{REVERSED = true,STARTROW='ddd', STOPROW='bbb'} And the output is: ddd/c1:q1/value1 ddd/c1:q2/value2 ccc/c1:q1/value1 ccc/c1:q2/value2 All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending. Is there a fundamental reason that HBase only supports forward Scan? It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables. I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9485) TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart
[ https://issues.apache.org/jira/browse/HBASE-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839274#comment-13839274 ] Hudson commented on HBASE-9485: --- SUCCESS: Integrated in HBase-TRUNK #4711 (See [https://builds.apache.org/job/HBase-TRUNK/4711/]) HBASE-9485 TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart (tedyu: rev 1547803) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputCommitter.java TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart -- Key: HBASE-9485 URL: https://issues.apache.org/jira/browse/HBASE-9485 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0, 0.96.2 Attachments: 9485-v2.txt HBase extends OutputCommitter which turns recovery off. Meaning all completed maps are lost on RM restart and job starts from scratch. FileOutputCommitter implements recovery so we should look at that to see what is potentially needed for recovery. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10072) Regenerate ClientProtos after HBASE-4811
[ https://issues.apache.org/jira/browse/HBASE-10072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839273#comment-13839273 ] Hudson commented on HBASE-10072: SUCCESS: Integrated in HBase-TRUNK #4711 (See [https://builds.apache.org/job/HBase-TRUNK/4711/]) HBASE-10072. Regenerate ClientProtos after HBASE-4811 (apurtell: rev 1547720) * /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java Regenerate ClientProtos after HBASE-4811 Key: HBASE-10072 URL: https://issues.apache.org/jira/browse/HBASE-10072 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0 Attachments: 10072.patch While running 'mvn compile -Pcompile-protobuf' I noticed generated/ClientProtos.java changed. Looks like the message descriptor for Scan has changed, and its FieldAccessorTable. Attaching the diff. Difference in protoc version maybe? I'm using protoc 2.5.0. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10031) Add a section on the transparent CF encryption feature to the manual
[ https://issues.apache.org/jira/browse/HBASE-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839272#comment-13839272 ] Hudson commented on HBASE-10031: SUCCESS: Integrated in HBase-TRUNK #4711 (See [https://builds.apache.org/job/HBase-TRUNK/4711/]) HBASE-10031. Add a section on the transparent CF encryption feature to the manual (apurtell: rev 1547739) * /hbase/trunk/src/main/docbkx/security.xml Add a section on the transparent CF encryption feature to the manual Key: HBASE-10031 URL: https://issues.apache.org/jira/browse/HBASE-10031 Project: HBase Issue Type: Task Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Blocker Fix For: 0.98.0 Attachments: 10031.patch Document HBASE-7544 in detail in the manual. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839291#comment-13839291 ] Jonathan Hsieh commented on HBASE-10079: Seems like reverting either HBASE-9963 or HBASE-10014 gets rid of the jagged losses due to flushes. However when testing on the tip of 0.96 with the reverts I seem to be losing some threads as the initialize becuase of some sort of race. I'm going to try from the exact point where 0.96.1rc1 was cut to see if it is an a happy place any will investigate the htable initialization problem afterwards. Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9485) TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart
[ https://issues.apache.org/jira/browse/HBASE-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839304#comment-13839304 ] Hudson commented on HBASE-9485: --- SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #863 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/863/]) HBASE-9485 TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart (tedyu: rev 1547803) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputCommitter.java TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart -- Key: HBASE-9485 URL: https://issues.apache.org/jira/browse/HBASE-9485 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0, 0.96.2 Attachments: 9485-v2.txt HBase extends OutputCommitter which turns recovery off. Meaning all completed maps are lost on RM restart and job starts from scratch. FileOutputCommitter implements recovery so we should look at that to see what is potentially needed for recovery. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10031) Add a section on the transparent CF encryption feature to the manual
[ https://issues.apache.org/jira/browse/HBASE-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839302#comment-13839302 ] Hudson commented on HBASE-10031: SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #863 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/863/]) HBASE-10031. Add a section on the transparent CF encryption feature to the manual (apurtell: rev 1547739) * /hbase/trunk/src/main/docbkx/security.xml Add a section on the transparent CF encryption feature to the manual Key: HBASE-10031 URL: https://issues.apache.org/jira/browse/HBASE-10031 Project: HBase Issue Type: Task Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Blocker Fix For: 0.98.0 Attachments: 10031.patch Document HBASE-7544 in detail in the manual. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-4811) Support reverse Scan
[ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839301#comment-13839301 ] Hudson commented on HBASE-4811: --- SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #863 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/863/]) HBASE-10072. Regenerate ClientProtos after HBASE-4811 (apurtell: rev 1547720) * /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java Support reverse Scan Key: HBASE-4811 URL: https://issues.apache.org/jira/browse/HBASE-4811 Project: HBase Issue Type: New Feature Components: Client Affects Versions: 0.20.6, 0.94.7 Reporter: John Carrino Assignee: chunhui shen Fix For: 0.98.0 Attachments: 4811-0.94-v22.txt, 4811-0.94-v23.txt, 4811-0.94-v3.txt, 4811-trunk-v10.txt, 4811-trunk-v29.patch, 4811-trunk-v5.patch, HBase-4811-0.94-v2.txt, HBase-4811-0.94.3modified.txt, hbase-4811-0.94 v21.patch, hbase-4811-0.94-v24.patch, hbase-4811-trunkv1.patch, hbase-4811-trunkv11.patch, hbase-4811-trunkv12.patch, hbase-4811-trunkv13.patch, hbase-4811-trunkv14.patch, hbase-4811-trunkv15.patch, hbase-4811-trunkv16.patch, hbase-4811-trunkv17.patch, hbase-4811-trunkv18.patch, hbase-4811-trunkv19.patch, hbase-4811-trunkv20.patch, hbase-4811-trunkv21.patch, hbase-4811-trunkv24.patch, hbase-4811-trunkv24.patch, hbase-4811-trunkv25.patch, hbase-4811-trunkv26.patch, hbase-4811-trunkv27.patch, hbase-4811-trunkv28.patch, hbase-4811-trunkv4.patch, hbase-4811-trunkv6.patch, hbase-4811-trunkv7.patch, hbase-4811-trunkv8.patch, hbase-4811-trunkv9.patch Reversed scan means scan the rows backward. And StartRow bigger than StopRow in a reversed scan. For example, for the following rows: aaa/c1:q1/value1 aaa/c1:q2/value2 bbb/c1:q1/value1 bbb/c1:q2/value2 ccc/c1:q1/value1 ccc/c1:q2/value2 ddd/c1:q1/value1 ddd/c1:q2/value2 eee/c1:q1/value1 eee/c1:q2/value2 you could do a reversed scan from 'ddd' to 'bbb'(exclude) like this: Scan scan = new Scan(); scan.setStartRow('ddd'); scan.setStopRow('bbb'); scan.setReversed(true); for(Result result:htable.getScanner(scan)){ System.out.println(result); } Aslo you could do the reversed scan with shell like this: hbase scan 'table',{REVERSED = true,STARTROW='ddd', STOPROW='bbb'} And the output is: ddd/c1:q1/value1 ddd/c1:q2/value2 ccc/c1:q1/value1 ccc/c1:q2/value2 All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending. Is there a fundamental reason that HBase only supports forward Scan? It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables. I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10072) Regenerate ClientProtos after HBASE-4811
[ https://issues.apache.org/jira/browse/HBASE-10072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839303#comment-13839303 ] Hudson commented on HBASE-10072: SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #863 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/863/]) HBASE-10072. Regenerate ClientProtos after HBASE-4811 (apurtell: rev 1547720) * /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java Regenerate ClientProtos after HBASE-4811 Key: HBASE-10072 URL: https://issues.apache.org/jira/browse/HBASE-10072 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0 Attachments: 10072.patch While running 'mvn compile -Pcompile-protobuf' I noticed generated/ClientProtos.java changed. Looks like the message descriptor for Scan has changed, and its FieldAccessorTable. Attaching the diff. Difference in protoc version maybe? I'm using protoc 2.5.0. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9485) TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart
[ https://issues.apache.org/jira/browse/HBASE-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839313#comment-13839313 ] Lars Hofhansl commented on HBASE-9485: -- We can add this to 0.94 as well, no? If it is built against Hadoop 2.x it should just work. TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart -- Key: HBASE-9485 URL: https://issues.apache.org/jira/browse/HBASE-9485 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0, 0.96.2 Attachments: 9485-v2.txt HBase extends OutputCommitter which turns recovery off. Meaning all completed maps are lost on RM restart and job starts from scratch. FileOutputCommitter implements recovery so we should look at that to see what is potentially needed for recovery. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10074) consolidate and improve capacity/sizing documentation
[ https://issues.apache.org/jira/browse/HBASE-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839315#comment-13839315 ] Hadoop QA commented on HBASE-10074: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617043/HBASE-10074.01.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8056//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8056//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8056//console This message is automatically generated. consolidate and improve capacity/sizing documentation - Key: HBASE-10074 URL: https://issues.apache.org/jira/browse/HBASE-10074 Project: HBase Issue Type: Improvement Components: documentation Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10074.01.patch, HBASE-10074.patch Region count description is in config section; region size description is in architecture sections; both of these have a lot of good technical details, but imho we could do better in terms of admin-centric advice. Currently, there's a nearly-empty capacity section; I'd like to rewrite it to consolidate capacity planning/sizing/region sizing information, and some basic configuration pertaining to it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-4811) Support reverse Scan
[ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4811: - Attachment: 4811-0.94-v25.txt Here's the rebased 0.94 patch. Support reverse Scan Key: HBASE-4811 URL: https://issues.apache.org/jira/browse/HBASE-4811 Project: HBase Issue Type: New Feature Components: Client Affects Versions: 0.20.6, 0.94.7 Reporter: John Carrino Assignee: chunhui shen Fix For: 0.98.0 Attachments: 4811-0.94-v22.txt, 4811-0.94-v23.txt, 4811-0.94-v25.txt, 4811-0.94-v3.txt, 4811-trunk-v10.txt, 4811-trunk-v29.patch, 4811-trunk-v5.patch, HBase-4811-0.94-v2.txt, HBase-4811-0.94.3modified.txt, hbase-4811-0.94 v21.patch, hbase-4811-0.94-v24.patch, hbase-4811-trunkv1.patch, hbase-4811-trunkv11.patch, hbase-4811-trunkv12.patch, hbase-4811-trunkv13.patch, hbase-4811-trunkv14.patch, hbase-4811-trunkv15.patch, hbase-4811-trunkv16.patch, hbase-4811-trunkv17.patch, hbase-4811-trunkv18.patch, hbase-4811-trunkv19.patch, hbase-4811-trunkv20.patch, hbase-4811-trunkv21.patch, hbase-4811-trunkv24.patch, hbase-4811-trunkv24.patch, hbase-4811-trunkv25.patch, hbase-4811-trunkv26.patch, hbase-4811-trunkv27.patch, hbase-4811-trunkv28.patch, hbase-4811-trunkv4.patch, hbase-4811-trunkv6.patch, hbase-4811-trunkv7.patch, hbase-4811-trunkv8.patch, hbase-4811-trunkv9.patch Reversed scan means scan the rows backward. And StartRow bigger than StopRow in a reversed scan. For example, for the following rows: aaa/c1:q1/value1 aaa/c1:q2/value2 bbb/c1:q1/value1 bbb/c1:q2/value2 ccc/c1:q1/value1 ccc/c1:q2/value2 ddd/c1:q1/value1 ddd/c1:q2/value2 eee/c1:q1/value1 eee/c1:q2/value2 you could do a reversed scan from 'ddd' to 'bbb'(exclude) like this: Scan scan = new Scan(); scan.setStartRow('ddd'); scan.setStopRow('bbb'); scan.setReversed(true); for(Result result:htable.getScanner(scan)){ System.out.println(result); } Aslo you could do the reversed scan with shell like this: hbase scan 'table',{REVERSED = true,STARTROW='ddd', STOPROW='bbb'} And the output is: ddd/c1:q1/value1 ddd/c1:q2/value2 ccc/c1:q1/value1 ccc/c1:q2/value2 All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending. Is there a fundamental reason that HBase only supports forward Scan? It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables. I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839333#comment-13839333 ] Nicolas Liochon commented on HBASE-10079: - I guess the error is in HBASE-9963. It seems there is an issue in HStore#StoreFlusherImpl#prepare: there is no lock there. Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839334#comment-13839334 ] Jonathan Hsieh commented on HBASE-10079: Actually, the current tip of 0.96 (HBASE-9485) doesn't seem to have the flush problem but does have the htable initializaiton problem. Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-10079: Attachment: 10079.v1.patch Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Attachments: 10079.v1.patch Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839351#comment-13839351 ] Nicolas Liochon commented on HBASE-10079: - That's strange. We should lock, and we don't do it in trunk or 0.96 head... Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Attachments: 10079.v1.patch Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9485) TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart
[ https://issues.apache.org/jira/browse/HBASE-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9485: -- Fix Version/s: 0.94.15 Status: Open (was: Patch Available) TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart -- Key: HBASE-9485 URL: https://issues.apache.org/jira/browse/HBASE-9485 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0, 0.94.15, 0.96.2 Attachments: 9485-v2.txt HBase extends OutputCommitter which turns recovery off. Meaning all completed maps are lost on RM restart and job starts from scratch. FileOutputCommitter implements recovery so we should look at that to see what is potentially needed for recovery. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-10079: Status: Patch Available (was: Open) Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Attachments: 10079.v1.patch Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9485) TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart
[ https://issues.apache.org/jira/browse/HBASE-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-9485: -- Attachment: 9485-0.94.txt TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart -- Key: HBASE-9485 URL: https://issues.apache.org/jira/browse/HBASE-9485 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0, 0.94.15, 0.96.2 Attachments: 9485-0.94.txt, 9485-v2.txt HBase extends OutputCommitter which turns recovery off. Meaning all completed maps are lost on RM restart and job starts from scratch. FileOutputCommitter implements recovery so we should look at that to see what is potentially needed for recovery. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9485) TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart
[ https://issues.apache.org/jira/browse/HBASE-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839362#comment-13839362 ] Ted Yu commented on HBASE-9485: --- Integrated to 0.94 as well. TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart -- Key: HBASE-9485 URL: https://issues.apache.org/jira/browse/HBASE-9485 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0, 0.94.15, 0.96.2 Attachments: 9485-0.94.txt, 9485-v2.txt HBase extends OutputCommitter which turns recovery off. Meaning all completed maps are lost on RM restart and job starts from scratch. FileOutputCommitter implements recovery so we should look at that to see what is potentially needed for recovery. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839431#comment-13839431 ] Hadoop QA commented on HBASE-10079: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617058/10079.v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8057//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8057//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8057//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8057//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8057//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8057//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8057//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8057//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8057//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8057//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8057//console This message is automatically generated. Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Attachments: 10079.v1.patch Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-4163) Create Split Strategy for YCSB Benchmark
[ https://issues.apache.org/jira/browse/HBASE-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839489#comment-13839489 ] Luke Lu commented on HBASE-4163: Tried to figure this out for somebody today, here is a hbase shell one-liner to save some more people's time before the feature is implemented: {code} create 'usertable', 'family', {SPLITS = (1..200).map {|i| user#{1000+i*(-1000)/200}}, MAX_FILESIZE = 4*1024**3} {code} Create Split Strategy for YCSB Benchmark Key: HBASE-4163 URL: https://issues.apache.org/jira/browse/HBASE-4163 Project: HBase Issue Type: Improvement Components: util Affects Versions: 0.90.3, 0.92.0 Reporter: Nicolas Spiegelberg Assignee: Lars George Priority: Minor Labels: benchmark Talked with Lars about how we can make it easier for users to run the YCSB benchmarks against HBase get realistic results. Currently, HBase is optimized for the random/uniform read/write case, which is the YCSB load. The initial reason why we perform bad when users test against us is because they do not presplit regions have the split ratio really low. We need a one-line way for a user to create a table that is pre-split to 200 regions (or some decent number) by default disable splitting. Realistically, this is how a uniform load cluster should scale, so it's not a hack. This will also give us a good use case to point to for how users should pre-split regions. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839491#comment-13839491 ] stack commented on HBASE-10079: --- Patch is good. Nice work Jon. Makes sense this missing lock was exposed by hbase-9963. Pity we didn't catch it in tests previous. Any chance of a test? Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Attachments: 10079.v1.patch Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9931) Optional setBatch for CopyTable to copy large rows in batches
[ https://issues.apache.org/jira/browse/HBASE-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839493#comment-13839493 ] stack commented on HBASE-9931: -- +1 Thanks for adding to 0.96. Optional setBatch for CopyTable to copy large rows in batches - Key: HBASE-9931 URL: https://issues.apache.org/jira/browse/HBASE-9931 Project: HBase Issue Type: Improvement Components: mapreduce Reporter: Dave Latham Assignee: Nick Dimiduk Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: HBASE-9931.00.patch, HBASE-9931.01.patch We've had CopyTable jobs fail because a small number of rows are wide enough to not fit into memory. If we could specify the batch size for CopyTable scans that shoud be able to break those large rows up into multiple iterations to save the heap. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839495#comment-13839495 ] Sergey Shelukhin commented on HBASE-10079: -- +1 Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Attachments: 10079.v1.patch Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9485) TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart
[ https://issues.apache.org/jira/browse/HBASE-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839500#comment-13839500 ] Hudson commented on HBASE-9485: --- SUCCESS: Integrated in hbase-0.96 #213 (See [https://builds.apache.org/job/hbase-0.96/213/]) HBASE-9485 TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart (tedyu: rev 1547802) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputCommitter.java TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart -- Key: HBASE-9485 URL: https://issues.apache.org/jira/browse/HBASE-9485 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.0, 0.94.15, 0.96.2 Attachments: 9485-0.94.txt, 9485-v2.txt HBase extends OutputCommitter which turns recovery off. Meaning all completed maps are lost on RM restart and job starts from scratch. FileOutputCommitter implements recovery so we should look at that to see what is potentially needed for recovery. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839529#comment-13839529 ] stack commented on HBASE-8763: -- bq. 1) Memstore insert using long.max as the initial write number What will we do if two edits arrive with same coordinates? How will we distingush them if both have long.max during the time it takes to sync and converte long.max to a legit seqid? bq. Currently, we maintain an internal queue which might defer the read point bump up if transactions complete order is different than that of MVCC internal write queue. A reason to unify MVCC and WAL seqid (smile). bq. By doing above, it's possible to remove the logics maintaining writeQueue ... We need the writeQueue for performance reasons, right? We need to add edits in bulk under a lock and this lock is expensive to obtain (maybe I am missing something?) bq. ...so it means we can remove two locking and one queue loop in write code path. What are the two locks J? Otherwise, sounds great. Will look at patches... [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10083) [89-fb] Better error handling for the compound bloom filter
Liyin Tang created HBASE-10083: -- Summary: [89-fb] Better error handling for the compound bloom filter Key: HBASE-10083 URL: https://issues.apache.org/jira/browse/HBASE-10083 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Liyin Tang Assignee: Liyin Tang When RegionServer failed to load a bloom block from HDFS due to any timeout or other reasons, it threw out the exception and disable the entire bloom filter for this HFile. This behavior does not make too much sense, especially for the compound bloom filter. Instead of disabling the bloom filter for the entire file, it could just return a potentially false positive result (true) and keep the bloom filter available. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839554#comment-13839554 ] Jonathan Hsieh commented on HBASE-10079: Here's the dropped threads stack dump -- each one of these corresponds to a thread that didn't run. {code} Exception in thread Thread-58 java.lang.IllegalStateException: test was supposed to be in the cache at org.apache.hadoop.hbase.TableName.createTableNameIfNecessary(TableName.java:337) at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:385) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:165) at org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:39) at org.apache.hadoop.hbase.client.HTablePool.createHTable(HTablePool.java:271) at org.apache.hadoop.hbase.client.HTablePool.findOrCreateTable(HTablePool.java:201) at org.apache.hadoop.hbase.client.HTablePool.getTable(HTablePool.java:180) at IncrementBlaster$1.run(IncrementBlaster.java:131) {code} Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Attachments: 10079.v1.patch Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10084) [WINDOWS] bin\hbase.cmd should allow whitespaces in java.library.path and classpath
Enis Soztutar created HBASE-10084: - Summary: [WINDOWS] bin\hbase.cmd should allow whitespaces in java.library.path and classpath Key: HBASE-10084 URL: https://issues.apache.org/jira/browse/HBASE-10084 Project: HBase Issue Type: Bug Components: scripts Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.2 In case CLASSPATH or java.library.path from hadoop or HBASE_HOME contains directories with names containing whitespaces, the bin script splits out errors. We can fix the ws handling hopefully once and for all (or not) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839634#comment-13839634 ] Jonathan Hsieh commented on HBASE-10079: I'm having a hard time recreating the jagged counts. I tried reverting patches, and before and after the patch nkeywal provided. I think the flush problem was a red herring where I was biased by the customer problem I was recently working on. When I changed my tests to do 10 increments the pattern I saw really jumped out. Looking at the original numbers from this morning I see the same pattern present with the 25 increments. 80 threads, 25 increments == 3125 increments / thread. count = 246875 != 25 (flush) // one thread failed to start. count = 243750 != 25 (kill) // two threads failed to start. count = 246878 != 25 (kill -9) // one thread failed to start and we had 3 threads that sent increments that succeeded and retried but didn't get an ack because of kill -9). The last one through we off because it wasn't regular but I think the explanation I have makes sense. I'm looking into seeing if my test code is bad (is there TableName documentation I ignoredthat says that the race in the stacktrace is my fault) or if we need to add some synchronization to this createTableNameIfNecessary method. Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Attachments: 10079.v1.patch Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839634#comment-13839634 ] Jonathan Hsieh edited comment on HBASE-10079 at 12/5/13 1:05 AM: - I'm having a hard time recreating the jagged counts. I tried reverting patches, and before and after the patch nkeywal provided. I think the flush problem was a red herring where I was biased by the customer problem I was recently working on. When I changed my tests to do 10 increments the pattern I saw really jumped out. Looking at the original numbers from this morning I see the same pattern present with the 25 increments. 80 threads, 25 increments == 3125 increments / thread. count = 246875 != 25 (flush) // one thread failed to start. count = 243750 != 25 (kill) // two threads failed to start. count = 246878 != 25 (kill -9) // one thread failed to start and we had 3 threads that sent increments that succeeded and retried but didn't get an ack because of kill -9). The last one through me off because it wasn't regular but I think the explanation I have makes sense. I'm looking into seeing if my test code is bad (is there TableName documentation I ignoredthat says that the race in the stacktrace is my fault) or if we need to add some synchronization to this createTableNameIfNecessary method. was (Author: jmhsieh): I'm having a hard time recreating the jagged counts. I tried reverting patches, and before and after the patch nkeywal provided. I think the flush problem was a red herring where I was biased by the customer problem I was recently working on. When I changed my tests to do 10 increments the pattern I saw really jumped out. Looking at the original numbers from this morning I see the same pattern present with the 25 increments. 80 threads, 25 increments == 3125 increments / thread. count = 246875 != 25 (flush) // one thread failed to start. count = 243750 != 25 (kill) // two threads failed to start. count = 246878 != 25 (kill -9) // one thread failed to start and we had 3 threads that sent increments that succeeded and retried but didn't get an ack because of kill -9). The last one through we off because it wasn't regular but I think the explanation I have makes sense. I'm looking into seeing if my test code is bad (is there TableName documentation I ignoredthat says that the race in the stacktrace is my fault) or if we need to add some synchronization to this createTableNameIfNecessary method. Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Attachments: 10079.v1.patch Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839643#comment-13839643 ] Jonathan Hsieh commented on HBASE-10079: Hm.. HBASE-6580 deprecates HTablePool and happened when I wasn't paying attention. Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Attachments: 10079.v1.patch Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10085) Some regions aren't re-assigned after a mater restarts
Jeffrey Zhong created HBASE-10085: - Summary: Some regions aren't re-assigned after a mater restarts Key: HBASE-10085 URL: https://issues.apache.org/jira/browse/HBASE-10085 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.96.0 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.96.1 We see this issue happened in a cluster restart: 1) when shutdown a cluster, some regions are in offline state because no Region servers are available(stop RS and then Master) 2) When the cluster restarts, the offlined regions are forced to be offline again and SSH skip re-assigning them by function AM.processServerShutdown as shown below. {code} 2013-12-03 10:41:56,686 INFO [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE 2013-12-03 10:41:56,686 DEBUG [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on deadserver; forcing offline ... 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} ... 2013-12-03 10:41:57,223 WARN [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:6-3] master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10084) [WINDOWS] bin\hbase.cmd should allow whitespaces in java.library.path and classpath
[ https://issues.apache.org/jira/browse/HBASE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839645#comment-13839645 ] Jean-Marc Spaggiari commented on HBASE-10084: - I don't think this is specific to window$. Even under Linux I think we should allow whitespaces in the different paths. [WINDOWS] bin\hbase.cmd should allow whitespaces in java.library.path and classpath --- Key: HBASE-10084 URL: https://issues.apache.org/jira/browse/HBASE-10084 Project: HBase Issue Type: Bug Components: scripts Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.2 In case CLASSPATH or java.library.path from hadoop or HBASE_HOME contains directories with names containing whitespaces, the bin script splits out errors. We can fix the ws handling hopefully once and for all (or not) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10085) Some regions aren't re-assigned after a mater restarts
[ https://issues.apache.org/jira/browse/HBASE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839656#comment-13839656 ] Jimmy Xiang commented on HBASE-10085: - Do you see this issue in 0.96.0 or 0.96.1? Some regions aren't re-assigned after a mater restarts -- Key: HBASE-10085 URL: https://issues.apache.org/jira/browse/HBASE-10085 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.96.0 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.96.1 We see this issue happened in a cluster restart: 1) when shutdown a cluster, some regions are in offline state because no Region servers are available(stop RS and then Master) 2) When the cluster restarts, the offlined regions are forced to be offline again and SSH skip re-assigning them by function AM.processServerShutdown as shown below. {code} 2013-12-03 10:41:56,686 INFO [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE 2013-12-03 10:41:56,686 DEBUG [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on deadserver; forcing offline ... 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} ... 2013-12-03 10:41:57,223 WARN [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:6-3] master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839662#comment-13839662 ] stack commented on HBASE-8755: -- Here is more review on the patch. Make the changes suggested below and I'll +1 it. (Discussion off-line w/ Feng on this issue helped me better understand this patch and put to rest any notion that there is an easier 'fix' than the one proposed here. That said. There is much room for improvement but this can be done in a follow-on) Remove these asserts rather than comment them out given they depended on a facility this patch removes. Leaving them in will only make the next reader of the code -- very likely lacking the context you have -- feel uneasy thinking someone removed asserts just to get tests to pass. 8 -assertTrue(Should have an outstanding WAL edit, ((FSHLog) log).hasDeferredEntries()); 9 +//assertTrue(Should have an outstanding WAL edit, ((FSHLog) log).hasDeferredEntries()); On the below... +import java.util.Random; ... using a Random for choosing an arbitrary thread for a list of 4 is heavyweight. Can you not take last digit of timestamp or nano timestamp or some attribute of the edit instead? Something more lightweight? Please remove all mentions of AsyncFlush since it no longer exists: // all writes pending on AsyncWrite/AsyncFlush thread with Leaving it in will confuse readers when they can't find any such thread class. Is this comment right? // txid = failedTxid will fail by throwing asyncIOE Should it be = failedTxid? This should be volatile since it is set by AsyncSync and then used by the main FSHLog thread (you have an assert to check it not null -- maybe you ran into an issue here already?): + private IOException asyncIOE = null; bq. + private final Object bufferLock = new Object(); 'bufferLock' if a very generic name. Could it be more descriptive? It is a lock held for a short while while AsyncWriter moves queued edits off the globally seen queue to a local queue just before we send the edits to the WAL. You add a method named getPendingWrites that requires this lock be held. Could we tie the method and the lock together better? Name it pendingWritesLock? (The name of the list to hold the pending writes is pendingWrites). bq. ...because the HDFS write-method is pretty heavyweight as far as locking is concerned. I think the heavyweight referred to in the above is hbase locking, not hdfs locking as the comment would imply. If you agree (you know this code better than I), please adjust the comment. Comments on what these threads do will help the next code reader. AsyncWriter does adding of edits to HDFS. AsyncSyncer needs a comment because it is oxymoronic (though it makes sense in this context). In particular, a comment would draw out why we need so many instances of a syncer thread because everyone's first thought here is going to be why do we need this? Ditto on the AsyncNotifier. In the reviews above, folks have asked why we need this thread at all and a code reader will likely think similar on a first pass. Bottom-line, your patch raised questions from reviewers; it would be cool if the questions were answered in code comments where possible so the questions do not come up again. 4 + private final AsyncWriter asyncWriter; 5 + private final AsyncSyncer[] asyncSyncers = new AsyncSyncer[5]; 6 + private final AsyncNotifier asyncNotifier; You remove the LogSyncer facility in this patch. That is good (need to note this in release notes). Your patch should remove the optional flush config from hbase-default.xml too since it no longer is relevant. 3 -this.optionalFlushInterval = 4 - conf.getLong(hbase.regionserver.optionallogflushinterval, 1 * 1000); I see it here... hbase-common/src/main/resources/hbase-default.xml: namehbase.regionserver.optionallogflushinterval/name A small nit is you might look at other threads in hbase and see how they are named... 3 +asyncWriter = new AsyncWriter(AsyncHLogWriter); Ditto here: + asyncSyncers[i] = new AsyncSyncer(AsyncHLogSyncer + i); Probably make the number of asyncsyncers a configuration (you don't have to put the option out in hbase-default.xml.. just make it so that if someone is reading the code and trips over this issue, they can change it by adding to hbase-site.xml w/o having to change code -- lets not reproduce the hard-coded '80' that is in the head of dfsclient we discussed yesterday -- smile). ... and here: asyncNotifier = new AsyncNotifier(AsyncHLogNotifier); Not important but check out how other threads are named in hbase. It might be good if these better align. Maybe make a method for shutting down all these thread or use the Threads#shutdown method in Threads.java? bq. LOG.error(Exception while waiting for AsyncNotifier threads to die, e); Do LOG.error(Exception
[jira] [Updated] (HBASE-10085) Some regions aren't re-assigned after a mater restarts
[ https://issues.apache.org/jira/browse/HBASE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-10085: -- Affects Version/s: (was: 0.96.0) 0.96.1 Some regions aren't re-assigned after a mater restarts -- Key: HBASE-10085 URL: https://issues.apache.org/jira/browse/HBASE-10085 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.96.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.96.1 We see this issue happened in a cluster restart: 1) when shutdown a cluster, some regions are in offline state because no Region servers are available(stop RS and then Master) 2) When the cluster restarts, the offlined regions are forced to be offline again and SSH skip re-assigning them by function AM.processServerShutdown as shown below. {code} 2013-12-03 10:41:56,686 INFO [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE 2013-12-03 10:41:56,686 DEBUG [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on deadserver; forcing offline ... 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} ... 2013-12-03 10:41:57,223 WARN [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:6-3] master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10085) Some regions aren't re-assigned after a mater restarts
[ https://issues.apache.org/jira/browse/HBASE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839667#comment-13839667 ] Sergey Shelukhin commented on HBASE-10085: -- 0.96.1, as of last Tuesday Some regions aren't re-assigned after a mater restarts -- Key: HBASE-10085 URL: https://issues.apache.org/jira/browse/HBASE-10085 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.96.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.96.1 We see this issue happened in a cluster restart: 1) when shutdown a cluster, some regions are in offline state because no Region servers are available(stop RS and then Master) 2) When the cluster restarts, the offlined regions are forced to be offline again and SSH skip re-assigning them by function AM.processServerShutdown as shown below. {code} 2013-12-03 10:41:56,686 INFO [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE 2013-12-03 10:41:56,686 DEBUG [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on deadserver; forcing offline ... 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} ... 2013-12-03 10:41:57,223 WARN [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:6-3] master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839677#comment-13839677 ] Jonathan Hsieh commented on HBASE-10079: Removed HTablePool code and still got a race. {code} Exception in thread Thread-1 java.lang.IllegalStateException: test was supposed to be in the cache at org.apache.hadoop.hbase.TableName.createTableNameIfNecessary(TableName.java:337) at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:412) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:150) at IncrementBlaster$1.run(IncrementBlaster.java:130) {code} This table cache is the root cause of the race. The testing program has n threads which waits until a rendezvous point before creating independent HTable instances with the same name. It is unreasonable for separate HTable constructors that just so happen to try to open the same table to race like this. Fix should be in the TableName cache. Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Attachments: 10079.v1.patch Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10017) HRegionPartitioner, rows directed to last partition are wrongly mapped.
[ https://issues.apache.org/jira/browse/HBASE-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839681#comment-13839681 ] Enis Soztutar commented on HBASE-10017: --- bq. I have reproduced data loss during bulk load. This happens under the same conditions as initial bug. 16 regions per table, I think it's not the only case. Again, partitioner wrongly maps last region data and resulting region HFile contains keys that shall not appear there. This partitioner is not intended to be used by bulk load. It is already there in the javadoc. TotalOrderPartioner should be used instead. If there are changes to regions, LoadIncrementalFiles checks the boundaries (although not sure whether it handles multiple splits to the same range or merges). Other than that, the changes seems ok. However, I think we should get the region boundaries at the start, and treat the range as immutable for the lifetime of the partitioner. Although the table regions might go underlying changes, we can at least guarantee a consistent mapping for key ranges. We can to a table.getStartKeys() and do a binary search for the key range considering the special region boundaries (empty start and stop rows). HRegionPartitioner, rows directed to last partition are wrongly mapped. --- Key: HBASE-10017 URL: https://issues.apache.org/jira/browse/HBASE-10017 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.94.6 Reporter: Roman Nikitchenko Priority: Critical Attachments: HBASE-10017-r1544633.patch, HBASE-10017-r1544633.patch, patchSiteOutput.txt Inside HRegionPartitioner class there is getPartition() method which should map first numPartitions regions to appropriate partitions 1:1. But based on condition last region is hashed which could lead to last reducer not having any data. This is considered serious issue. I reproduced this only starting from 16 regions per table. Original defect was found in 0.94.6 but at least today's trunk and 0.91 branch head have the same HRegionPartitioner code in this part which means the same issue. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10085) Some regions aren't re-assigned after a mater restarts
[ https://issues.apache.org/jira/browse/HBASE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-10085: -- Status: Patch Available (was: Open) Some regions aren't re-assigned after a mater restarts -- Key: HBASE-10085 URL: https://issues.apache.org/jira/browse/HBASE-10085 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.96.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.96.1 Attachments: hbase-10085.patch We see this issue happened in a cluster restart: 1) when shutdown a cluster, some regions are in offline state because no Region servers are available(stop RS and then Master) 2) When the cluster restarts, the offlined regions are forced to be offline again and SSH skip re-assigning them by function AM.processServerShutdown as shown below. {code} 2013-12-03 10:41:56,686 INFO [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE 2013-12-03 10:41:56,686 DEBUG [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on deadserver; forcing offline ... 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} ... 2013-12-03 10:41:57,223 WARN [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:6-3] master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10085) Some regions aren't re-assigned after a mater restarts
[ https://issues.apache.org/jira/browse/HBASE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-10085: -- Attachment: hbase-10085.patch Though we see this issue in the latest 0.96 code, it seems should happen in 0.96.0 code base from the code. Some regions aren't re-assigned after a mater restarts -- Key: HBASE-10085 URL: https://issues.apache.org/jira/browse/HBASE-10085 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.96.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.96.1 Attachments: hbase-10085.patch We see this issue happened in a cluster restart: 1) when shutdown a cluster, some regions are in offline state because no Region servers are available(stop RS and then Master) 2) When the cluster restarts, the offlined regions are forced to be offline again and SSH skip re-assigning them by function AM.processServerShutdown as shown below. {code} 2013-12-03 10:41:56,686 INFO [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE 2013-12-03 10:41:56,686 DEBUG [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on deadserver; forcing offline ... 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} ... 2013-12-03 10:41:57,223 WARN [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:6-3] master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10079) Increments lost after flush
[ https://issues.apache.org/jira/browse/HBASE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839689#comment-13839689 ] Jonathan Hsieh commented on HBASE-10079: [~nkeywal] HBASE-9976 introduces the TableName cache which is the root cause. Increments lost after flush Key: HBASE-10079 URL: https://issues.apache.org/jira/browse/HBASE-10079 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.1 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.96.1 Attachments: 10079.v1.patch Testing 0.96.1rc1. With one process incrementing a row in a table, we increment single col. We flush or do kills/kill-9 and data is lost. flush and kill are likely the same problem (kill would flush), kill -9 may or may not have the same root cause. 5 nodes hadoop 2.1.0 (a pre cdh5b1 hdfs). hbase 0.96.1 rc1 Test: 25 increments on a single row an single col with various number of client threads (IncrementBlaster). Verify we have a count of 25 after the run (IncrementVerifier). Run 1: No fault injection. 5 runs. count = 25. on multiple runs. Correctness verified. 1638 inc/s throughput. Run 2: flushes table with incrementing row. count = 246875 !=25. correctness failed. 1517 inc/s throughput. Run 3: kill of rs hosting incremented row. count = 243750 != 25. Correctness failed. 1451 inc/s throughput. Run 4: one kill -9 of rs hosting incremented row. 246878.!= 25. Correctness failed. 1395 inc/s (including recovery) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10085) Some regions aren't re-assigned after a mater restarts
[ https://issues.apache.org/jira/browse/HBASE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839700#comment-13839700 ] Jimmy Xiang commented on HBASE-10085: - In step 2, do you mean the whole cluster restarts (both master + rs)? Is it easy to add a unit test? Some regions aren't re-assigned after a mater restarts -- Key: HBASE-10085 URL: https://issues.apache.org/jira/browse/HBASE-10085 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.96.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.96.1 Attachments: hbase-10085.patch We see this issue happened in a cluster restart: 1) when shutdown a cluster, some regions are in offline state because no Region servers are available(stop RS and then Master) 2) When the cluster restarts, the offlined regions are forced to be offline again and SSH skip re-assigning them by function AM.processServerShutdown as shown below. {code} 2013-12-03 10:41:56,686 INFO [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE 2013-12-03 10:41:56,686 DEBUG [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on deadserver; forcing offline ... 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} ... 2013-12-03 10:41:57,223 WARN [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:6-3] master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10017) HRegionPartitioner, rows directed to last partition are wrongly mapped.
[ https://issues.apache.org/jira/browse/HBASE-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839703#comment-13839703 ] Nick Dimiduk commented on HBASE-10017: -- Multiple splits are handled through retrying. Splits are made and the halves rewritten as independent HFiles with each pass, so this should be okay. [~rn] I'm very concerned about the bulkload data loss issue, but I cannot reproduce it using our existing unit tests (TestHRegionServerBulkLoad). Are you able to demonstrate the loss in a test? As [~enis] said, TOP should be used for generating HFiles files. Bulkload itself isn't performed inside a mapreduce job, so I'm confused about how the HRegionPartitioner comes into play in this scenario. HRegionPartitioner, rows directed to last partition are wrongly mapped. --- Key: HBASE-10017 URL: https://issues.apache.org/jira/browse/HBASE-10017 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.94.6 Reporter: Roman Nikitchenko Priority: Critical Attachments: HBASE-10017-r1544633.patch, HBASE-10017-r1544633.patch, patchSiteOutput.txt Inside HRegionPartitioner class there is getPartition() method which should map first numPartitions regions to appropriate partitions 1:1. But based on condition last region is hashed which could lead to last reducer not having any data. This is considered serious issue. I reproduced this only starting from 16 regions per table. Original defect was found in 0.94.6 but at least today's trunk and 0.91 branch head have the same HRegionPartitioner code in this part which means the same issue. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10017) HRegionPartitioner, rows directed to last partition are wrongly mapped.
[ https://issues.apache.org/jira/browse/HBASE-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839702#comment-13839702 ] Enis Soztutar commented on HBASE-10017: --- bq. although not sure whether it handles multiple splits to the same range or merges Nick pointed out that we are actually splitting those files by re-writing those files. I thought that we were creating actual reference files. HRegionPartitioner, rows directed to last partition are wrongly mapped. --- Key: HBASE-10017 URL: https://issues.apache.org/jira/browse/HBASE-10017 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.94.6 Reporter: Roman Nikitchenko Priority: Critical Attachments: HBASE-10017-r1544633.patch, HBASE-10017-r1544633.patch, patchSiteOutput.txt Inside HRegionPartitioner class there is getPartition() method which should map first numPartitions regions to appropriate partitions 1:1. But based on condition last region is hashed which could lead to last reducer not having any data. This is considered serious issue. I reproduced this only starting from 16 regions per table. Original defect was found in 0.94.6 but at least today's trunk and 0.91 branch head have the same HRegionPartitioner code in this part which means the same issue. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10085) Some regions aren't re-assigned after a mater restarts
[ https://issues.apache.org/jira/browse/HBASE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839707#comment-13839707 ] Jeffrey Zhong commented on HBASE-10085: --- I mean whole cluster(master + rs). I'll add a unit test to cover this. Thanks. Some regions aren't re-assigned after a mater restarts -- Key: HBASE-10085 URL: https://issues.apache.org/jira/browse/HBASE-10085 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.96.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.96.1 Attachments: hbase-10085.patch We see this issue happened in a cluster restart: 1) when shutdown a cluster, some regions are in offline state because no Region servers are available(stop RS and then Master) 2) When the cluster restarts, the offlined regions are forced to be offline again and SSH skip re-assigning them by function AM.processServerShutdown as shown below. {code} 2013-12-03 10:41:56,686 INFO [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE 2013-12-03 10:41:56,686 DEBUG [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on deadserver; forcing offline ... 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} ... 2013-12-03 10:41:57,223 WARN [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:6-3] master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} {code} -- This message was sent by Atlassian JIRA (v6.1#6144)