[jira] [Commented] (HBASE-14822) Renewing leases of scanners doesn't work
[ https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043715#comment-15043715 ] Hadoop QA commented on HBASE-14822: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12775962/14822-v5.txt against master branch at commit 8bf70144e40650ef972f005e2465bd0e2a087c40. ATTACHMENT ID: 12775962 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not generate new checkstyle errors. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 zombies{color}. No zombie tests found running at the end of the build. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16778//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16778//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16778//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16778//console This message is automatically generated. > Renewing leases of scanners doesn't work > > > Key: HBASE-14822 > URL: https://issues.apache.org/jira/browse/HBASE-14822 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14 >Reporter: Samarth Jain >Assignee: Lars Hofhansl > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98.txt, > 14822-v3-0.98.txt, 14822-v4-0.98.txt, 14822-v4.txt, 14822-v5.txt, 14822.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043721#comment-15043721 ] Yu Li commented on HBASE-14004: --- Nice discussion [~yangzhe1991] and [~Apache9]. FWIW, two questions about the Phil's proposal: 1. What the logic would be like if the durability is set to ASYNC in table descriptor? Is the following case possible to happen?: 1) entry write into memstore 2) region reassign to other RS thus content in memstore got flushed into hfile 3) wal sync/write failed In this case we might run into another kind of inconsistency, say master cluster has the data but slave doesn't? 2. About {{WAL logging idempotent}}, maybe we also need to consider the cross-RS case when region assign happens before wal sync acked? > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14930) check_compatibility.sh needs smarter exit codes
[ https://issues.apache.org/jira/browse/HBASE-14930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043289#comment-15043289 ] Hudson commented on HBASE-14930: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1141 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1141/]) HBASE-14930 check_compatibility.sh needs smarter exit codes (apurtell: rev 87b6d5b2bb67e11b586888ff608a513a52ee43c8) * dev-support/check_compatibility.sh > check_compatibility.sh needs smarter exit codes > --- > > Key: HBASE-14930 > URL: https://issues.apache.org/jira/browse/HBASE-14930 > Project: HBase > Issue Type: Bug >Reporter: Dima Spivak >Assignee: Dima Spivak > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: HBASE-14930_master_v1.patch > > > The check_compatibility.sh tool in dev_support uses the Java API Compliance > Checker to do static analysis of source/binary incompatibilties between two > HBase branches. One problem, though, is that the script has a few instances > where it may return an exit code of 1 (e.g. if Maven steps fail), but this is > the same exit code that the Java ACC tool itself uses to denote that the tool > succeeded, but found incompatibilities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14928) Start row should be set for query through HBase REST gateway involving globbing option
[ https://issues.apache.org/jira/browse/HBASE-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043290#comment-15043290 ] Hudson commented on HBASE-14928: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1141 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1141/]) HBASE-14928 Start row should be set for query through HBase REST gateway (tedyu: rev 5ab7ac15180ad4d5acb255715e6389565afd3c4e) * hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/TableResource.java > Start row should be set for query through HBase REST gateway involving > globbing option > -- > > Key: HBASE-14928 > URL: https://issues.apache.org/jira/browse/HBASE-14928 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17 > > Attachments: 14928-v1.txt > > > As Ben Sutton reported in the thread, Slow response on HBase REST api using > globbing option, query through the Rest API with a globbing option i.e. > http://:/table/key\* executes extremely slowly. > Jerry He pointed out that PrefixFilter is used for query involving globbing > option. > This issue is to fix this bug by setting start row for such queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14923) VerifyReplication should not mask the exception during result comparison
[ https://issues.apache.org/jira/browse/HBASE-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043306#comment-15043306 ] Hudson commented on HBASE-14923: FAILURE: Integrated in HBase-0.98-matrix #268 (See [https://builds.apache.org/job/HBase-0.98-matrix/268/]) HBASE-14923 VerifyReplication should not mask the exception during (apurtell: rev 6309959ea2da3d60d62d6d53106b4efaeb5e530f) * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java > VerifyReplication should not mask the exception during result comparison > - > > Key: HBASE-14923 > URL: https://issues.apache.org/jira/browse/HBASE-14923 > Project: HBase > Issue Type: Bug > Components: tooling >Affects Versions: 2.0.0, 0.98.16 >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal >Priority: Minor > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: HBASE-14923_v1.patch > > > hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java > Line:154 > } catch (Exception e) { > logFailRowAndIncreaseCounter(context, > Counters.CONTENT_DIFFERENT_ROWS, value); > } > Just LOG.error needs to be added for more information for the failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14516) categorize hadoop-compat tests
[ https://issues.apache.org/jira/browse/HBASE-14516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043307#comment-15043307 ] Hudson commented on HBASE-14516: FAILURE: Integrated in HBase-0.98-matrix #268 (See [https://builds.apache.org/job/HBase-0.98-matrix/268/]) HBASE-14516 categorize hadoop-compat tests (busbey: rev 6a5d3f70101ae7bd7f94df2ef75ec60bd511fbff) * hbase-hadoop-compat/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestMetricsReplicationSourceFactory.java * hbase-hadoop-compat/src/test/java/org/apache/hadoop/hbase/thrift/TestMetricsThriftServerSourceFactory.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/metrics/TestBaseSourceImpl.java * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionServerSourceImpl.java * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationMetricsSourceImpl.java * hbase-hadoop-compat/pom.xml * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestMetricsWALSourceImpl.java * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/metrics/TestBaseSourceImpl.java * hbase-hadoop1-compat/pom.xml * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestMetricsReplicationSourceImpl.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionServerSourceImpl.java * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/master/TestMetricsMasterSourceImpl.java * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestMetricsWALSourceImpl.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop-compat/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestMetricsHLogSource.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestMetricsReplicationSourceFactoryImpl.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestMetricsReplicationSourceImpl.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/thrift/TestMetricsThriftServerSourceFactoryImpl.java * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop2-compat/pom.xml * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerMetricsSourceFactoryImpl.java * hbase-hadoop-compat/src/test/java/org/apache/hadoop/hbase/TestCompatibilitySingletonFactory.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/master/TestMetricsMasterSourceImpl.java * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/rest/TestRESTMetricsSourceImpl.java * hbase-annotations/src/test/java/org/apache/hadoop/hbase/testclassification/MetricsTests.java * hbase-hadoop-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionServerSourceFactory.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/rest/TestMetricsRESTSourceImpl.java * hbase-hadoop-compat/src/test/java/org/apache/hadoop/hbase/master/TestMetricsMasterSourceFactory.java * hbase-hadoop-compat/src/test/java/org/apache/hadoop/hbase/rest/TestMetricsRESTSource.java > categorize hadoop-compat tests > -- > > Key: HBASE-14516 > URL: https://issues.apache.org/jira/browse/HBASE-14516 > Project: HBase > Issue Type: Task > Components: build, hadoop2, test >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: HBASE-14516.0.98.v2.patch, HBASE-14516.1.patch > > > the hadoop-compat and hadoop2-compat modules do not rely on the hbase > annotations test-jar and their tests aren't categorized. > this causes things to fail if you attempt to specify one of our test > categories to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14926) Hung ThriftServer; no timeout on read from client; if client crashes, worker thread gets stuck reading
[ https://issues.apache.org/jira/browse/HBASE-14926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043305#comment-15043305 ] Hudson commented on HBASE-14926: FAILURE: Integrated in HBase-0.98-matrix #268 (See [https://builds.apache.org/job/HBase-0.98-matrix/268/]) HBASE-14926 Hung ThriftServer; no timeout on read from client; if client (stack: rev e47f396d6a27a48bae6ad2c23208978b6c2439e5) * hbase-examples/README.txt * hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java * hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java * hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/ThriftServer.java > Hung ThriftServer; no timeout on read from client; if client crashes, worker > thread gets stuck reading > -- > > Key: HBASE-14926 > URL: https://issues.apache.org/jira/browse/HBASE-14926 > Project: HBase > Issue Type: Bug > Components: Thrift >Affects Versions: 2.0.0, 1.2.0, 1.1.2, 1.3.0, 1.0.3, 0.98.16 >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.17 > > Attachments: 14926.patch, 14926v2.txt > > > Thrift server is hung. All worker threads are doing this: > {code} > "thrift-worker-0" daemon prio=10 tid=0x7f0bb95c2800 nid=0xf6a7 runnable > [0x7f0b956e] >java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:152) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > - locked <0x00066d859490> (a java.io.BufferedInputStream) > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) > at > org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601) > at > org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) > at > org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289) > at > org.apache.hadoop.hbase.thrift.CallQueue$Call.run(CallQueue.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > They never recover. > I don't have client side logs. > We've been here before: HBASE-4967 "connected client thrift sockets should > have a server side read timeout" but this patch only got applied to fb branch > (and thrift has changed since then). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14923) VerifyReplication should not mask the exception during result comparison
[ https://issues.apache.org/jira/browse/HBASE-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043287#comment-15043287 ] Hudson commented on HBASE-14923: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1141 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1141/]) HBASE-14923 VerifyReplication should not mask the exception during (apurtell: rev 6309959ea2da3d60d62d6d53106b4efaeb5e530f) * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java > VerifyReplication should not mask the exception during result comparison > - > > Key: HBASE-14923 > URL: https://issues.apache.org/jira/browse/HBASE-14923 > Project: HBase > Issue Type: Bug > Components: tooling >Affects Versions: 2.0.0, 0.98.16 >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal >Priority: Minor > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: HBASE-14923_v1.patch > > > hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java > Line:154 > } catch (Exception e) { > logFailRowAndIncreaseCounter(context, > Counters.CONTENT_DIFFERENT_ROWS, value); > } > Just LOG.error needs to be added for more information for the failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14516) categorize hadoop-compat tests
[ https://issues.apache.org/jira/browse/HBASE-14516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043288#comment-15043288 ] Hudson commented on HBASE-14516: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1141 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1141/]) HBASE-14516 categorize hadoop-compat tests (busbey: rev 6a5d3f70101ae7bd7f94df2ef75ec60bd511fbff) * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestMetricsWALSourceImpl.java * hbase-hadoop-compat/src/test/java/org/apache/hadoop/hbase/master/TestMetricsMasterSourceFactory.java * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerMetricsSourceFactoryImpl.java * hbase-hadoop2-compat/pom.xml * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/master/TestMetricsMasterSourceImpl.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/thrift/TestMetricsThriftServerSourceFactoryImpl.java * hbase-hadoop-compat/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestMetricsReplicationSourceFactory.java * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/rest/TestRESTMetricsSourceImpl.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/master/TestMetricsMasterSourceImpl.java * hbase-hadoop1-compat/pom.xml * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/rest/TestMetricsRESTSourceImpl.java * hbase-annotations/src/test/java/org/apache/hadoop/hbase/testclassification/MetricsTests.java * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionSourceImpl.java * hbase-hadoop-compat/pom.xml * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationMetricsSourceImpl.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/metrics/TestBaseSourceImpl.java * hbase-hadoop-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionServerSourceFactory.java * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionServerSourceImpl.java * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/metrics/TestBaseSourceImpl.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestMetricsWALSourceImpl.java * hbase-hadoop-compat/src/test/java/org/apache/hadoop/hbase/thrift/TestMetricsThriftServerSourceFactory.java * hbase-hadoop-compat/src/test/java/org/apache/hadoop/hbase/rest/TestMetricsRESTSource.java * hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestMetricsReplicationSourceImpl.java * hbase-hadoop-compat/src/test/java/org/apache/hadoop/hbase/TestCompatibilitySingletonFactory.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestMetricsReplicationSourceFactoryImpl.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestMetricsReplicationSourceImpl.java * hbase-hadoop-compat/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestMetricsHLogSource.java * hbase-hadoop2-compat/src/test/java/org/apache/hadoop/hbase/regionserver/TestMetricsRegionServerSourceImpl.java > categorize hadoop-compat tests > -- > > Key: HBASE-14516 > URL: https://issues.apache.org/jira/browse/HBASE-14516 > Project: HBase > Issue Type: Task > Components: build, hadoop2, test >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: HBASE-14516.0.98.v2.patch, HBASE-14516.1.patch > > > the hadoop-compat and hadoop2-compat modules do not rely on the hbase > annotations test-jar and their tests aren't categorized. > this causes things to fail if you attempt to specify one of our test > categories to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14926) Hung ThriftServer; no timeout on read from client; if client crashes, worker thread gets stuck reading
[ https://issues.apache.org/jira/browse/HBASE-14926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043286#comment-15043286 ] Hudson commented on HBASE-14926: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1141 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1141/]) HBASE-14926 Hung ThriftServer; no timeout on read from client; if client (stack: rev e47f396d6a27a48bae6ad2c23208978b6c2439e5) * hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java * hbase-examples/README.txt * hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/ThriftServer.java * hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java > Hung ThriftServer; no timeout on read from client; if client crashes, worker > thread gets stuck reading > -- > > Key: HBASE-14926 > URL: https://issues.apache.org/jira/browse/HBASE-14926 > Project: HBase > Issue Type: Bug > Components: Thrift >Affects Versions: 2.0.0, 1.2.0, 1.1.2, 1.3.0, 1.0.3, 0.98.16 >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.17 > > Attachments: 14926.patch, 14926v2.txt > > > Thrift server is hung. All worker threads are doing this: > {code} > "thrift-worker-0" daemon prio=10 tid=0x7f0bb95c2800 nid=0xf6a7 runnable > [0x7f0b956e] >java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:152) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > - locked <0x00066d859490> (a java.io.BufferedInputStream) > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) > at > org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601) > at > org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) > at > org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289) > at > org.apache.hadoop.hbase.thrift.CallQueue$Call.run(CallQueue.java:64) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > They never recover. > I don't have client side logs. > We've been here before: HBASE-4967 "connected client thrift sockets should > have a server side read timeout" but this patch only got applied to fb branch > (and thrift has changed since then). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14928) Start row should be set for query through HBase REST gateway involving globbing option
[ https://issues.apache.org/jira/browse/HBASE-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043309#comment-15043309 ] Hudson commented on HBASE-14928: FAILURE: Integrated in HBase-0.98-matrix #268 (See [https://builds.apache.org/job/HBase-0.98-matrix/268/]) HBASE-14928 Start row should be set for query through HBase REST gateway (tedyu: rev 5ab7ac15180ad4d5acb255715e6389565afd3c4e) * hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/TableResource.java > Start row should be set for query through HBase REST gateway involving > globbing option > -- > > Key: HBASE-14928 > URL: https://issues.apache.org/jira/browse/HBASE-14928 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17 > > Attachments: 14928-v1.txt > > > As Ben Sutton reported in the thread, Slow response on HBase REST api using > globbing option, query through the Rest API with a globbing option i.e. > http://:/table/key\* executes extremely slowly. > Jerry He pointed out that PrefixFilter is used for query involving globbing > option. > This issue is to fix this bug by setting start row for such queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14930) check_compatibility.sh needs smarter exit codes
[ https://issues.apache.org/jira/browse/HBASE-14930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043308#comment-15043308 ] Hudson commented on HBASE-14930: FAILURE: Integrated in HBase-0.98-matrix #268 (See [https://builds.apache.org/job/HBase-0.98-matrix/268/]) HBASE-14930 check_compatibility.sh needs smarter exit codes (apurtell: rev 87b6d5b2bb67e11b586888ff608a513a52ee43c8) * dev-support/check_compatibility.sh > check_compatibility.sh needs smarter exit codes > --- > > Key: HBASE-14930 > URL: https://issues.apache.org/jira/browse/HBASE-14930 > Project: HBase > Issue Type: Bug >Reporter: Dima Spivak >Assignee: Dima Spivak > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: HBASE-14930_master_v1.patch > > > The check_compatibility.sh tool in dev_support uses the Java API Compliance > Checker to do static analysis of source/binary incompatibilties between two > HBase branches. One problem, though, is that the script has a few instances > where it may return an exit code of 1 (e.g. if Maven steps fail), but this is > the same exit code that the Java ACC tool itself uses to denote that the tool > succeeded, but found incompatibilities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-1422) Refactor to Server Manager
[ https://issues.apache.org/jira/browse/HBASE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043690#comment-15043690 ] Hudson commented on HBASE-1422: --- FAILURE: Integrated in HBase-Trunk_matrix #535 (See [https://builds.apache.org/job/HBase-Trunk_matrix/535/]) Revert "HBASE-1422 Delayed flush doesn't work causing flush storms; (stack: rev 9fb53d07c418002c8a03be1e7e664e094f304ba5) * hbase-common/src/main/java/org/apache/hadoop/hbase/JitterScheduledThreadPoolExecutorImpl.java > Refactor to Server Manager > -- > > Key: HBASE-1422 > URL: https://issues.apache.org/jira/browse/HBASE-1422 > Project: HBase > Issue Type: Sub-task >Affects Versions: 0.19.2 >Reporter: Evgeny Ryabitskiy >Assignee: Evgeny Ryabitskiy > Fix For: 0.90.0 > > Attachments: HBASE-1422.patch, HBASE-1422_v2.patch, > HBASE-1422_v3.patch > > > This is refactor to Server Manager class from HBASE-1017 > I separate it for reasons: > * Its better to have several small patchs and apply them iterativly then one > great path > * I fu..** tired from synchronising w/ SVN (this class changes > frequently), you can saw 10 patches in HBASE-1017 > > We need this refactoing for reasons: > * Server Manager looks like shi**.. bad thing... > * is every time harder to make any chnages > * it is becoming more ugly every time > What changes are done: > ServerManager has mapping: > * serverName 2 serverInfo, > * serverAddr 2 serverInfo, > * serverName 2 load, > * load 2 severName > 1) serverName 2 load - not necessary if you have serverName 2 serverInfo > 2) All mappings are encapsulated in ServersInfo class (inner class of > ServerManager) > 3) ServersInfo has operations for adding, updating and removing information > of HRS > + some code in RegionServer is puted in synchronised block... cause it is > working with synchronised map... > Note: this task is to make code much much more clear.. and it's not going to > change logic, so no much problem is going appear -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14922) Delayed flush doesn't work causing flush storms.
[ https://issues.apache.org/jira/browse/HBASE-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043689#comment-15043689 ] Hudson commented on HBASE-14922: FAILURE: Integrated in HBase-Trunk_matrix #535 (See [https://builds.apache.org/job/HBase-Trunk_matrix/535/]) Revert "Revert "HBASE-14922 Delayed flush doesn't work causing flush (stack: rev 8bf70144e40650ef972f005e2465bd0e2a087c40) * hbase-common/src/main/java/org/apache/hadoop/hbase/JitterScheduledThreadPoolExecutorImpl.java > Delayed flush doesn't work causing flush storms. > > > Key: HBASE-14922 > URL: https://issues.apache.org/jira/browse/HBASE-14922 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0, 1.2.0, 1.1.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14922-fix.txt, HBASE-14922-v1.patch, > HBASE-14922-v2.patch, HBASE-14922.patch > > > Starting all regionservers at the same time will mean that most > PeriodicMemstoreFlusher's will be running at the same time. So all of these > threads will queue flushes at about the same time. > This was supposed to be mitigated by Delayed. However that isn't nearly > enough. This results in the immediate filling up and then draining of the > flush queues every hour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14799) Commons-collections object deserialization remote command execution vulnerability
[ https://issues.apache.org/jira/browse/HBASE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14799: --- Attachment: 14799-0.98.addendum 0.98 builds were failing due to missing test category in TestHbaseObjectWritableFor96Migration.java Committed the addendum to unblock the build > Commons-collections object deserialization remote command execution > vulnerability > -- > > Key: HBASE-14799 > URL: https://issues.apache.org/jira/browse/HBASE-14799 > Project: HBase > Issue Type: Bug >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Critical > Fix For: 2.0.0, 0.94.28, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14799-0.98.addendum, HBASE-14799-0.94.patch, > HBASE-14799-0.94.patch, HBASE-14799-0.94.patch, HBASE-14799-0.94.patch, > HBASE-14799-0.94.patch, HBASE-14799-0.98.patch, HBASE-14799-0.98.patch, > HBASE-14799-0.98.patch, HBASE-14799.patch, HBASE-14799.patch > > > Read: > http://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-and-your-application-have-in-common-this-vulnerability/ > TL;DR: If you have commons-collections on your classpath and accept and > process Java object serialization data, then you probably have an exploitable > remote command execution vulnerability. > 0.94 and earlier HBase releases are vulnerable because we might read in and > rehydrate serialized Java objects out of RPC packet data in > HbaseObjectWritable using ObjectInputStream#readObject (see > https://hbase.apache.org/0.94/xref/org/apache/hadoop/hbase/io/HbaseObjectWritable.html#714) > and we have commons-collections on the classpath on the server. > 0.98 also carries some limited exposure to this problem through inclusion of > backwards compatible deserialization code in > HbaseObjectWritableFor96Migration. This is used by the 0.94-to-0.98 migration > utility, and by the AccessController when reading permissions from the ACL > table serialized in legacy format by 0.94. Unprivileged users cannot run the > tool nor access the ACL table. > Unprivileged users can however attack a 0.94 installation. An attacker might > be able to use the method discussed on that blog post to capture valid HBase > RPC payloads for 0.94 and prior versions, rewrite them to embed an exploit, > and replay them to trigger a remote command execution with the privileges of > the account under which the HBase RegionServer daemon is running. > We need to make a patch release of 0.94 that changes HbaseObjectWritable to > disallow processing of random Java object serializations. This will be a > compatibility break that might affect old style coprocessors, which quite > possibly may rely on this catch-all in HbaseObjectWritable for custom object > (de)serialization. We can introduce a new configuration setting, > "hbase.allow.legacy.object.serialization", defaulting to false. > To be thorough, we can also use the new configuration setting > "hbase.allow.legacy.object.serialization" (defaulting to false) in 0.98 to > prevent the AccessController from falling back to the vulnerable legacy code. > This turns out to not affect the ability to migrate permissions because > TablePermission implements Writable, which is safe, not Serializable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14916) Add checkstyle_report.py to other branches
[ https://issues.apache.org/jira/browse/HBASE-14916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043401#comment-15043401 ] Appy commented on HBASE-14916: -- So when I was patched in v3 locally, i used 'git am' which created the file with 755. But then the QA on v3 failed with same permission issue. Looking at console output, I see that the script uses `patch -p1` instead. Trying it locally, it indeed creates new file with 644. Looking at `man patch`, there is caveat about permissions. So I don't know how to get around that. We definitely don't want to change `patch -1` to `git am`. I believe file permissions are one off cases. One simple thing to do would be, using `git am` and run test-patch.sh locally to test the checkstyle part manually (since QA output for other tests is fine). And before finally pushing to repo, verify permission manually. > Add checkstyle_report.py to other branches > -- > > Key: HBASE-14916 > URL: https://issues.apache.org/jira/browse/HBASE-14916 > Project: HBase > Issue Type: Bug >Reporter: Appy >Assignee: Appy > Attachments: HBASE-14916-branch-1-v2.patch, > HBASE-14916-branch-1-v3.patch, HBASE-14916-branch-1.patch > > > Given test-patch.sh is always run from master, and that it now uses > checkstyle_report.py, we should pull back the script to other branches too. > Otherwise we see error like: > /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/jenkins.build/dev-support/test-patch.sh: > line 662: > /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase/dev-support/checkstyle_report.py: > No such file or directory > [reference|https://builds.apache.org/job/PreCommit-HBASE-Build/16734//consoleFull] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-7171) Initial web UI for region/memstore/storefiles details
[ https://issues.apache.org/jira/browse/HBASE-7171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043384#comment-15043384 ] stack commented on HBASE-7171: -- Getting info as json would be a crazy ask? We do that in a few places currentlyNo harm if a later. > Initial web UI for region/memstore/storefiles details > - > > Key: HBASE-7171 > URL: https://issues.apache.org/jira/browse/HBASE-7171 > Project: HBase > Issue Type: Improvement > Components: UI >Reporter: stack >Assignee: Mikhail Antonov > Labels: beginner > Attachments: HBASE-7171.patch, region_details.png, region_list.png, > storefile_details.png > > > Click on a region in UI and get a listing of hfiles in HDFS and summary of > memstore content; click on an HFile and see its content -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-7171) Initial web UI for region/memstore/storefiles details
[ https://issues.apache.org/jira/browse/HBASE-7171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043383#comment-15043383 ] stack commented on HBASE-7171: -- Beautiful! > Initial web UI for region/memstore/storefiles details > - > > Key: HBASE-7171 > URL: https://issues.apache.org/jira/browse/HBASE-7171 > Project: HBase > Issue Type: Improvement > Components: UI >Reporter: stack >Assignee: Mikhail Antonov > Labels: beginner > Attachments: HBASE-7171.patch, region_details.png, region_list.png, > storefile_details.png > > > Click on a region in UI and get a listing of hfiles in HDFS and summary of > memstore content; click on an HFile and see its content -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14916) Add checkstyle_report.py to other branches
[ https://issues.apache.org/jira/browse/HBASE-14916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043382#comment-15043382 ] stack commented on HBASE-14916: --- How you testing it locally [~appy]? Dump in here and I'll try same... > Add checkstyle_report.py to other branches > -- > > Key: HBASE-14916 > URL: https://issues.apache.org/jira/browse/HBASE-14916 > Project: HBase > Issue Type: Bug >Reporter: Appy >Assignee: Appy > Attachments: HBASE-14916-branch-1-v2.patch, > HBASE-14916-branch-1-v3.patch, HBASE-14916-branch-1.patch > > > Given test-patch.sh is always run from master, and that it now uses > checkstyle_report.py, we should pull back the script to other branches too. > Otherwise we see error like: > /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/jenkins.build/dev-support/test-patch.sh: > line 662: > /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase/dev-support/checkstyle_report.py: > No such file or directory > [reference|https://builds.apache.org/job/PreCommit-HBASE-Build/16734//consoleFull] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14923) VerifyReplication should not mask the exception during result comparison
[ https://issues.apache.org/jira/browse/HBASE-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043385#comment-15043385 ] Hudson commented on HBASE-14923: SUCCESS: Integrated in HBase-1.0 #1120 (See [https://builds.apache.org/job/HBase-1.0/1120/]) HBASE-14923 VerifyReplication should not mask the exception during (apurtell: rev 13335ffe02fb357810551d7c5c81b44ca2b5137d) * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java > VerifyReplication should not mask the exception during result comparison > - > > Key: HBASE-14923 > URL: https://issues.apache.org/jira/browse/HBASE-14923 > Project: HBase > Issue Type: Bug > Components: tooling >Affects Versions: 2.0.0, 0.98.16 >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal >Priority: Minor > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: HBASE-14923_v1.patch > > > hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java > Line:154 > } catch (Exception e) { > logFailRowAndIncreaseCounter(context, > Counters.CONTENT_DIFFERENT_ROWS, value); > } > Just LOG.error needs to be added for more information for the failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14930) check_compatibility.sh needs smarter exit codes
[ https://issues.apache.org/jira/browse/HBASE-14930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043386#comment-15043386 ] Hudson commented on HBASE-14930: SUCCESS: Integrated in HBase-1.0 #1120 (See [https://builds.apache.org/job/HBase-1.0/1120/]) HBASE-14930 check_compatibility.sh needs smarter exit codes (apurtell: rev 41f107b0f06b0ae3680014551be9e295c735871f) * dev-support/check_compatibility.sh > check_compatibility.sh needs smarter exit codes > --- > > Key: HBASE-14930 > URL: https://issues.apache.org/jira/browse/HBASE-14930 > Project: HBase > Issue Type: Bug >Reporter: Dima Spivak >Assignee: Dima Spivak > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: HBASE-14930_master_v1.patch > > > The check_compatibility.sh tool in dev_support uses the Java API Compliance > Checker to do static analysis of source/binary incompatibilties between two > HBase branches. One problem, though, is that the script has a few instances > where it may return an exit code of 1 (e.g. if Maven steps fail), but this is > the same exit code that the Java ACC tool itself uses to denote that the tool > succeeded, but found incompatibilities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14922) Delayed flush doesn't work causing flush storms.
[ https://issues.apache.org/jira/browse/HBASE-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043484#comment-15043484 ] stack commented on HBASE-14922: --- I pushed the [~lhofhansl] fix because the cited exception is causing tests to fail ruining our clean build track record. https://builds.apache.org/view/H-L/view/HBase/job/HBase-1.2/lastCompletedBuild/jdk=latest1.7,label=Hadoop/testReport/org.apache.hadoop.hbase.procedure/TestProcedureManager/org_apache_hadoop_hbase_procedure_TestProcedureManager/ Fix looks good to me. If it not [~eclark], we can reevaluate later... committing to get blue builds back again. > Delayed flush doesn't work causing flush storms. > > > Key: HBASE-14922 > URL: https://issues.apache.org/jira/browse/HBASE-14922 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0, 1.2.0, 1.1.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14922-fix.txt, HBASE-14922-v1.patch, > HBASE-14922-v2.patch, HBASE-14922.patch > > > Starting all regionservers at the same time will mean that most > PeriodicMemstoreFlusher's will be running at the same time. So all of these > threads will queue flushes at about the same time. > This was supposed to be mitigated by Delayed. However that isn't nearly > enough. This results in the immediate filling up and then draining of the > flush queues every hour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14922) Delayed flush doesn't work causing flush storms.
[ https://issues.apache.org/jira/browse/HBASE-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043485#comment-15043485 ] stack commented on HBASE-14922: --- Oh, committed to 1.2+ > Delayed flush doesn't work causing flush storms. > > > Key: HBASE-14922 > URL: https://issues.apache.org/jira/browse/HBASE-14922 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0, 1.2.0, 1.1.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14922-fix.txt, HBASE-14922-v1.patch, > HBASE-14922-v2.patch, HBASE-14922.patch > > > Starting all regionservers at the same time will mean that most > PeriodicMemstoreFlusher's will be running at the same time. So all of these > threads will queue flushes at about the same time. > This was supposed to be mitigated by Delayed. However that isn't nearly > enough. This results in the immediate filling up and then draining of the > flush queues every hour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043246#comment-15043246 ] Duo Zhang commented on HBASE-14004: --- Sounds great, an incremental unique id of WAL entry is better since it is managed by ourselves. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14822) Renewing leases of scanners doesn't work
[ https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042795#comment-15042795 ] Hadoop QA commented on HBASE-14822: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12775922/14822-v4.txt against master branch at commit 80afb839ec57bccc54c4777be0b9cfb8fb71df63. ATTACHMENT ID: 12775922 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated new checkstyle errors. Check build console for list of new errors. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + new java.lang.String[] { "Region", "Scan", "ScannerId", "NumberOfRows", "CloseScanner", "NextCallSeq", "ClientHandlesPartials", "ClientHandlesHeartbeats", "TrackScanMetrics", "Renew", }); + TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_CLIENT_SCANNER_TIMEOUT_PERIOD, leaseTimeout); {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 zombies{color}. No zombie tests found running at the end of the build. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16776//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16776//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16776//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16776//console This message is automatically generated. > Renewing leases of scanners doesn't work > > > Key: HBASE-14822 > URL: https://issues.apache.org/jira/browse/HBASE-14822 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14 >Reporter: Samarth Jain >Assignee: Lars Hofhansl > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98.txt, > 14822-v3-0.98.txt, 14822-v4-0.98.txt, 14822-v4.txt, 14822.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14919) Infrastructure refactoring
[ https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043518#comment-15043518 ] Hadoop QA commented on HBASE-14919: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12775947/HBASE-14919-V01.patch against master branch at commit b1462679e17f9b5827720f3c57eaeff946cfea0e. ATTACHMENT ID: 12775947 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 29 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16777//console This message is automatically generated. > Infrastructure refactoring > -- > > Key: HBASE-14919 > URL: https://issues.apache.org/jira/browse/HBASE-14919 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch > > > Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as > first-class citizen and decoupling memstore scanner from the memstore > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-1422) Refactor to Server Manager
[ https://issues.apache.org/jira/browse/HBASE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043556#comment-15043556 ] Hudson commented on HBASE-1422: --- FAILURE: Integrated in HBase-1.2 #426 (See [https://builds.apache.org/job/HBase-1.2/426/]) HBASE-1422 Delayed flush doesn't work causing flush storms; addendum by (stack: rev 55422749a54bac6b8177ed29be50e856dd623503) * hbase-common/src/main/java/org/apache/hadoop/hbase/JitterScheduledThreadPoolExecutorImpl.java > Refactor to Server Manager > -- > > Key: HBASE-1422 > URL: https://issues.apache.org/jira/browse/HBASE-1422 > Project: HBase > Issue Type: Sub-task >Affects Versions: 0.19.2 >Reporter: Evgeny Ryabitskiy >Assignee: Evgeny Ryabitskiy > Fix For: 0.90.0 > > Attachments: HBASE-1422.patch, HBASE-1422_v2.patch, > HBASE-1422_v3.patch > > > This is refactor to Server Manager class from HBASE-1017 > I separate it for reasons: > * Its better to have several small patchs and apply them iterativly then one > great path > * I fu..** tired from synchronising w/ SVN (this class changes > frequently), you can saw 10 patches in HBASE-1017 > > We need this refactoing for reasons: > * Server Manager looks like shi**.. bad thing... > * is every time harder to make any chnages > * it is becoming more ugly every time > What changes are done: > ServerManager has mapping: > * serverName 2 serverInfo, > * serverAddr 2 serverInfo, > * serverName 2 load, > * load 2 severName > 1) serverName 2 load - not necessary if you have serverName 2 serverInfo > 2) All mappings are encapsulated in ServersInfo class (inner class of > ServerManager) > 3) ServersInfo has operations for adding, updating and removing information > of HRS > + some code in RegionServer is puted in synchronised block... cause it is > working with synchronised map... > Note: this task is to make code much much more clear.. and it's not going to > change logic, so no much problem is going appear -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14922) Delayed flush doesn't work causing flush storms.
[ https://issues.apache.org/jira/browse/HBASE-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043572#comment-15043572 ] Hudson commented on HBASE-14922: SUCCESS: Integrated in HBase-1.2-IT #328 (See [https://builds.apache.org/job/HBase-1.2-IT/328/]) Revert "Revert "HBASE-14922 Delayed flush doesn't work causing flush (stack: rev 1bcb2c66aed24105c02300baf8bc431f81c9e76e) * hbase-common/src/main/java/org/apache/hadoop/hbase/JitterScheduledThreadPoolExecutorImpl.java > Delayed flush doesn't work causing flush storms. > > > Key: HBASE-14922 > URL: https://issues.apache.org/jira/browse/HBASE-14922 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0, 1.2.0, 1.1.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14922-fix.txt, HBASE-14922-v1.patch, > HBASE-14922-v2.patch, HBASE-14922.patch > > > Starting all regionservers at the same time will mean that most > PeriodicMemstoreFlusher's will be running at the same time. So all of these > threads will queue flushes at about the same time. > This was supposed to be mitigated by Delayed. However that isn't nearly > enough. This results in the immediate filling up and then draining of the > flush queues every hour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-1422) Refactor to Server Manager
[ https://issues.apache.org/jira/browse/HBASE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043573#comment-15043573 ] Hudson commented on HBASE-1422: --- SUCCESS: Integrated in HBase-1.2-IT #328 (See [https://builds.apache.org/job/HBase-1.2-IT/328/]) HBASE-1422 Delayed flush doesn't work causing flush storms; addendum by (stack: rev 55422749a54bac6b8177ed29be50e856dd623503) * hbase-common/src/main/java/org/apache/hadoop/hbase/JitterScheduledThreadPoolExecutorImpl.java Revert "HBASE-1422 Delayed flush doesn't work causing flush storms; (stack: rev 1633db4747a32e02af7b7df9a316adcbeeebd92d) * hbase-common/src/main/java/org/apache/hadoop/hbase/JitterScheduledThreadPoolExecutorImpl.java > Refactor to Server Manager > -- > > Key: HBASE-1422 > URL: https://issues.apache.org/jira/browse/HBASE-1422 > Project: HBase > Issue Type: Sub-task >Affects Versions: 0.19.2 >Reporter: Evgeny Ryabitskiy >Assignee: Evgeny Ryabitskiy > Fix For: 0.90.0 > > Attachments: HBASE-1422.patch, HBASE-1422_v2.patch, > HBASE-1422_v3.patch > > > This is refactor to Server Manager class from HBASE-1017 > I separate it for reasons: > * Its better to have several small patchs and apply them iterativly then one > great path > * I fu..** tired from synchronising w/ SVN (this class changes > frequently), you can saw 10 patches in HBASE-1017 > > We need this refactoing for reasons: > * Server Manager looks like shi**.. bad thing... > * is every time harder to make any chnages > * it is becoming more ugly every time > What changes are done: > ServerManager has mapping: > * serverName 2 serverInfo, > * serverAddr 2 serverInfo, > * serverName 2 load, > * load 2 severName > 1) serverName 2 load - not necessary if you have serverName 2 serverInfo > 2) All mappings are encapsulated in ServersInfo class (inner class of > ServerManager) > 3) ServersInfo has operations for adding, updating and removing information > of HRS > + some code in RegionServer is puted in synchronised block... cause it is > working with synchronised map... > Note: this task is to make code much much more clear.. and it's not going to > change logic, so no much problem is going appear -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14799) Commons-collections object deserialization remote command execution vulnerability
[ https://issues.apache.org/jira/browse/HBASE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043593#comment-15043593 ] Hudson commented on HBASE-14799: FAILURE: Integrated in HBase-0.98-matrix #269 (See [https://builds.apache.org/job/HBase-0.98-matrix/269/]) HBASE-14799 Adds test category to (tedyu: rev d95d345798faaff893f3c032f7c9c9558d025bb7) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestHbaseObjectWritableFor96Migration.java > Commons-collections object deserialization remote command execution > vulnerability > -- > > Key: HBASE-14799 > URL: https://issues.apache.org/jira/browse/HBASE-14799 > Project: HBase > Issue Type: Bug >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Critical > Fix For: 2.0.0, 0.94.28, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14799-0.98.addendum, HBASE-14799-0.94.patch, > HBASE-14799-0.94.patch, HBASE-14799-0.94.patch, HBASE-14799-0.94.patch, > HBASE-14799-0.94.patch, HBASE-14799-0.98.patch, HBASE-14799-0.98.patch, > HBASE-14799-0.98.patch, HBASE-14799.patch, HBASE-14799.patch > > > Read: > http://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-and-your-application-have-in-common-this-vulnerability/ > TL;DR: If you have commons-collections on your classpath and accept and > process Java object serialization data, then you probably have an exploitable > remote command execution vulnerability. > 0.94 and earlier HBase releases are vulnerable because we might read in and > rehydrate serialized Java objects out of RPC packet data in > HbaseObjectWritable using ObjectInputStream#readObject (see > https://hbase.apache.org/0.94/xref/org/apache/hadoop/hbase/io/HbaseObjectWritable.html#714) > and we have commons-collections on the classpath on the server. > 0.98 also carries some limited exposure to this problem through inclusion of > backwards compatible deserialization code in > HbaseObjectWritableFor96Migration. This is used by the 0.94-to-0.98 migration > utility, and by the AccessController when reading permissions from the ACL > table serialized in legacy format by 0.94. Unprivileged users cannot run the > tool nor access the ACL table. > Unprivileged users can however attack a 0.94 installation. An attacker might > be able to use the method discussed on that blog post to capture valid HBase > RPC payloads for 0.94 and prior versions, rewrite them to embed an exploit, > and replay them to trigger a remote command execution with the privileges of > the account under which the HBase RegionServer daemon is running. > We need to make a patch release of 0.94 that changes HbaseObjectWritable to > disallow processing of random Java object serializations. This will be a > compatibility break that might affect old style coprocessors, which quite > possibly may rely on this catch-all in HbaseObjectWritable for custom object > (de)serialization. We can introduce a new configuration setting, > "hbase.allow.legacy.object.serialization", defaulting to false. > To be thorough, we can also use the new configuration setting > "hbase.allow.legacy.object.serialization" (defaulting to false) in 0.98 to > prevent the AccessController from falling back to the vulnerable legacy code. > This turns out to not affect the ability to migrate permissions because > TablePermission implements Writable, which is safe, not Serializable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14799) Commons-collections object deserialization remote command execution vulnerability
[ https://issues.apache.org/jira/browse/HBASE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043597#comment-15043597 ] Hudson commented on HBASE-14799: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1142 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1142/]) HBASE-14799 Adds test category to (tedyu: rev d95d345798faaff893f3c032f7c9c9558d025bb7) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestHbaseObjectWritableFor96Migration.java > Commons-collections object deserialization remote command execution > vulnerability > -- > > Key: HBASE-14799 > URL: https://issues.apache.org/jira/browse/HBASE-14799 > Project: HBase > Issue Type: Bug >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Critical > Fix For: 2.0.0, 0.94.28, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14799-0.98.addendum, HBASE-14799-0.94.patch, > HBASE-14799-0.94.patch, HBASE-14799-0.94.patch, HBASE-14799-0.94.patch, > HBASE-14799-0.94.patch, HBASE-14799-0.98.patch, HBASE-14799-0.98.patch, > HBASE-14799-0.98.patch, HBASE-14799.patch, HBASE-14799.patch > > > Read: > http://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-and-your-application-have-in-common-this-vulnerability/ > TL;DR: If you have commons-collections on your classpath and accept and > process Java object serialization data, then you probably have an exploitable > remote command execution vulnerability. > 0.94 and earlier HBase releases are vulnerable because we might read in and > rehydrate serialized Java objects out of RPC packet data in > HbaseObjectWritable using ObjectInputStream#readObject (see > https://hbase.apache.org/0.94/xref/org/apache/hadoop/hbase/io/HbaseObjectWritable.html#714) > and we have commons-collections on the classpath on the server. > 0.98 also carries some limited exposure to this problem through inclusion of > backwards compatible deserialization code in > HbaseObjectWritableFor96Migration. This is used by the 0.94-to-0.98 migration > utility, and by the AccessController when reading permissions from the ACL > table serialized in legacy format by 0.94. Unprivileged users cannot run the > tool nor access the ACL table. > Unprivileged users can however attack a 0.94 installation. An attacker might > be able to use the method discussed on that blog post to capture valid HBase > RPC payloads for 0.94 and prior versions, rewrite them to embed an exploit, > and replay them to trigger a remote command execution with the privileges of > the account under which the HBase RegionServer daemon is running. > We need to make a patch release of 0.94 that changes HbaseObjectWritable to > disallow processing of random Java object serializations. This will be a > compatibility break that might affect old style coprocessors, which quite > possibly may rely on this catch-all in HbaseObjectWritable for custom object > (de)serialization. We can introduce a new configuration setting, > "hbase.allow.legacy.object.serialization", defaulting to false. > To be thorough, we can also use the new configuration setting > "hbase.allow.legacy.object.serialization" (defaulting to false) in 0.98 to > prevent the AccessController from falling back to the vulnerable legacy code. > This turns out to not affect the ability to migrate permissions because > TablePermission implements Writable, which is safe, not Serializable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14932) bulkload fails because file not found
Shuaifeng Zhou created HBASE-14932: -- Summary: bulkload fails because file not found Key: HBASE-14932 URL: https://issues.apache.org/jira/browse/HBASE-14932 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.10 Reporter: Shuaifeng Zhou Fix For: 0.98.17 When make a dobulkload call, one call may contain sevel hfiles to load, but the call may timeout during regionserver load files, and client will retry to load. But when client doing retry call, regionserver may continue doing load operation, if somefiles success, the retry call will throw filenotfound exception, and this will cause client retry again and again until retry exhausted, and bulkload fails. When this happening, actually, some files are loaded successfully, that's a inconsistent status. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042765#comment-15042765 ] Phil Yang commented on HBASE-14004: --- Is it required to use the size of serialized binary data? I don't know if there is a sequence increment unique id for each wal log. If so or if we can add this, we can know what is the largest id that has been hsynced, right? And this id can also help us on replaying duplicate entries. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042744#comment-15042744 ] Duo Zhang commented on HBASE-14004: --- {quote} Fixing HBase writing path that we should retry logging WAL in a new file rather than rollback MemStore. {quote} To be clear, this means we will hold the {{WAL.sync}} request if there are some entries have already been written out but not acked and never return until we successfully write them out and get ack back. And if {{WAL.sync}} or {{WAL.write}} fails(maybe due to queue full), we will still rollback MemStore since we can confirm that the WAL entries have not been written out. Right? And I think there is another task for us. Now the DFSOutputStream does not provide a public method to get acked length. We can open a issue of HDFS project and use reflection first in HBase. But there is still a problem that {{hflush}} or {{hsync}} does not return the acked length which means get acked length and {{hsync}} are two separated operations so it is hard to get the exact acked length after calling {{hsync}}. Maybe we could get current total write out bytes first(not acked length) and then call {{hsync}}, the acked length after calling {{hsync}} must be larger than this value so it is safe to use this value as "acked length". Any thoughts? Thanks. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14930) check_compatibility.sh needs smarter exit codes
[ https://issues.apache.org/jira/browse/HBASE-14930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042743#comment-15042743 ] Hudson commented on HBASE-14930: FAILURE: Integrated in HBase-1.1-JDK7 #1614 (See [https://builds.apache.org/job/HBase-1.1-JDK7/1614/]) HBASE-14930 check_compatibility.sh needs smarter exit codes (apurtell: rev 86aa45b12c54fbdd5caa4e8261b782ea7e2f8d8e) * dev-support/check_compatibility.sh > check_compatibility.sh needs smarter exit codes > --- > > Key: HBASE-14930 > URL: https://issues.apache.org/jira/browse/HBASE-14930 > Project: HBase > Issue Type: Bug >Reporter: Dima Spivak >Assignee: Dima Spivak > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: HBASE-14930_master_v1.patch > > > The check_compatibility.sh tool in dev_support uses the Java API Compliance > Checker to do static analysis of source/binary incompatibilties between two > HBase branches. One problem, though, is that the script has a few instances > where it may return an exit code of 1 (e.g. if Maven steps fail), but this is > the same exit code that the Java ACC tool itself uses to denote that the tool > succeeded, but found incompatibilities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14923) VerifyReplication should not mask the exception during result comparison
[ https://issues.apache.org/jira/browse/HBASE-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042742#comment-15042742 ] Hudson commented on HBASE-14923: FAILURE: Integrated in HBase-1.1-JDK7 #1614 (See [https://builds.apache.org/job/HBase-1.1-JDK7/1614/]) HBASE-14923 VerifyReplication should not mask the exception during (apurtell: rev a04904c6bdc9ca34d790d137504622b7d48e77ef) * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java > VerifyReplication should not mask the exception during result comparison > - > > Key: HBASE-14923 > URL: https://issues.apache.org/jira/browse/HBASE-14923 > Project: HBase > Issue Type: Bug > Components: tooling >Affects Versions: 2.0.0, 0.98.16 >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal >Priority: Minor > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: HBASE-14923_v1.patch > > > hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java > Line:154 > } catch (Exception e) { > logFailRowAndIncreaseCounter(context, > Counters.CONTENT_DIFFERENT_ROWS, value); > } > Just LOG.error needs to be added for more information for the failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14919) Infrastructure refactoring
[ https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043501#comment-15043501 ] stack commented on HBASE-14919: --- My fault [~eshcar] I broke our build environment a few days ago messing with improvements. Reschedule any patches you may have queued for hadoopqa. Let me do this one. > Infrastructure refactoring > -- > > Key: HBASE-14919 > URL: https://issues.apache.org/jira/browse/HBASE-14919 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-14919-V01.patch > > > Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as > first-class citizen and decoupling memstore scanner from the memstore > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-1422) Refactor to Server Manager
[ https://issues.apache.org/jira/browse/HBASE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043554#comment-15043554 ] Hudson commented on HBASE-1422: --- SUCCESS: Integrated in HBase-1.3-IT #358 (See [https://builds.apache.org/job/HBase-1.3-IT/358/]) HBASE-1422 Delayed flush doesn't work causing flush storms; addendum by (stack: rev 7d0c08fbcdb6ab84bc8fb78fdc840c7b29720390) * hbase-common/src/main/java/org/apache/hadoop/hbase/JitterScheduledThreadPoolExecutorImpl.java Revert "HBASE-1422 Delayed flush doesn't work causing flush storms; (stack: rev 693e1dee4cf46159c6c1e255e40a346985db4115) * hbase-common/src/main/java/org/apache/hadoop/hbase/JitterScheduledThreadPoolExecutorImpl.java > Refactor to Server Manager > -- > > Key: HBASE-1422 > URL: https://issues.apache.org/jira/browse/HBASE-1422 > Project: HBase > Issue Type: Sub-task >Affects Versions: 0.19.2 >Reporter: Evgeny Ryabitskiy >Assignee: Evgeny Ryabitskiy > Fix For: 0.90.0 > > Attachments: HBASE-1422.patch, HBASE-1422_v2.patch, > HBASE-1422_v3.patch > > > This is refactor to Server Manager class from HBASE-1017 > I separate it for reasons: > * Its better to have several small patchs and apply them iterativly then one > great path > * I fu..** tired from synchronising w/ SVN (this class changes > frequently), you can saw 10 patches in HBASE-1017 > > We need this refactoing for reasons: > * Server Manager looks like shi**.. bad thing... > * is every time harder to make any chnages > * it is becoming more ugly every time > What changes are done: > ServerManager has mapping: > * serverName 2 serverInfo, > * serverAddr 2 serverInfo, > * serverName 2 load, > * load 2 severName > 1) serverName 2 load - not necessary if you have serverName 2 serverInfo > 2) All mappings are encapsulated in ServersInfo class (inner class of > ServerManager) > 3) ServersInfo has operations for adding, updating and removing information > of HRS > + some code in RegionServer is puted in synchronised block... cause it is > working with synchronised map... > Note: this task is to make code much much more clear.. and it's not going to > change logic, so no much problem is going appear -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14922) Delayed flush doesn't work causing flush storms.
[ https://issues.apache.org/jira/browse/HBASE-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043553#comment-15043553 ] Hudson commented on HBASE-14922: SUCCESS: Integrated in HBase-1.3-IT #358 (See [https://builds.apache.org/job/HBase-1.3-IT/358/]) Revert "Revert "HBASE-14922 Delayed flush doesn't work causing flush (stack: rev d955cb328046cb7efe78486fd0f4b03258ea2f6a) * hbase-common/src/main/java/org/apache/hadoop/hbase/JitterScheduledThreadPoolExecutorImpl.java > Delayed flush doesn't work causing flush storms. > > > Key: HBASE-14922 > URL: https://issues.apache.org/jira/browse/HBASE-14922 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0, 1.2.0, 1.1.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14922-fix.txt, HBASE-14922-v1.patch, > HBASE-14922-v2.patch, HBASE-14922.patch > > > Starting all regionservers at the same time will mean that most > PeriodicMemstoreFlusher's will be running at the same time. So all of these > threads will queue flushes at about the same time. > This was supposed to be mitigated by Delayed. However that isn't nearly > enough. This results in the immediate filling up and then draining of the > flush queues every hour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-1422) Refactor to Server Manager
[ https://issues.apache.org/jira/browse/HBASE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043525#comment-15043525 ] Hudson commented on HBASE-1422: --- SUCCESS: Integrated in HBase-1.3 #420 (See [https://builds.apache.org/job/HBase-1.3/420/]) HBASE-1422 Delayed flush doesn't work causing flush storms; addendum by (stack: rev 7d0c08fbcdb6ab84bc8fb78fdc840c7b29720390) * hbase-common/src/main/java/org/apache/hadoop/hbase/JitterScheduledThreadPoolExecutorImpl.java > Refactor to Server Manager > -- > > Key: HBASE-1422 > URL: https://issues.apache.org/jira/browse/HBASE-1422 > Project: HBase > Issue Type: Sub-task >Affects Versions: 0.19.2 >Reporter: Evgeny Ryabitskiy >Assignee: Evgeny Ryabitskiy > Fix For: 0.90.0 > > Attachments: HBASE-1422.patch, HBASE-1422_v2.patch, > HBASE-1422_v3.patch > > > This is refactor to Server Manager class from HBASE-1017 > I separate it for reasons: > * Its better to have several small patchs and apply them iterativly then one > great path > * I fu..** tired from synchronising w/ SVN (this class changes > frequently), you can saw 10 patches in HBASE-1017 > > We need this refactoing for reasons: > * Server Manager looks like shi**.. bad thing... > * is every time harder to make any chnages > * it is becoming more ugly every time > What changes are done: > ServerManager has mapping: > * serverName 2 serverInfo, > * serverAddr 2 serverInfo, > * serverName 2 load, > * load 2 severName > 1) serverName 2 load - not necessary if you have serverName 2 serverInfo > 2) All mappings are encapsulated in ServersInfo class (inner class of > ServerManager) > 3) ServersInfo has operations for adding, updating and removing information > of HRS > + some code in RegionServer is puted in synchronised block... cause it is > working with synchronised map... > Note: this task is to make code much much more clear.. and it's not going to > change logic, so no much problem is going appear -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-1422) Refactor to Server Manager
[ https://issues.apache.org/jira/browse/HBASE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043538#comment-15043538 ] Hudson commented on HBASE-1422: --- FAILURE: Integrated in HBase-Trunk_matrix #534 (See [https://builds.apache.org/job/HBase-Trunk_matrix/534/]) HBASE-1422 Delayed flush doesn't work causing flush storms; addendum by (stack: rev b1462679e17f9b5827720f3c57eaeff946cfea0e) * hbase-common/src/main/java/org/apache/hadoop/hbase/JitterScheduledThreadPoolExecutorImpl.java > Refactor to Server Manager > -- > > Key: HBASE-1422 > URL: https://issues.apache.org/jira/browse/HBASE-1422 > Project: HBase > Issue Type: Sub-task >Affects Versions: 0.19.2 >Reporter: Evgeny Ryabitskiy >Assignee: Evgeny Ryabitskiy > Fix For: 0.90.0 > > Attachments: HBASE-1422.patch, HBASE-1422_v2.patch, > HBASE-1422_v3.patch > > > This is refactor to Server Manager class from HBASE-1017 > I separate it for reasons: > * Its better to have several small patchs and apply them iterativly then one > great path > * I fu..** tired from synchronising w/ SVN (this class changes > frequently), you can saw 10 patches in HBASE-1017 > > We need this refactoing for reasons: > * Server Manager looks like shi**.. bad thing... > * is every time harder to make any chnages > * it is becoming more ugly every time > What changes are done: > ServerManager has mapping: > * serverName 2 serverInfo, > * serverAddr 2 serverInfo, > * serverName 2 load, > * load 2 severName > 1) serverName 2 load - not necessary if you have serverName 2 serverInfo > 2) All mappings are encapsulated in ServersInfo class (inner class of > ServerManager) > 3) ServersInfo has operations for adding, updating and removing information > of HRS > + some code in RegionServer is puted in synchronised block... cause it is > working with synchronised map... > Note: this task is to make code much much more clear.. and it's not going to > change logic, so no much problem is going appear -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14919) Infrastructure refactoring
[ https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043498#comment-15043498 ] Eshcar Hillel commented on HBASE-14919: --- Added link to review board. [~yuzhihong] can you tell what's the base for the -1 overall for the patch? > Infrastructure refactoring > -- > > Key: HBASE-14919 > URL: https://issues.apache.org/jira/browse/HBASE-14919 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-14919-V01.patch > > > Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as > first-class citizen and decoupling memstore scanner from the memstore > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14919) Infrastructure refactoring
[ https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14919: -- Attachment: HBASE-14919-V01.patch Retry hadoopqa > Infrastructure refactoring > -- > > Key: HBASE-14919 > URL: https://issues.apache.org/jira/browse/HBASE-14919 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch > > > Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as > first-class citizen and decoupling memstore scanner from the memstore > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14822) Renewing leases of scanners doesn't work
[ https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-14822: -- Attachment: 14822-v5.txt * fixes the long line * removed code for ClientSmallScanner (since nothing needs to be renewed) > Renewing leases of scanners doesn't work > > > Key: HBASE-14822 > URL: https://issues.apache.org/jira/browse/HBASE-14822 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14 >Reporter: Samarth Jain >Assignee: Lars Hofhansl > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98.txt, > 14822-v3-0.98.txt, 14822-v4-0.98.txt, 14822-v4.txt, 14822-v5.txt, 14822.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14919) Infrastructure refactoring
[ https://issues.apache.org/jira/browse/HBASE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043651#comment-15043651 ] stack commented on HBASE-14919: --- This time the patch does not apply [~eshcar] Want to rebase? > Infrastructure refactoring > -- > > Key: HBASE-14919 > URL: https://issues.apache.org/jira/browse/HBASE-14919 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Attachments: HBASE-14919-V01.patch, HBASE-14919-V01.patch > > > Refactoring the MemStore hierarchy, introducing segment (StoreSegment) as > first-class citizen and decoupling memstore scanner from the memstore > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14903) Table Or Region?
[ https://issues.apache.org/jira/browse/HBASE-14903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043660#comment-15043660 ] 胡托 commented on HBASE-14903: http://note.youdao.com/share/?id=86f628bba69e9de9170b4c0642d169c9=note > Table Or Region? > > > Key: HBASE-14903 > URL: https://issues.apache.org/jira/browse/HBASE-14903 > Project: HBase > Issue Type: Bug > Components: documentation >Affects Versions: 2.0.0 >Reporter: 胡托 >Priority: Blocker > > I've been reading on Latest Reference Guide and try to translated into > Chinese! > I think this sentence "When a table is in the process of splitting," > should be "When a Region is in the process of splitting," on chapter 【62.2. > hbase:meta】。 > By the way,is this document the > latest?【http://hbase.apache.org/book.html#arch.overview】I will translate it! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14903) Table Or Region?
[ https://issues.apache.org/jira/browse/HBASE-14903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043659#comment-15043659 ] 胡托 commented on HBASE-14903: How to patch it?Can you give me some guide?For example,document format? > Table Or Region? > > > Key: HBASE-14903 > URL: https://issues.apache.org/jira/browse/HBASE-14903 > Project: HBase > Issue Type: Bug > Components: documentation >Affects Versions: 2.0.0 >Reporter: 胡托 >Priority: Blocker > > I've been reading on Latest Reference Guide and try to translated into > Chinese! > I think this sentence "When a table is in the process of splitting," > should be "When a Region is in the process of splitting," on chapter 【62.2. > hbase:meta】。 > By the way,is this document the > latest?【http://hbase.apache.org/book.html#arch.overview】I will translate it! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14903) Table Or Region?
[ https://issues.apache.org/jira/browse/HBASE-14903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043661#comment-15043661 ] 胡托 commented on HBASE-14903: http://note.youdao.com/share/?id=86f628bba69e9de9170b4c0642d169c9=note > Table Or Region? > > > Key: HBASE-14903 > URL: https://issues.apache.org/jira/browse/HBASE-14903 > Project: HBase > Issue Type: Bug > Components: documentation >Affects Versions: 2.0.0 >Reporter: 胡托 >Priority: Blocker > > I've been reading on Latest Reference Guide and try to translated into > Chinese! > I think this sentence "When a table is in the process of splitting," > should be "When a Region is in the process of splitting," on chapter 【62.2. > hbase:meta】。 > By the way,is this document the > latest?【http://hbase.apache.org/book.html#arch.overview】I will translate it! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-1422) Refactor to Server Manager
[ https://issues.apache.org/jira/browse/HBASE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043671#comment-15043671 ] Hudson commented on HBASE-1422: --- FAILURE: Integrated in HBase-1.2 #427 (See [https://builds.apache.org/job/HBase-1.2/427/]) Revert "HBASE-1422 Delayed flush doesn't work causing flush storms; (stack: rev 1633db4747a32e02af7b7df9a316adcbeeebd92d) * hbase-common/src/main/java/org/apache/hadoop/hbase/JitterScheduledThreadPoolExecutorImpl.java > Refactor to Server Manager > -- > > Key: HBASE-1422 > URL: https://issues.apache.org/jira/browse/HBASE-1422 > Project: HBase > Issue Type: Sub-task >Affects Versions: 0.19.2 >Reporter: Evgeny Ryabitskiy >Assignee: Evgeny Ryabitskiy > Fix For: 0.90.0 > > Attachments: HBASE-1422.patch, HBASE-1422_v2.patch, > HBASE-1422_v3.patch > > > This is refactor to Server Manager class from HBASE-1017 > I separate it for reasons: > * Its better to have several small patchs and apply them iterativly then one > great path > * I fu..** tired from synchronising w/ SVN (this class changes > frequently), you can saw 10 patches in HBASE-1017 > > We need this refactoing for reasons: > * Server Manager looks like shi**.. bad thing... > * is every time harder to make any chnages > * it is becoming more ugly every time > What changes are done: > ServerManager has mapping: > * serverName 2 serverInfo, > * serverAddr 2 serverInfo, > * serverName 2 load, > * load 2 severName > 1) serverName 2 load - not necessary if you have serverName 2 serverInfo > 2) All mappings are encapsulated in ServersInfo class (inner class of > ServerManager) > 3) ServersInfo has operations for adding, updating and removing information > of HRS > + some code in RegionServer is puted in synchronised block... cause it is > working with synchronised map... > Note: this task is to make code much much more clear.. and it's not going to > change logic, so no much problem is going appear -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14915) Hanging test : org.apache.hadoop.hbase.mapreduce.TestImportExport
[ https://issues.apache.org/jira/browse/HBASE-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043669#comment-15043669 ] Heng Chen commented on HBASE-14915: --- Oh, [~stack] I notice this information in QA information which you post in description. {code} Flaked tests: org.apache.hadoop.hbase.client.TestMultiParallel.testFlushCommitsNoAbort(org.apache.hadoop.hbase.client.TestMultiParallel) Run 1: TestMultiParallel.testFlushCommitsNoAbort:241->doTestFlushCommits:293->validateLoadedData:676 null Run 2: PASS {code} It has no relates with org.apache.hadoop.hbase.mapreduce.TestImportExport. Sorry for my mistake. But is it normal for TestMultiParallel.testFlushCommitsNoAbort failed? btw. the QA information which you post in comment show that. {code} Flaked tests: org.apache.hadoop.hbase.master.cleaner.TestLogsCleaner.testZnodeCversionChange(org.apache.hadoop.hbase.master.cleaner.TestLogsCleaner) Run 1: TestLogsCleaner.testZnodeCversionChange:156 � TestTimedOut test timed out afte... Run 2: PASS {code} It seems it has no relates with TestImportExport too? > Hanging test : org.apache.hadoop.hbase.mapreduce.TestImportExport > - > > Key: HBASE-14915 > URL: https://issues.apache.org/jira/browse/HBASE-14915 > Project: HBase > Issue Type: Sub-task > Components: hangingTests >Reporter: stack > Attachments: HBASE-14915-branch-1.2.patch > > > This test hangs a bunch: > Here is latest: > https://builds.apache.org/job/HBase-1.2/418/jdk=latest1.7,label=Hadoop/consoleText -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-7171) Initial web UI for region/memstore/storefiles details
[ https://issues.apache.org/jira/browse/HBASE-7171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043678#comment-15043678 ] Mikhail Antonov commented on HBASE-7171: [~stack] not crazy, why.There's related handy jira for that - HBASE-4040. Let me take a look at it, should be similar to how it's done in WALPrettyPrinter (and then could do similar web UI for it). The refactoring needed is that HFilePrettyPrinter is a tool, invocable from shell, which outputs to hardcoded System.out, so need to make it invocable from regular code (need to make sure it's backward compatible). Meanwhile any thoughts on the screenshots? Some more info to add in header/columns...? > Initial web UI for region/memstore/storefiles details > - > > Key: HBASE-7171 > URL: https://issues.apache.org/jira/browse/HBASE-7171 > Project: HBase > Issue Type: Improvement > Components: UI >Reporter: stack >Assignee: Mikhail Antonov > Labels: beginner > Attachments: HBASE-7171.patch, region_details.png, region_list.png, > storefile_details.png > > > Click on a region in UI and get a listing of hfiles in HDFS and summary of > memstore content; click on an HFile and see its content -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14915) Hanging test : org.apache.hadoop.hbase.mapreduce.TestImportExport
[ https://issues.apache.org/jira/browse/HBASE-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043648#comment-15043648 ] stack commented on HBASE-14915: --- It hung again https://builds.apache.org/job/HBase-1.2/jdk=latest1.7,label=Hadoop/426/consoleText I missed your patch [~chenheng] Let me apply. Thank you. > Hanging test : org.apache.hadoop.hbase.mapreduce.TestImportExport > - > > Key: HBASE-14915 > URL: https://issues.apache.org/jira/browse/HBASE-14915 > Project: HBase > Issue Type: Sub-task > Components: hangingTests >Reporter: stack > Attachments: HBASE-14915-branch-1.2.patch > > > This test hangs a bunch: > Here is latest: > https://builds.apache.org/job/HBase-1.2/418/jdk=latest1.7,label=Hadoop/consoleText -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14915) Hanging test : org.apache.hadoop.hbase.mapreduce.TestImportExport
[ https://issues.apache.org/jira/browse/HBASE-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043649#comment-15043649 ] stack commented on HBASE-14915: --- Your patch is for TestMultiParallel rather than TestImportExport, is that intentional? We need this on TestMultiParallel too? > Hanging test : org.apache.hadoop.hbase.mapreduce.TestImportExport > - > > Key: HBASE-14915 > URL: https://issues.apache.org/jira/browse/HBASE-14915 > Project: HBase > Issue Type: Sub-task > Components: hangingTests >Reporter: stack > Attachments: HBASE-14915-branch-1.2.patch > > > This test hangs a bunch: > Here is latest: > https://builds.apache.org/job/HBase-1.2/418/jdk=latest1.7,label=Hadoop/consoleText -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043657#comment-15043657 ] stack commented on HBASE-14004: --- bq. ReplicationSource should only read WAL that is hsynced to prevent slave cluster having data that master losses. This will require big change in how replication works but for the better and replication will be less resource intense because less NN ops (if crash, we ask NN for file length, not ZK? If so, this would be a task we have been needing to do for a long time; i.e. undo keeping replication position in zk). bq. WAL reader can handle duplicate entries, in other words, make WAL logging idempotent. Might have to add some code to reader to skip an entry it has seen before (this may be there already -- need to check). bq. Fixing HBase writing path that we should retry logging WAL in a new file rather than rollback MemStore. This is new but has been done before. I'd be up for helping w/ WAL changes, stuff like keeping around appends until the sync for them comes in (I've messed w/ this before), and would be interested in helping out on replication log length accounting changing it from relying on reopen after it gets EOF and keeping length in zk. You fellas are fixing a few fundamental issues here. Sweet. bq. we will still rollback MemStore since we can confirm that the WAL entries have not been written out. Right? We could try rejiggering the order in which memstore gets updated, putting it off till after the sync. The order we have now came about long time ago when WAL was very different. We might be able to change the order, simplify the write pipeline, and not lose too much perf (or, perhaps, get more perf because we are doing healthier group commits). bq. Maybe we could get current total write out bytes first(not acked length) and then call hsync, the acked length after calling hsync must be larger than this value so it is safe to use this value as "acked length". It would be good if hbase could calculate the written length itself. We could try it. What happens if we want to compress WAL or what about crc tax (I suppose this latter would be a constant -- and for the former, maybe we could figure then length... even on compress if per edit or per batch) bq. I don't know if there is a sequence increment unique id for each wal log. There is such a sequenceid but it is by-region, not global. Could keep sequence id by region accounts? (We already do this elsewhere). > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-1422) Refactor to Server Manager
[ https://issues.apache.org/jira/browse/HBASE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043663#comment-15043663 ] Hudson commented on HBASE-1422: --- SUCCESS: Integrated in HBase-1.3 #421 (See [https://builds.apache.org/job/HBase-1.3/421/]) Revert "HBASE-1422 Delayed flush doesn't work causing flush storms; (stack: rev 693e1dee4cf46159c6c1e255e40a346985db4115) * hbase-common/src/main/java/org/apache/hadoop/hbase/JitterScheduledThreadPoolExecutorImpl.java > Refactor to Server Manager > -- > > Key: HBASE-1422 > URL: https://issues.apache.org/jira/browse/HBASE-1422 > Project: HBase > Issue Type: Sub-task >Affects Versions: 0.19.2 >Reporter: Evgeny Ryabitskiy >Assignee: Evgeny Ryabitskiy > Fix For: 0.90.0 > > Attachments: HBASE-1422.patch, HBASE-1422_v2.patch, > HBASE-1422_v3.patch > > > This is refactor to Server Manager class from HBASE-1017 > I separate it for reasons: > * Its better to have several small patchs and apply them iterativly then one > great path > * I fu..** tired from synchronising w/ SVN (this class changes > frequently), you can saw 10 patches in HBASE-1017 > > We need this refactoing for reasons: > * Server Manager looks like shi**.. bad thing... > * is every time harder to make any chnages > * it is becoming more ugly every time > What changes are done: > ServerManager has mapping: > * serverName 2 serverInfo, > * serverAddr 2 serverInfo, > * serverName 2 load, > * load 2 severName > 1) serverName 2 load - not necessary if you have serverName 2 serverInfo > 2) All mappings are encapsulated in ServersInfo class (inner class of > ServerManager) > 3) ServersInfo has operations for adding, updating and removing information > of HRS > + some code in RegionServer is puted in synchronised block... cause it is > working with synchronised map... > Note: this task is to make code much much more clear.. and it's not going to > change logic, so no much problem is going appear -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
[ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043664#comment-15043664 ] Duo Zhang commented on HBASE-14004: --- {quote} This will require big change in how replication works but for the better and replication will be less resource intense because less NN ops (if crash, we ask NN for file length, not ZK? If so, this would be a task we have been needing to do for a long time; i.e. undo keeping replication position in zk). {quote} I think we should have two branches to determine how many entries can we read. One is for closed WAL file, one is for the WAL still being written. We can get this information using {{DistributedFileSystem.isFileClosed}}. If the file is already closed, then we could use the length that gotten from HDFS. If the file is still opened for writting, then we should ask the rs who is writing it for the safe length. If we can not find the rs(maybe it has already crashed), then we could wait a minute since namenode will finally recover its lease and close the file. {quote} There is such a sequenceid but it is by-region, not global. Could keep sequence id by region accounts? (We already do this elsewhere). {quote} So maybe we still need to use "acked length", not "acked id". But this is enough to filter out duplicate WAL entries I think. > [Replication] Inconsistency between Memstore and WAL may result in data in > remote cluster that is not in the origin > --- > > Key: HBASE-14004 > URL: https://issues.apache.org/jira/browse/HBASE-14004 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: He Liangliang >Priority: Critical > Labels: replication, wal > > Looks like the current write path can cause inconsistency between > memstore/hfile and WAL which cause the slave cluster has more data than the > master cluster. > The simplified write path looks like: > 1. insert record into Memstore > 2. write record to WAL > 3. sync WAL > 4. rollback Memstore if 3 fails > It's possible that the HDFS sync RPC call fails, but the data is already > (may partially) transported to the DNs which finally get persisted. As a > result, the handler will rollback the Memstore and the later flushed HFile > will also skip this record. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14922) Delayed flush doesn't work causing flush storms.
[ https://issues.apache.org/jira/browse/HBASE-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043662#comment-15043662 ] Hudson commented on HBASE-14922: SUCCESS: Integrated in HBase-1.3 #421 (See [https://builds.apache.org/job/HBase-1.3/421/]) Revert "Revert "HBASE-14922 Delayed flush doesn't work causing flush (stack: rev d955cb328046cb7efe78486fd0f4b03258ea2f6a) * hbase-common/src/main/java/org/apache/hadoop/hbase/JitterScheduledThreadPoolExecutorImpl.java > Delayed flush doesn't work causing flush storms. > > > Key: HBASE-14922 > URL: https://issues.apache.org/jira/browse/HBASE-14922 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0, 1.2.0, 1.1.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14922-fix.txt, HBASE-14922-v1.patch, > HBASE-14922-v2.patch, HBASE-14922.patch > > > Starting all regionservers at the same time will mean that most > PeriodicMemstoreFlusher's will be running at the same time. So all of these > threads will queue flushes at about the same time. > This was supposed to be mitigated by Delayed. However that isn't nearly > enough. This results in the immediate filling up and then draining of the > flush queues every hour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14922) Delayed flush doesn't work causing flush storms.
[ https://issues.apache.org/jira/browse/HBASE-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043670#comment-15043670 ] Hudson commented on HBASE-14922: FAILURE: Integrated in HBase-1.2 #427 (See [https://builds.apache.org/job/HBase-1.2/427/]) Revert "Revert "HBASE-14922 Delayed flush doesn't work causing flush (stack: rev 1bcb2c66aed24105c02300baf8bc431f81c9e76e) * hbase-common/src/main/java/org/apache/hadoop/hbase/JitterScheduledThreadPoolExecutorImpl.java > Delayed flush doesn't work causing flush storms. > > > Key: HBASE-14922 > URL: https://issues.apache.org/jira/browse/HBASE-14922 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0, 1.2.0, 1.1.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14922-fix.txt, HBASE-14922-v1.patch, > HBASE-14922-v2.patch, HBASE-14922.patch > > > Starting all regionservers at the same time will mean that most > PeriodicMemstoreFlusher's will be running at the same time. So all of these > threads will queue flushes at about the same time. > This was supposed to be mitigated by Delayed. However that isn't nearly > enough. This results in the immediate filling up and then draining of the > flush queues every hour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)