[jira] [Updated] (HBASE-10451) Enable back Tag compression on HFiles
[ https://issues.apache.org/jira/browse/HBASE-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-10451: --- Attachment: HBASE-10451_V6.patch Tests are passing with this. IntegrationTestIngestWithVisibilityLabels run also looks fine. Change in TestEncodedSeekers is to remove encodeOnDisk parameter. We dont have any such setting available in now in HCD. Enable back Tag compression on HFiles - Key: HBASE-10451 URL: https://issues.apache.org/jira/browse/HBASE-10451 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Critical Fix For: 0.98.1, 0.99.0 Attachments: HBASE-10451.patch, HBASE-10451_V2.patch, HBASE-10451_V3.patch, HBASE-10451_V4.patch, HBASE-10451_V5.patch, HBASE-10451_V6.patch HBASE-10443 disables tag compression on HFiles. This Jira is to fix the issues we have found out in HBASE-10443 and enable it back. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10594) Speed up TestRestoreSnapshotFromClient a bit
[ https://issues.apache.org/jira/browse/HBASE-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910100#comment-13910100 ] Hudson commented on HBASE-10594: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #166 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/166/]) HBASE-10594 Speed up TestRestoreSnapshotFromClient a bit. (larsh: rev 1571142) * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestRestoreSnapshotFromClient.java Speed up TestRestoreSnapshotFromClient a bit Key: HBASE-10594 URL: https://issues.apache.org/jira/browse/HBASE-10594 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.18 Attachments: 10594-0.94.txt, 10594-trunk.txt Looking through the longest running test in 0.94 I noticed that TestRestoreSnapshotFromClient runs for over 10 minutes on the jenkins boxes (264s on my local box). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-8304) Bulkload fail to remove files if fs.default.name / fs.defaultFS is configured without default port.
[ https://issues.apache.org/jira/browse/HBASE-8304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated HBASE-8304: Attachment: (was: HBASE-9537.patch) Bulkload fail to remove files if fs.default.name / fs.defaultFS is configured without default port. --- Key: HBASE-8304 URL: https://issues.apache.org/jira/browse/HBASE-8304 Project: HBase Issue Type: Bug Components: HFile, regionserver Affects Versions: 0.94.5 Reporter: Raymond Liu Labels: bulkloader Attachments: HBASE-8304.patch When fs.default.name or fs.defaultFS in hadoop core-site.xml is configured as hdfs://ip, and hbase.rootdir is configured as hdfs://ip:port/hbaserootdir where port is the hdfs namenode's default port. the bulkload operation will not remove the file in bulk output dir. Store::bulkLoadHfile will think hdfs:://ip and hdfs:://ip:port as different filesystem and go with copy approaching instead of rename. The root cause is that hbase master will rewrite fs.default.name/fs.defaultFS according to hbase.rootdir when regionserver started, thus, dest fs uri from the hregion will not matching src fs uri passed from client. any suggestion what is the best approaching to fix this issue? I kind of think that we could check for default port if src uri come without port info. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10451) Enable back Tag compression on HFiles
[ https://issues.apache.org/jira/browse/HBASE-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-10451: --- Status: Open (was: Patch Available) Enable back Tag compression on HFiles - Key: HBASE-10451 URL: https://issues.apache.org/jira/browse/HBASE-10451 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Critical Fix For: 0.98.1, 0.99.0 Attachments: HBASE-10451.patch, HBASE-10451_V2.patch, HBASE-10451_V3.patch, HBASE-10451_V4.patch, HBASE-10451_V5.patch HBASE-10443 disables tag compression on HFiles. This Jira is to fix the issues we have found out in HBASE-10443 and enable it back. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-8304) Bulkload fail to remove files if fs.default.name / fs.defaultFS is configured without default port.
[ https://issues.apache.org/jira/browse/HBASE-8304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated HBASE-8304: Attachment: HBASE-8304.patch Bulkload fail to remove files if fs.default.name / fs.defaultFS is configured without default port. --- Key: HBASE-8304 URL: https://issues.apache.org/jira/browse/HBASE-8304 Project: HBase Issue Type: Bug Components: HFile, regionserver Affects Versions: 0.94.5 Reporter: Raymond Liu Labels: bulkloader Attachments: HBASE-8304.patch When fs.default.name or fs.defaultFS in hadoop core-site.xml is configured as hdfs://ip, and hbase.rootdir is configured as hdfs://ip:port/hbaserootdir where port is the hdfs namenode's default port. the bulkload operation will not remove the file in bulk output dir. Store::bulkLoadHfile will think hdfs:://ip and hdfs:://ip:port as different filesystem and go with copy approaching instead of rename. The root cause is that hbase master will rewrite fs.default.name/fs.defaultFS according to hbase.rootdir when regionserver started, thus, dest fs uri from the hregion will not matching src fs uri passed from client. any suggestion what is the best approaching to fix this issue? I kind of think that we could check for default port if src uri come without port info. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10451) Enable back Tag compression on HFiles
[ https://issues.apache.org/jira/browse/HBASE-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-10451: --- Status: Patch Available (was: Open) Enable back Tag compression on HFiles - Key: HBASE-10451 URL: https://issues.apache.org/jira/browse/HBASE-10451 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Critical Fix For: 0.98.1, 0.99.0 Attachments: HBASE-10451.patch, HBASE-10451_V2.patch, HBASE-10451_V3.patch, HBASE-10451_V4.patch, HBASE-10451_V5.patch, HBASE-10451_V6.patch HBASE-10443 disables tag compression on HFiles. This Jira is to fix the issues we have found out in HBASE-10443 and enable it back. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10594) Speed up TestRestoreSnapshotFromClient a bit
[ https://issues.apache.org/jira/browse/HBASE-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910093#comment-13910093 ] Hudson commented on HBASE-10594: FAILURE: Integrated in HBase-0.94-security #420 (See [https://builds.apache.org/job/HBase-0.94-security/420/]) HBASE-10594 Speed up TestRestoreSnapshotFromClient a bit. (larsh: rev 1571144) * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestRestoreSnapshotFromClient.java Speed up TestRestoreSnapshotFromClient a bit Key: HBASE-10594 URL: https://issues.apache.org/jira/browse/HBASE-10594 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.18 Attachments: 10594-0.94.txt, 10594-trunk.txt Looking through the longest running test in 0.94 I noticed that TestRestoreSnapshotFromClient runs for over 10 minutes on the jenkins boxes (264s on my local box). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10595) HBaseAdmin.getTableDescriptor can wrongly get the previous table's TableDescriptor even after the table dir in hdfs is removed
[ https://issues.apache.org/jira/browse/HBASE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910114#comment-13910114 ] Hadoop QA commented on HBASE-10595: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630609/HBASE-10595-trunk_v2.patch against trunk revision . ATTACHMENT ID: 12630609 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.util.TestHBaseFsck.testSplitDaughtersNotInMeta(TestHBaseFsck.java:1477) at org.apache.hadoop.hbase.util.TestHBaseFsck.testOverlapAndOrphan(TestHBaseFsck.java:859) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8781//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8781//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8781//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8781//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8781//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8781//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8781//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8781//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8781//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8781//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8781//console This message is automatically generated. HBaseAdmin.getTableDescriptor can wrongly get the previous table's TableDescriptor even after the table dir in hdfs is removed -- Key: HBASE-10595 URL: https://issues.apache.org/jira/browse/HBASE-10595 Project: HBase Issue Type: Bug Components: master, util Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10595-trunk_v1.patch, HBASE-10595-trunk_v2.patch When a table dir (in hdfs) is removed(by outside), HMaster will still return the cached TableDescriptor to client for getTableDescriptor request. On the contrary, HBaseAdmin.listTables() is handled correctly in current implementation, for a table whose table dir in hdfs is removed by outside, getTableDescriptor can still retrieve back a valid (old) table descriptor, while listTables says it doesn't exist, this is inconsistent The reason for this bug is because HMaster (via FSTableDescriptors) doesn't check if the table dir exists for getTableDescriptor() request, (while it lists all existing table dirs(not firstly respects cache) and returns accordingly for listTables() request) When a table is deleted via deleteTable, the cache will be cleared after the table dir and tableInfo file is removed, listTables/getTableDescriptor inconsistency should be transient(though still exists, when table dir is removed while cache is not cleared) and harder to expose -- This
[jira] [Commented] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt
[ https://issues.apache.org/jira/browse/HBASE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910125#comment-13910125 ] Nicolas Liochon commented on HBASE-10525: - bq. when the connection.interrupt() is invoked the reader thread gets it.. What happens to the writer thread if it was waiting in the callsToWrite.take() call. In #markClosed, we put a Call to the queue (the DEATH_PILL), this way the writers exits the 'take' method. The reader thread calls #markClosed on any exception, interruptions included. Allow the client to use a different thread for writing to ease interrupt Key: HBASE-10525 URL: https://issues.apache.org/jira/browse/HBASE-10525 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10525.v1.patch, 10525.v2.patch, 10525.v3.patch, 10525.v4.patch, 10525.v5.patch, 10525.v6.patch, 10525.v7.patch, HBaseclient-EventualConsistency.pdf This is an issue in the HBASE-10070 context, but as well more generally if you want to interrupt an operation with a limited cost. I will attach a doc with a more detailed explanation. This adds a thread per region server; so it's otional. The first patch activates it by default to see how it behaves on a full hadoop-qa run. The target is to be unset by default. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8304) Bulkload fail to remove files if fs.default.name / fs.defaultFS is configured without default port.
[ https://issues.apache.org/jira/browse/HBASE-8304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910126#comment-13910126 ] Hadoop QA commented on HBASE-8304: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630630/HBASE-8304.patch against trunk revision . ATTACHMENT ID: 12630630 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 hadoop1.0{color}. The patch failed to compile against the hadoop 1.0 profile. Here is snippet of errors: {code}[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hbase-server: Compilation failure [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionFileSystem.java:[430,70] cannot find symbol [ERROR] symbol : method getNNServiceRpcAddresses(org.apache.hadoop.conf.Configuration) [ERROR] location: class org.apache.hadoop.hdfs.DFSUtil [ERROR] - [Help 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hbase-server: Compilation failure /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionFileSystem.java:[430,70] cannot find symbol symbol : method getNNServiceRpcAddresses(org.apache.hadoop.conf.Configuration) location: class org.apache.hadoop.hdfs.DFSUtil at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213) -- Caused by: org.apache.maven.plugin.CompilationFailureException: Compilation failure /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionFileSystem.java:[430,70] cannot find symbol symbol : method getNNServiceRpcAddresses(org.apache.hadoop.conf.Configuration) location: class org.apache.hadoop.hdfs.DFSUtil at org.apache.maven.plugin.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:729){code} Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8783//console This message is automatically generated. Bulkload fail to remove files if fs.default.name / fs.defaultFS is configured without default port. --- Key: HBASE-8304 URL: https://issues.apache.org/jira/browse/HBASE-8304 Project: HBase Issue Type: Bug Components: HFile, regionserver Affects Versions: 0.94.5 Reporter: Raymond Liu Labels: bulkloader Attachments: HBASE-8304.patch When fs.default.name or fs.defaultFS in hadoop core-site.xml is configured as hdfs://ip, and hbase.rootdir is configured as hdfs://ip:port/hbaserootdir where port is the hdfs namenode's default port. the bulkload operation will not remove the file in bulk output dir. Store::bulkLoadHfile will think hdfs:://ip and hdfs:://ip:port as different filesystem and go with copy approaching instead of rename. The root cause is that hbase master will rewrite fs.default.name/fs.defaultFS according to hbase.rootdir when regionserver started, thus, dest fs uri from the hregion will not matching src fs uri passed from client. any suggestion what is the best approaching to fix this issue? I kind of think that we could check for default port if src uri come without port info. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10355) Failover RPC's from client using region replicas
[ https://issues.apache.org/jira/browse/HBASE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910141#comment-13910141 ] Nicolas Liochon commented on HBASE-10355: - bq. The following fragment solves the issue for me. Basically we just rethrow InterrruptedIOEx. Can you take a look: This would work. We need as well to exclude the SocketTimeoutException. This is done by an utility class. So it would become: {code} private RegionLocations getRegionLocations(boolean useCache) throws RetriesExhaustedException, DoNotRetryIOException, InterruptedIOException { RegionLocations rl; try { rl = cConnection.locateRegion(tableName, get.getRow(), useCache, true); } catch (DoNotRetryIOException e) { throw e; } catch (RetriesExhaustedException e) { throw e; } catch (IOException e) { ExceptionUtil.rethrowIfInterrupt(e); throw new RetriesExhaustedException(Can't get the location, e); } if (rl == null) { throw new RetriesExhaustedException(Can't get the locations); } return rl; } {code} Failover RPC's from client using region replicas Key: HBASE-10355 URL: https://issues.apache.org/jira/browse/HBASE-10355 Project: HBase Issue Type: Sub-task Components: Client Reporter: Enis Soztutar Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10355.v1.patch, 10355.v2.patch, 10355.v3.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10579) [Documentation]: ExportSnapshot tool package incorrectly documented
[ https://issues.apache.org/jira/browse/HBASE-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-10579: Assignee: Aleksandr Shulman [Documentation]: ExportSnapshot tool package incorrectly documented --- Key: HBASE-10579 URL: https://issues.apache.org/jira/browse/HBASE-10579 Project: HBase Issue Type: Bug Components: documentation, snapshots Affects Versions: 0.98.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.96.2, 0.98.1 Attachments: HBASE-10579-v0.patch Documentation Page: http://hbase.apache.org/book/ops.snapshots.html Expected documentation: The class should be specified as org.apache.hadoop.hbase.snapshot.ExportSnapshot Current documentation: Specified as: org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot This makes sense because the class is located in the org.apache.hadoop.hbase.snapshot package: https://github.com/apache/hbase/blob/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java#19 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client
[ https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910145#comment-13910145 ] Nicolas Liochon commented on HBASE-10566: - bq. I suppose it is ok. Maybe rename the class so it is not confused with Callable. Actually, we never use the fact that it's a java Callable. But changing the name can impact a lot of code. I will try (intelliJ will do the change for me, but it can make the patch much bigger, I dont know). bq. Is TimeLimitedRpcController left as an exercise to the reader I forgot it (usual stuff: not added to git, so not included in git diff). But the patch globally compiles but does not set the timeout all the time. bq. Doesn't callTimeout make more sense for this parameter name? Often timeout indicates a duration, while here I used something like a cutoff time. That's what I wanted to express. There is an implication however: the client and the server time must be in sync. Even if it's a common requirement, I'm not sure I'm not going to change my mind. Thanks a lot for the feedback, I'm going to try to write the full patch. cleanup rpcTimeout in the client Key: HBASE-10566 URL: https://issues.apache.org/jira/browse/HBASE-10566 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10566.sample.patch There are two issues: 1) A confusion between the socket timeout and the call timeout Socket timeouts should be minimal: a default like 20 seconds, that could be lowered to single digits timeouts for some apps: if we can not write to the socket in 10 second, we have an issue. This is different from the total duration (send query + do query + receive query), that can be longer, as it can include remotes calls on the server and so on. Today, we have a single value, it does not allow us to have low socket read timeouts. 2) The timeout can be different between the calls. Typically, if the total time, retries included is 60 seconds but failed after 2 seconds, then the remaining is 58s. HBase does this today, but by hacking with a thread local storage variable. It's a hack (it should have been a parameter of the methods, the TLS allowed to bypass all the layers. May be protobuf makes this complicated, to be confirmed), but as well it does not really work, because we can have multithreading issues (we use the updated rpc timeout of someone else, or we create a new BlockingRpcChannelImplementation with a random default timeout). Ideally, we could send the call timeout to the server as well: it will be able to dismiss alone the calls that it received but git stick in the request queue or in the internal retries (on hdfs for example). This will make the system more reactive to failure. I think we can solve this now, especially after 10525. The main issue is to something that fits well with protobuf... Then it should be easy to have a pool of thread for writers and readers, w/o a single thread per region server as today. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-10579) [Documentation]: ExportSnapshot tool package incorrectly documented
[ https://issues.apache.org/jira/browse/HBASE-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi resolved HBASE-10579. - Resolution: Fixed Fix Version/s: 0.99.0 committed, thanks for the patch [Documentation]: ExportSnapshot tool package incorrectly documented --- Key: HBASE-10579 URL: https://issues.apache.org/jira/browse/HBASE-10579 Project: HBase Issue Type: Bug Components: documentation, snapshots Affects Versions: 0.98.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: HBASE-10579-v0.patch Documentation Page: http://hbase.apache.org/book/ops.snapshots.html Expected documentation: The class should be specified as org.apache.hadoop.hbase.snapshot.ExportSnapshot Current documentation: Specified as: org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot This makes sense because the class is located in the org.apache.hadoop.hbase.snapshot package: https://github.com/apache/hbase/blob/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java#19 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10598) Written data can not be read out because MemStore#timeRangeTracker might be updated concurrently
cuijianwei created HBASE-10598: -- Summary: Written data can not be read out because MemStore#timeRangeTracker might be updated concurrently Key: HBASE-10598 URL: https://issues.apache.org/jira/browse/HBASE-10598 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.16 Reporter: cuijianwei In our test environment, we found that written data can't be read out occasionally. After debugging, we find that maximumTimestamp/minimumTimestamp of MemStore#timeRangeTracker might decrease/increase when MemStore#timeRangeTracker is updated concurrently, which might make the MemStore/StoreFile to be filtered incorrectly when reading data out. Let's see how the concurrent updating of timeRangeTracker#maximumTimestamp cause this problem. Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. kv1 and kv2 belong to the same Store(the same region), but contain different rowkeys. Consequently, kv1 and kv2 could be updated concurrently. When we see the implementation of HRegionServer#multi, kv1 and kv2 will be add to MemStore by HRegion#doMiniBatchMutation#applyFamilyMapToMemstore. Then, MemStore#internalAdd will be invoked and MemStore#timeRangeTracker will be updated by TimeRangeTracker#includeTimestamp as follows: {code} private void includeTimestamp(final long timestamp) { ... else if (maximumTimestamp timestamp) { maximumTimestamp = timestamp; } return; } {code} Imagining the current maximumTimestamp is t0 before includeTimestamp invoked, kv1.timestamp=t1, kv2.timestamp=t2, t1 and t2 are both set by user(then, user knows the timestamp of kv1 and kv2), and t1 t2. T1 and T2 will be executed concurrently, therefore, the two threads might both find the current maximumTimestamp is less than the timestamp of its kv. After that, T1 and T2 will both set maximumTimestamp to timestamp of its kv. If T1 set maximumTimestamp before T2 doing that, the maximumTimestamp will be set to t2. Then, before any new update with bigger timestamp has been applied to the MemStore, if we try to read out kv1 by HTable#get and set the timestamp of 'Get' to t1, the StoreScanner will decide whether the MemStoreScanner(imagining kv1 has not been flushed) should be selected as candidate scanner by the method MemStoreScanner#shouldUseScanner. The MemStore won't be selected because maximumTimestamp of the MemStore has been set to t2 (t2 t1). Consequently, the written kv1 can't be read out and kv1 is lost from user's perspective. If the analysis of above is right, after maximumTimestamp of MemStore#timeRangeTracker has been set to t2, user will experience data lass in the following situations: 1. Before any new write with kv.timestamp t1 has been add to the MemStore, read request of kv1 with timestamp=t1 can not read kv1 out. 2. Before any new put with kv.timestamp t1 has been add to the MemStore, if a flush happened, the data of MemStore will be flushed to StoreFile with StoreFile#maximumTimestamp set to t2. After that, any read request with timestamp=t2 can not read kv1 before next compaction(the content of StoreFile won't change and kv1.timestamp might also not be included even after compaction). The second situation is much more serious because the incorrect timeRange of MemStore has been persisted to the file. And Similarly, the concurrent update of TimeRangeTracker#minimumTimestamp may also cause this problem. As a simple way to fix the problem, we could add synchronized to TimeRangeTracker#includeTimestamp so that this method won't be invoked concurrently. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10598) Written data can not be read out because MemStore#timeRangeTracker might be updated concurrently
[ https://issues.apache.org/jira/browse/HBASE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cuijianwei updated HBASE-10598: --- Description: In our test environment, we find written data can't be read out occasionally. After debugging, we find that maximumTimestamp/minimumTimestamp of MemStore#timeRangeTracker might decrease/increase when MemStore#timeRangeTracker is updated concurrently, which might make the MemStore/StoreFile to be filtered incorrectly when reading data out. Let's see how the concurrent updating of timeRangeTracker#maximumTimestamp cause this problem. Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. kv1 and kv2 belong to the same Store(so belong to the same region), but contain different rowkeys. Consequently, kv1 and kv2 could be updated concurrently. When we see the implementation of HRegionServer#multi, kv1 and kv2 will be add to MemStore by HRegion#doMiniBatchMutation#applyFamilyMapToMemstore. Then, MemStore#internalAdd will be invoked and MemStore#timeRangeTracker will be updated by TimeRangeTracker#includeTimestamp as follows: {code} private void includeTimestamp(final long timestamp) { ... else if (maximumTimestamp timestamp) { maximumTimestamp = timestamp; } return; } {code} Imagining the current maximumTimestamp of TimeRangeTracker is t0 before includeTimestamp invoked, kv1.timestamp=t1, kv2.timestamp=t2, t1 and t2 are both set by user(then, user knows the timestamps of kv1 and kv2), and t1 t2 t0. T1 and T2 will be executed concurrently, therefore, the two threads might both find the current maximumTimestamp is less than the timestamp of its kv. After that, T1 and T2 will both set maximumTimestamp to timestamp of its kv. If T1 set maximumTimestamp before T2 doing that, the maximumTimestamp will be set to t2. Then, before any new update with bigger timestamp has been applied to the MemStore, if we try to read out kv1 by HTable#get and set the timestamp of 'Get' to t1, the StoreScanner will decide whether the MemStoreScanner(imagining kv1 has not been flushed) should be selected as candidate scanner by MemStoreScanner#shouldUseScanner. Then, the MemStore won't be selected because maximumTimestamp of the MemStore has been set to t2 (t2 t1). Consequently, the written kv1 can't be read out and kv1 is lost from user's perspective. If the analysis of above is right, after maximumTimestamp of MemStore#timeRangeTracker has been set to t2, user will experience data lass in the following situations: 1. Before any new write with kv.timestamp t1 has been add to the MemStore, read request of kv1 with timestamp=t1 can not read kv1 out. 2. Before any new put with kv.timestamp t1 has been add to the MemStore, if a flush happened, the data of MemStore will be flushed to StoreFile with StoreFile#maximumTimestamp set to t2. After that, any read request with timestamp=t2 can not read kv1 before next compaction(the content of StoreFile won't change and kv1.timestamp might also not be included even after compaction). The second situation is much more serious because the incorrect timeRange of MemStore has been persisted to the file. And Similarly, the concurrent update of TimeRangeTracker#minimumTimestamp may also cause this problem. As a simple way to fix the problem, we could add synchronized to TimeRangeTracker#includeTimestamp so that this method won't be invoked concurrently. was: In our test environment, we found that written data can't be read out occasionally. After debugging, we find that maximumTimestamp/minimumTimestamp of MemStore#timeRangeTracker might decrease/increase when MemStore#timeRangeTracker is updated concurrently, which might make the MemStore/StoreFile to be filtered incorrectly when reading data out. Let's see how the concurrent updating of timeRangeTracker#maximumTimestamp cause this problem. Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. kv1 and kv2 belong to the same Store(the same region), but contain different rowkeys. Consequently, kv1 and kv2 could be updated concurrently. When we see the implementation of HRegionServer#multi, kv1 and kv2 will be add to MemStore by HRegion#doMiniBatchMutation#applyFamilyMapToMemstore. Then, MemStore#internalAdd will be invoked and MemStore#timeRangeTracker will be updated by TimeRangeTracker#includeTimestamp as follows: {code} private void includeTimestamp(final long timestamp) { ... else if (maximumTimestamp timestamp) { maximumTimestamp = timestamp; } return; } {code} Imagining the current maximumTimestamp is t0 before includeTimestamp invoked, kv1.timestamp=t1, kv2.timestamp=t2, t1 and t2 are both set by user(then, user knows the timestamp of kv1 and kv2), and t1 t2. T1 and T2 will be executed concurrently, therefore, the two threads might both find the current maximumTimestamp is less
[jira] [Updated] (HBASE-10598) Written data can not be read out because MemStore#timeRangeTracker might be updated concurrently
[ https://issues.apache.org/jira/browse/HBASE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cuijianwei updated HBASE-10598: --- Description: In our test environment, we find written data can't be read out occasionally. After debugging, we find that maximumTimestamp/minimumTimestamp of MemStore#timeRangeTracker might decrease/increase when MemStore#timeRangeTracker is updated concurrently, which might make the MemStore/StoreFile to be filtered incorrectly when reading data out. Let's see how the concurrent updating of timeRangeTracker#maximumTimestamp cause this problem. Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. kv1 and kv2 belong to the same Store(so belong to the same region), but contain different rowkeys. Consequently, kv1 and kv2 could be updated concurrently. When we see the implementation of HRegionServer#multi, kv1 and kv2 will be add to MemStore by HRegion#doMiniBatchMutation#applyFamilyMapToMemstore. Then, MemStore#internalAdd will be invoked and MemStore#timeRangeTracker will be updated by TimeRangeTracker#includeTimestamp as follows: {code} private void includeTimestamp(final long timestamp) { ... else if (maximumTimestamp timestamp) { maximumTimestamp = timestamp; } return; } {code} Imagining the current maximumTimestamp of TimeRangeTracker is t0 before includeTimestamp invoked, kv1.timestamp=t1, kv2.timestamp=t2, t1 and t2 are both set by user(then, user knows the timestamps of kv1 and kv2), and t1 t2 t0. T1 and T2 will be executed concurrently, therefore, the two threads might both find the current maximumTimestamp is less than the timestamp of its kv. After that, T1 and T2 will both set maximumTimestamp to timestamp of its kv. If T1 set maximumTimestamp before T2 doing that, the maximumTimestamp will be set to t2. Then, before any new update with bigger timestamp has been applied to the MemStore, if we try to read out kv1 by HTable#get and set the timestamp of 'Get' to t1, the StoreScanner will decide whether the MemStoreScanner(imagining kv1 has not been flushed) should be selected as candidate scanner by MemStoreScanner#shouldUseScanner. Then, the MemStore won't be selected because maximumTimestamp of the MemStore has been set to t2 (t2 t1). Consequently, the written kv1 can't be read out and kv1 is lost from user's perspective. If the above analysis is right, after maximumTimestamp of MemStore#timeRangeTracker has been set to t2, user will experience data lass in the following situations: 1. Before any new write with kv.timestamp t1 has been add to the MemStore, read request of kv1 with timestamp=t1 can not read kv1 out. 2. Before any new write with kv.timestamp t1 has been add to the MemStore, if a flush happened, the data of MemStore will be flushed to StoreFile with StoreFile#maximumTimestamp set to t2. After that, any read request with timestamp=t1 can not read kv1 before next compaction(kv1.timestamp might also not be included in timeRange of StoreFile even after compaction). The second situation is much more serious because the incorrect timeRange of MemStore has been persisted to the file. Similarly, the concurrent update of TimeRangeTracker#minimumTimestamp may also cause this problem. As a simple way to fix the problem, we could add synchronized to TimeRangeTracker#includeTimestamp so that this method won't be invoked concurrently. was: In our test environment, we find written data can't be read out occasionally. After debugging, we find that maximumTimestamp/minimumTimestamp of MemStore#timeRangeTracker might decrease/increase when MemStore#timeRangeTracker is updated concurrently, which might make the MemStore/StoreFile to be filtered incorrectly when reading data out. Let's see how the concurrent updating of timeRangeTracker#maximumTimestamp cause this problem. Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. kv1 and kv2 belong to the same Store(so belong to the same region), but contain different rowkeys. Consequently, kv1 and kv2 could be updated concurrently. When we see the implementation of HRegionServer#multi, kv1 and kv2 will be add to MemStore by HRegion#doMiniBatchMutation#applyFamilyMapToMemstore. Then, MemStore#internalAdd will be invoked and MemStore#timeRangeTracker will be updated by TimeRangeTracker#includeTimestamp as follows: {code} private void includeTimestamp(final long timestamp) { ... else if (maximumTimestamp timestamp) { maximumTimestamp = timestamp; } return; } {code} Imagining the current maximumTimestamp of TimeRangeTracker is t0 before includeTimestamp invoked, kv1.timestamp=t1, kv2.timestamp=t2, t1 and t2 are both set by user(then, user knows the timestamps of kv1 and kv2), and t1 t2 t0. T1 and T2 will be executed concurrently, therefore, the two threads might both find the current
[jira] [Updated] (HBASE-10598) Written data can not be read out because MemStore#timeRangeTracker might be updated concurrently
[ https://issues.apache.org/jira/browse/HBASE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cuijianwei updated HBASE-10598: --- Description: In our test environment, we find written data can't be read out occasionally. After debugging, we find that maximumTimestamp/minimumTimestamp of MemStore#timeRangeTracker might decrease/increase when MemStore#timeRangeTracker is updated concurrently, which might make the MemStore/StoreFile to be filtered incorrectly when reading data out. Let's see how the concurrent updating of timeRangeTracker#maximumTimestamp cause this problem. Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. kv1 and kv2 belong to the same Store(so belong to the same region), but contain different rowkeys. Consequently, kv1 and kv2 could be updated concurrently. When we see the implementation of HRegionServer#multi, kv1 and kv2 will be add to MemStore by HRegion#applyFamilyMapToMemstore in HRegion#doMiniBatchMutation. Then, MemStore#internalAdd will be invoked and MemStore#timeRangeTracker will be updated by TimeRangeTracker#includeTimestamp as follows: {code} private void includeTimestamp(final long timestamp) { ... else if (maximumTimestamp timestamp) { maximumTimestamp = timestamp; } return; } {code} Imagining the current maximumTimestamp of TimeRangeTracker is t0 before includeTimestamp(...) invoked, kv1.timestamp=t1, kv2.timestamp=t2, t1 and t2 are both set by user(then, user knows the timestamps of kv1 and kv2), and t1 t2 t0. T1 and T2 will be executed concurrently, therefore, the two threads might both find the current maximumTimestamp is less than the timestamp of its kv. After that, T1 and T2 will both set maximumTimestamp to timestamp of its kv. If T1 set maximumTimestamp before T2 doing that, the maximumTimestamp will be set to t2. Then, before any new update with bigger timestamp has been applied to the MemStore, if we try to read out kv1 by HTable#get and set the timestamp of 'Get' to t1, the StoreScanner will decide whether the MemStoreScanner(imagining kv1 has not been flushed) should be selected as candidate scanner by MemStoreScanner#shouldUseScanner. Then, the MemStore won't be selected because maximumTimestamp of the MemStore has been set to t2 (t2 t1). Consequently, the written kv1 can't be read out and kv1 is lost from user's perspective. If the above analysis is right, after maximumTimestamp of MemStore#timeRangeTracker has been set to t2, user will experience data lass in the following situations: 1. Before any new write with kv.timestamp t1 has been add to the MemStore, read request of kv1 with timestamp=t1 can not read kv1 out. 2. Before any new write with kv.timestamp t1 has been add to the MemStore, if a flush happened, the data of MemStore will be flushed to StoreFile with StoreFile#maximumTimestamp set to t2. After that, any read request with timestamp=t1 can not read kv1 before next compaction(kv1.timestamp might also not be included in timeRange of StoreFile even after compaction). The second situation is much more serious because the incorrect timeRange of MemStore has been persisted to the file. Similarly, the concurrent update of TimeRangeTracker#minimumTimestamp may also cause this problem. As a simple way to fix the problem, we could add synchronized to TimeRangeTracker#includeTimestamp so that this method won't be invoked concurrently. was: In our test environment, we find written data can't be read out occasionally. After debugging, we find that maximumTimestamp/minimumTimestamp of MemStore#timeRangeTracker might decrease/increase when MemStore#timeRangeTracker is updated concurrently, which might make the MemStore/StoreFile to be filtered incorrectly when reading data out. Let's see how the concurrent updating of timeRangeTracker#maximumTimestamp cause this problem. Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. kv1 and kv2 belong to the same Store(so belong to the same region), but contain different rowkeys. Consequently, kv1 and kv2 could be updated concurrently. When we see the implementation of HRegionServer#multi, kv1 and kv2 will be add to MemStore by HRegion#applyFamilyMapToMemstore in HRegion#doMiniBatchMutation. Then, MemStore#internalAdd will be invoked and MemStore#timeRangeTracker will be updated by TimeRangeTracker#includeTimestamp as follows: {code} private void includeTimestamp(final long timestamp) { ... else if (maximumTimestamp timestamp) { maximumTimestamp = timestamp; } return; } {code} Imagining the current maximumTimestamp of TimeRangeTracker is t0 before includeTimestamp invoked, kv1.timestamp=t1, kv2.timestamp=t2, t1 and t2 are both set by user(then, user knows the timestamps of kv1 and kv2), and t1 t2 t0. T1 and T2 will be executed concurrently, therefore, the two threads might both
[jira] [Updated] (HBASE-10598) Written data can not be read out because MemStore#timeRangeTracker might be updated concurrently
[ https://issues.apache.org/jira/browse/HBASE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cuijianwei updated HBASE-10598: --- Description: In our test environment, we find written data can't be read out occasionally. After debugging, we find that maximumTimestamp/minimumTimestamp of MemStore#timeRangeTracker might decrease/increase when MemStore#timeRangeTracker is updated concurrently, which might make the MemStore/StoreFile to be filtered incorrectly when reading data out. Let's see how the concurrent updating of timeRangeTracker#maximumTimestamp cause this problem. Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. kv1 and kv2 belong to the same Store(so belong to the same region), but contain different rowkeys. Consequently, kv1 and kv2 could be updated concurrently. When we see the implementation of HRegionServer#multi, kv1 and kv2 will be add to MemStore by HRegion#applyFamilyMapToMemstore in HRegion#doMiniBatchMutation. Then, MemStore#internalAdd will be invoked and MemStore#timeRangeTracker will be updated by TimeRangeTracker#includeTimestamp as follows: {code} private void includeTimestamp(final long timestamp) { ... else if (maximumTimestamp timestamp) { maximumTimestamp = timestamp; } return; } {code} Imagining the current maximumTimestamp of TimeRangeTracker is t0 before includeTimestamp invoked, kv1.timestamp=t1, kv2.timestamp=t2, t1 and t2 are both set by user(then, user knows the timestamps of kv1 and kv2), and t1 t2 t0. T1 and T2 will be executed concurrently, therefore, the two threads might both find the current maximumTimestamp is less than the timestamp of its kv. After that, T1 and T2 will both set maximumTimestamp to timestamp of its kv. If T1 set maximumTimestamp before T2 doing that, the maximumTimestamp will be set to t2. Then, before any new update with bigger timestamp has been applied to the MemStore, if we try to read out kv1 by HTable#get and set the timestamp of 'Get' to t1, the StoreScanner will decide whether the MemStoreScanner(imagining kv1 has not been flushed) should be selected as candidate scanner by MemStoreScanner#shouldUseScanner. Then, the MemStore won't be selected because maximumTimestamp of the MemStore has been set to t2 (t2 t1). Consequently, the written kv1 can't be read out and kv1 is lost from user's perspective. If the above analysis is right, after maximumTimestamp of MemStore#timeRangeTracker has been set to t2, user will experience data lass in the following situations: 1. Before any new write with kv.timestamp t1 has been add to the MemStore, read request of kv1 with timestamp=t1 can not read kv1 out. 2. Before any new write with kv.timestamp t1 has been add to the MemStore, if a flush happened, the data of MemStore will be flushed to StoreFile with StoreFile#maximumTimestamp set to t2. After that, any read request with timestamp=t1 can not read kv1 before next compaction(kv1.timestamp might also not be included in timeRange of StoreFile even after compaction). The second situation is much more serious because the incorrect timeRange of MemStore has been persisted to the file. Similarly, the concurrent update of TimeRangeTracker#minimumTimestamp may also cause this problem. As a simple way to fix the problem, we could add synchronized to TimeRangeTracker#includeTimestamp so that this method won't be invoked concurrently. was: In our test environment, we find written data can't be read out occasionally. After debugging, we find that maximumTimestamp/minimumTimestamp of MemStore#timeRangeTracker might decrease/increase when MemStore#timeRangeTracker is updated concurrently, which might make the MemStore/StoreFile to be filtered incorrectly when reading data out. Let's see how the concurrent updating of timeRangeTracker#maximumTimestamp cause this problem. Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. kv1 and kv2 belong to the same Store(so belong to the same region), but contain different rowkeys. Consequently, kv1 and kv2 could be updated concurrently. When we see the implementation of HRegionServer#multi, kv1 and kv2 will be add to MemStore by HRegion#doMiniBatchMutation#applyFamilyMapToMemstore. Then, MemStore#internalAdd will be invoked and MemStore#timeRangeTracker will be updated by TimeRangeTracker#includeTimestamp as follows: {code} private void includeTimestamp(final long timestamp) { ... else if (maximumTimestamp timestamp) { maximumTimestamp = timestamp; } return; } {code} Imagining the current maximumTimestamp of TimeRangeTracker is t0 before includeTimestamp invoked, kv1.timestamp=t1, kv2.timestamp=t2, t1 and t2 are both set by user(then, user knows the timestamps of kv1 and kv2), and t1 t2 t0. T1 and T2 will be executed concurrently, therefore, the two threads might both find the current
[jira] [Updated] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt
[ https://issues.apache.org/jira/browse/HBASE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-10525: Resolution: Fixed Release Note: If hbase.ipc.client.allowsInterrupt is set to true (default being false), the writes are performed in a different thread. This workarounds a Java limitation with interruptions and i/o; and limits the impact of interrupting a client call. It's strongly recommended to activate this parameter when using tables with multiple replicas. Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk, thanks for the reviews, all. (Devaraj, I understood your comment as 'ok if', I can obviously revert / amend if you want more time for the review.). Same goes for everyone who wants to chime in: this code is obviously critical complex. Allow the client to use a different thread for writing to ease interrupt Key: HBASE-10525 URL: https://issues.apache.org/jira/browse/HBASE-10525 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10525.v1.patch, 10525.v2.patch, 10525.v3.patch, 10525.v4.patch, 10525.v5.patch, 10525.v6.patch, 10525.v7.patch, HBaseclient-EventualConsistency.pdf This is an issue in the HBASE-10070 context, but as well more generally if you want to interrupt an operation with a limited cost. I will attach a doc with a more detailed explanation. This adds a thread per region server; so it's otional. The first patch activates it by default to see how it behaves on a full hadoop-qa run. The target is to be unset by default. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10598) Written data can not be read out because MemStore#timeRangeTracker might be updated concurrently
[ https://issues.apache.org/jira/browse/HBASE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cuijianwei updated HBASE-10598: --- Description: In our test environment, we find written data can't be read out occasionally. After debugging, we find that maximumTimestamp/minimumTimestamp of MemStore#timeRangeTracker might decrease/increase when MemStore#timeRangeTracker is updated concurrently, which might make the MemStore/StoreFile to be filtered incorrectly when reading data out. Let's see how the concurrent updating of timeRangeTracker#maximumTimestamp cause this problem. Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. kv1 and kv2 belong to the same Store(so belong to the same region), but contain different rowkeys. Consequently, kv1 and kv2 could be updated concurrently. When we see the implementation of HRegionServer#multi, kv1 and kv2 will be add to MemStore by HRegion#applyFamilyMapToMemstore in HRegion#doMiniBatchMutation. Then, MemStore#internalAdd will be invoked and MemStore#timeRangeTracker will be updated by TimeRangeTracker#includeTimestamp as follows: {code} private void includeTimestamp(final long timestamp) { ... else if (maximumTimestamp timestamp) { maximumTimestamp = timestamp; } return; } {code} Imagining the current maximumTimestamp of TimeRangeTracker is t0 before includeTimestamp(...) invoked, kv1.timestamp=t1, kv2.timestamp=t2, t1 and t2 are both set by user(then, user knows the timestamps of kv1 and kv2), and t1 t2 t0. T1 and T2 will be executed concurrently, therefore, the two threads might both find the current maximumTimestamp is less than the timestamp of its kv. After that, T1 and T2 will both set maximumTimestamp to timestamp of its kv. If T1 set maximumTimestamp before T2 doing that, the maximumTimestamp will be set to t2. Then, before any new update with bigger timestamp has been applied to the MemStore, if we try to read out kv1 by HTable#get and set the timestamp of 'Get' to t1, the StoreScanner will decide whether the MemStoreScanner(imagining kv1 has not been flushed) should be selected as candidate scanner by MemStoreScanner#shouldUseScanner. Then, the MemStore won't be selected in MemStoreScanner#shouldUseScanner because maximumTimestamp of the MemStore has been set to t2 (t2 t1). Consequently, the written kv1 can't be read out and kv1 is lost from user's perspective. If the above analysis is right, after maximumTimestamp of MemStore#timeRangeTracker has been set to t2, user will experience data lass in the following situations: 1. Before any new write with kv.timestamp t1 has been add to the MemStore, read request of kv1 with timestamp=t1 can not read kv1 out. 2. Before any new write with kv.timestamp t1 has been add to the MemStore, if a flush happened, the data of MemStore will be flushed to StoreFile with StoreFile#maximumTimestamp set to t2. After that, any read request with timestamp=t1 can not read kv1 before next compaction(kv1.timestamp might also not be included in timeRange of StoreFile even after compaction). The second situation is much more serious because the incorrect timeRange of MemStore has been persisted to the file. Similarly, the concurrent update of TimeRangeTracker#minimumTimestamp may also cause this problem. As a simple way to fix the problem, we could add synchronized to TimeRangeTracker#includeTimestamp so that this method won't be invoked concurrently. was: In our test environment, we find written data can't be read out occasionally. After debugging, we find that maximumTimestamp/minimumTimestamp of MemStore#timeRangeTracker might decrease/increase when MemStore#timeRangeTracker is updated concurrently, which might make the MemStore/StoreFile to be filtered incorrectly when reading data out. Let's see how the concurrent updating of timeRangeTracker#maximumTimestamp cause this problem. Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. kv1 and kv2 belong to the same Store(so belong to the same region), but contain different rowkeys. Consequently, kv1 and kv2 could be updated concurrently. When we see the implementation of HRegionServer#multi, kv1 and kv2 will be add to MemStore by HRegion#applyFamilyMapToMemstore in HRegion#doMiniBatchMutation. Then, MemStore#internalAdd will be invoked and MemStore#timeRangeTracker will be updated by TimeRangeTracker#includeTimestamp as follows: {code} private void includeTimestamp(final long timestamp) { ... else if (maximumTimestamp timestamp) { maximumTimestamp = timestamp; } return; } {code} Imagining the current maximumTimestamp of TimeRangeTracker is t0 before includeTimestamp(...) invoked, kv1.timestamp=t1, kv2.timestamp=t2, t1 and t2 are both set by user(then, user knows the timestamps of kv1 and kv2), and t1 t2 t0. T1 and T2 will be executed concurrently,
[jira] [Commented] (HBASE-10451) Enable back Tag compression on HFiles
[ https://issues.apache.org/jira/browse/HBASE-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910166#comment-13910166 ] Hadoop QA commented on HBASE-10451: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630629/HBASE-10451_V6.patch against trunk revision . ATTACHMENT ID: 12630629 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8782//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8782//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8782//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8782//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8782//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8782//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8782//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8782//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8782//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8782//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8782//console This message is automatically generated. Enable back Tag compression on HFiles - Key: HBASE-10451 URL: https://issues.apache.org/jira/browse/HBASE-10451 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Critical Fix For: 0.98.1, 0.99.0 Attachments: HBASE-10451.patch, HBASE-10451_V2.patch, HBASE-10451_V3.patch, HBASE-10451_V4.patch, HBASE-10451_V5.patch, HBASE-10451_V6.patch HBASE-10443 disables tag compression on HFiles. This Jira is to fix the issues we have found out in HBASE-10443 and enable it back. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10598) Written data can not be read out because MemStore#timeRangeTracker might be updated concurrently
[ https://issues.apache.org/jira/browse/HBASE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cuijianwei updated HBASE-10598: --- Description: In our test environment, we find written data can't be read out occasionally. After debugging, we find that maximumTimestamp/minimumTimestamp of MemStore#timeRangeTracker might decrease/increase when MemStore#timeRangeTracker is updated concurrently, which might make the MemStore/StoreFile to be filtered incorrectly when reading data out. Let's see how the concurrent updating of timeRangeTracker#maximumTimestamp cause this problem. Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. kv1 and kv2 belong to the same Store(so belong to the same region), but contain different rowkeys. Consequently, kv1 and kv2 could be updated concurrently. When we see the implementation of HRegionServer#multi, kv1 and kv2 will be add to MemStore by HRegion#applyFamilyMapToMemstore in HRegion#doMiniBatchMutation. Then, MemStore#internalAdd will be invoked and MemStore#timeRangeTracker will be updated by TimeRangeTracker#includeTimestamp as follows: {code} private void includeTimestamp(final long timestamp) { ... else if (maximumTimestamp timestamp) { maximumTimestamp = timestamp; } return; } {code} Imagining the current maximumTimestamp of TimeRangeTracker is t0 before includeTimestamp(...) invoked, kv1.timestamp=t1, kv2.timestamp=t2, t1 and t2 are both set by user(then, user knows the timestamps of kv1 and kv2), and t1 t2 t0. T1 and T2 will be executed concurrently, therefore, the two threads might both find the current maximumTimestamp is less than the timestamp of its kv. After that, T1 and T2 will both set maximumTimestamp to timestamp of its kv. If T1 set maximumTimestamp before T2 doing that, the maximumTimestamp will be set to t2. Then, before any new update with bigger timestamp has been applied to the MemStore, if we try to read out kv1 by HTable#get and set the timestamp of 'Get' to t1, the StoreScanner will decide whether the MemStoreScanner(imagining kv1 has not been flushed) should be selected as candidate scanner by MemStoreScanner#shouldUseScanner. Then, the MemStore won't be selected in MemStoreScanner#shouldUseScanner because maximumTimestamp of the MemStore has been set to t2 (t2 t1). Consequently, the written kv1 can't be read out and kv1 is lost from user's perspective. If the above analysis is right, after maximumTimestamp of MemStore#timeRangeTracker has been set to t2, user will experience data lass in the following situations: 1. Before any new write with kv.timestamp t1 has been add to the MemStore, read request of kv1 with timestamp=t1 can not read kv1 out. 2. Before any new write with kv.timestamp t1 has been add to the MemStore, if a flush happened, the data of MemStore will be flushed to StoreFile with StoreFile#maximumTimestamp set to t2. After that, any read request with timestamp=t1 can not read kv1 before next compaction(Actually, kv1.timestamp might not be included in timeRange of the StoreFile even after compaction). The second situation is much more serious because the incorrect timeRange of MemStore has been persisted to the file. Similarly, the concurrent update of TimeRangeTracker#minimumTimestamp may also cause this problem. As a simple way to fix the problem, we could add synchronized to TimeRangeTracker#includeTimestamp so that this method won't be invoked concurrently. was: In our test environment, we find written data can't be read out occasionally. After debugging, we find that maximumTimestamp/minimumTimestamp of MemStore#timeRangeTracker might decrease/increase when MemStore#timeRangeTracker is updated concurrently, which might make the MemStore/StoreFile to be filtered incorrectly when reading data out. Let's see how the concurrent updating of timeRangeTracker#maximumTimestamp cause this problem. Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. kv1 and kv2 belong to the same Store(so belong to the same region), but contain different rowkeys. Consequently, kv1 and kv2 could be updated concurrently. When we see the implementation of HRegionServer#multi, kv1 and kv2 will be add to MemStore by HRegion#applyFamilyMapToMemstore in HRegion#doMiniBatchMutation. Then, MemStore#internalAdd will be invoked and MemStore#timeRangeTracker will be updated by TimeRangeTracker#includeTimestamp as follows: {code} private void includeTimestamp(final long timestamp) { ... else if (maximumTimestamp timestamp) { maximumTimestamp = timestamp; } return; } {code} Imagining the current maximumTimestamp of TimeRangeTracker is t0 before includeTimestamp(...) invoked, kv1.timestamp=t1, kv2.timestamp=t2, t1 and t2 are both set by user(then, user knows the timestamps of kv1 and kv2), and t1 t2 t0. T1 and T2 will be executed
[jira] [Updated] (HBASE-10598) Written data can not be read out because MemStore#timeRangeTracker might be updated concurrently
[ https://issues.apache.org/jira/browse/HBASE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cuijianwei updated HBASE-10598: --- Attachment: HBASE-10598-0.94.v1.patch Written data can not be read out because MemStore#timeRangeTracker might be updated concurrently Key: HBASE-10598 URL: https://issues.apache.org/jira/browse/HBASE-10598 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.16 Reporter: cuijianwei Attachments: HBASE-10598-0.94.v1.patch In our test environment, we find written data can't be read out occasionally. After debugging, we find that maximumTimestamp/minimumTimestamp of MemStore#timeRangeTracker might decrease/increase when MemStore#timeRangeTracker is updated concurrently, which might make the MemStore/StoreFile to be filtered incorrectly when reading data out. Let's see how the concurrent updating of timeRangeTracker#maximumTimestamp cause this problem. Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. kv1 and kv2 belong to the same Store(so belong to the same region), but contain different rowkeys. Consequently, kv1 and kv2 could be updated concurrently. When we see the implementation of HRegionServer#multi, kv1 and kv2 will be add to MemStore by HRegion#applyFamilyMapToMemstore in HRegion#doMiniBatchMutation. Then, MemStore#internalAdd will be invoked and MemStore#timeRangeTracker will be updated by TimeRangeTracker#includeTimestamp as follows: {code} private void includeTimestamp(final long timestamp) { ... else if (maximumTimestamp timestamp) { maximumTimestamp = timestamp; } return; } {code} Imagining the current maximumTimestamp of TimeRangeTracker is t0 before includeTimestamp(...) invoked, kv1.timestamp=t1, kv2.timestamp=t2, t1 and t2 are both set by user(then, user knows the timestamps of kv1 and kv2), and t1 t2 t0. T1 and T2 will be executed concurrently, therefore, the two threads might both find the current maximumTimestamp is less than the timestamp of its kv. After that, T1 and T2 will both set maximumTimestamp to timestamp of its kv. If T1 set maximumTimestamp before T2 doing that, the maximumTimestamp will be set to t2. Then, before any new update with bigger timestamp has been applied to the MemStore, if we try to read out kv1 by HTable#get and set the timestamp of 'Get' to t1, the StoreScanner will decide whether the MemStoreScanner(imagining kv1 has not been flushed) should be selected as candidate scanner by MemStoreScanner#shouldUseScanner. Then, the MemStore won't be selected in MemStoreScanner#shouldUseScanner because maximumTimestamp of the MemStore has been set to t2 (t2 t1). Consequently, the written kv1 can't be read out and kv1 is lost from user's perspective. If the above analysis is right, after maximumTimestamp of MemStore#timeRangeTracker has been set to t2, user will experience data lass in the following situations: 1. Before any new write with kv.timestamp t1 has been add to the MemStore, read request of kv1 with timestamp=t1 can not read kv1 out. 2. Before any new write with kv.timestamp t1 has been add to the MemStore, if a flush happened, the data of MemStore will be flushed to StoreFile with StoreFile#maximumTimestamp set to t2. After that, any read request with timestamp=t1 can not read kv1 before next compaction(Actually, kv1.timestamp might not be included in timeRange of the StoreFile even after compaction). The second situation is much more serious because the incorrect timeRange of MemStore has been persisted to the file. Similarly, the concurrent update of TimeRangeTracker#minimumTimestamp may also cause this problem. As a simple way to fix the problem, we could add synchronized to TimeRangeTracker#includeTimestamp so that this method won't be invoked concurrently. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10575) ReplicationSource thread can't be terminated if it runs into the loop to contact peer's zk ensemble and fails continuously
[ https://issues.apache.org/jira/browse/HBASE-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910170#comment-13910170 ] Feng Honghua commented on HBASE-10575: -- [~lhofhansl], thanks for the review! :-) Can it be committed, or any further feedback? Thanks ReplicationSource thread can't be terminated if it runs into the loop to contact peer's zk ensemble and fails continuously -- Key: HBASE-10575 URL: https://issues.apache.org/jira/browse/HBASE-10575 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.98.1, 0.99.0, 0.94.17 Reporter: Feng Honghua Assignee: Feng Honghua Priority: Critical Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.18 Attachments: HBASE-10575-trunk_v1.patch When ReplicationSource thread runs into the loop to contact peer's zk ensemble, it doesn't check isActive() before each retry, so if the given peer's zk ensemble is not reachable due to some reason, this ReplicationSource thread just can't be terminated by outside such as removePeer etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10594) Speed up TestRestoreSnapshotFromClient a bit
[ https://issues.apache.org/jira/browse/HBASE-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910185#comment-13910185 ] Hudson commented on HBASE-10594: ABORTED: Integrated in hbase-0.96 #309 (See [https://builds.apache.org/job/hbase-0.96/309/]) HBASE-10594 Speed up TestRestoreSnapshotFromClient a bit. (larsh: rev 1571143) * /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestRestoreSnapshotFromClient.java Speed up TestRestoreSnapshotFromClient a bit Key: HBASE-10594 URL: https://issues.apache.org/jira/browse/HBASE-10594 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.18 Attachments: 10594-0.94.txt, 10594-trunk.txt Looking through the longest running test in 0.94 I noticed that TestRestoreSnapshotFromClient runs for over 10 minutes on the jenkins boxes (264s on my local box). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10579) [Documentation]: ExportSnapshot tool package incorrectly documented
[ https://issues.apache.org/jira/browse/HBASE-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910186#comment-13910186 ] Hudson commented on HBASE-10579: ABORTED: Integrated in HBase-0.98 #179 (See [https://builds.apache.org/job/HBase-0.98/179/]) HBASE-10579 ExportSnapshot tool package incorrectly documented (Aleksandr Shulman) (mbertozzi: rev 1571201) * /hbase/branches/0.98/src/main/docbkx/ops_mgt.xml [Documentation]: ExportSnapshot tool package incorrectly documented --- Key: HBASE-10579 URL: https://issues.apache.org/jira/browse/HBASE-10579 Project: HBase Issue Type: Bug Components: documentation, snapshots Affects Versions: 0.98.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: HBASE-10579-v0.patch Documentation Page: http://hbase.apache.org/book/ops.snapshots.html Expected documentation: The class should be specified as org.apache.hadoop.hbase.snapshot.ExportSnapshot Current documentation: Specified as: org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot This makes sense because the class is located in the org.apache.hadoop.hbase.snapshot package: https://github.com/apache/hbase/blob/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java#19 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10594) Speed up TestRestoreSnapshotFromClient a bit
[ https://issues.apache.org/jira/browse/HBASE-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910189#comment-13910189 ] Hudson commented on HBASE-10594: ABORTED: Integrated in HBase-0.98 #179 (See [https://builds.apache.org/job/HBase-0.98/179/]) HBASE-10594 Speed up TestRestoreSnapshotFromClient a bit. (larsh: rev 1571142) * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestRestoreSnapshotFromClient.java Speed up TestRestoreSnapshotFromClient a bit Key: HBASE-10594 URL: https://issues.apache.org/jira/browse/HBASE-10594 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.18 Attachments: 10594-0.94.txt, 10594-trunk.txt Looking through the longest running test in 0.94 I noticed that TestRestoreSnapshotFromClient runs for over 10 minutes on the jenkins boxes (264s on my local box). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt
[ https://issues.apache.org/jira/browse/HBASE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910208#comment-13910208 ] Hudson commented on HBASE-10525: FAILURE: Integrated in HBase-TRUNK #4947 (See [https://builds.apache.org/job/HBase-TRUNK/4947/]) HBASE-10525 Allow the client to use a different thread for writing to ease interrupt (nkeywal: rev 1571210) * /hbase/trunk/dev-support/findbugs-exclude.xml * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionManager.java * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java Allow the client to use a different thread for writing to ease interrupt Key: HBASE-10525 URL: https://issues.apache.org/jira/browse/HBASE-10525 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10525.v1.patch, 10525.v2.patch, 10525.v3.patch, 10525.v4.patch, 10525.v5.patch, 10525.v6.patch, 10525.v7.patch, HBaseclient-EventualConsistency.pdf This is an issue in the HBASE-10070 context, but as well more generally if you want to interrupt an operation with a limited cost. I will attach a doc with a more detailed explanation. This adds a thread per region server; so it's otional. The first patch activates it by default to see how it behaves on a full hadoop-qa run. The target is to be unset by default. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10579) [Documentation]: ExportSnapshot tool package incorrectly documented
[ https://issues.apache.org/jira/browse/HBASE-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910206#comment-13910206 ] Hudson commented on HBASE-10579: FAILURE: Integrated in HBase-TRUNK #4947 (See [https://builds.apache.org/job/HBase-TRUNK/4947/]) HBASE-10579 ExportSnapshot tool package incorrectly documented (Aleksandr Shulman) (mbertozzi: rev 1571200) * /hbase/trunk/src/main/docbkx/ops_mgt.xml [Documentation]: ExportSnapshot tool package incorrectly documented --- Key: HBASE-10579 URL: https://issues.apache.org/jira/browse/HBASE-10579 Project: HBase Issue Type: Bug Components: documentation, snapshots Affects Versions: 0.98.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: HBASE-10579-v0.patch Documentation Page: http://hbase.apache.org/book/ops.snapshots.html Expected documentation: The class should be specified as org.apache.hadoop.hbase.snapshot.ExportSnapshot Current documentation: Specified as: org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot This makes sense because the class is located in the org.apache.hadoop.hbase.snapshot package: https://github.com/apache/hbase/blob/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java#19 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10594) Speed up TestRestoreSnapshotFromClient a bit
[ https://issues.apache.org/jira/browse/HBASE-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910207#comment-13910207 ] Hudson commented on HBASE-10594: FAILURE: Integrated in HBase-TRUNK #4947 (See [https://builds.apache.org/job/HBase-TRUNK/4947/]) HBASE-10594 Speed up TestRestoreSnapshotFromClient a bit. (larsh: rev 1571141) * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestRestoreSnapshotFromClient.java Speed up TestRestoreSnapshotFromClient a bit Key: HBASE-10594 URL: https://issues.apache.org/jira/browse/HBASE-10594 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.18 Attachments: 10594-0.94.txt, 10594-trunk.txt Looking through the longest running test in 0.94 I noticed that TestRestoreSnapshotFromClient runs for over 10 minutes on the jenkins boxes (264s on my local box). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException
[ https://issues.apache.org/jira/browse/HBASE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910245#comment-13910245 ] ramkrishna.s.vasudevan commented on HBASE-10499: [~fenghh] The number of flushers should be the default value. I did not change that. Sorry for the late reply. bq.But if you want to raise a JIRA to replace System.currentMilllis with EnvirnonmentEdge.currentMillis Yes better to change. I am not saying this JIRA is because of this, but just wanted to ensure we change it. Will raise one. In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException -- Key: HBASE-10499 URL: https://issues.apache.org/jira/browse/HBASE-10499 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.98.1, 0.99.0 Attachments: HBASE-10499.patch, hbase-root-master-ip-10-157-0-229.zip, hbase-root-regionserver-ip-10-93-128-92.zip, master_4e39.log, master_576f.log, rs_4e39.log, rs_576f.log, t1.dump, t2.dump, workloada_0.98.dat I got this while testing 0.98RC. But am not sure if it is specific to this version. Doesn't seem so to me. Also it is something similar to HBASE-5312 and HBASE-5568. Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 regions. In one of the run with 0.98 server and 0.98 client I faced this problem like the hlogs became more and the system requested flushes for those many regions. One by one everything was flushed except one and that one thing remained unflushed. The ripple effect of this on the client side {code} com.yahoo.ycsb.DBException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245) at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73) at com.yahoo.ycsb.ClientThread.run(Client.java:307) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 54 actions: RegionTooBusyException: 54 times, at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187) at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171) at org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897) at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225) at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232) ... 2 more {code} On one of the RS {code} 2014-02-11 08:45:58,714 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, acc43e4b42c1a041078774f4f20a3ff5 .. 2014-02-11 08:47:49,580 INFO [regionserver60020.logRoller] wal.FSHLog: Too many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39 {code} {code} 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a delay of 16689 2014-02-11 09:42:44,237 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a delay of 15868 2014-02-11 09:42:54,238 INFO [regionserver60020.periodicFlusher] regionserver.HRegionServer: regionserver60020.periodicFlusher requesting flush for region
[jira] [Created] (HBASE-10599) Replace System.currentMillis() with EnvironmentEdge.currentTimeMillis in memstore flusher and related places
ramkrishna.s.vasudevan created HBASE-10599: -- Summary: Replace System.currentMillis() with EnvironmentEdge.currentTimeMillis in memstore flusher and related places Key: HBASE-10599 URL: https://issues.apache.org/jira/browse/HBASE-10599 Project: HBase Issue Type: Improvement Affects Versions: 0.96.1.1, 0.98.0, 0.99.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.96.2, 0.98.1, 0.99.0 Memstoreflusher still uses System.currentMillis. Better to replace it with EnvironmentEdge.currentMillis(), -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10599) Replace System.currentMillis() with EnvironmentEdge.currentTimeMillis in memstore flusher and related places
[ https://issues.apache.org/jira/browse/HBASE-10599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910259#comment-13910259 ] Jean-Marc Spaggiari commented on HBASE-10599: - Hi Ramkrishna, For my knowledge, why should EnvironmentEdge.currentMillis() be prefered to System.currentMillis()? Replace System.currentMillis() with EnvironmentEdge.currentTimeMillis in memstore flusher and related places Key: HBASE-10599 URL: https://issues.apache.org/jira/browse/HBASE-10599 Project: HBase Issue Type: Improvement Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.96.2, 0.98.1, 0.99.0 Memstoreflusher still uses System.currentMillis. Better to replace it with EnvironmentEdge.currentMillis(), -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10595) HBaseAdmin.getTableDescriptor can wrongly get the previous table's TableDescriptor even after the table dir in hdfs is removed
[ https://issues.apache.org/jira/browse/HBASE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-10595: - Attachment: HBASE-10595-trunk_v3.patch HBaseAdmin.getTableDescriptor can wrongly get the previous table's TableDescriptor even after the table dir in hdfs is removed -- Key: HBASE-10595 URL: https://issues.apache.org/jira/browse/HBASE-10595 Project: HBase Issue Type: Bug Components: master, util Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10595-trunk_v1.patch, HBASE-10595-trunk_v2.patch, HBASE-10595-trunk_v3.patch When a table dir (in hdfs) is removed(by outside), HMaster will still return the cached TableDescriptor to client for getTableDescriptor request. On the contrary, HBaseAdmin.listTables() is handled correctly in current implementation, for a table whose table dir in hdfs is removed by outside, getTableDescriptor can still retrieve back a valid (old) table descriptor, while listTables says it doesn't exist, this is inconsistent The reason for this bug is because HMaster (via FSTableDescriptors) doesn't check if the table dir exists for getTableDescriptor() request, (while it lists all existing table dirs(not firstly respects cache) and returns accordingly for listTables() request) When a table is deleted via deleteTable, the cache will be cleared after the table dir and tableInfo file is removed, listTables/getTableDescriptor inconsistency should be transient(though still exists, when table dir is removed while cache is not cleared) and harder to expose -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8803) region_mover.rb should move multiple regions at a time
[ https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910276#comment-13910276 ] Jean-Marc Spaggiari commented on HBASE-8803: Thanks. But you (or any other commiter) will have to do it, since I can not (yet). I have been able to apply current patch without any modification in 0.94. I can rebase if required. region_mover.rb should move multiple regions at a time -- Key: HBASE-8803 URL: https://issues.apache.org/jira/browse/HBASE-8803 Project: HBase Issue Type: Bug Components: Usability Affects Versions: 0.98.0, 0.94.8, 0.95.1 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Fix For: 0.99.0 Attachments: 8803v5.txt, HBASE-8803-v0-trunk.patch, HBASE-8803-v1-0.94.patch, HBASE-8803-v1-trunk.patch, HBASE-8803-v2-0.94.patch, HBASE-8803-v2-0.94.patch, HBASE-8803-v3-0.94.patch, HBASE-8803-v4-0.94.patch, HBASE-8803-v4-trunk.patch, HBASE-8803-v5-0.94.patch, HBASE-8803-v6-0.94.patch, HBASE-8803-v6-trunk.patch Original Estimate: 48h Remaining Estimate: 48h When there is many regions in a cluster, rolling_restart can take hours because region_mover is moving the regions one by one. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10594) Speed up TestRestoreSnapshotFromClient a bit
[ https://issues.apache.org/jira/browse/HBASE-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910288#comment-13910288 ] Hudson commented on HBASE-10594: SUCCESS: Integrated in hbase-0.96-hadoop2 #213 (See [https://builds.apache.org/job/hbase-0.96-hadoop2/213/]) HBASE-10594 Speed up TestRestoreSnapshotFromClient a bit. (larsh: rev 1571143) * /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestRestoreSnapshotFromClient.java Speed up TestRestoreSnapshotFromClient a bit Key: HBASE-10594 URL: https://issues.apache.org/jira/browse/HBASE-10594 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.18 Attachments: 10594-0.94.txt, 10594-trunk.txt Looking through the longest running test in 0.94 I noticed that TestRestoreSnapshotFromClient runs for over 10 minutes on the jenkins boxes (264s on my local box). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10579) [Documentation]: ExportSnapshot tool package incorrectly documented
[ https://issues.apache.org/jira/browse/HBASE-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910287#comment-13910287 ] Hudson commented on HBASE-10579: SUCCESS: Integrated in hbase-0.96-hadoop2 #213 (See [https://builds.apache.org/job/hbase-0.96-hadoop2/213/]) HBASE-10579 ExportSnapshot tool package incorrectly documented (Aleksandr Shulman) (mbertozzi: rev 1571202) * /hbase/branches/0.96/src/main/docbkx/ops_mgt.xml [Documentation]: ExportSnapshot tool package incorrectly documented --- Key: HBASE-10579 URL: https://issues.apache.org/jira/browse/HBASE-10579 Project: HBase Issue Type: Bug Components: documentation, snapshots Affects Versions: 0.98.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: HBASE-10579-v0.patch Documentation Page: http://hbase.apache.org/book/ops.snapshots.html Expected documentation: The class should be specified as org.apache.hadoop.hbase.snapshot.ExportSnapshot Current documentation: Specified as: org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot This makes sense because the class is located in the org.apache.hadoop.hbase.snapshot package: https://github.com/apache/hbase/blob/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java#19 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10579) [Documentation]: ExportSnapshot tool package incorrectly documented
[ https://issues.apache.org/jira/browse/HBASE-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910290#comment-13910290 ] Hudson commented on HBASE-10579: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #167 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/167/]) HBASE-10579 ExportSnapshot tool package incorrectly documented (Aleksandr Shulman) (mbertozzi: rev 1571201) * /hbase/branches/0.98/src/main/docbkx/ops_mgt.xml [Documentation]: ExportSnapshot tool package incorrectly documented --- Key: HBASE-10579 URL: https://issues.apache.org/jira/browse/HBASE-10579 Project: HBase Issue Type: Bug Components: documentation, snapshots Affects Versions: 0.98.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: HBASE-10579-v0.patch Documentation Page: http://hbase.apache.org/book/ops.snapshots.html Expected documentation: The class should be specified as org.apache.hadoop.hbase.snapshot.ExportSnapshot Current documentation: Specified as: org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot This makes sense because the class is located in the org.apache.hadoop.hbase.snapshot package: https://github.com/apache/hbase/blob/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java#19 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10591) Sanity check table configuration in createTable
[ https://issues.apache.org/jira/browse/HBASE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910295#comment-13910295 ] Jean-Marc Spaggiari commented on HBASE-10591: - Can we add a force parameter in case anyone really want to have a value outside of those values, and knows what he is doing? As an example, I have a table with MAX_FILESIZE = '1638400' (Less than 2MB). This region handle VERY small keys/values. Value is 1 byte. Keys are less than 32 bytes. But I want this table to be spread over all my servers. So I have to put a small MAX_FILESIZE value. With current patch, I will not be able to do that anymore, which is bad for me. So I would really prefer to have a force option. Yes I can use hbase.hregion.max.filesize.limit and set it to 1MB, but since I think this is still a good idea, I want to have this check for my other tables :) My 2¢. Sanity check table configuration in createTable --- Key: HBASE-10591 URL: https://issues.apache.org/jira/browse/HBASE-10591 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.99.0 Attachments: hbase-10591_v1.patch, hbase-10591_v2.patch We had a cluster completely become unoperational, because a couple of table was erroneously created with MAX_FILESIZE set to 4K, which resulted in 180K regions in a short interval, and bringing the master down due to HBASE-4246. We can do some sanity checking in master.createTable() and reject the requests. We already check the compression there, so it seems a good place. Alter table should also check for this as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-10574) IllegalArgumentException Hadoop Hbase
[ https://issues.apache.org/jira/browse/HBASE-10574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Marc Spaggiari resolved HBASE-10574. - Resolution: Not A Problem Closing as per request. IllegalArgumentException Hadoop Hbase - Key: HBASE-10574 URL: https://issues.apache.org/jira/browse/HBASE-10574 Project: HBase Issue Type: Test Components: hadoop2 Affects Versions: 0.96.0 Environment: Windows Reporter: SSR Priority: Critical Original Estimate: 96h Remaining Estimate: 96h Hi All, We are trying to load the data to HBase We are able to connect Hbase from Eclipse. We are following the tutorial at: http://courses.coreservlets.com/Course-Materials/pdf/hadoop/04-MapRed-4-InputAndOutput.pdf When we run the program we are getting the below exception. 2014-02-20 10:28:04,099 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(439)) - Cleaning up the staging area file:/tmp/hadoop-yarakanaboinas/mapred/staging/yarakanaboinas1524547448/.staging/job_local1524547448_0001 Exception in thread main java.lang.IllegalArgumentException: Pathname /C:/hdp/hbase-0.96.0.2.0.6.0-0009-hadoop2/lib/hbase-client-0.96.0.2.0.6.0-0009-hadoop2.jar from hdfs://HBADGX7900016:8020/C:/hdp/hbase-0.96.0.2.0.6.0-0009-hadoop2/lib/hbase-client-0.96.0.2.0.6.0-0009-hadoop2.jar is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:184) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:92) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:300) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286) at WordCountMapper.StartWithCountJob_HBase.run(StartWithCountJob_HBase.java:41) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at WordCountMapper.StartWithCountJob_HBase.main(StartWithCountJob_HBase.java:44) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10579) [Documentation]: ExportSnapshot tool package incorrectly documented
[ https://issues.apache.org/jira/browse/HBASE-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910356#comment-13910356 ] Hudson commented on HBASE-10579: FAILURE: Integrated in hbase-0.96 #310 (See [https://builds.apache.org/job/hbase-0.96/310/]) HBASE-10579 ExportSnapshot tool package incorrectly documented (Aleksandr Shulman) (mbertozzi: rev 1571202) * /hbase/branches/0.96/src/main/docbkx/ops_mgt.xml [Documentation]: ExportSnapshot tool package incorrectly documented --- Key: HBASE-10579 URL: https://issues.apache.org/jira/browse/HBASE-10579 Project: HBase Issue Type: Bug Components: documentation, snapshots Affects Versions: 0.98.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: HBASE-10579-v0.patch Documentation Page: http://hbase.apache.org/book/ops.snapshots.html Expected documentation: The class should be specified as org.apache.hadoop.hbase.snapshot.ExportSnapshot Current documentation: Specified as: org.apache.hadoop.hbase.snapshot.tool.ExportSnapshot This makes sense because the class is located in the org.apache.hadoop.hbase.snapshot package: https://github.com/apache/hbase/blob/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java#19 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10595) HBaseAdmin.getTableDescriptor can wrongly get the previous table's TableDescriptor even after the table dir in hdfs is removed
[ https://issues.apache.org/jira/browse/HBASE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910362#comment-13910362 ] Hadoop QA commented on HBASE-10595: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630659/HBASE-10595-trunk_v3.patch against trunk revision . ATTACHMENT ID: 12630659 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8784//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8784//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8784//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8784//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8784//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8784//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8784//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8784//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8784//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8784//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8784//console This message is automatically generated. HBaseAdmin.getTableDescriptor can wrongly get the previous table's TableDescriptor even after the table dir in hdfs is removed -- Key: HBASE-10595 URL: https://issues.apache.org/jira/browse/HBASE-10595 Project: HBase Issue Type: Bug Components: master, util Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10595-trunk_v1.patch, HBASE-10595-trunk_v2.patch, HBASE-10595-trunk_v3.patch When a table dir (in hdfs) is removed(by outside), HMaster will still return the cached TableDescriptor to client for getTableDescriptor request. On the contrary, HBaseAdmin.listTables() is handled correctly in current implementation, for a table whose table dir in hdfs is removed by outside, getTableDescriptor can still retrieve back a valid (old) table descriptor, while listTables says it doesn't exist, this is inconsistent The reason for this bug is because HMaster (via FSTableDescriptors) doesn't check if the table dir exists for getTableDescriptor() request, (while it lists all existing table dirs(not firstly respects cache) and returns accordingly for listTables() request) When a table is deleted via deleteTable, the cache will be cleared after the table dir and tableInfo file is removed, listTables/getTableDescriptor inconsistency should be transient(though still exists, when table dir is removed while cache is not cleared) and harder to expose -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt
[ https://issues.apache.org/jira/browse/HBASE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910386#comment-13910386 ] Devaraj Das commented on HBASE-10525: - Yes, [~nkeywal], i just had that question. Thanks for the clarification. Allow the client to use a different thread for writing to ease interrupt Key: HBASE-10525 URL: https://issues.apache.org/jira/browse/HBASE-10525 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10525.v1.patch, 10525.v2.patch, 10525.v3.patch, 10525.v4.patch, 10525.v5.patch, 10525.v6.patch, 10525.v7.patch, HBaseclient-EventualConsistency.pdf This is an issue in the HBASE-10070 context, but as well more generally if you want to interrupt an operation with a limited cost. I will attach a doc with a more detailed explanation. This adds a thread per region server; so it's otional. The first patch activates it by default to see how it behaves on a full hadoop-qa run. The target is to be unset by default. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10566) cleanup rpcTimeout in the client
[ https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-10566: Status: Patch Available (was: Open) cleanup rpcTimeout in the client Key: HBASE-10566 URL: https://issues.apache.org/jira/browse/HBASE-10566 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10566.sample.patch, 10566.v1.patch There are two issues: 1) A confusion between the socket timeout and the call timeout Socket timeouts should be minimal: a default like 20 seconds, that could be lowered to single digits timeouts for some apps: if we can not write to the socket in 10 second, we have an issue. This is different from the total duration (send query + do query + receive query), that can be longer, as it can include remotes calls on the server and so on. Today, we have a single value, it does not allow us to have low socket read timeouts. 2) The timeout can be different between the calls. Typically, if the total time, retries included is 60 seconds but failed after 2 seconds, then the remaining is 58s. HBase does this today, but by hacking with a thread local storage variable. It's a hack (it should have been a parameter of the methods, the TLS allowed to bypass all the layers. May be protobuf makes this complicated, to be confirmed), but as well it does not really work, because we can have multithreading issues (we use the updated rpc timeout of someone else, or we create a new BlockingRpcChannelImplementation with a random default timeout). Ideally, we could send the call timeout to the server as well: it will be able to dismiss alone the calls that it received but git stick in the request queue or in the internal retries (on hdfs for example). This will make the system more reactive to failure. I think we can solve this now, especially after 10525. The main issue is to something that fits well with protobuf... Then it should be easy to have a pool of thread for writers and readers, w/o a single thread per region server as today. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10566) cleanup rpcTimeout in the client
[ https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-10566: Attachment: 10566.v1.patch cleanup rpcTimeout in the client Key: HBASE-10566 URL: https://issues.apache.org/jira/browse/HBASE-10566 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10566.sample.patch, 10566.v1.patch There are two issues: 1) A confusion between the socket timeout and the call timeout Socket timeouts should be minimal: a default like 20 seconds, that could be lowered to single digits timeouts for some apps: if we can not write to the socket in 10 second, we have an issue. This is different from the total duration (send query + do query + receive query), that can be longer, as it can include remotes calls on the server and so on. Today, we have a single value, it does not allow us to have low socket read timeouts. 2) The timeout can be different between the calls. Typically, if the total time, retries included is 60 seconds but failed after 2 seconds, then the remaining is 58s. HBase does this today, but by hacking with a thread local storage variable. It's a hack (it should have been a parameter of the methods, the TLS allowed to bypass all the layers. May be protobuf makes this complicated, to be confirmed), but as well it does not really work, because we can have multithreading issues (we use the updated rpc timeout of someone else, or we create a new BlockingRpcChannelImplementation with a random default timeout). Ideally, we could send the call timeout to the server as well: it will be able to dismiss alone the calls that it received but git stick in the request queue or in the internal retries (on hdfs for example). This will make the system more reactive to failure. I think we can solve this now, especially after 10525. The main issue is to something that fits well with protobuf... Then it should be easy to have a pool of thread for writers and readers, w/o a single thread per region server as today. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10451) Enable back Tag compression on HFiles
[ https://issues.apache.org/jira/browse/HBASE-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910450#comment-13910450 ] Anoop Sam John commented on HBASE-10451: Not able to see the test result to check for zombie test !! Enable back Tag compression on HFiles - Key: HBASE-10451 URL: https://issues.apache.org/jira/browse/HBASE-10451 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Critical Fix For: 0.98.1, 0.99.0 Attachments: HBASE-10451.patch, HBASE-10451_V2.patch, HBASE-10451_V3.patch, HBASE-10451_V4.patch, HBASE-10451_V5.patch, HBASE-10451_V6.patch HBASE-10443 disables tag compression on HFiles. This Jira is to fix the issues we have found out in HBASE-10443 and enable it back. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client
[ https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910452#comment-13910452 ] Nicolas Liochon commented on HBASE-10566: - v1 is a first attempt. I haven't run all the tests locally, but I had no error after a 30 minutes run. 3 different socket timeouts - connect - read - write For all of them, we should be able to set them to low value, something like 2 / 5 / 5, without any impact. Likely I will need to write a test for this. The existing timeout of 60s is a global timeout for the operation. I need to double check how we were using the existing operationTimout, my feeling is that it was buggy, and that it was overriding the individual timeout. If it's the case, it's still buggy. cleanup rpcTimeout in the client Key: HBASE-10566 URL: https://issues.apache.org/jira/browse/HBASE-10566 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10566.sample.patch, 10566.v1.patch There are two issues: 1) A confusion between the socket timeout and the call timeout Socket timeouts should be minimal: a default like 20 seconds, that could be lowered to single digits timeouts for some apps: if we can not write to the socket in 10 second, we have an issue. This is different from the total duration (send query + do query + receive query), that can be longer, as it can include remotes calls on the server and so on. Today, we have a single value, it does not allow us to have low socket read timeouts. 2) The timeout can be different between the calls. Typically, if the total time, retries included is 60 seconds but failed after 2 seconds, then the remaining is 58s. HBase does this today, but by hacking with a thread local storage variable. It's a hack (it should have been a parameter of the methods, the TLS allowed to bypass all the layers. May be protobuf makes this complicated, to be confirmed), but as well it does not really work, because we can have multithreading issues (we use the updated rpc timeout of someone else, or we create a new BlockingRpcChannelImplementation with a random default timeout). Ideally, we could send the call timeout to the server as well: it will be able to dismiss alone the calls that it received but git stick in the request queue or in the internal retries (on hdfs for example). This will make the system more reactive to failure. I think we can solve this now, especially after 10525. The main issue is to something that fits well with protobuf... Then it should be easy to have a pool of thread for writers and readers, w/o a single thread per region server as today. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10451) Enable back Tag compression on HFiles
[ https://issues.apache.org/jira/browse/HBASE-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910513#comment-13910513 ] ramkrishna.s.vasudevan commented on HBASE-10451: +1. You can check the zombie test once. If your testing is satisfactory then commit the patch. Enable back Tag compression on HFiles - Key: HBASE-10451 URL: https://issues.apache.org/jira/browse/HBASE-10451 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Critical Fix For: 0.98.1, 0.99.0 Attachments: HBASE-10451.patch, HBASE-10451_V2.patch, HBASE-10451_V3.patch, HBASE-10451_V4.patch, HBASE-10451_V5.patch, HBASE-10451_V6.patch HBASE-10443 disables tag compression on HFiles. This Jira is to fix the issues we have found out in HBASE-10443 and enable it back. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10587) Master metrics clusterRequests is wrong
[ https://issues.apache.org/jira/browse/HBASE-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-10587: Priority: Minor (was: Major) Master metrics clusterRequests is wrong --- Key: HBASE-10587 URL: https://issues.apache.org/jira/browse/HBASE-10587 Project: HBase Issue Type: Bug Components: master, metrics Affects Versions: 0.96.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: hbase-10587.patch In the master jmx, metrics clusterRequests increases so fast. Looked into the code and found the calculation is a little bit wrong. It's a counter. However, for each region server report, the total number of requests is added to clusterRequests. That means it's added multiple times. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10587) Master metrics clusterRequests is wrong
[ https://issues.apache.org/jira/browse/HBASE-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-10587: Resolution: Fixed Fix Version/s: 0.99.0 0.98.1 0.96.2 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Integrated into 0.96, 0.98, and trunk. Thanks Enis for reviewing it. Master metrics clusterRequests is wrong --- Key: HBASE-10587 URL: https://issues.apache.org/jira/browse/HBASE-10587 Project: HBase Issue Type: Bug Components: master, metrics Affects Versions: 0.96.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: hbase-10587.patch In the master jmx, metrics clusterRequests increases so fast. Looked into the code and found the calculation is a little bit wrong. It's a counter. However, for each region server report, the total number of requests is added to clusterRequests. That means it's added multiple times. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10597) IOEngine#read() should return the number of bytes transferred
[ https://issues.apache.org/jira/browse/HBASE-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10597: --- Attachment: 10597-v2.txt Patch v2 addresses Anoop's comments. IOEngine#read() should return the number of bytes transferred - Key: HBASE-10597 URL: https://issues.apache.org/jira/browse/HBASE-10597 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Attachments: 10597-v1.txt, 10597-v2.txt IOEngine#read() is called by BucketCache#getBlock(). IOEngine#read() should return the number of bytes transferred so that BucketCache#getBlock() can check this return value against the length obtained from bucketEntry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10597) IOEngine#read() should return the number of bytes transferred
[ https://issues.apache.org/jira/browse/HBASE-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910580#comment-13910580 ] Andrew Purtell commented on HBASE-10597: Checking return values is good. Why only a log message here? Is this an error? How should it be handled? {code} Index: hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java === --- hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java (revision 1571351) +++ hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java (working copy) @@ -367,7 +367,10 @@ if (bucketEntry.equals(backingMap.get(key))) { int len = bucketEntry.getLength(); ByteBuffer bb = ByteBuffer.allocate(len); - ioEngine.read(bb, bucketEntry.offset()); + int lenRead = ioEngine.read(bb, bucketEntry.offset()); + if (lenRead != len) { +LOG.warn(Only + lenRead + bytes read, + len + expected); + } Cacheable cachedBlock = bucketEntry.deserializerReference( deserialiserMap).deserialize(bb, true); long timeTaken = System.nanoTime() - start; {code} IOEngine#read() should return the number of bytes transferred - Key: HBASE-10597 URL: https://issues.apache.org/jira/browse/HBASE-10597 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Attachments: 10597-v1.txt, 10597-v2.txt IOEngine#read() is called by BucketCache#getBlock(). IOEngine#read() should return the number of bytes transferred so that BucketCache#getBlock() can check this return value against the length obtained from bucketEntry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10593) FileInputStream in JenkinsHash#main() is never closed
[ https://issues.apache.org/jira/browse/HBASE-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910585#comment-13910585 ] Andrew Purtell commented on HBASE-10593: bq. Why not work on removing this unused class instead of 'fixing' it? +1 to that FileInputStream in JenkinsHash#main() is never closed - Key: HBASE-10593 URL: https://issues.apache.org/jira/browse/HBASE-10593 Project: HBase Issue Type: Bug Reporter: Ted Yu Priority: Trivial {code} FileInputStream in = new FileInputStream(args[0]); {code} The above FileInputStream is not closed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client
[ https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910592#comment-13910592 ] Hadoop QA commented on HBASE-10566: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630699/10566.v1.patch against trunk revision . ATTACHMENT ID: 12630699 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 3 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestClientOperationInterrupt Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8785//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8785//console This message is automatically generated. cleanup rpcTimeout in the client Key: HBASE-10566 URL: https://issues.apache.org/jira/browse/HBASE-10566 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10566.sample.patch, 10566.v1.patch There are two issues: 1) A confusion between the socket timeout and the call timeout Socket timeouts should be minimal: a default like 20 seconds, that could be lowered to single digits timeouts for some apps: if we can not write to the socket in 10 second, we have an issue. This is different from the total duration (send query + do query + receive query), that can be longer, as it can include remotes calls on the server and so on. Today, we have a single value, it does not allow us to have low socket read timeouts. 2) The timeout can be different between the calls. Typically, if the total time, retries included is 60 seconds but failed after 2 seconds, then the remaining is 58s. HBase does this today, but by hacking with a thread local storage variable. It's a hack (it should have been a parameter of the methods, the TLS allowed to bypass all the layers. May be protobuf makes this complicated, to be confirmed), but as well it does not really work, because we can have multithreading issues (we use the updated rpc timeout of someone else, or we create a new BlockingRpcChannelImplementation with a random default timeout). Ideally, we could send the call timeout to the server as well: it will be able to dismiss alone the calls that it received but git stick in the request queue or in the internal retries (on
[jira] [Commented] (HBASE-10597) IOEngine#read() should return the number of bytes transferred
[ https://issues.apache.org/jira/browse/HBASE-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910597#comment-13910597 ] Ted Yu commented on HBASE-10597: I thought about throwing exception when there is mismatch in length read. Here is the method signature for BlockCache#getBlock(): {code} Cacheable getBlock(BlockCacheKey cacheKey, boolean caching, boolean repeat); {code} If the above signature is kept, some RuntimeException would be thrown. Is that Okay ? IOEngine#read() should return the number of bytes transferred - Key: HBASE-10597 URL: https://issues.apache.org/jira/browse/HBASE-10597 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Attachments: 10597-v1.txt, 10597-v2.txt IOEngine#read() is called by BucketCache#getBlock(). IOEngine#read() should return the number of bytes transferred so that BucketCache#getBlock() can check this return value against the length obtained from bucketEntry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10597) IOEngine#read() should return the number of bytes transferred
[ https://issues.apache.org/jira/browse/HBASE-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910599#comment-13910599 ] Andrew Purtell commented on HBASE-10597: bq. If the above signature is kept, some RuntimeException would be thrown. Is that Okay ? Yes, I think so. IOEngine#read() should return the number of bytes transferred - Key: HBASE-10597 URL: https://issues.apache.org/jira/browse/HBASE-10597 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Attachments: 10597-v1.txt, 10597-v2.txt IOEngine#read() is called by BucketCache#getBlock(). IOEngine#read() should return the number of bytes transferred so that BucketCache#getBlock() can check this return value against the length obtained from bucketEntry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10566) cleanup rpcTimeout in the client
[ https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-10566: Status: Open (was: Patch Available) cleanup rpcTimeout in the client Key: HBASE-10566 URL: https://issues.apache.org/jira/browse/HBASE-10566 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch There are two issues: 1) A confusion between the socket timeout and the call timeout Socket timeouts should be minimal: a default like 20 seconds, that could be lowered to single digits timeouts for some apps: if we can not write to the socket in 10 second, we have an issue. This is different from the total duration (send query + do query + receive query), that can be longer, as it can include remotes calls on the server and so on. Today, we have a single value, it does not allow us to have low socket read timeouts. 2) The timeout can be different between the calls. Typically, if the total time, retries included is 60 seconds but failed after 2 seconds, then the remaining is 58s. HBase does this today, but by hacking with a thread local storage variable. It's a hack (it should have been a parameter of the methods, the TLS allowed to bypass all the layers. May be protobuf makes this complicated, to be confirmed), but as well it does not really work, because we can have multithreading issues (we use the updated rpc timeout of someone else, or we create a new BlockingRpcChannelImplementation with a random default timeout). Ideally, we could send the call timeout to the server as well: it will be able to dismiss alone the calls that it received but git stick in the request queue or in the internal retries (on hdfs for example). This will make the system more reactive to failure. I think we can solve this now, especially after 10525. The main issue is to something that fits well with protobuf... Then it should be easy to have a pool of thread for writers and readers, w/o a single thread per region server as today. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10566) cleanup rpcTimeout in the client
[ https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-10566: Status: Patch Available (was: Open) cleanup rpcTimeout in the client Key: HBASE-10566 URL: https://issues.apache.org/jira/browse/HBASE-10566 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch There are two issues: 1) A confusion between the socket timeout and the call timeout Socket timeouts should be minimal: a default like 20 seconds, that could be lowered to single digits timeouts for some apps: if we can not write to the socket in 10 second, we have an issue. This is different from the total duration (send query + do query + receive query), that can be longer, as it can include remotes calls on the server and so on. Today, we have a single value, it does not allow us to have low socket read timeouts. 2) The timeout can be different between the calls. Typically, if the total time, retries included is 60 seconds but failed after 2 seconds, then the remaining is 58s. HBase does this today, but by hacking with a thread local storage variable. It's a hack (it should have been a parameter of the methods, the TLS allowed to bypass all the layers. May be protobuf makes this complicated, to be confirmed), but as well it does not really work, because we can have multithreading issues (we use the updated rpc timeout of someone else, or we create a new BlockingRpcChannelImplementation with a random default timeout). Ideally, we could send the call timeout to the server as well: it will be able to dismiss alone the calls that it received but git stick in the request queue or in the internal retries (on hdfs for example). This will make the system more reactive to failure. I think we can solve this now, especially after 10525. The main issue is to something that fits well with protobuf... Then it should be easy to have a pool of thread for writers and readers, w/o a single thread per region server as today. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client
[ https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910602#comment-13910602 ] Nicolas Liochon commented on HBASE-10566: - v2 fixes the test error. I'm not sure that we should not get rid of 'wrapException' however. We're spending a lot of time wrapping the exceptions, and then unwrapping them to discover what really happened. cleanup rpcTimeout in the client Key: HBASE-10566 URL: https://issues.apache.org/jira/browse/HBASE-10566 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch There are two issues: 1) A confusion between the socket timeout and the call timeout Socket timeouts should be minimal: a default like 20 seconds, that could be lowered to single digits timeouts for some apps: if we can not write to the socket in 10 second, we have an issue. This is different from the total duration (send query + do query + receive query), that can be longer, as it can include remotes calls on the server and so on. Today, we have a single value, it does not allow us to have low socket read timeouts. 2) The timeout can be different between the calls. Typically, if the total time, retries included is 60 seconds but failed after 2 seconds, then the remaining is 58s. HBase does this today, but by hacking with a thread local storage variable. It's a hack (it should have been a parameter of the methods, the TLS allowed to bypass all the layers. May be protobuf makes this complicated, to be confirmed), but as well it does not really work, because we can have multithreading issues (we use the updated rpc timeout of someone else, or we create a new BlockingRpcChannelImplementation with a random default timeout). Ideally, we could send the call timeout to the server as well: it will be able to dismiss alone the calls that it received but git stick in the request queue or in the internal retries (on hdfs for example). This will make the system more reactive to failure. I think we can solve this now, especially after 10525. The main issue is to something that fits well with protobuf... Then it should be easy to have a pool of thread for writers and readers, w/o a single thread per region server as today. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10597) IOEngine#read() should return the number of bytes transferred
[ https://issues.apache.org/jira/browse/HBASE-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10597: --- Attachment: 10597-v3.txt Thanks for the confirmation, Andy. Here is patch v3. IOEngine#read() should return the number of bytes transferred - Key: HBASE-10597 URL: https://issues.apache.org/jira/browse/HBASE-10597 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Attachments: 10597-v1.txt, 10597-v2.txt, 10597-v3.txt IOEngine#read() is called by BucketCache#getBlock(). IOEngine#read() should return the number of bytes transferred so that BucketCache#getBlock() can check this return value against the length obtained from bucketEntry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10597) IOEngine#read() should return the number of bytes transferred
[ https://issues.apache.org/jira/browse/HBASE-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910624#comment-13910624 ] Andrew Purtell commented on HBASE-10597: +1 on v3 if HadoopQA is happy IOEngine#read() should return the number of bytes transferred - Key: HBASE-10597 URL: https://issues.apache.org/jira/browse/HBASE-10597 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Attachments: 10597-v1.txt, 10597-v2.txt, 10597-v3.txt IOEngine#read() is called by BucketCache#getBlock(). IOEngine#read() should return the number of bytes transferred so that BucketCache#getBlock() can check this return value against the length obtained from bucketEntry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10591) Sanity check table configuration in createTable
[ https://issues.apache.org/jira/browse/HBASE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910635#comment-13910635 ] Andrew Purtell commented on HBASE-10591: Funny, on HBASE-10571 I suggested the JIRA be re-scoped, to basically this. Can we also add TTL checks here and close HBASE-10571 as a dup? Sanity check table configuration in createTable --- Key: HBASE-10591 URL: https://issues.apache.org/jira/browse/HBASE-10591 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.99.0 Attachments: hbase-10591_v1.patch, hbase-10591_v2.patch We had a cluster completely become unoperational, because a couple of table was erroneously created with MAX_FILESIZE set to 4K, which resulted in 180K regions in a short interval, and bringing the master down due to HBASE-4246. We can do some sanity checking in master.createTable() and reject the requests. We already check the compression there, so it seems a good place. Alter table should also check for this as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10591) Sanity check table configuration in createTable
[ https://issues.apache.org/jira/browse/HBASE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910638#comment-13910638 ] Andrew Purtell commented on HBASE-10591: I also think sanity checking needs to be done for table schema modifications as well as the initial create. Sanity check table configuration in createTable --- Key: HBASE-10591 URL: https://issues.apache.org/jira/browse/HBASE-10591 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.99.0 Attachments: hbase-10591_v1.patch, hbase-10591_v2.patch We had a cluster completely become unoperational, because a couple of table was erroneously created with MAX_FILESIZE set to 4K, which resulted in 180K regions in a short interval, and bringing the master down due to HBASE-4246. We can do some sanity checking in master.createTable() and reject the requests. We already check the compression there, so it seems a good place. Alter table should also check for this as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10600) HTable#batch() should perform validation on empty Put
Ted Yu created HBASE-10600: -- Summary: HTable#batch() should perform validation on empty Put Key: HBASE-10600 URL: https://issues.apache.org/jira/browse/HBASE-10600 Project: HBase Issue Type: Bug Reporter: Ted Yu Raised by java8964 in this thread: http://osdir.com/ml/general/2014-02/msg44384.html When an empty Put is passed in the List to HTable#batch(), there is no validation performed whereas IllegalArgumentException would have been thrown if this empty Put in the simple Put API call. Validation on empty Put should be carried out in HTable#batch(). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10355) Failover RPC's from client using region replicas
[ https://issues.apache.org/jira/browse/HBASE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910643#comment-13910643 ] Nicolas Liochon commented on HBASE-10355: - bq. We should either document this very well, or auto-enable interrupts if this jira is used. It's not easy to do that, because the RpcClient does not really know about the replica. Something that we could do however, is to do a single check in HTable: if we have a get with Consistency != Strong, we check the value for allowsInterrupt. If false, we log a warning message. The other option would be to throw an illegalStateException, if we want to say that we support only this option with replica (and it would make sense). Failover RPC's from client using region replicas Key: HBASE-10355 URL: https://issues.apache.org/jira/browse/HBASE-10355 Project: HBase Issue Type: Sub-task Components: Client Reporter: Enis Soztutar Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10355.v1.patch, 10355.v2.patch, 10355.v3.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10591) Sanity check table configuration in createTable
[ https://issues.apache.org/jira/browse/HBASE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910652#comment-13910652 ] Enis Soztutar commented on HBASE-10591: --- bq. As an example, I have a table with MAX_FILESIZE = '1638400' (Less than 2MB). This region handle VERY small keys/values. Value is 1 byte. Keys are less than 32 bytes. But I want this table to be spread over all my servers. So I have to put a small MAX_FILESIZE value. Let me make it a per table configuration. bq. Can we also add TTL checks here and close HBASE-10571 as a dup? Makes sense. bq. I also think sanity checking needs to be done for table schema modifications as well as the initial create. The patch does the checks in modify table as well. Sanity check table configuration in createTable --- Key: HBASE-10591 URL: https://issues.apache.org/jira/browse/HBASE-10591 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.99.0 Attachments: hbase-10591_v1.patch, hbase-10591_v2.patch We had a cluster completely become unoperational, because a couple of table was erroneously created with MAX_FILESIZE set to 4K, which resulted in 180K regions in a short interval, and bringing the master down due to HBASE-4246. We can do some sanity checking in master.createTable() and reject the requests. We already check the compression there, so it seems a good place. Alter table should also check for this as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10598) Written data can not be read out because MemStore#timeRangeTracker might be updated concurrently
[ https://issues.apache.org/jira/browse/HBASE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910669#comment-13910669 ] Enis Soztutar commented on HBASE-10598: --- Nice finding! Can we do this with two AtomicLongs with compare and set? Written data can not be read out because MemStore#timeRangeTracker might be updated concurrently Key: HBASE-10598 URL: https://issues.apache.org/jira/browse/HBASE-10598 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.16 Reporter: cuijianwei Attachments: HBASE-10598-0.94.v1.patch In our test environment, we find written data can't be read out occasionally. After debugging, we find that maximumTimestamp/minimumTimestamp of MemStore#timeRangeTracker might decrease/increase when MemStore#timeRangeTracker is updated concurrently, which might make the MemStore/StoreFile to be filtered incorrectly when reading data out. Let's see how the concurrent updating of timeRangeTracker#maximumTimestamp cause this problem. Imagining there are two threads T1 and T2 putting two KeyValues kv1 and kv2. kv1 and kv2 belong to the same Store(so belong to the same region), but contain different rowkeys. Consequently, kv1 and kv2 could be updated concurrently. When we see the implementation of HRegionServer#multi, kv1 and kv2 will be add to MemStore by HRegion#applyFamilyMapToMemstore in HRegion#doMiniBatchMutation. Then, MemStore#internalAdd will be invoked and MemStore#timeRangeTracker will be updated by TimeRangeTracker#includeTimestamp as follows: {code} private void includeTimestamp(final long timestamp) { ... else if (maximumTimestamp timestamp) { maximumTimestamp = timestamp; } return; } {code} Imagining the current maximumTimestamp of TimeRangeTracker is t0 before includeTimestamp(...) invoked, kv1.timestamp=t1, kv2.timestamp=t2, t1 and t2 are both set by user(then, user knows the timestamps of kv1 and kv2), and t1 t2 t0. T1 and T2 will be executed concurrently, therefore, the two threads might both find the current maximumTimestamp is less than the timestamp of its kv. After that, T1 and T2 will both set maximumTimestamp to timestamp of its kv. If T1 set maximumTimestamp before T2 doing that, the maximumTimestamp will be set to t2. Then, before any new update with bigger timestamp has been applied to the MemStore, if we try to read out kv1 by HTable#get and set the timestamp of 'Get' to t1, the StoreScanner will decide whether the MemStoreScanner(imagining kv1 has not been flushed) should be selected as candidate scanner by MemStoreScanner#shouldUseScanner. Then, the MemStore won't be selected in MemStoreScanner#shouldUseScanner because maximumTimestamp of the MemStore has been set to t2 (t2 t1). Consequently, the written kv1 can't be read out and kv1 is lost from user's perspective. If the above analysis is right, after maximumTimestamp of MemStore#timeRangeTracker has been set to t2, user will experience data lass in the following situations: 1. Before any new write with kv.timestamp t1 has been add to the MemStore, read request of kv1 with timestamp=t1 can not read kv1 out. 2. Before any new write with kv.timestamp t1 has been add to the MemStore, if a flush happened, the data of MemStore will be flushed to StoreFile with StoreFile#maximumTimestamp set to t2. After that, any read request with timestamp=t1 can not read kv1 before next compaction(Actually, kv1.timestamp might not be included in timeRange of the StoreFile even after compaction). The second situation is much more serious because the incorrect timeRange of MemStore has been persisted to the file. Similarly, the concurrent update of TimeRangeTracker#minimumTimestamp may also cause this problem. As a simple way to fix the problem, we could add synchronized to TimeRangeTracker#includeTimestamp so that this method won't be invoked concurrently. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10600) HTable#batch() should perform validation on empty Put
[ https://issues.apache.org/jira/browse/HBASE-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910680#comment-13910680 ] Andrew Purtell commented on HBASE-10600: Do you have a patch for this Ted? HTable#batch() should perform validation on empty Put - Key: HBASE-10600 URL: https://issues.apache.org/jira/browse/HBASE-10600 Project: HBase Issue Type: Bug Reporter: Ted Yu Raised by java8964 in this thread: http://osdir.com/ml/general/2014-02/msg44384.html When an empty Put is passed in the List to HTable#batch(), there is no validation performed whereas IllegalArgumentException would have been thrown if this empty Put in the simple Put API call. Validation on empty Put should be carried out in HTable#batch(). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10355) Failover RPC's from client using region replicas
[ https://issues.apache.org/jira/browse/HBASE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910688#comment-13910688 ] Devaraj Das commented on HBASE-10355: - bq. Something that we could do however, is to do a single check in HTable: if we have a get with Consistency != Strong, [~nkeywal], wondering if it is possible to set the configuration to have interrupts enabled from the HTable layer and pass it down to the RPC layer. Failover RPC's from client using region replicas Key: HBASE-10355 URL: https://issues.apache.org/jira/browse/HBASE-10355 Project: HBase Issue Type: Sub-task Components: Client Reporter: Enis Soztutar Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10355.v1.patch, 10355.v2.patch, 10355.v3.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10595) HBaseAdmin.getTableDescriptor can wrongly get the previous table's TableDescriptor even after the table dir in hdfs is removed
[ https://issues.apache.org/jira/browse/HBASE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910694#comment-13910694 ] Enis Soztutar commented on HBASE-10595: --- Going to NN for checking whether table dir exists basically means that we should not be using the cache at all. Users are expected to not delete the table directory from the file system, which will cause further inconsistencies. Why do you think this is a problem? HBaseAdmin.getTableDescriptor can wrongly get the previous table's TableDescriptor even after the table dir in hdfs is removed -- Key: HBASE-10595 URL: https://issues.apache.org/jira/browse/HBASE-10595 Project: HBase Issue Type: Bug Components: master, util Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-10595-trunk_v1.patch, HBASE-10595-trunk_v2.patch, HBASE-10595-trunk_v3.patch When a table dir (in hdfs) is removed(by outside), HMaster will still return the cached TableDescriptor to client for getTableDescriptor request. On the contrary, HBaseAdmin.listTables() is handled correctly in current implementation, for a table whose table dir in hdfs is removed by outside, getTableDescriptor can still retrieve back a valid (old) table descriptor, while listTables says it doesn't exist, this is inconsistent The reason for this bug is because HMaster (via FSTableDescriptors) doesn't check if the table dir exists for getTableDescriptor() request, (while it lists all existing table dirs(not firstly respects cache) and returns accordingly for listTables() request) When a table is deleted via deleteTable, the cache will be cleared after the table dir and tableInfo file is removed, listTables/getTableDescriptor inconsistency should be transient(though still exists, when table dir is removed while cache is not cleared) and harder to expose -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10600) HTable#batch() should perform validation on empty Put
[ https://issues.apache.org/jira/browse/HBASE-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10600: --- Attachment: 10600-v1.txt Here is patch v1. HTable#batch() should perform validation on empty Put - Key: HBASE-10600 URL: https://issues.apache.org/jira/browse/HBASE-10600 Project: HBase Issue Type: Bug Reporter: Ted Yu Attachments: 10600-v1.txt Raised by java8964 in this thread: http://osdir.com/ml/general/2014-02/msg44384.html When an empty Put is passed in the List to HTable#batch(), there is no validation performed whereas IllegalArgumentException would have been thrown if this empty Put in the simple Put API call. Validation on empty Put should be carried out in HTable#batch(). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10590) Update contents about tracing in the Reference Guide
[ https://issues.apache.org/jira/browse/HBASE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HBASE-10590: - Attachment: HBASE-10590-0.patch attaching patch including fixes such as: * moved contents to newly created file, * fixed formatting of XML, * added usage about trace command on HBase shell. Update contents about tracing in the Reference Guide Key: HBASE-10590 URL: https://issues.apache.org/jira/browse/HBASE-10590 Project: HBase Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Priority: Minor Attachments: HBASE-10590-0.patch Adding explanation about client side settings and shell command for tracing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10355) Failover RPC's from client using region replicas
[ https://issues.apache.org/jira/browse/HBASE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910708#comment-13910708 ] Nicolas Liochon commented on HBASE-10355: - The connection is shared between the tables, so you don't really know: if the first get is on a table that don't have replica, then the connection will be w/o the separate writer. HTable knows very little about replica today. It only sees something when it receives a get with consistency != strong. Note that HBASE-10566 is about being able to have a single path (once the socket timeout it out of the way, we can have a thread pool for the readers and the writers). Failover RPC's from client using region replicas Key: HBASE-10355 URL: https://issues.apache.org/jira/browse/HBASE-10355 Project: HBase Issue Type: Sub-task Components: Client Reporter: Enis Soztutar Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10355.v1.patch, 10355.v2.patch, 10355.v3.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10590) Update contents about tracing in the Reference Guide
[ https://issues.apache.org/jira/browse/HBASE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HBASE-10590: - Labels: documentaion (was: ) Status: Patch Available (was: Open) Update contents about tracing in the Reference Guide Key: HBASE-10590 URL: https://issues.apache.org/jira/browse/HBASE-10590 Project: HBase Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Priority: Minor Labels: documentaion Attachments: HBASE-10590-0.patch Adding explanation about client side settings and shell command for tracing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10587) Master metrics clusterRequests is wrong
[ https://issues.apache.org/jira/browse/HBASE-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910726#comment-13910726 ] Hudson commented on HBASE-10587: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #168 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/168/]) HBASE-10587 Master metrics clusterRequests is wrong (jxiang: rev 1571357) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterMetrics.java Master metrics clusterRequests is wrong --- Key: HBASE-10587 URL: https://issues.apache.org/jira/browse/HBASE-10587 Project: HBase Issue Type: Bug Components: master, metrics Affects Versions: 0.96.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: hbase-10587.patch In the master jmx, metrics clusterRequests increases so fast. Looked into the code and found the calculation is a little bit wrong. It's a counter. However, for each region server report, the total number of requests is added to clusterRequests. That means it's added multiple times. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client
[ https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910746#comment-13910746 ] Hadoop QA commented on HBASE-10566: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630733/10566.v2.patch against trunk revision . ATTACHMENT ID: 12630733 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 3 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8787//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8787//console This message is automatically generated. cleanup rpcTimeout in the client Key: HBASE-10566 URL: https://issues.apache.org/jira/browse/HBASE-10566 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch There are two issues: 1) A confusion between the socket timeout and the call timeout Socket timeouts should be minimal: a default like 20 seconds, that could be lowered to single digits timeouts for some apps: if we can not write to the socket in 10 second, we have an issue. This is different from the total duration (send query + do query + receive query), that can be longer, as it can include remotes calls on the server and so on. Today, we have a single value, it does not allow us to have low socket read timeouts. 2) The timeout can be different between the calls. Typically, if the total time, retries included is 60 seconds but failed after 2 seconds, then the remaining is 58s. HBase does this today, but by hacking with a thread local storage variable. It's a hack (it should have been a parameter of the methods, the TLS allowed to bypass all the layers. May be protobuf makes this complicated, to be confirmed), but as well it does not really work, because we can have multithreading issues (we use the updated rpc timeout of someone else, or we create a new BlockingRpcChannelImplementation with a random default timeout). Ideally, we could send the call timeout to the server as well: it will be able to dismiss alone the calls that it received but git stick in the request queue or in the internal retries (on hdfs for example). This will make the system more reactive to
[jira] [Commented] (HBASE-10597) IOEngine#read() should return the number of bytes transferred
[ https://issues.apache.org/jira/browse/HBASE-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910754#comment-13910754 ] Hadoop QA commented on HBASE-10597: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630734/10597-v3.txt against trunk revision . ATTACHMENT ID: 12630734 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8788//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8788//console This message is automatically generated. IOEngine#read() should return the number of bytes transferred - Key: HBASE-10597 URL: https://issues.apache.org/jira/browse/HBASE-10597 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Attachments: 10597-v1.txt, 10597-v2.txt, 10597-v3.txt IOEngine#read() is called by BucketCache#getBlock(). IOEngine#read() should return the number of bytes transferred so that BucketCache#getBlock() can check this return value against the length obtained from bucketEntry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10587) Master metrics clusterRequests is wrong
[ https://issues.apache.org/jira/browse/HBASE-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910769#comment-13910769 ] Hudson commented on HBASE-10587: SUCCESS: Integrated in hbase-0.96 #311 (See [https://builds.apache.org/job/hbase-0.96/311/]) HBASE-10587 Master metrics clusterRequests is wrong (jxiang: rev 1571358) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterMetrics.java Master metrics clusterRequests is wrong --- Key: HBASE-10587 URL: https://issues.apache.org/jira/browse/HBASE-10587 Project: HBase Issue Type: Bug Components: master, metrics Affects Versions: 0.96.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: hbase-10587.patch In the master jmx, metrics clusterRequests increases so fast. Looked into the code and found the calculation is a little bit wrong. It's a counter. However, for each region server report, the total number of requests is added to clusterRequests. That means it's added multiple times. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10587) Master metrics clusterRequests is wrong
[ https://issues.apache.org/jira/browse/HBASE-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910809#comment-13910809 ] Hudson commented on HBASE-10587: FAILURE: Integrated in HBase-TRUNK #4948 (See [https://builds.apache.org/job/HBase-TRUNK/4948/]) HBASE-10587 Master metrics clusterRequests is wrong (jxiang: rev 1571354) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterMetrics.java Master metrics clusterRequests is wrong --- Key: HBASE-10587 URL: https://issues.apache.org/jira/browse/HBASE-10587 Project: HBase Issue Type: Bug Components: master, metrics Affects Versions: 0.96.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: hbase-10587.patch In the master jmx, metrics clusterRequests increases so fast. Looked into the code and found the calculation is a little bit wrong. It's a counter. However, for each region server report, the total number of requests is added to clusterRequests. That means it's added multiple times. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10597) IOEngine#read() should return the number of bytes transferred
[ https://issues.apache.org/jira/browse/HBASE-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10597: --- Fix Version/s: 0.99.0 0.98.1 IOEngine#read() should return the number of bytes transferred - Key: HBASE-10597 URL: https://issues.apache.org/jira/browse/HBASE-10597 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.1, 0.99.0 Attachments: 10597-v1.txt, 10597-v2.txt, 10597-v3.txt IOEngine#read() is called by BucketCache#getBlock(). IOEngine#read() should return the number of bytes transferred so that BucketCache#getBlock() can check this return value against the length obtained from bucketEntry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10597) IOEngine#read() should return the number of bytes transferred
[ https://issues.apache.org/jira/browse/HBASE-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10597: --- Hadoop Flags: Reviewed IOEngine#read() should return the number of bytes transferred - Key: HBASE-10597 URL: https://issues.apache.org/jira/browse/HBASE-10597 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.1, 0.99.0 Attachments: 10597-v1.txt, 10597-v2.txt, 10597-v3.txt IOEngine#read() is called by BucketCache#getBlock(). IOEngine#read() should return the number of bytes transferred so that BucketCache#getBlock() can check this return value against the length obtained from bucketEntry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10601) Upgrade hadoop to 2.3.0 release
Ted Yu created HBASE-10601: -- Summary: Upgrade hadoop to 2.3.0 release Key: HBASE-10601 URL: https://issues.apache.org/jira/browse/HBASE-10601 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.99.0 Apache Hadoop 2.3.0 has been released. This issue is to upgrade hadoop dependency to 2.3.0 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10601) Upgrade hadoop to 2.3.0 release
[ https://issues.apache.org/jira/browse/HBASE-10601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10601: --- Status: Patch Available (was: Open) Upgrade hadoop to 2.3.0 release --- Key: HBASE-10601 URL: https://issues.apache.org/jira/browse/HBASE-10601 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.99.0 Attachments: 10601-v1.txt Apache Hadoop 2.3.0 has been released. This issue is to upgrade hadoop dependency to 2.3.0 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10601) Upgrade hadoop to 2.3.0 release
[ https://issues.apache.org/jira/browse/HBASE-10601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10601: --- Attachment: 10601-v1.txt Upgrade hadoop to 2.3.0 release --- Key: HBASE-10601 URL: https://issues.apache.org/jira/browse/HBASE-10601 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.99.0 Attachments: 10601-v1.txt Apache Hadoop 2.3.0 has been released. This issue is to upgrade hadoop dependency to 2.3.0 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10590) Update contents about tracing in the Reference Guide
[ https://issues.apache.org/jira/browse/HBASE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910868#comment-13910868 ] Hadoop QA commented on HBASE-10590: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630779/HBASE-10590-0.patch against trunk revision . ATTACHMENT ID: 12630779 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8789//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8789//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8789//console This message is automatically generated. Update contents about tracing in the Reference Guide Key: HBASE-10590 URL: https://issues.apache.org/jira/browse/HBASE-10590 Project: HBase Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Priority: Minor Labels: documentaion Attachments: HBASE-10590-0.patch Adding explanation about client side settings and shell command for tracing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10590) Update contents about tracing in the Reference Guide
[ https://issues.apache.org/jira/browse/HBASE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10590: -- Resolution: Fixed Fix Version/s: 0.99.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk. Thank you for the excellent addition to our doc. Update contents about tracing in the Reference Guide Key: HBASE-10590 URL: https://issues.apache.org/jira/browse/HBASE-10590 Project: HBase Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Minor Labels: documentaion Fix For: 0.99.0 Attachments: HBASE-10590-0.patch Adding explanation about client side settings and shell command for tracing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-10590) Update contents about tracing in the Reference Guide
[ https://issues.apache.org/jira/browse/HBASE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reassigned HBASE-10590: - Assignee: Masatake Iwasaki Made you a contributor Masatake. Update contents about tracing in the Reference Guide Key: HBASE-10590 URL: https://issues.apache.org/jira/browse/HBASE-10590 Project: HBase Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Minor Labels: documentaion Fix For: 0.99.0 Attachments: HBASE-10590-0.patch Adding explanation about client side settings and shell command for tracing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10600) HTable#batch() should perform validation on empty Put
[ https://issues.apache.org/jira/browse/HBASE-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10600: --- Attachment: 10600-v2.txt Patch v2 adds a test. HTable#batch() should perform validation on empty Put - Key: HBASE-10600 URL: https://issues.apache.org/jira/browse/HBASE-10600 Project: HBase Issue Type: Bug Reporter: Ted Yu Attachments: 10600-v1.txt, 10600-v2.txt Raised by java8964 in this thread: http://osdir.com/ml/general/2014-02/msg44384.html When an empty Put is passed in the List to HTable#batch(), there is no validation performed whereas IllegalArgumentException would have been thrown if this empty Put in the simple Put API call. Validation on empty Put should be carried out in HTable#batch(). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10600) HTable#batch() should perform validation on empty Put
[ https://issues.apache.org/jira/browse/HBASE-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10600: --- Assignee: Ted Yu Status: Patch Available (was: Open) HTable#batch() should perform validation on empty Put - Key: HBASE-10600 URL: https://issues.apache.org/jira/browse/HBASE-10600 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Attachments: 10600-v1.txt, 10600-v2.txt Raised by java8964 in this thread: http://osdir.com/ml/general/2014-02/msg44384.html When an empty Put is passed in the List to HTable#batch(), there is no validation performed whereas IllegalArgumentException would have been thrown if this empty Put in the simple Put API call. Validation on empty Put should be carried out in HTable#batch(). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10601) Upgrade hadoop to 2.3.0 release
[ https://issues.apache.org/jira/browse/HBASE-10601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910914#comment-13910914 ] Lars Hofhansl commented on HBASE-10601: --- Should we rather have 2.2 and 2.3 as an option, and default to 2.3? Upgrade hadoop to 2.3.0 release --- Key: HBASE-10601 URL: https://issues.apache.org/jira/browse/HBASE-10601 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.99.0 Attachments: 10601-v1.txt Apache Hadoop 2.3.0 has been released. This issue is to upgrade hadoop dependency to 2.3.0 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10601) Upgrade hadoop to 2.3.0 release
[ https://issues.apache.org/jira/browse/HBASE-10601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-10601: -- Attachment: 10601-0.94.txt Here's what I had in mind for 0.94. I was planning to do that in a separate jira, but might as well do it here. Note the change for protobuf specific to the Hadoop version. Upgrade hadoop to 2.3.0 release --- Key: HBASE-10601 URL: https://issues.apache.org/jira/browse/HBASE-10601 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.99.0 Attachments: 10601-0.94.txt, 10601-v1.txt Apache Hadoop 2.3.0 has been released. This issue is to upgrade hadoop dependency to 2.3.0 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10600) HTable#batch() should perform validation on empty Put
[ https://issues.apache.org/jira/browse/HBASE-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10600: -- Priority: Trivial (was: Major) HTable#batch() should perform validation on empty Put - Key: HBASE-10600 URL: https://issues.apache.org/jira/browse/HBASE-10600 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Trivial Attachments: 10600-v1.txt, 10600-v2.txt Raised by java8964 in this thread: http://osdir.com/ml/general/2014-02/msg44384.html When an empty Put is passed in the List to HTable#batch(), there is no validation performed whereas IllegalArgumentException would have been thrown if this empty Put in the simple Put API call. Validation on empty Put should be carried out in HTable#batch(). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10600) HTable#batch() should perform validation on empty Put
[ https://issues.apache.org/jira/browse/HBASE-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910917#comment-13910917 ] stack commented on HBASE-10600: --- Why would we allow an empty Put in the first place? HTable#batch() should perform validation on empty Put - Key: HBASE-10600 URL: https://issues.apache.org/jira/browse/HBASE-10600 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Attachments: 10600-v1.txt, 10600-v2.txt Raised by java8964 in this thread: http://osdir.com/ml/general/2014-02/msg44384.html When an empty Put is passed in the List to HTable#batch(), there is no validation performed whereas IllegalArgumentException would have been thrown if this empty Put in the simple Put API call. Validation on empty Put should be carried out in HTable#batch(). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10601) Upgrade hadoop to 2.3.0 release
[ https://issues.apache.org/jira/browse/HBASE-10601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910919#comment-13910919 ] Ted Yu commented on HBASE-10601: Patch v1 is for trunk (0.99). For 0.94, I am fine with Lars' patch. Upgrade hadoop to 2.3.0 release --- Key: HBASE-10601 URL: https://issues.apache.org/jira/browse/HBASE-10601 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.99.0 Attachments: 10601-0.94.txt, 10601-v1.txt Apache Hadoop 2.3.0 has been released. This issue is to upgrade hadoop dependency to 2.3.0 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10590) Update contents about tracing in the Reference Guide
[ https://issues.apache.org/jira/browse/HBASE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910924#comment-13910924 ] Masatake Iwasaki commented on HBASE-10590: -- Thanks [~stack]! Update contents about tracing in the Reference Guide Key: HBASE-10590 URL: https://issues.apache.org/jira/browse/HBASE-10590 Project: HBase Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Minor Labels: documentaion Fix For: 0.99.0 Attachments: HBASE-10590-0.patch Adding explanation about client side settings and shell command for tracing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10601) Upgrade hadoop to 2.3.0 release
[ https://issues.apache.org/jira/browse/HBASE-10601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10601: --- Status: Open (was: Patch Available) Upgrade hadoop to 2.3.0 release --- Key: HBASE-10601 URL: https://issues.apache.org/jira/browse/HBASE-10601 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.99.0 Attachments: 10601-0.94.txt, 10601-v1.txt Apache Hadoop 2.3.0 has been released. This issue is to upgrade hadoop dependency to 2.3.0 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client
[ https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910954#comment-13910954 ] stack commented on HBASE-10566: --- Fix javadoc warning on commit? This is a a great comment: We're spending a lot of time wrapping the exceptions, and then unwrapping them to discover what really happened. File an issue for this one when you get a chance. Patch looks great to me. Commit. cleanup rpcTimeout in the client Key: HBASE-10566 URL: https://issues.apache.org/jira/browse/HBASE-10566 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch There are two issues: 1) A confusion between the socket timeout and the call timeout Socket timeouts should be minimal: a default like 20 seconds, that could be lowered to single digits timeouts for some apps: if we can not write to the socket in 10 second, we have an issue. This is different from the total duration (send query + do query + receive query), that can be longer, as it can include remotes calls on the server and so on. Today, we have a single value, it does not allow us to have low socket read timeouts. 2) The timeout can be different between the calls. Typically, if the total time, retries included is 60 seconds but failed after 2 seconds, then the remaining is 58s. HBase does this today, but by hacking with a thread local storage variable. It's a hack (it should have been a parameter of the methods, the TLS allowed to bypass all the layers. May be protobuf makes this complicated, to be confirmed), but as well it does not really work, because we can have multithreading issues (we use the updated rpc timeout of someone else, or we create a new BlockingRpcChannelImplementation with a random default timeout). Ideally, we could send the call timeout to the server as well: it will be able to dismiss alone the calls that it received but git stick in the request queue or in the internal retries (on hdfs for example). This will make the system more reactive to failure. I think we can solve this now, especially after 10525. The main issue is to something that fits well with protobuf... Then it should be easy to have a pool of thread for writers and readers, w/o a single thread per region server as today. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10451) Enable back Tag compression on HFiles
[ https://issues.apache.org/jira/browse/HBASE-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910956#comment-13910956 ] Ted Yu commented on HBASE-10451: {code} -TagCompressionContext context = new TagCompressionContext(LRUDictionary.class); +TagCompressionContext context = new TagCompressionContext(LRUDictionary.class, Byte.MAX_VALUE); {code} There are some calls to TagCompressionContext ctor where Short.MAX_VALUE is used. Is the different capacity intended ? Enable back Tag compression on HFiles - Key: HBASE-10451 URL: https://issues.apache.org/jira/browse/HBASE-10451 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Critical Fix For: 0.98.1, 0.99.0 Attachments: HBASE-10451.patch, HBASE-10451_V2.patch, HBASE-10451_V3.patch, HBASE-10451_V4.patch, HBASE-10451_V5.patch, HBASE-10451_V6.patch HBASE-10443 disables tag compression on HFiles. This Jira is to fix the issues we have found out in HBASE-10443 and enable it back. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10601) Upgrade hadoop to 2.3.0 release
[ https://issues.apache.org/jira/browse/HBASE-10601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910974#comment-13910974 ] Lars Hofhansl commented on HBASE-10601: --- [~apurtell], [~stack], any opinions for 0.96 and 0.98? Upgrade hadoop to 2.3.0 release --- Key: HBASE-10601 URL: https://issues.apache.org/jira/browse/HBASE-10601 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.99.0 Attachments: 10601-0.94.txt, 10601-v1.txt Apache Hadoop 2.3.0 has been released. This issue is to upgrade hadoop dependency to 2.3.0 -- This message was sent by Atlassian JIRA (v6.1.5#6160)