[jira] [Commented] (HBASE-1403) Move transactional and indexer hbase out of core into contrib or out to their own project
[ https://issues.apache.org/jira/browse/HBASE-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1401#comment-1401 ] Liang Li commented on HBASE-1403: - OK ,I see it ,thank you very much! Move transactional and indexer hbase out of core into contrib or out to their own project - Key: HBASE-1403 URL: https://issues.apache.org/jira/browse/HBASE-1403 Project: HBase Issue Type: Task Reporter: stack Fix For: 0.20.0 Its a bit of work bringing along these hbase subclasses as changes happen in core. They are substantial enough contributions, they could be their own projects -- if there was the will to keep them up. Otherwise, we could move them down into contrib. -- that'd better demark core and this hbase customizations. I wrote Clint asking what he thought. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11679) Replace HTable with HTableInterface where backwards-compatible
[ https://issues.apache.org/jira/browse/HBASE-11679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088892#comment-14088892 ] Hadoop QA commented on HBASE-11679: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660322/HBASE_11679.patch against trunk revision . ATTACHMENT ID: 12660322 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 444 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10334//console This message is automatically generated. Replace HTable with HTableInterface where backwards-compatible -- Key: HBASE-11679 URL: https://issues.apache.org/jira/browse/HBASE-11679 Project: HBase Issue Type: Improvement Reporter: Carter Assignee: Carter Attachments: HBASE_11679.patch, HBASE_11679.patch This is a refactor to move more of the code towards using interfaces for proper encapsulation of logic. The amount of code touched is large, but it should be fairly easy to review. It changes variable declarations from HTable to HTableInterface where the following holds: # The declaration being updated won't break assignment # The declaration change does not break the compile (eg trying to access non-interface methods) The two main situations are to change something like this: {code} HTable h = new HTable(c, tn); {code} to {code} HTableInterface h = new HTable(c, tn); {code} and this: {code} public void doSomething(HTable h) { ... } {code} to this: {code} public void doSomething(HTableInterface h) { ... } {code} This gets most of the obvious cases out of the way and prepares for more complicated interface refactors in the future. In method signatures, I changed parameters, but did _not_ change any public or protected method return values, since that would violate criteria #1 above and break compatibility. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11695) PeriodicFlusher and WakeFrequency issues
Lars Hofhansl created HBASE-11695: - Summary: PeriodicFlusher and WakeFrequency issues Key: HBASE-11695 URL: https://issues.apache.org/jira/browse/HBASE-11695 Project: HBase Issue Type: Bug Affects Versions: 0.94.21 Reporter: Lars Hofhansl We just ran into a flush storm caused by the PeriodicFlusher. Many memstore became eligible for flushing at exactly the same time, the effect we've seen is that the exact same region was flushed multiple times, because the flusher wakes up too often (every 10s). The jitter of 20s is larger than that and it takes some time to actually flush the memstore. Here's one example. We've seen 100's of these, monopolizing the flush queue and preventing important flushes from happening. {code} 06-Aug-2014 20:11:56 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 13449 06-Aug-2014 20:12:06 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 14060 {code} So we need to increase the period of the PeriodicFlusher to at least the random jitter, also increase the default random jitter (20s does not help with many regions). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11695) PeriodicFlusher and WakeFrequency issues
[ https://issues.apache.org/jira/browse/HBASE-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-11695: -- Priority: Critical (was: Major) PeriodicFlusher and WakeFrequency issues Key: HBASE-11695 URL: https://issues.apache.org/jira/browse/HBASE-11695 Project: HBase Issue Type: Bug Affects Versions: 0.94.21 Reporter: Lars Hofhansl Priority: Critical We just ran into a flush storm caused by the PeriodicFlusher. Many memstore became eligible for flushing at exactly the same time, the effect we've seen is that the exact same region was flushed multiple times, because the flusher wakes up too often (every 10s). The jitter of 20s is larger than that and it takes some time to actually flush the memstore. Here's one example. We've seen 100's of these, monopolizing the flush queue and preventing important flushes from happening. {code} 06-Aug-2014 20:11:56 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 13449 06-Aug-2014 20:12:06 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 14060 {code} So we need to increase the period of the PeriodicFlusher to at least the random jitter, also increase the default random jitter (20s does not help with many regions). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11695) PeriodicFlusher and WakeFrequency issues
[ https://issues.apache.org/jira/browse/HBASE-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088938#comment-14088938 ] Lars Hofhansl commented on HBASE-11695: --- We can do what we do what the CompactionChecker does and add a multiplier. (As an aside the default multiplier for the CompactionChecker is 1000, so it would only check every 1s = 2h 46m, isn't that too rarely?) Another option is to set the period like this: max(wakeFrequency, 2*jitter, flushInteral/10). I.e. we # do not wake up more often that wakeFrequency # do not wake up such that we would request flush of the same region multiple times (2*jitter) # only wakeup often enough to satisfy the flushInterval with an accuracy of 10% The jitter is hardcoded to 20s. wakeFrequency defaults to 10s (it's not actually a frequency, btw), and flushInterval defaults to 1h. So with these defaults we'd wake up to check every 360s, which seems more like it. Or maybe just max(wakeFrequency, 2*jitter)... I.e. every 40s with default settings. But maybe that's too complicate and we just define another multiplier, or a complete new setting - mean another config option, though. PeriodicFlusher and WakeFrequency issues Key: HBASE-11695 URL: https://issues.apache.org/jira/browse/HBASE-11695 Project: HBase Issue Type: Bug Affects Versions: 0.94.21 Reporter: Lars Hofhansl We just ran into a flush storm caused by the PeriodicFlusher. Many memstore became eligible for flushing at exactly the same time, the effect we've seen is that the exact same region was flushed multiple times, because the flusher wakes up too often (every 10s). The jitter of 20s is larger than that and it takes some time to actually flush the memstore. Here's one example. We've seen 100's of these, monopolizing the flush queue and preventing important flushes from happening. {code} 06-Aug-2014 20:11:56 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 13449 06-Aug-2014 20:12:06 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 14060 {code} So we need to increase the period of the PeriodicFlusher to at least the random jitter, also increase the default random jitter (20s does not help with many regions). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HBASE-11695) PeriodicFlusher and WakeFrequency issues
[ https://issues.apache.org/jira/browse/HBASE-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl reassigned HBASE-11695: - Assignee: Lars Hofhansl PeriodicFlusher and WakeFrequency issues Key: HBASE-11695 URL: https://issues.apache.org/jira/browse/HBASE-11695 Project: HBase Issue Type: Bug Affects Versions: 0.94.21 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Critical We just ran into a flush storm caused by the PeriodicFlusher. Many memstore became eligible for flushing at exactly the same time, the effect we've seen is that the exact same region was flushed multiple times, because the flusher wakes up too often (every 10s). The jitter of 20s is larger than that and it takes some time to actually flush the memstore. Here's one example. We've seen 100's of these, monopolizing the flush queue and preventing important flushes from happening. {code} 06-Aug-2014 20:11:56 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 13449 06-Aug-2014 20:12:06 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 14060 {code} So we need to increase the period of the PeriodicFlusher to at least the random jitter, also increase the default random jitter (20s does not help with many regions). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HBASE-11695) PeriodicFlusher and WakeFrequency issues
[ https://issues.apache.org/jira/browse/HBASE-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088938#comment-14088938 ] Lars Hofhansl edited comment on HBASE-11695 at 8/7/14 7:21 AM: --- We can do what we do what the CompactionChecker does and add a multiplier. (As an aside the default multiplier for the CompactionChecker is 1000, so it would only check every 1s = 2h 46m, isn't that too rarely?) Another option is to set the period like this: max(wakeFrequency, 2*jitter, flushInteral/10). I.e. we # do not wake up more often that wakeFrequency # do not wake up such that we would request flush of the same region multiple times (2*jitter) # only wakeup often enough to satisfy the flushInterval with an accuracy of 10% (flushInterval/10) The jitter is hardcoded to 20s. wakeFrequency defaults to 10s (it's not actually a frequency, btw), and flushInterval defaults to 1h. So with these defaults we'd wake up to check every 360s, which seems more like it. Or maybe just max(wakeFrequency, 2*jitter)... I.e. every 40s with default settings. But maybe that's too complicated and we just define another multiplier, or a complete new setting - means another config option, though. was (Author: lhofhansl): We can do what we do what the CompactionChecker does and add a multiplier. (As an aside the default multiplier for the CompactionChecker is 1000, so it would only check every 1s = 2h 46m, isn't that too rarely?) Another option is to set the period like this: max(wakeFrequency, 2*jitter, flushInteral/10). I.e. we # do not wake up more often that wakeFrequency # do not wake up such that we would request flush of the same region multiple times (2*jitter) # only wakeup often enough to satisfy the flushInterval with an accuracy of 10% The jitter is hardcoded to 20s. wakeFrequency defaults to 10s (it's not actually a frequency, btw), and flushInterval defaults to 1h. So with these defaults we'd wake up to check every 360s, which seems more like it. Or maybe just max(wakeFrequency, 2*jitter)... I.e. every 40s with default settings. But maybe that's too complicate and we just define another multiplier, or a complete new setting - mean another config option, though. PeriodicFlusher and WakeFrequency issues Key: HBASE-11695 URL: https://issues.apache.org/jira/browse/HBASE-11695 Project: HBase Issue Type: Bug Affects Versions: 0.94.21 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Critical We just ran into a flush storm caused by the PeriodicFlusher. Many memstore became eligible for flushing at exactly the same time, the effect we've seen is that the exact same region was flushed multiple times, because the flusher wakes up too often (every 10s). The jitter of 20s is larger than that and it takes some time to actually flush the memstore. Here's one example. We've seen 100's of these, monopolizing the flush queue and preventing important flushes from happening. {code} 06-Aug-2014 20:11:56 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 13449 06-Aug-2014 20:12:06 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 14060 {code} So we need to increase the period of the PeriodicFlusher to at least the random jitter, also increase the default random jitter (20s does not help with many regions). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11693) Backport HBASE-11026 (Provide option to filter out all rows in PerformanceEvaluation tool) to 0.94
[ https://issues.apache.org/jira/browse/HBASE-11693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088951#comment-14088951 ] Lars Hofhansl commented on HBASE-11693: --- +1 Backport HBASE-11026 (Provide option to filter out all rows in PerformanceEvaluation tool) to 0.94 -- Key: HBASE-11693 URL: https://issues.apache.org/jira/browse/HBASE-11693 Project: HBase Issue Type: Task Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 0.94.23 Attachments: HBASE-11693.patch Backport HBASE-11026 (Provide option to filter out all rows in PerformanceEvaluation tool) to 0.94 branch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11690) Backport HBASE-5934 (Add the ability for Performance Evaluation to set the table compression) to 0.94
[ https://issues.apache.org/jira/browse/HBASE-11690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088959#comment-14088959 ] Lars Hofhansl commented on HBASE-11690: --- +1 Backport HBASE-5934 (Add the ability for Performance Evaluation to set the table compression) to 0.94 - Key: HBASE-11690 URL: https://issues.apache.org/jira/browse/HBASE-11690 Project: HBase Issue Type: Task Affects Versions: 0.94.23 Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Attachments: HBASE-11690.patch Backport HBASE-5934 (Add the ability for Performance Evaluation to set the table compression) to 0.94 branch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11691) Backport HBASE-7156 (Add Data Block Encoding and -D opts to Performance Evaluation) to 0.94
[ https://issues.apache.org/jira/browse/HBASE-11691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088961#comment-14088961 ] Lars Hofhansl commented on HBASE-11691: --- Looks good to. Thanks [~apurtell]. +1 Backport HBASE-7156 (Add Data Block Encoding and -D opts to Performance Evaluation) to 0.94 --- Key: HBASE-11691 URL: https://issues.apache.org/jira/browse/HBASE-11691 Project: HBase Issue Type: Task Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 0.94.23 Attachments: HBASE-11691.patch Backport HBASE-7156 (Add Data Block Encoding and -D opts to Performance Evaluation) to 0.94 branch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11695) PeriodicFlusher and WakeFrequency issues
[ https://issues.apache.org/jira/browse/HBASE-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-11695: -- Fix Version/s: 0.98.6 0.94.23 2.0.0 0.99.0 PeriodicFlusher and WakeFrequency issues Key: HBASE-11695 URL: https://issues.apache.org/jira/browse/HBASE-11695 Project: HBase Issue Type: Bug Affects Versions: 0.94.21 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Critical Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 We just ran into a flush storm caused by the PeriodicFlusher. Many memstore became eligible for flushing at exactly the same time, the effect we've seen is that the exact same region was flushed multiple times, because the flusher wakes up too often (every 10s). The jitter of 20s is larger than that and it takes some time to actually flush the memstore. Here's one example. We've seen 100's of these, monopolizing the flush queue and preventing important flushes from happening. {code} 06-Aug-2014 20:11:56 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 13449 06-Aug-2014 20:12:06 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 14060 {code} So we need to increase the period of the PeriodicFlusher to at least the random jitter, also increase the default random jitter (20s does not help with many regions). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-4593) Design and document the official procedure for posting patches, commits, commit messages, etc. to smooth process and make integration with tools easier
[ https://issues.apache.org/jira/browse/HBASE-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088974#comment-14088974 ] Hadoop QA commented on HBASE-4593: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660324/HBASE-4593.patch against trunk revision . ATTACHMENT ID: 12660324 {color:red}-1 @author{color}. The patch appears to contain 2 @author tags which the Hadoop community has agreed to not allow in code contributions. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + xlink:href=https://issues.apache.org/jira/issues/?jql=project%20%3D%20HBASE%20AND%20labels%20in%20(beginner) +programlisting language=bournemvn clean install -DskipTests/programlisting + xlink:href=http://michaelmorello.blogspot.com/2011/09/hbase-subversion-eclipse-windows.html; +programlisting language=xml![CDATA[settings xmlns=http://maven.apache.org/SETTINGS/1.0.0; +$ MAVEN_OPTS=-Xmx2g mvn clean install -DskipTests assembly:single -Dassembly.file=hbase-assembly/src/main/assembly/src.xml -Prelease + $ MAVEN_OPTS=-Xmx3g mvn clean install -DskipTests javadoc:aggregate site assembly:single -Prelease + filenamehttps://svn.apache.org/repos/asf/hbase/hbase.apache.org/trunk/filename. + xlink:href=http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html; + classnameorg.apache.hadoop.hbase.chaos.factories.MonkeyConstants/classname + xlink:href=http://blog.cloudera.com/blog/2013/09/how-to-test-hbase-applications-using-popular-tools/; {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.TestRegionRebalancing {color:red}-1 core zombie tests{color}. There are 3 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10335//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10335//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10335//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10335//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10335//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10335//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10335//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10335//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10335//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10335//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10335//console This message is automatically generated. Design and document the official procedure for posting patches, commits, commit messages, etc. to smooth process and make integration with tools easier --- Key: HBASE-4593 URL: https://issues.apache.org/jira/browse/HBASE-4593 Project: HBase Issue Type: Task Components: documentation Reporter: Jonathan Gray Assignee: Misty Stanley-Jones Attachments: HBASE-4593.patch, HBASE-4593.pdf I have been
[jira] [Commented] (HBASE-11695) PeriodicFlusher and WakeFrequency issues
[ https://issues.apache.org/jira/browse/HBASE-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089023#comment-14089023 ] Nicolas Liochon commented on HBASE-11695: - When we request a flush, we check that there is not already a request by checking regionsInQueue. Should we remove the region from regionsInQueue once the flush is done and not just before doing it? PeriodicFlusher and WakeFrequency issues Key: HBASE-11695 URL: https://issues.apache.org/jira/browse/HBASE-11695 Project: HBase Issue Type: Bug Affects Versions: 0.94.21 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Critical Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 We just ran into a flush storm caused by the PeriodicFlusher. Many memstore became eligible for flushing at exactly the same time, the effect we've seen is that the exact same region was flushed multiple times, because the flusher wakes up too often (every 10s). The jitter of 20s is larger than that and it takes some time to actually flush the memstore. Here's one example. We've seen 100's of these, monopolizing the flush queue and preventing important flushes from happening. {code} 06-Aug-2014 20:11:56 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 13449 06-Aug-2014 20:12:06 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 14060 {code} So we need to increase the period of the PeriodicFlusher to at least the random jitter, also increase the default random jitter (20s does not help with many regions). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11695) PeriodicFlusher and WakeFrequency issues
[ https://issues.apache.org/jira/browse/HBASE-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089080#comment-14089080 ] Qiang Tian commented on HBASE-11695: interesting. from the latest code, there could be multiple code paths that request flush. ( PeriodicFlusher might be innocent :-)) and yes, in MemStoreFlusher#flushRegion, the entry is removed from regionsInQueue before the flush PeriodicFlusher and WakeFrequency issues Key: HBASE-11695 URL: https://issues.apache.org/jira/browse/HBASE-11695 Project: HBase Issue Type: Bug Affects Versions: 0.94.21 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Critical Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 We just ran into a flush storm caused by the PeriodicFlusher. Many memstore became eligible for flushing at exactly the same time, the effect we've seen is that the exact same region was flushed multiple times, because the flusher wakes up too often (every 10s). The jitter of 20s is larger than that and it takes some time to actually flush the memstore. Here's one example. We've seen 100's of these, monopolizing the flush queue and preventing important flushes from happening. {code} 06-Aug-2014 20:11:56 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 13449 06-Aug-2014 20:12:06 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 14060 {code} So we need to increase the period of the PeriodicFlusher to at least the random jitter, also increase the default random jitter (20s does not help with many regions). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11339) HBase MOB
[ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingcheng Du updated HBASE-11339: - Attachment: HBase MOB Design-v4.pdf A new version of design document. In this new version, the value of a cell in the mob column family consists two parts. 1. The value size of a mob data (first 8 bytes). 2. The path of a mob file. Whereas in the old version, the value only had the path of a mob file. HBase MOB - Key: HBASE-11339 URL: https://issues.apache.org/jira/browse/HBASE-11339 Project: HBase Issue Type: Umbrella Components: regionserver, Scanners Reporter: Jingcheng Du Assignee: Jingcheng Du Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide .docx, hbase-11339-in-dev.patch It's quite useful to save the medium binary data like images, documents into Apache HBase. Unfortunately directly saving the binary MOB(medium object) to HBase leads to a worse performance since the frequent split and compaction. In this design, the MOB data are stored in an more efficient way, which keeps a high write/read performance and guarantees the data consistency in Apache HBase. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11643) Read and write MOB in HBase
[ https://issues.apache.org/jira/browse/HBASE-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingcheng Du updated HBASE-11643: - Attachment: HBase-11643.diff Hi all, the 1st patch is uploaded. Please help review and comments. Thanks a lot! Read and write MOB in HBase --- Key: HBASE-11643 URL: https://issues.apache.org/jira/browse/HBASE-11643 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Jingcheng Du Assignee: Jingcheng Du Attachments: HBase-11643.diff The read/write MOB in HBase are implemented in this JIRA. Normally, the Cells are saved in the MemStore, and flushed to the HFiles when necessary. For MOB, the Cells are saved in the MemStore as well, but they're flushed to elsewhere out of HBase in the format of HFiles. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11643) Read and write MOB in HBase
[ https://issues.apache.org/jira/browse/HBASE-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089115#comment-14089115 ] Jingcheng Du commented on HBASE-11643: -- Also you could find the code in the review board, you could find it through the link https://reviews.apache.org/r/24448/. Thanks! Read and write MOB in HBase --- Key: HBASE-11643 URL: https://issues.apache.org/jira/browse/HBASE-11643 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Jingcheng Du Assignee: Jingcheng Du Attachments: HBase-11643.diff The read/write MOB in HBase are implemented in this JIRA. Normally, the Cells are saved in the MemStore, and flushed to the HFiles when necessary. For MOB, the Cells are saved in the MemStore as well, but they're flushed to elsewhere out of HBase in the format of HFiles. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11643) Read and write MOB in HBase
[ https://issues.apache.org/jira/browse/HBASE-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingcheng Du updated HBASE-11643: - Status: Patch Available (was: Open) Read and write MOB in HBase --- Key: HBASE-11643 URL: https://issues.apache.org/jira/browse/HBASE-11643 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Jingcheng Du Assignee: Jingcheng Du Attachments: HBase-11643.diff The read/write MOB in HBase are implemented in this JIRA. Normally, the Cells are saved in the MemStore, and flushed to the HFiles when necessary. For MOB, the Cells are saved in the MemStore as well, but they're flushed to elsewhere out of HBase in the format of HFiles. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11643) Read and write MOB in HBase
[ https://issues.apache.org/jira/browse/HBASE-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089123#comment-14089123 ] Hadoop QA commented on HBASE-11643: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660367/HBase-11643.diff against trunk revision . ATTACHMENT ID: 12660367 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 20 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10337//console This message is automatically generated. Read and write MOB in HBase --- Key: HBASE-11643 URL: https://issues.apache.org/jira/browse/HBASE-11643 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Jingcheng Du Assignee: Jingcheng Du Attachments: HBase-11643.diff The read/write MOB in HBase are implemented in this JIRA. Normally, the Cells are saved in the MemStore, and flushed to the HFiles when necessary. For MOB, the Cells are saved in the MemStore as well, but they're flushed to elsewhere out of HBase in the format of HFiles. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11527) Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap.
[ https://issues.apache.org/jira/browse/HBASE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-11527: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Stack for the review. I have added a TODO in HConstants as suggested. Pushed to master. Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap. - Key: HBASE-11527 URL: https://issues.apache.org/jira/browse/HBASE-11527 Project: HBase Issue Type: Bug Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-11527.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11689) Track meta in transition
[ https://issues.apache.org/jira/browse/HBASE-11689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089176#comment-14089176 ] Jimmy Xiang commented on HBASE-11689: - It's a different. ZK-based assignment uses ZK to coordinate, zk-less uses RPC. We can use the meta location znode to do the tracking. Currently, the meta location znode has a PB MetaRegionServer. We can add the state info to it, just as we did for region state in meta row. Track meta in transition Key: HBASE-11689 URL: https://issues.apache.org/jira/browse/HBASE-11689 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang With ZK-less region assignment, user regions in transition are tracked in meta. We need a way to track meta in transition too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HBASE-11689) Track meta in transition
[ https://issues.apache.org/jira/browse/HBASE-11689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089176#comment-14089176 ] Jimmy Xiang edited comment on HBASE-11689 at 8/7/14 12:41 PM: -- It's different. ZK-based assignment uses ZK to coordinate, zk-less uses RPC. We can use the meta location znode to do the tracking. Currently, the meta location znode has a PB MetaRegionServer. We can add the state info to it, just as we did for region state in meta row. was (Author: jxiang): It's a different. ZK-based assignment uses ZK to coordinate, zk-less uses RPC. We can use the meta location znode to do the tracking. Currently, the meta location znode has a PB MetaRegionServer. We can add the state info to it, just as we did for region state in meta row. Track meta in transition Key: HBASE-11689 URL: https://issues.apache.org/jira/browse/HBASE-11689 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang With ZK-less region assignment, user regions in transition are tracked in meta. We need a way to track meta in transition too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-4593) Design and document the official procedure for posting patches, commits, commit messages, etc. to smooth process and make integration with tools easier
[ https://issues.apache.org/jira/browse/HBASE-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089273#comment-14089273 ] Sean Busbey commented on HBASE-4593: There are a few places in the rendered PDF where it looks like spacing is missing between plain text and formatted (e.g. links, monospace). It looks like there are spaces in the source patch though, so probably a rendering artifact? What's the difference between section 18.1 and 18.10? they seem to be aimed at the same thing. If they're not, 18.1 should have a pointer to 18.10. {quote} +para If you are looking to contribute to Apache HBase, look for issues in JIRA tagged with +the label 'beginner': link + xlink:href=https://issues.apache.org/jira/issues/?jql=project%20%3D%20HBASE%20AND%20labels%20in%20(beginner) +project = HBASE AND labels in (beginner)/link. These are issues HBase {quote} personally, I think using issues in JIRA tagged with the label 'beginner' as teh link text (instead of the jira query text) makes the section read clearer. {quote} +link xlink:href=http://git.apache.org/;Apache Git/link page. /para {quote} nit: trailing whitespace between '.' and end of paragraph. {quote} +section +titleOther IDEs/title +paraTODO - Please contribute/para /section {quote} Can we make this more specific or leave it out? Ideally a follow-on umbrella jira that we can reference here. for subtasks of that umbrella, I know IntelliJ is popular with the IDE users I talk to. {quote} +codecompile-protobuf/code to do this./para +programlisting language=bournemvn compile -Dcompile-protobuf/programlisting +programlisting language=bournemvn compile -Pcompile-protobuf/programlisting {quote} This looks like I need to run both of these commands to rebuild the protobufs. I believe I only need to run one of them. We should pick which one is preferred and only suggest that one. The protoc.path example looks like we prefer the -Dcompile-protobuf version. I'm pretty sure the idiomatic way for maven is the profile, so perhaps we should make both use that? {quote} +section xml:id=build.snappy +titleBuilding in snappy compression support/title +paraPass code-Dsnappy/code to trigger the codesnappy/code maven profile for +building Google Snappy native libraries into HBase. See also xref +linkend=snappy.compression//para {quote} Similar to the protobuf bit, can we change this to use the profile directly? {quote} +paraHBase 0.96.x will run on Hadoop 1.x or Hadoop 2.x. HBase 0.98 still runs on both, +but HBase 0.98 deprecates use of Hadoop 1. HBase 1.x will emphasisnot/emphasis +run on Hadoop 1. In the following procedures, we make a distinction between HBase +1.x builds and the awkward process involved building HBase 0.96/0.98 for either +Hadoop 1 or Hadoop 2 targets. /para {quote} Maybe end with a link to the java ref for more info? {quote} +formalpara +titleMaven Version/title +paraYou must use maven 3.0.x (Check by running commandmvn -version/command). /para +/formalpara {quote} Is this generally true? Should we add it to the section on basic compilation? Seems more likely to trip up a new person trying to build than someone creating a release candidate. {quote} +titleBefore You Begin/title +paraBefore you make a release candidate, do a practise run by deploying a {quote} Do we have a documentation style guide that covers British v American english usage? {quote} +intervention is needed here), the checking of the produced artifacts to ensure +they are 'good' -- e.g. undoing the produced tarballs, eyeballing them to make +sure they look right then starting and checking all is running properly -- and {quote} unpacking would be clearer than undoing {quote} +titleIf you used the filenamemake_rc.sh/filename script instead of doing +the above manually,, do your sanity checks now./title {quote} nit: extraneous comma. {quote} +docbkx:generate-html/command (TODO: It looks like you have to run commandmvn +site/command first because docbkx wants to include a transformed +filenamehbase-default.xml/filename. Fix). When you run mvn site, we do the {quote} Can we make a jira for this and then reference it here? {quote} +section xml:id=hbase.rc.voting +titleVoting on Release Candidates/title +para Everyone is encouraged to try and vote on HBase release candidates.
[jira] [Commented] (HBASE-11692) Document how to do a manual region split
[ https://issues.apache.org/jira/browse/HBASE-11692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089278#comment-14089278 ] Sean Busbey commented on HBASE-11692: - I would definitely like us to have docs on common ops needs, such as handling manual splits. Document how to do a manual region split Key: HBASE-11692 URL: https://issues.apache.org/jira/browse/HBASE-11692 Project: HBase Issue Type: Task Components: documentation Reporter: Misty Stanley-Jones {quote} -- Forwarded message -- From: Liu, Ming (HPIT-GADSC) ming.l...@hp.com Date: Tue, Aug 5, 2014 at 11:28 PM Subject: Why hbase need manual split? To: u...@hbase.apache.org u...@hbase.apache.org Hi, all, As I understand, HBase will automatically split a region when the region is too big. So in what scenario, user needs to do a manual split? Could someone kindly give me some examples that user need to do the region split explicitly via HBase Shell or Java API? Thanks very much. Regards, Ming {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11692) Document how to do a manual region split
[ https://issues.apache.org/jira/browse/HBASE-11692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-11692: Description: {quote} -- Forwarded message -- From: Liu, Ming (HPIT-GADSC) ming.l...@hp.com Date: Tue, Aug 5, 2014 at 11:28 PM Subject: Why hbase need manual split? To: u...@hbase.apache.org u...@hbase.apache.org Hi, all, As I understand, HBase will automatically split a region when the region is too big. So in what scenario, user needs to do a manual split? Could someone kindly give me some examples that user need to do the region split explicitly via HBase Shell or Java API? Thanks very much. Regards, Ming {quote} was: {code} -- Forwarded message -- From: Liu, Ming (HPIT-GADSC) ming.l...@hp.com Date: Tue, Aug 5, 2014 at 11:28 PM Subject: Why hbase need manual split? To: u...@hbase.apache.org u...@hbase.apache.org Hi, all, As I understand, HBase will automatically split a region when the region is too big. So in what scenario, user needs to do a manual split? Could someone kindly give me some examples that user need to do the region split explicitly via HBase Shell or Java API? Thanks very much. Regards, Ming {code} Document how to do a manual region split Key: HBASE-11692 URL: https://issues.apache.org/jira/browse/HBASE-11692 Project: HBase Issue Type: Task Components: documentation Reporter: Misty Stanley-Jones {quote} -- Forwarded message -- From: Liu, Ming (HPIT-GADSC) ming.l...@hp.com Date: Tue, Aug 5, 2014 at 11:28 PM Subject: Why hbase need manual split? To: u...@hbase.apache.org u...@hbase.apache.org Hi, all, As I understand, HBase will automatically split a region when the region is too big. So in what scenario, user needs to do a manual split? Could someone kindly give me some examples that user need to do the region split explicitly via HBase Shell or Java API? Thanks very much. Regards, Ming {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11629) Operational concerns for Replication should call out ZooKeeper
[ https://issues.apache.org/jira/browse/HBASE-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089285#comment-14089285 ] Sean Busbey commented on HBASE-11629: - +1 LGTM {quote} @@ -1141,10 +1141,10 @@ false replicate and keeps track of the current position inside ZooKeeper to simplify failure recovery. That position, as well as the queue of WALs to process, may be different for every slave cluster./para - + paraThe clusters participating in replication can be of different sizes. The master cluster relies on randomization to attempt to balance the stream of replication on the slave clusters/para - + paraHBase supports master/master and cyclic replication as well as replication to multiple slaves./para {quote} nit: unrelated whitespace change. Operational concerns for Replication should call out ZooKeeper -- Key: HBASE-11629 URL: https://issues.apache.org/jira/browse/HBASE-11629 Project: HBase Issue Type: Bug Components: documentation, Replication Reporter: Sean Busbey Assignee: Misty Stanley-Jones Attachments: HBASE-11629.patch Our [design invariants state that ZooKeeper data is safe to delete|http://hbase.apache.org/book/developing.html#design.invariants.zk.data]. However, [replication only stores its data in zookeeper|http://hbase.apache.org/replication.html#Replication_Zookeeper_State]. This can lead to operators accidentally disabling their replication set up while attempting to recover from an unrelated issue by clearing the zk state. We should update the [operational concerns section on replication|http://hbase.apache.org/book/cluster_replication.html] to call out that the /hbase/replication tree should not be deleted. We should probably also add a warning to the set up steps. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11527) Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap.
[ https://issues.apache.org/jira/browse/HBASE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089289#comment-14089289 ] Hudson commented on HBASE-11527: FAILURE: Integrated in HBase-TRUNK #5378 (See [https://builds.apache.org/job/HBase-TRUNK/5378/]) HBASE-11527 Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap. (Anoop) (anoopsamjohn: rev 12d9697d934df90e0ed0261aa20446120c1086a6) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreChunkPool.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultHeapMemoryTuner.java * hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java * hbase-common/src/main/java/org/apache/hadoop/hbase/HBaseConfiguration.java * hbase-common/src/main/java/org/apache/hadoop/hbase/io/util/HeapMemorySizeUtil.java * hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestBlockCacheReporting.java * hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheConfig.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HeapMemoryManager.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap. - Key: HBASE-11527 URL: https://issues.apache.org/jira/browse/HBASE-11527 Project: HBase Issue Type: Bug Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-11527.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HBASE-6290) Add a function a mark a server as dead and start the recovery the process
[ https://issues.apache.org/jira/browse/HBASE-6290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Talat UYARER reassigned HBASE-6290: --- Assignee: Talat UYARER (was: Nicolas Liochon) Add a function a mark a server as dead and start the recovery the process - Key: HBASE-6290 URL: https://issues.apache.org/jira/browse/HBASE-6290 Project: HBase Issue Type: Improvement Components: monitoring Affects Versions: 0.95.2 Reporter: Nicolas Liochon Assignee: Talat UYARER Priority: Minor Labels: beginner ZooKeeper is used a a monitoring tool: we use znode and we start the recovery process when a znode is deleted by ZK because it got a timeout. This timeout is defaulted to 90 seconds, and often set to 30s However, some HW issues could be detected by specialized hw monitoring tools before the ZK timeout. For this reason, it makes sense to offer a very simple function to mark a RS as dead. This should not take in It could be a hbase shell function such as considerAsDead ipAddress|serverName This would delete all the znodes of the server running on this box, starting the recovery process. Such a function would be easily callable (at callers risk) by any fault detection tool... We could have issues to identify the right master region servers around ipv4 vs ipv6 vs and multi networked boxes however. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11527) Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap.
[ https://issues.apache.org/jira/browse/HBASE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089337#comment-14089337 ] stack commented on HBASE-11527: --- [~anoop.hbase] Can you add release note sir? Should this come back into branch-1 as addition to our ergonomic story where mem sizing shifts with work load? Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap. - Key: HBASE-11527 URL: https://issues.apache.org/jira/browse/HBASE-11527 Project: HBase Issue Type: Bug Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-11527.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11629) Operational concerns for Replication should call out ZooKeeper
[ https://issues.apache.org/jira/browse/HBASE-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-11629: -- Resolution: Fixed Fix Version/s: 2.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed to master. Thanks [~misty] (and [~busbey] for review) Operational concerns for Replication should call out ZooKeeper -- Key: HBASE-11629 URL: https://issues.apache.org/jira/browse/HBASE-11629 Project: HBase Issue Type: Bug Components: documentation, Replication Reporter: Sean Busbey Assignee: Misty Stanley-Jones Fix For: 2.0.0 Attachments: HBASE-11629.patch Our [design invariants state that ZooKeeper data is safe to delete|http://hbase.apache.org/book/developing.html#design.invariants.zk.data]. However, [replication only stores its data in zookeeper|http://hbase.apache.org/replication.html#Replication_Zookeeper_State]. This can lead to operators accidentally disabling their replication set up while attempting to recover from an unrelated issue by clearing the zk state. We should update the [operational concerns section on replication|http://hbase.apache.org/book/cluster_replication.html] to call out that the /hbase/replication tree should not be deleted. We should probably also add a warning to the set up steps. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11692) Document how to do a manual region split
[ https://issues.apache.org/jira/browse/HBASE-11692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089342#comment-14089342 ] stack commented on HBASE-11692: --- I liked the exposition in the mail thread on why you'd do it and that automated splitting does not help w/, for instance, timeseries. Document how to do a manual region split Key: HBASE-11692 URL: https://issues.apache.org/jira/browse/HBASE-11692 Project: HBase Issue Type: Task Components: documentation Reporter: Misty Stanley-Jones {quote} -- Forwarded message -- From: Liu, Ming (HPIT-GADSC) ming.l...@hp.com Date: Tue, Aug 5, 2014 at 11:28 PM Subject: Why hbase need manual split? To: u...@hbase.apache.org u...@hbase.apache.org Hi, all, As I understand, HBase will automatically split a region when the region is too big. So in what scenario, user needs to do a manual split? Could someone kindly give me some examples that user need to do the region split explicitly via HBase Shell or Java API? Thanks very much. Regards, Ming {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11689) Track meta in transition
[ https://issues.apache.org/jira/browse/HBASE-11689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089349#comment-14089349 ] stack commented on HBASE-11689: --- bq. We can add the state info to it, just as we did for region state in meta row. We'd have one system to look up user regions and another to look up meta regions? bq. Are we going to support many small meta regions? Lets back up and talk about being able to split meta first. Thats a big change. Lets make a system that will work for two and two hundred regions in meta. +1 on recording in znode meta transition state for now while one meta region only. Track meta in transition Key: HBASE-11689 URL: https://issues.apache.org/jira/browse/HBASE-11689 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Jimmy Xiang With ZK-less region assignment, user regions in transition are tracked in meta. We need a way to track meta in transition too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (HBASE-11527) Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap.
[ https://issues.apache.org/jira/browse/HBASE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John reopened HBASE-11527: Sure let me make patch for branch-1 as well. Reopening the jira. Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap. - Key: HBASE-11527 URL: https://issues.apache.org/jira/browse/HBASE-11527 Project: HBase Issue Type: Bug Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-11527.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11695) PeriodicFlusher and WakeFrequency issues
[ https://issues.apache.org/jira/browse/HBASE-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089376#comment-14089376 ] Lars Hofhansl commented on HBASE-11695: --- That is true, we do not actually flush the same region multiple times (except for a race Nicolas mentions), we just request the flush multiple times. In our case the storm was caused my many regions being eligible for periodic flushing at the same time, i.e. they've all been written into slowly not filling them within an hour. I also want to increase the jitter. It is still pointless to wake up the flusher thread every 10s when the jitter is 20s (or more) and the requested flush interval is 3600s. PeriodicFlusher and WakeFrequency issues Key: HBASE-11695 URL: https://issues.apache.org/jira/browse/HBASE-11695 Project: HBase Issue Type: Bug Affects Versions: 0.94.21 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Critical Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 We just ran into a flush storm caused by the PeriodicFlusher. Many memstore became eligible for flushing at exactly the same time, the effect we've seen is that the exact same region was flushed multiple times, because the flusher wakes up too often (every 10s). The jitter of 20s is larger than that and it takes some time to actually flush the memstore. Here's one example. We've seen 100's of these, monopolizing the flush queue and preventing important flushes from happening. {code} 06-Aug-2014 20:11:56 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 13449 06-Aug-2014 20:12:06 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 14060 {code} So we need to increase the period of the PeriodicFlusher to at least the random jitter, also increase the default random jitter (20s does not help with many regions). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11611) Clean up ZK-based region assignment
[ https://issues.apache.org/jira/browse/HBASE-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089384#comment-14089384 ] Jimmy Xiang commented on HBASE-11611: - No test failure but couple replica related tests hang. I messed it up in address some review comments. Checked in an addendum to fix it. Clean up ZK-based region assignment --- Key: HBASE-11611 URL: https://issues.apache.org/jira/browse/HBASE-11611 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 2.0.0 Attachments: hbase-11611.patch, hbase-11611_v1.patch, hbase-11611_v2.patch We can clean up the ZK-based region assignment code and use the ZK-less one in the master branch, to make the code easier to understand and maintain. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11611) Clean up ZK-based region assignment
[ https://issues.apache.org/jira/browse/HBASE-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-11611: Attachment: hbase-11611.addendum Clean up ZK-based region assignment --- Key: HBASE-11611 URL: https://issues.apache.org/jira/browse/HBASE-11611 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 2.0.0 Attachments: hbase-11611.addendum, hbase-11611.patch, hbase-11611_v1.patch, hbase-11611_v2.patch We can clean up the ZK-based region assignment code and use the ZK-less one in the master branch, to make the code easier to understand and maintain. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HBASE-11611) Clean up ZK-based region assignment
[ https://issues.apache.org/jira/browse/HBASE-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089384#comment-14089384 ] Jimmy Xiang edited comment on HBASE-11611 at 8/7/14 4:14 PM: - No test failure but couple replica related tests hang. I messed it up in addressing some review comments. Checked in the attached addendum to fix it. was (Author: jxiang): No test failure but couple replica related tests hang. I messed it up in address some review comments. Checked in an addendum to fix it. Clean up ZK-based region assignment --- Key: HBASE-11611 URL: https://issues.apache.org/jira/browse/HBASE-11611 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 2.0.0 Attachments: hbase-11611.addendum, hbase-11611.patch, hbase-11611_v1.patch, hbase-11611_v2.patch We can clean up the ZK-based region assignment code and use the ZK-less one in the master branch, to make the code easier to understand and maintain. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-6617) ReplicationSourceManager should be able to track multiple WAL paths
[ https://issues.apache.org/jira/browse/HBASE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-6617: --- Component/s: Replication ReplicationSourceManager should be able to track multiple WAL paths --- Key: HBASE-6617 URL: https://issues.apache.org/jira/browse/HBASE-6617 Project: HBase Issue Type: Sub-task Components: Replication Reporter: Ted Yu Currently ReplicationSourceManager uses logRolled() to receive notification about new HLog and remembers it in latestPath. When region server has multiple WAL support, we need to keep track of multiple Path's in ReplicationSourceManager -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11679) Replace HTable with HTableInterface where backwards-compatible
[ https://issues.apache.org/jira/browse/HBASE-11679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089386#comment-14089386 ] stack commented on HBASE-11679: --- Fails to apply since yesterday's big commit (files removed and two little rejects). I tried to bring it over but TestEndToEndSplitTransaction proves a little awkward. See what you think [~carterpage] Went through the patch looking for API breakage to see if ...did not change any public or protected method return values, since that would violate criteria #1 above and break compatibility holds. LGTM. Bulk of changes are in test otherwise, internal changes; no API change. +1 on commit to master. +1 on commit to branch-1 but such a big change probably needs [~enis] blessing. Replace HTable with HTableInterface where backwards-compatible -- Key: HBASE-11679 URL: https://issues.apache.org/jira/browse/HBASE-11679 Project: HBase Issue Type: Improvement Reporter: Carter Assignee: Carter Attachments: HBASE_11679.patch, HBASE_11679.patch This is a refactor to move more of the code towards using interfaces for proper encapsulation of logic. The amount of code touched is large, but it should be fairly easy to review. It changes variable declarations from HTable to HTableInterface where the following holds: # The declaration being updated won't break assignment # The declaration change does not break the compile (eg trying to access non-interface methods) The two main situations are to change something like this: {code} HTable h = new HTable(c, tn); {code} to {code} HTableInterface h = new HTable(c, tn); {code} and this: {code} public void doSomething(HTable h) { ... } {code} to this: {code} public void doSomething(HTableInterface h) { ... } {code} This gets most of the obvious cases out of the way and prepares for more complicated interface refactors in the future. In method signatures, I changed parameters, but did _not_ change any public or protected method return values, since that would violate criteria #1 above and break compatibility. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11629) Operational concerns for Replication should call out ZooKeeper
[ https://issues.apache.org/jira/browse/HBASE-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089390#comment-14089390 ] Hudson commented on HBASE-11629: ABORTED: Integrated in HBase-TRUNK #5379 (See [https://builds.apache.org/job/HBase-TRUNK/5379/]) HBASE-11629 Operational concerns for Replication should call out ZooKeeper (Misty Stanley Jones) (stack: rev 3fdc6a2b728c7280132bd31831f3838c24b0d0e3) * src/main/docbkx/developer.xml * src/main/docbkx/ops_mgt.xml Operational concerns for Replication should call out ZooKeeper -- Key: HBASE-11629 URL: https://issues.apache.org/jira/browse/HBASE-11629 Project: HBase Issue Type: Bug Components: documentation, Replication Reporter: Sean Busbey Assignee: Misty Stanley-Jones Fix For: 2.0.0 Attachments: HBASE-11629.patch Our [design invariants state that ZooKeeper data is safe to delete|http://hbase.apache.org/book/developing.html#design.invariants.zk.data]. However, [replication only stores its data in zookeeper|http://hbase.apache.org/replication.html#Replication_Zookeeper_State]. This can lead to operators accidentally disabling their replication set up while attempting to recover from an unrelated issue by clearing the zk state. We should update the [operational concerns section on replication|http://hbase.apache.org/book/cluster_replication.html] to call out that the /hbase/replication tree should not be deleted. We should probably also add a warning to the set up steps. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11575) Pseudo distributed mode does not work as documented
[ https://issues.apache.org/jira/browse/HBASE-11575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089405#comment-14089405 ] Jimmy Xiang commented on HBASE-11575: - Unfortunately, it is in branch-1 too. The problem is that master uses the same RPC port as a region server now. Not sure how to handle this better. Pseudo distributed mode does not work as documented Key: HBASE-11575 URL: https://issues.apache.org/jira/browse/HBASE-11575 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Assignee: Jimmy Xiang Priority: Critical Fix For: 1.0.0, 2.0.0 Attachments: hbase-11575.patch After master-RS colocation, now the pseudo dist-mode does not work as documented since you cannot start a region server in the same port 16020. I think we can either select a random port (and info port) for the master's region server, or document how to do a pseudo-distributed setup in the book. [~jxiang] wdyt? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11604) Disable co-locating meta/master by default
[ https://issues.apache.org/jira/browse/HBASE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-11604: Status: Open (was: Patch Available) Disable co-locating meta/master by default -- Key: HBASE-11604 URL: https://issues.apache.org/jira/browse/HBASE-11604 Project: HBase Issue Type: Task Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 1.0.0 Attachments: hbase-11604.patch To avoid possible confusing, it's better to keep the original deployment scheme in 1.0. ZK-less region assignment is off by default in 1.0 already. We should, by default, not assign any region to master or backup master. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11604) Disable co-locating meta/master by default
[ https://issues.apache.org/jira/browse/HBASE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089409#comment-14089409 ] Jimmy Xiang commented on HBASE-11604: - Cancelled the patch for now. Just realized that this change may impact the standalone/pseudo distributed mode. Let me think about it more. Disable co-locating meta/master by default -- Key: HBASE-11604 URL: https://issues.apache.org/jira/browse/HBASE-11604 Project: HBase Issue Type: Task Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 1.0.0 Attachments: hbase-11604.patch To avoid possible confusing, it's better to keep the original deployment scheme in 1.0. ZK-less region assignment is off by default in 1.0 already. We should, by default, not assign any region to master or backup master. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11527) Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap.
[ https://issues.apache.org/jira/browse/HBASE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-11527: --- Attachment: HBASE-11527_addendum.patch I have missed considering the L2 block cache heap size in checking the size rule for max memstore heap vs min block cache heap and min memstore heap vs max block cache heap. This addendum is fixes that. Will give branch-1 patch also soon. Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap. - Key: HBASE-11527 URL: https://issues.apache.org/jira/browse/HBASE-11527 Project: HBase Issue Type: Bug Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-11527.patch, HBASE-11527_addendum.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11527) Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap.
[ https://issues.apache.org/jira/browse/HBASE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-11527: --- Status: Patch Available (was: Reopened) Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap. - Key: HBASE-11527 URL: https://issues.apache.org/jira/browse/HBASE-11527 Project: HBase Issue Type: Bug Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-11527.patch, HBASE-11527_addendum.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089418#comment-14089418 ] Anoop Sam John commented on HBASE-5699: --- My idea is like to make a multi WAL impl which helps write throughput as well as MTTR. The MTTR when we have the distributed log replay mode. If we can make sure to have region grouping policy in selecting the regions for a WAL in multi WAL area, we can try max to allocate all those regions to same RS on crash. So this RS can read this WAL and replay locally. The distributed log replay batch calls has not to go over RPC .. Lots of Qs and corner cases there. But we can discuss more on and try to make it better. Run with 1 WAL in HRegionServer - Key: HBASE-5699 URL: https://issues.apache.org/jira/browse/HBASE-5699 Project: HBase Issue Type: Improvement Components: Performance Reporter: binlijin Assignee: Li Pi Priority: Critical Attachments: PerfHbase.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11679) Replace HTable with HTableInterface where backwards-compatible
[ https://issues.apache.org/jira/browse/HBASE-11679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089434#comment-14089434 ] Carter commented on HBASE-11679: I'll merge down and confirm there are no deltas in the test results between it and master. That shouldnt hold up Enis from try taking a look. Replace HTable with HTableInterface where backwards-compatible -- Key: HBASE-11679 URL: https://issues.apache.org/jira/browse/HBASE-11679 Project: HBase Issue Type: Improvement Reporter: Carter Assignee: Carter Attachments: HBASE_11679.patch, HBASE_11679.patch This is a refactor to move more of the code towards using interfaces for proper encapsulation of logic. The amount of code touched is large, but it should be fairly easy to review. It changes variable declarations from HTable to HTableInterface where the following holds: # The declaration being updated won't break assignment # The declaration change does not break the compile (eg trying to access non-interface methods) The two main situations are to change something like this: {code} HTable h = new HTable(c, tn); {code} to {code} HTableInterface h = new HTable(c, tn); {code} and this: {code} public void doSomething(HTable h) { ... } {code} to this: {code} public void doSomething(HTableInterface h) { ... } {code} This gets most of the obvious cases out of the way and prepares for more complicated interface refactors in the future. In method signatures, I changed parameters, but did _not_ change any public or protected method return values, since that would violate criteria #1 above and break compatibility. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089439#comment-14089439 ] Sean Busbey commented on HBASE-5699: [~anoop.hbase], that sounds like a combination of this and the ideas in HBASE-8610? Run with 1 WAL in HRegionServer - Key: HBASE-5699 URL: https://issues.apache.org/jira/browse/HBASE-5699 Project: HBase Issue Type: Improvement Components: Performance Reporter: binlijin Assignee: Li Pi Priority: Critical Attachments: PerfHbase.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11696) Make CombinedBlockCache resizable.
Anoop Sam John created HBASE-11696: -- Summary: Make CombinedBlockCache resizable. Key: HBASE-11696 URL: https://issues.apache.org/jira/browse/HBASE-11696 Project: HBase Issue Type: Improvement Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 HBASE-5349 adds auto tuning of memstore heap and block cache heap. Block cache needs to be resizable in order for this. CombinedBlockCache is not marked resizable now. We can make this. On resize the L1 cache (ie. LRU cache) can get resized. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089464#comment-14089464 ] Anoop Sam John commented on HBASE-5699: --- Ya collective ideas around MultiWAL Run with 1 WAL in HRegionServer - Key: HBASE-5699 URL: https://issues.apache.org/jira/browse/HBASE-5699 Project: HBase Issue Type: Improvement Components: Performance Reporter: binlijin Assignee: Li Pi Priority: Critical Attachments: PerfHbase.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11695) PeriodicFlusher and WakeFrequency issues
[ https://issues.apache.org/jira/browse/HBASE-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-11695: -- Attachment: 11695-trunk.txt Trivial patch, this alleviate the storms a bit, and also avoid running the chore more often than makes sense. PeriodicFlusher and WakeFrequency issues Key: HBASE-11695 URL: https://issues.apache.org/jira/browse/HBASE-11695 Project: HBase Issue Type: Bug Affects Versions: 0.94.21 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Critical Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 Attachments: 11695-trunk.txt We just ran into a flush storm caused by the PeriodicFlusher. Many memstore became eligible for flushing at exactly the same time, the effect we've seen is that the exact same region was flushed multiple times, because the flusher wakes up too often (every 10s). The jitter of 20s is larger than that and it takes some time to actually flush the memstore. Here's one example. We've seen 100's of these, monopolizing the flush queue and preventing important flushes from happening. {code} 06-Aug-2014 20:11:56 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 13449 06-Aug-2014 20:12:06 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 14060 {code} So we need to increase the period of the PeriodicFlusher to at least the random jitter, also increase the default random jitter (20s does not help with many regions). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089487#comment-14089487 ] ramkrishna.s.vasudevan commented on HBASE-5699: --- Grouping based on region should in itself be a pluggable module because a simple thing could be just based on a specific factor (like group every 5 regions) or could be based on names. To start with we could do simple grouping. bq.we can try max to allocate all those regions to same RS on crash. So this RS can read this WAL and replay locally. To replay locally we should avoid the RPC itself totally? Is it possible in the new distributed log replay? It tries to create do table.batchmutate() right. Need to see the code to confirm this once. Run with 1 WAL in HRegionServer - Key: HBASE-5699 URL: https://issues.apache.org/jira/browse/HBASE-5699 Project: HBase Issue Type: Improvement Components: Performance Reporter: binlijin Assignee: Li Pi Priority: Critical Attachments: PerfHbase.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11696) Make CombinedBlockCache resizable.
[ https://issues.apache.org/jira/browse/HBASE-11696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089494#comment-14089494 ] stack commented on HBASE-11696: --- On CBC, L1 is META blocks only. I could imagine it running a while and then ergonomics asking its usage is and then shrinking down the L1 so it was the size of the META block cache only. It could check back on occasion to see if evictions. If so, size up the L1 until evictions go to zero again. This would be sweet. Make CombinedBlockCache resizable. -- Key: HBASE-11696 URL: https://issues.apache.org/jira/browse/HBASE-11696 Project: HBase Issue Type: Improvement Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 HBASE-5349 adds auto tuning of memstore heap and block cache heap. Block cache needs to be resizable in order for this. CombinedBlockCache is not marked resizable now. We can make this. On resize the L1 cache (ie. LRU cache) can get resized. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11695) PeriodicFlusher and WakeFrequency issues
[ https://issues.apache.org/jira/browse/HBASE-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089498#comment-14089498 ] stack commented on HBASE-11695: --- Go for it (presuming you've tried it). PeriodicFlusher and WakeFrequency issues Key: HBASE-11695 URL: https://issues.apache.org/jira/browse/HBASE-11695 Project: HBase Issue Type: Bug Affects Versions: 0.94.21 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Critical Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 Attachments: 11695-trunk.txt We just ran into a flush storm caused by the PeriodicFlusher. Many memstore became eligible for flushing at exactly the same time, the effect we've seen is that the exact same region was flushed multiple times, because the flusher wakes up too often (every 10s). The jitter of 20s is larger than that and it takes some time to actually flush the memstore. Here's one example. We've seen 100's of these, monopolizing the flush queue and preventing important flushes from happening. {code} 06-Aug-2014 20:11:56 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 13449 06-Aug-2014 20:12:06 [regionserver60020.periodicFlusher] INFO org.apache.hadoop.hbase.regionserver.HRegionServer[1397]-- regionserver60020.periodicFlusher requesting flush for region tsdb,\x00\x00\x0AO\xCF* \x00\x00\x01\x00\x01\x1F\x00\x00\x03\x00\x00\x0C,1340147003629.ef4a680b962592de910d0fdeb376dfc2. after a delay of 14060 {code} So we need to increase the period of the PeriodicFlusher to at least the random jitter, also increase the default random jitter (20s does not help with many regions). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11697) Improve the 'Too many blocks' message on UI blockcache status page
[ https://issues.apache.org/jira/browse/HBASE-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-11697: --- Attachment: TooManyBlocks.png Improve the 'Too many blocks' message on UI blockcache status page -- Key: HBASE-11697 URL: https://issues.apache.org/jira/browse/HBASE-11697 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Priority: Minor Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: TooManyBlocks.png If metrics calculations over blockcache contents stopped after examining hbase.ui.blockcache.by.file.max items, the UI will put up a message. However, this notion of too many blocks / fullness refers to structures used for calculating blockcache metrics. See BlockCacheUtil. We should improve this message so it does not leave a user the impression the blockcache may be in a bad state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089495#comment-14089495 ] Anoop Sam John commented on HBASE-5699: --- Yes, the grouping logic should be a pluggable module. The grouping can be per table regions wise or on all regions. It should be inline with balancing strategy. (per table or not) bq.To replay locally we should avoid the RPC itself totally? Is it possible in the new distributed log replay? Have not checked deeply with code. These are like high level thoughts only. We can check more. If we can avoid RPCs in the replay that would be great IMO. Run with 1 WAL in HRegionServer - Key: HBASE-5699 URL: https://issues.apache.org/jira/browse/HBASE-5699 Project: HBase Issue Type: Improvement Components: Performance Reporter: binlijin Assignee: Li Pi Priority: Critical Attachments: PerfHbase.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11697) Improve the 'Too many blocks' message on UI blockcache status page
Andrew Purtell created HBASE-11697: -- Summary: Improve the 'Too many blocks' message on UI blockcache status page Key: HBASE-11697 URL: https://issues.apache.org/jira/browse/HBASE-11697 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Priority: Minor Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: TooManyBlocks.png If metrics calculations over blockcache contents stopped after examining hbase.ui.blockcache.by.file.max items, the UI will put up a message. However, this notion of too many blocks / fullness refers to structures used for calculating blockcache metrics. See BlockCacheUtil. We should improve this message so it does not leave a user the impression the blockcache may be in a bad state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11697) Improve the 'Too many blocks' message on UI blockcache status page
[ https://issues.apache.org/jira/browse/HBASE-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089501#comment-14089501 ] stack commented on HBASE-11697: --- You are right Andrew. The message was meant to convey Too many blocks to show in the UI without overwhelmingnothing wrong w/ your blockcache Improve the 'Too many blocks' message on UI blockcache status page -- Key: HBASE-11697 URL: https://issues.apache.org/jira/browse/HBASE-11697 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Priority: Minor Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: TooManyBlocks.png If metrics calculations over blockcache contents stopped after examining hbase.ui.blockcache.by.file.max items, the UI will put up a message. However, this notion of too many blocks / fullness refers to structures used for calculating blockcache metrics. See BlockCacheUtil. We should improve this message so it does not leave a user the impression the blockcache may be in a bad state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11527) Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap.
[ https://issues.apache.org/jira/browse/HBASE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089507#comment-14089507 ] stack commented on HBASE-11527: --- [~anoop.hbase] This calculation is only for the onheap right? patch looks good but how about a test? Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap. - Key: HBASE-11527 URL: https://issues.apache.org/jira/browse/HBASE-11527 Project: HBase Issue Type: Bug Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-11527.patch, HBASE-11527_addendum.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11527) Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap.
[ https://issues.apache.org/jira/browse/HBASE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089522#comment-14089522 ] Anoop Sam John commented on HBASE-11527: Yes on heap. I can add a test to TestHeapMemoryManager where L2 heap cache is also present. Let me check. Will give patch tomorrow boss. Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap. - Key: HBASE-11527 URL: https://issues.apache.org/jira/browse/HBASE-11527 Project: HBase Issue Type: Bug Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-11527.patch, HBASE-11527_addendum.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11696) Make CombinedBlockCache resizable.
[ https://issues.apache.org/jira/browse/HBASE-11696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089529#comment-14089529 ] Anoop Sam John commented on HBASE-11696: bq.On CBC, L1 is META blocks only. META alone? We have HCD#setCacheDataInL1(boolean value) right? Not gone through the entire code flow of that. Correct me if I wrong. bq. I could imagine it running a while and then ergonomics asking its usage is and then shrinking down the L1 so it was the size of the META block cache only. It could check back on occasion to see if evictions. If so, size up the L1 until evictions go to zero again. Ya the tuner framework will do this work. Make CombinedBlockCache resizable. -- Key: HBASE-11696 URL: https://issues.apache.org/jira/browse/HBASE-11696 Project: HBase Issue Type: Improvement Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 HBASE-5349 adds auto tuning of memstore heap and block cache heap. Block cache needs to be resizable in order for this. CombinedBlockCache is not marked resizable now. We can make this. On resize the L1 cache (ie. LRU cache) can get resized. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11527) Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap.
[ https://issues.apache.org/jira/browse/HBASE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089539#comment-14089539 ] Hadoop QA commented on HBASE-11527: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660394/HBASE-11527_addendum.patch against trunk revision . ATTACHMENT ID: 12660394 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10339//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10339//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10339//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10339//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10339//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10339//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10339//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10339//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10339//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10339//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10339//console This message is automatically generated. Cluster free memory limit check should consider L2 block cache size also when L2 cache is onheap. - Key: HBASE-11527 URL: https://issues.apache.org/jira/browse/HBASE-11527 Project: HBase Issue Type: Bug Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-11527.patch, HBASE-11527_addendum.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11553) Abstract visibility label related services into an interface
[ https://issues.apache.org/jira/browse/HBASE-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089542#comment-14089542 ] Anoop Sam John commented on HBASE-11553: Pls see ExpAsStringVisibilityLabelServiceImpl for an alternate impl for VisibilityLabelService. Abstract visibility label related services into an interface Key: HBASE-11553 URL: https://issues.apache.org/jira/browse/HBASE-11553 Project: HBase Issue Type: Improvement Components: security Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11553.patch, HBASE-11553.patch, HBASE-11553_V2.patch, HBASE-11553_V3.patch - storage and retrieval of label dictionary and authentication sets - marshalling and unmarshalling of visibility expression representations in operation attributes and cell tags - management of assignment of authorizations to principals This will allow us to introduce additional serde implementations for visibility expressions, for example storing as strings in some places and compressed/tokenized representation in others in order to support additional use cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6626) Add a chapter on HDFS in the troubleshooting section of the HBase reference guide.
[ https://issues.apache.org/jira/browse/HBASE-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089578#comment-14089578 ] Jonathan Hsieh commented on HBASE-6626: --- I'm taking a look. Add a chapter on HDFS in the troubleshooting section of the HBase reference guide. -- Key: HBASE-6626 URL: https://issues.apache.org/jira/browse/HBASE-6626 Project: HBase Issue Type: Improvement Components: documentation Affects Versions: 0.95.2 Reporter: Nicolas Liochon Assignee: Misty Stanley-Jones Priority: Blocker Attachments: HBASE-6626.patch, troubleshooting.txt I looked mainly at the major failure case, but here is what I have: New sub chapter in the existing chapter Troubleshooting and Debugging HBase: HDFS HBASE 1) HDFS HBase 2) Connection related settings 2.1) Number of retries 2.2) Timeouts 3) Log samples 1) HDFS HBase HBase uses HDFS to store its HFile, i.e. the core HBase files and the Write-Ahead-Logs, i.e. the files that will be used to restore the data after a crash. In both cases, the reliability of HBase comes from the fact that HDFS writes the data to multiple locations. To be efficient, HBase needs the data to be available locally, hence it's highly recommended to have the HDFS datanode on the same machines as the HBase Region Servers. Detailed information on how HDFS works can be found at [1]. Important features are: - HBase is a client application of HDFS, i.e. uses the HDFS DFSClient class. This class can appears in HBase logs with other HDFS client related logs. - Some HDFS settings are HDFS-server-side, i.e. must be set on the HDFS side, while some other are HDFS-client-side, i.e. must be set in HBase, while some other must be set in both places. - the HDFS writes are pipelined from one datanode to another. When writing, there are communications between: - HBase and HDFS namenode, through the HDFS client classes. - HBase and HDFS datanodes, through the HDFS client classes. - HDFS datanode between themselves: issues on these communications are in HDFS logs, not HBase. HDFS writes are always local when possible. As a consequence, there should not be much write error in HBase Region Servers: they write to the local datanode. If this datanode can't replicate the blocks, it will appear in its logs, not in the region servers logs. - datanodes can be contacted through the ipc.Client interface (once again this class can shows up in HBase logs) and the data transfer interface (usually shows up as the DataNode class in the HBase logs). There are on different ports (defaults being: 50010 and 50020). - To understand exactly what's going on, you must look that the HDFS log files as well: HBase logs represent the client side. - With the default setting, HDFS needs 630s to mark a datanode as dead. For this reason, this node will still be tried by HBase or by other datanodes when writing and reading until HDFS definitively decides it's dead. This will add some extras lines in the logs. This monitoring is performed by the NameNode. - The HDFS clients (i.e. HBase using HDFS client code) don't fully rely on the NameNode, but can mark temporally a node as dead if they had an error when they tried to use it. 2) Settings for retries and timeouts 2.1) Retries ipc.client.connect.max.retries Default 10 Indicates the number of retries a client will make to establish a server connection. Not taken into account if the error is a SocketTimeout. In this case the number of retries is 45 (fixed on branch, HADOOP-7932 or in HADOOP-7397). For SASL, the number of retries is hard-coded to 15. Can be increased, especially if the socket timeouts have been lowered. ipc.client.connect.max.retries.on.timeouts Default 45 If you have HADOOP-7932, max number of retries on timeout. Counter is different than ipc.client.connect.max.retries so if you mix the socket errors you will get 55 retries with the default values. Could be lowered, once it is available. With HADOOP-7397 ipc.client.connect.max.retries is reused so there would be 10 tries. dfs.client.block.write.retries Default 3 Number of tries for the client when writing a block. After a failure, will connect to the namenode a get a new location, sending the list of the datanodes already tried without success. Could be increased, especially if the socket timeouts have been lowered. See HBASE-6490. dfs.client.block.write.locateFollowingBlock.retries Default 5 Number of retries to the namenode when the client got NotReplicatedYetException, i.e. the existing nodes of the files are not yet replicated to dfs.replication.min. This should not impact HBase, as dfs.replication.min
[jira] [Commented] (HBASE-11611) Clean up ZK-based region assignment
[ https://issues.apache.org/jira/browse/HBASE-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089605#comment-14089605 ] Hudson commented on HBASE-11611: SUCCESS: Integrated in HBase-TRUNK #5380 (See [https://builds.apache.org/job/HBase-TRUNK/5380/]) HBASE-11611 Addendum to fix hanging tests (jxiang: rev 041a2ba948e7aa04d814479b7ed81bc47ce14332) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Clean up ZK-based region assignment --- Key: HBASE-11611 URL: https://issues.apache.org/jira/browse/HBASE-11611 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 2.0.0 Attachments: hbase-11611.addendum, hbase-11611.patch, hbase-11611_v1.patch, hbase-11611_v2.patch We can clean up the ZK-based region assignment code and use the ZK-less one in the master branch, to make the code easier to understand and maintain. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Work started] (HBASE-10378) Divide HLog interface into User and Implementor specific interfaces
[ https://issues.apache.org/jira/browse/HBASE-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-10378 started by Sean Busbey. Divide HLog interface into User and Implementor specific interfaces --- Key: HBASE-10378 URL: https://issues.apache.org/jira/browse/HBASE-10378 Project: HBase Issue Type: Sub-task Components: wal Reporter: Himanshu Vashishtha Assignee: Sean Busbey Attachments: 10378-1.patch, 10378-2.patch HBASE-5937 introduces the HLog interface as a first step to support multiple WAL implementations. This interface is a good start, but has some limitations/drawbacks in its current state, such as: 1) There is no clear distinction b/w User and Implementor APIs, and it provides APIs both for WAL users (append, sync, etc) and also WAL implementors (Reader/Writer interfaces, etc). There are APIs which are very much implementation specific (getFileNum, etc) and a user such as a RegionServer shouldn't know about it. 2) There are about 14 methods in FSHLog which are not present in HLog interface but are used at several places in the unit test code. These tests typecast HLog to FSHLog, which makes it very difficult to test multiple WAL implementations without doing some ugly checks. I'd like to propose some changes in HLog interface that would ease the multi WAL story: 1) Have two interfaces WAL and WALService. WAL provides APIs for implementors. WALService provides APIs for users (such as RegionServer). 2) A skeleton implementation of the above two interface as the base class for other WAL implementations (AbstractWAL). It provides required fields for all subclasses (fs, conf, log dir, etc). Make a minimal set of test only methods and add this set in AbstractWAL. 3) HLogFactory returns a WALService reference when creating a WAL instance; if a user need to access impl specific APIs (there are unit tests which get WAL from a HRegionServer and then call impl specific APIs), use AbstractWAL type casting, 4) Make TestHLog abstract and let all implementors provide their respective test class which extends TestHLog (TestFSHLog, for example). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11685) Incr/decr on the reference count of HConnectionImplementation need be atomic
[ https://issues.apache.org/jira/browse/HBASE-11685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089623#comment-14089623 ] Andrew Purtell commented on HBASE-11685: There's a spelling error in the log message. Please change nagative to negative. Is there any way to identify for which connection the ref count went negative? Would aid in debugging. Otherwise lgtm. Incr/decr on the reference count of HConnectionImplementation need be atomic - Key: HBASE-11685 URL: https://issues.apache.org/jira/browse/HBASE-11685 Project: HBase Issue Type: Bug Components: Client Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 2.0.0 Attachments: HBASE-11685-trunk-v1.diff, HBASE-11685-trunk-v2.diff, HBASE-11685-trunk-v3.diff Currently, the incr/decr operation on the ref count of HConnectionImplementation are not atomic. This may cause that the ref count always be larger than 0 and the connection never be closed. {code} /** * Increment this client's reference count. */ void incCount() { ++refCount; } /** * Decrement this client's reference count. */ void decCount() { if (refCount 0) { --refCount; } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11553) Abstract visibility label related services into an interface
[ https://issues.apache.org/jira/browse/HBASE-11553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089631#comment-14089631 ] Andrew Purtell commented on HBASE-11553: Can you update RB with the latest patch Anoop? There are a lot of lines of change here. Abstract visibility label related services into an interface Key: HBASE-11553 URL: https://issues.apache.org/jira/browse/HBASE-11553 Project: HBase Issue Type: Improvement Components: security Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11553.patch, HBASE-11553.patch, HBASE-11553_V2.patch, HBASE-11553_V3.patch - storage and retrieval of label dictionary and authentication sets - marshalling and unmarshalling of visibility expression representations in operation attributes and cell tags - management of assignment of authorizations to principals This will allow us to introduce additional serde implementations for visibility expressions, for example storing as strings in some places and compressed/tokenized representation in others in order to support additional use cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11696) Make CombinedBlockCache resizable.
[ https://issues.apache.org/jira/browse/HBASE-11696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089641#comment-14089641 ] stack commented on HBASE-11696: --- META is not meta table, its INDEX+BLOOM blocks. Make CombinedBlockCache resizable. -- Key: HBASE-11696 URL: https://issues.apache.org/jira/browse/HBASE-11696 Project: HBase Issue Type: Improvement Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 HBASE-5349 adds auto tuning of memstore heap and block cache heap. Block cache needs to be resizable in order for this. CombinedBlockCache is not marked resizable now. We can make this. On resize the L1 cache (ie. LRU cache) can get resized. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11678) BucketCache ramCache fills heap after running a few hours
[ https://issues.apache.org/jira/browse/HBASE-11678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-11678: -- Attachment: 11678v3.txt Why ain't this running? Retry. BucketCache ramCache fills heap after running a few hours - Key: HBASE-11678 URL: https://issues.apache.org/jira/browse/HBASE-11678 Project: HBase Issue Type: Bug Components: BlockCache Affects Versions: 0.99.0, 0.98.5, 2.0.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: 0001-When-we-failed-add-an-entry-failing-with-a-CacheFull.patch, 11678v2.txt, 11678v2.txt, 11678v3.txt, 11678v3.txt, gc_crash_unevenblocks_with_lots_of_evictions.png, gc_over_12_hours_unevenblocks_with_lots_of_evictions.png Testing BucketCache, my heap filled after running for hours. Dumping heap, culprit is the ramCache Map in BucketCache. Tried running with more writer threads but made no difference. Trying to figure now how our accounting is going wonky. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11661) Quickstart chapter claims standalone mode has multiple processes
[ https://issues.apache.org/jira/browse/HBASE-11661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-11661: -- Resolution: Fixed Fix Version/s: 2.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed to master. Thanks [~busbey+li...@cloudera.com] ([~misty] for review) Quickstart chapter claims standalone mode has multiple processes Key: HBASE-11661 URL: https://issues.apache.org/jira/browse/HBASE-11661 Project: HBase Issue Type: Bug Components: documentation Reporter: Sean Busbey Assignee: Sean Busbey Priority: Minor Fix For: 2.0.0 Attachments: HBASE_11661-v1.patch The quickstart chapter on launching a standalone hbase says to validate the launch by checking on both the HMaster and HRegionServer processes: {quote} 4. The bin/start-hbase.sh script is provided as a convenient way to start HBase. Issue the command, and if all goes well, a message is logged to standard output showing that HBase started successfully. You can use the jps command to verify that you have one running process called HMaster and at least one called HRegionServer. {quote} This is incorrect. In standalone mode there is only a single HMaster process. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11659) Region state RPC call is not idempotent
[ https://issues.apache.org/jira/browse/HBASE-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089654#comment-14089654 ] stack commented on HBASE-11659: --- [~jxiang] Commit? Region state RPC call is not idempotent --- Key: HBASE-11659 URL: https://issues.apache.org/jira/browse/HBASE-11659 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Virag Kothari Assignee: Virag Kothari Attachments: HBASE-11659.patch Here is the scenario on 0.98 with zk-less assignment The master gets an OPEN RPC call from region server. So, it moves the region state from PENDING_OPEN to OPEN. But, the call timeouts on the region server and region server retries sending the OPEN call. However, now the master throws an Exception saying the region is not PENDING_OPEN. So, the region servers aborts the region on receiving that exception and sends FAILED_OPEN to master. But the master cannot change its state from FAILED_OPEN to OPEN, so eventually the master keeps the state as OPEN while the actual region is no longer open on region server. The master should not throw an exception on receiving OPEN RPC calls multiple times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11333) Remove deprecated class MetaMigrationConvertingToPB
[ https://issues.apache.org/jira/browse/HBASE-11333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-11333: -- Attachment: 11333v2.txt Patch to purge this stuff from master branch. Remove deprecated class MetaMigrationConvertingToPB --- Key: HBASE-11333 URL: https://issues.apache.org/jira/browse/HBASE-11333 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.99.0 Reporter: Mikhail Antonov Assignee: Mikhail Antonov Priority: Trivial Fix For: 0.99.0 Attachments: 11333v2.txt, HBASE-11333.patch MetaMigrationConvertingToPB is marked deprecated and to be deleted next major release after 0.96. Is that the time? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11613) get_counter shell command is not displaying the result for counter columns.
[ https://issues.apache.org/jira/browse/HBASE-11613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089684#comment-14089684 ] stack commented on HBASE-11613: --- [~sreenivasulureddy] You see JMs comment above asking for info? get_counter shell command is not displaying the result for counter columns. - Key: HBASE-11613 URL: https://issues.apache.org/jira/browse/HBASE-11613 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.98.3 Reporter: Y. SREENIVASULU REDDY Priority: Minor perform the following opertions in HBase shell prompt. 1. create a table with one column family. 2. insert some amount of data into the table. 3. then perform increment operation on any column qualifier. eg: incr 't', 'r1', 'f:c1' 4. then queried the get counter query, it is throwing nocounter found message to the user. {code} eg: hbase(main):010:0 get_counter 't', 'r1', 'f', 'c1' No counter found at specified coordinates {code} = and wrong message is throwing to user, while executing the get_counter query. {code} hbase(main):009:0 get_counter 't', 'r1', 'f' ERROR: wrong number of arguments (3 for 4) Here is some help for this command: Return a counter cell value at specified table/row/column coordinates. A cell cell should be managed with atomic increment function oh HBase and the data should be binary encoded. Example: hbase get_counter 'ns1:t1', 'r1', 'c1' hbase get_counter 't1', 'r1', 'c1' The same commands also can be run on a table reference. Suppose you had a reference t to table 't1', the corresponding command would be: hbase t.get_counter 'r1', 'c1' {code} {code} problem: In example they given 3 arguments but asking 4 arguments If run with 3 arguments it will throw error. if run with 4 arguments No counter found at specified coordinates message is throwing even though counter is specified. {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11333) Remove deprecated class MetaMigrationConvertingToPB
[ https://issues.apache.org/jira/browse/HBASE-11333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089687#comment-14089687 ] Hadoop QA commented on HBASE-11333: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660442/11333v2.txt against trunk revision . ATTACHMENT ID: 12660442 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10341//console This message is automatically generated. Remove deprecated class MetaMigrationConvertingToPB --- Key: HBASE-11333 URL: https://issues.apache.org/jira/browse/HBASE-11333 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.99.0 Reporter: Mikhail Antonov Assignee: Mikhail Antonov Priority: Trivial Fix For: 0.99.0 Attachments: 11333v2.txt, HBASE-11333.patch MetaMigrationConvertingToPB is marked deprecated and to be deleted next major release after 0.96. Is that the time? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11698) TestFullLogReconstruction has duplicate config setting
Sean Busbey created HBASE-11698: --- Summary: TestFullLogReconstruction has duplicate config setting Key: HBASE-11698 URL: https://issues.apache.org/jira/browse/HBASE-11698 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.98.5 Reporter: Sean Busbey Priority: Trivial The 0.98 branch's version of TestFullLogReconstruction has a duplicate config setting during setupBeforeClass {code} 54// faster failover with cluster.shutdown();fs.close() idiom 55c.setInt(hbase.ipc.client.connect.max.retries, 1); 56c.setInt(hbase.ipc.client.connect.max.retries, 1); {code} the 0.98.4 release has a line for setting this config and for the HDFS version. the branch-1 and master versions of this set of patches have only the setting of the hbase one, so I think that's the correct behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11659) Region state RPC call is not idempotent
[ https://issues.apache.org/jira/browse/HBASE-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089707#comment-14089707 ] Virag Kothari commented on HBASE-11659: --- Let me update based on Jimmy's comments and also will add a unit test. Region state RPC call is not idempotent --- Key: HBASE-11659 URL: https://issues.apache.org/jira/browse/HBASE-11659 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Virag Kothari Assignee: Virag Kothari Attachments: HBASE-11659.patch Here is the scenario on 0.98 with zk-less assignment The master gets an OPEN RPC call from region server. So, it moves the region state from PENDING_OPEN to OPEN. But, the call timeouts on the region server and region server retries sending the OPEN call. However, now the master throws an Exception saying the region is not PENDING_OPEN. So, the region servers aborts the region on receiving that exception and sends FAILED_OPEN to master. But the master cannot change its state from FAILED_OPEN to OPEN, so eventually the master keeps the state as OPEN while the actual region is no longer open on region server. The master should not throw an exception on receiving OPEN RPC calls multiple times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11598) Add simple rpc throttling
[ https://issues.apache.org/jira/browse/HBASE-11598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089720#comment-14089720 ] Matteo Bertozzi commented on HBASE-11598: - switched to a two methods only: set_quota and list_quotas to add/replace/update a quota settings you use set_quota with all the options to tune the a particular quota type. {noformat} hbase set_quota TYPE = THROTTLE, USER = 'bob', LIMIT = '10req/sec' hbase set_quota TYPE = THROTTLE, USER = 'bob', TABLE = 't1', LIMIT = '64M/min' hbase set_quota TYPE = THROTTLE, TABLE = 't2', LIMIT = '100req/sec' hbase set_quota TYPE = THROTTLE, NAMESPACE = 'n1', LIMIT = '100M/min' {noformat} to remove something you just set the limit to NONE {noformat} hbase set_quota TYPE = THROTTLE, USER = 'bob', LIMIT = NONE {noformat} To list the quotas you use the list_quotas (which is a scanner) and you get all the details. you can specify filters {noformat} hbase list_quotas OWNER QUOTAS NAMESPACE = n1 {TYPE = THROTTLE, THROTTLE_TYPE = REQUEST_SIZE, LIMIT = 100M/min, SCOPE =MACHINE} TABLE = t2 {TYPE = THROTTLE, THROTTLE_TYPE = REQUEST_NUMBER, LIMIT = 100req/sec, SCOPE = MACHINE} USER = bob {TYPE = THROTTLE, THROTTLE_TYPE = REQUEST_NUMBER, LIMIT = 10req/sec, SCOPE = MACHINE} USER = bob, TABLE = t1{TYPE = THROTTLE, THROTTLE_TYPE = REQUEST_SIZE, LIMIT = 64M/min, SCOPE = MACHINE} 4 row(s) in 0.1360 seconds hbase list_quotas USER = 'bo.*' OWNER QUOTAS USER = bob {TYPE = THROTTLE, THROTTLE_TYPE = REQUEST_NUMBER, LIMIT = 10req/sec, SCOPE = MACHINE} USER = bob, TABLE = t1{TYPE = THROTTLE, THROTTLE_TYPE = REQUEST_SIZE, LIMIT = 64M/min, SCOPE = MACHINE} 2 row(s) in 0.0210 seconds {noformat} Add simple rpc throttling - Key: HBASE-11598 URL: https://issues.apache.org/jira/browse/HBASE-11598 Project: HBase Issue Type: New Feature Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Fix For: 1.0.0, 2.0.0 Add a simple version of rpc throttling. (by simple I mean something that requires less changes as possible to the core) The idea is to add a hbase:quota table to store the user/table quota information. Add a couple of API on the client like throttleUser() and throttleTable() and on the server side before executing the request we check the quota, if not an exception is thrown. The quota will be per-machine. There will be a flag QuotaScope that will be used in the future to specify the quota at cluster level instead of per machine. (A limit of 100req/min means that each machine can execute 100req/min with a scope per-machine). This will be the first cut, simple solution that requires verify few changes to the core. Later on we can make the client aware of the ThrottlingException and deal with it in a smarter way. Also we need to change a bit the RPC code to be able to yield the operation if the quota will be available not to far in the future, and avoid going back to the client for few seconds. REVIEW BOARD: https://reviews.apache.org/r/23981 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11333) Remove deprecated class MetaMigrationConvertingToPB
[ https://issues.apache.org/jira/browse/HBASE-11333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-11333: -- Attachment: 11333v3.txt What I committed (just removing stuff). Remove deprecated class MetaMigrationConvertingToPB --- Key: HBASE-11333 URL: https://issues.apache.org/jira/browse/HBASE-11333 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.99.0 Reporter: Mikhail Antonov Assignee: Mikhail Antonov Priority: Trivial Fix For: 2.0.0 Attachments: 11333v2.txt, 11333v3.txt, HBASE-11333.patch MetaMigrationConvertingToPB is marked deprecated and to be deleted next major release after 0.96. Is that the time? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11333) Remove deprecated class MetaMigrationConvertingToPB
[ https://issues.apache.org/jira/browse/HBASE-11333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-11333: -- Resolution: Fixed Fix Version/s: (was: 0.99.0) 2.0.0 Status: Resolved (was: Patch Available) Pushed to master. Remove deprecated class MetaMigrationConvertingToPB --- Key: HBASE-11333 URL: https://issues.apache.org/jira/browse/HBASE-11333 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.99.0 Reporter: Mikhail Antonov Assignee: Mikhail Antonov Priority: Trivial Fix For: 2.0.0 Attachments: 11333v2.txt, 11333v3.txt, HBASE-11333.patch MetaMigrationConvertingToPB is marked deprecated and to be deleted next major release after 0.96. Is that the time? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11611) Clean up ZK-based region assignment
[ https://issues.apache.org/jira/browse/HBASE-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-11611: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) HBase master jenkins is blue again. Let's keep it so: Clean up ZK-based region assignment --- Key: HBASE-11611 URL: https://issues.apache.org/jira/browse/HBASE-11611 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 2.0.0 Attachments: hbase-11611.addendum, hbase-11611.patch, hbase-11611_v1.patch, hbase-11611_v2.patch We can clean up the ZK-based region assignment code and use the ZK-less one in the master branch, to make the code easier to understand and maintain. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11697) Improve the 'Too many blocks' message on UI blockcache status page
[ https://issues.apache.org/jira/browse/HBASE-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov updated HBASE-11697: Attachment: HBASE-11697.patch Something like that? Improve the 'Too many blocks' message on UI blockcache status page -- Key: HBASE-11697 URL: https://issues.apache.org/jira/browse/HBASE-11697 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Priority: Minor Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11697.patch, TooManyBlocks.png If metrics calculations over blockcache contents stopped after examining hbase.ui.blockcache.by.file.max items, the UI will put up a message. However, this notion of too many blocks / fullness refers to structures used for calculating blockcache metrics. See BlockCacheUtil. We should improve this message so it does not leave a user the impression the blockcache may be in a bad state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11659) Region state RPC call is not idempotent
[ https://issues.apache.org/jira/browse/HBASE-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089770#comment-14089770 ] Jimmy Xiang commented on HBASE-11659: - Cool, a unit test really helps. Thanks. Region state RPC call is not idempotent --- Key: HBASE-11659 URL: https://issues.apache.org/jira/browse/HBASE-11659 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Virag Kothari Assignee: Virag Kothari Attachments: HBASE-11659.patch Here is the scenario on 0.98 with zk-less assignment The master gets an OPEN RPC call from region server. So, it moves the region state from PENDING_OPEN to OPEN. But, the call timeouts on the region server and region server retries sending the OPEN call. However, now the master throws an Exception saying the region is not PENDING_OPEN. So, the region servers aborts the region on receiving that exception and sends FAILED_OPEN to master. But the master cannot change its state from FAILED_OPEN to OPEN, so eventually the master keeps the state as OPEN while the actual region is no longer open on region server. The master should not throw an exception on receiving OPEN RPC calls multiple times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-6626) Add a chapter on HDFS in the troubleshooting section of the HBase reference guide.
[ https://issues.apache.org/jira/browse/HBASE-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089791#comment-14089791 ] Jonathan Hsieh commented on HBASE-6626: --- Lgtm misty. I'm going to commit this -- it is an improvement on the current vacuous state. We can add more improvements in follow up issues. committed to trunk/branch-1. Doesn't apply cleanly to 0.98 but i think we are ok just committing to docs to branch-1 master since that is where they are pulled from these days. Add a chapter on HDFS in the troubleshooting section of the HBase reference guide. -- Key: HBASE-6626 URL: https://issues.apache.org/jira/browse/HBASE-6626 Project: HBase Issue Type: Improvement Components: documentation Affects Versions: 0.95.2 Reporter: Nicolas Liochon Assignee: Misty Stanley-Jones Priority: Blocker Fix For: 0.99.0, 2.0.0 Attachments: HBASE-6626.patch, troubleshooting.txt I looked mainly at the major failure case, but here is what I have: New sub chapter in the existing chapter Troubleshooting and Debugging HBase: HDFS HBASE 1) HDFS HBase 2) Connection related settings 2.1) Number of retries 2.2) Timeouts 3) Log samples 1) HDFS HBase HBase uses HDFS to store its HFile, i.e. the core HBase files and the Write-Ahead-Logs, i.e. the files that will be used to restore the data after a crash. In both cases, the reliability of HBase comes from the fact that HDFS writes the data to multiple locations. To be efficient, HBase needs the data to be available locally, hence it's highly recommended to have the HDFS datanode on the same machines as the HBase Region Servers. Detailed information on how HDFS works can be found at [1]. Important features are: - HBase is a client application of HDFS, i.e. uses the HDFS DFSClient class. This class can appears in HBase logs with other HDFS client related logs. - Some HDFS settings are HDFS-server-side, i.e. must be set on the HDFS side, while some other are HDFS-client-side, i.e. must be set in HBase, while some other must be set in both places. - the HDFS writes are pipelined from one datanode to another. When writing, there are communications between: - HBase and HDFS namenode, through the HDFS client classes. - HBase and HDFS datanodes, through the HDFS client classes. - HDFS datanode between themselves: issues on these communications are in HDFS logs, not HBase. HDFS writes are always local when possible. As a consequence, there should not be much write error in HBase Region Servers: they write to the local datanode. If this datanode can't replicate the blocks, it will appear in its logs, not in the region servers logs. - datanodes can be contacted through the ipc.Client interface (once again this class can shows up in HBase logs) and the data transfer interface (usually shows up as the DataNode class in the HBase logs). There are on different ports (defaults being: 50010 and 50020). - To understand exactly what's going on, you must look that the HDFS log files as well: HBase logs represent the client side. - With the default setting, HDFS needs 630s to mark a datanode as dead. For this reason, this node will still be tried by HBase or by other datanodes when writing and reading until HDFS definitively decides it's dead. This will add some extras lines in the logs. This monitoring is performed by the NameNode. - The HDFS clients (i.e. HBase using HDFS client code) don't fully rely on the NameNode, but can mark temporally a node as dead if they had an error when they tried to use it. 2) Settings for retries and timeouts 2.1) Retries ipc.client.connect.max.retries Default 10 Indicates the number of retries a client will make to establish a server connection. Not taken into account if the error is a SocketTimeout. In this case the number of retries is 45 (fixed on branch, HADOOP-7932 or in HADOOP-7397). For SASL, the number of retries is hard-coded to 15. Can be increased, especially if the socket timeouts have been lowered. ipc.client.connect.max.retries.on.timeouts Default 45 If you have HADOOP-7932, max number of retries on timeout. Counter is different than ipc.client.connect.max.retries so if you mix the socket errors you will get 55 retries with the default values. Could be lowered, once it is available. With HADOOP-7397 ipc.client.connect.max.retries is reused so there would be 10 tries. dfs.client.block.write.retries Default 3 Number of tries for the client when writing a block. After a failure, will connect to the namenode a get a new location, sending the list of the datanodes already tried without success. Could be increased, especially if
[jira] [Updated] (HBASE-6626) Add a chapter on HDFS in the troubleshooting section of the HBase reference guide.
[ https://issues.apache.org/jira/browse/HBASE-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-6626: -- Resolution: Fixed Fix Version/s: 2.0.0 0.99.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Add a chapter on HDFS in the troubleshooting section of the HBase reference guide. -- Key: HBASE-6626 URL: https://issues.apache.org/jira/browse/HBASE-6626 Project: HBase Issue Type: Improvement Components: documentation Affects Versions: 0.95.2 Reporter: Nicolas Liochon Assignee: Misty Stanley-Jones Priority: Blocker Fix For: 0.99.0, 2.0.0 Attachments: HBASE-6626.patch, troubleshooting.txt I looked mainly at the major failure case, but here is what I have: New sub chapter in the existing chapter Troubleshooting and Debugging HBase: HDFS HBASE 1) HDFS HBase 2) Connection related settings 2.1) Number of retries 2.2) Timeouts 3) Log samples 1) HDFS HBase HBase uses HDFS to store its HFile, i.e. the core HBase files and the Write-Ahead-Logs, i.e. the files that will be used to restore the data after a crash. In both cases, the reliability of HBase comes from the fact that HDFS writes the data to multiple locations. To be efficient, HBase needs the data to be available locally, hence it's highly recommended to have the HDFS datanode on the same machines as the HBase Region Servers. Detailed information on how HDFS works can be found at [1]. Important features are: - HBase is a client application of HDFS, i.e. uses the HDFS DFSClient class. This class can appears in HBase logs with other HDFS client related logs. - Some HDFS settings are HDFS-server-side, i.e. must be set on the HDFS side, while some other are HDFS-client-side, i.e. must be set in HBase, while some other must be set in both places. - the HDFS writes are pipelined from one datanode to another. When writing, there are communications between: - HBase and HDFS namenode, through the HDFS client classes. - HBase and HDFS datanodes, through the HDFS client classes. - HDFS datanode between themselves: issues on these communications are in HDFS logs, not HBase. HDFS writes are always local when possible. As a consequence, there should not be much write error in HBase Region Servers: they write to the local datanode. If this datanode can't replicate the blocks, it will appear in its logs, not in the region servers logs. - datanodes can be contacted through the ipc.Client interface (once again this class can shows up in HBase logs) and the data transfer interface (usually shows up as the DataNode class in the HBase logs). There are on different ports (defaults being: 50010 and 50020). - To understand exactly what's going on, you must look that the HDFS log files as well: HBase logs represent the client side. - With the default setting, HDFS needs 630s to mark a datanode as dead. For this reason, this node will still be tried by HBase or by other datanodes when writing and reading until HDFS definitively decides it's dead. This will add some extras lines in the logs. This monitoring is performed by the NameNode. - The HDFS clients (i.e. HBase using HDFS client code) don't fully rely on the NameNode, but can mark temporally a node as dead if they had an error when they tried to use it. 2) Settings for retries and timeouts 2.1) Retries ipc.client.connect.max.retries Default 10 Indicates the number of retries a client will make to establish a server connection. Not taken into account if the error is a SocketTimeout. In this case the number of retries is 45 (fixed on branch, HADOOP-7932 or in HADOOP-7397). For SASL, the number of retries is hard-coded to 15. Can be increased, especially if the socket timeouts have been lowered. ipc.client.connect.max.retries.on.timeouts Default 45 If you have HADOOP-7932, max number of retries on timeout. Counter is different than ipc.client.connect.max.retries so if you mix the socket errors you will get 55 retries with the default values. Could be lowered, once it is available. With HADOOP-7397 ipc.client.connect.max.retries is reused so there would be 10 tries. dfs.client.block.write.retries Default 3 Number of tries for the client when writing a block. After a failure, will connect to the namenode a get a new location, sending the list of the datanodes already tried without success. Could be increased, especially if the socket timeouts have been lowered. See HBASE-6490. dfs.client.block.write.locateFollowingBlock.retries Default 5 Number of retries to the namenode when the client got NotReplicatedYetException, i.e. the existing
[jira] [Updated] (HBASE-11673) TestIOFencing#testFencingAroundCompactionAfterWALSync fails
[ https://issues.apache.org/jira/browse/HBASE-11673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-11673: --- Attachment: testFencingAroundCompactionAfterWALSync.tar.gz test output from Jenkins. TestIOFencing#testFencingAroundCompactionAfterWALSync fails --- Key: HBASE-11673 URL: https://issues.apache.org/jira/browse/HBASE-11673 Project: HBase Issue Type: Test Reporter: Qiang Tian Assignee: Sergey Soldatov Fix For: 2.0.0 Attachments: HBASE_11673-v1.patch, testFencingAroundCompactionAfterWALSync.tar.gz got several test failure on the latest build: {quote} [tianq@bdvm101 surefire-reports]$ ls -1t|grep Tests run * |grep FAILURE org.apache.hadoop.hbase.client.TestReplicasClient.txt:Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 38.706 sec FAILURE! org.apache.hadoop.hbase.master.TestMasterOperationsForRegionReplicas.txt:Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 30.669 sec FAILURE! org.apache.hadoop.hbase.regionserver.TestRegionReplicas.txt:Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 39.113 sec FAILURE! org.apache.hadoop.hbase.TestIOFencing.txt:Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 177.071 sec FAILURE! {quote} the first one: {quote} failure message=Timed out waiting for the region to flush type=java.lang.AssertionErrorjava.lang.AssertionError: Timed out waiting for the region to flush -at org.junit.Assert.fail(Assert.java:88) -at org.junit.Assert.assertTrue(Assert.java:41) -at org.apache.hadoop.hbase.TestIOFencing.doTest(TestIOFencing.java:291) -at org.apache.hadoop.hbase.TestIOFencing.testFencingAroundCompactionAfterWALSync(TestIOFencing.java:236) -at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) -at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) -at java.lang.reflect.Method.invoke(Method.java:606) {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HBASE-3361) Modularize Maven Structure for Tests
[ https://issues.apache.org/jira/browse/HBASE-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh reassigned HBASE-3361: - Assignee: Jonathan Hsieh (was: Misty Stanley-Jones) Modularize Maven Structure for Tests Key: HBASE-3361 URL: https://issues.apache.org/jira/browse/HBASE-3361 Project: HBase Issue Type: Improvement Components: documentation Reporter: Ed Kohlwey Assignee: Jonathan Hsieh Attachments: HBASE-3361.patch There's a few reasons to break tests out into their own module: 1. Allowing maven users to easily re-consume test utilities as part of a test package which doesn't pollute the runtime classpath 2. Putting integration tests (tests that create or require a cluster) in their own module allows users to easily rebuild and test the core of HBase without running long-running tests, reducing the developer iteration loop After some discussions with Stack on IRC, it sounds like there was some historic investigation of this which was abandoned because the module system was becoming too complex. I'd suggest that rather than trying to break out components all at once into their modules, evaluate creation of modules on a case-by-case basis and only create them when there's a significant use case justification. I created a sample of what I'm thinking about (based on the current trunk) and posted it on github git://github.com/ekohlwey/modularized-hbase.git -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-3361) Modularize Maven Structure for Tests
[ https://issues.apache.org/jira/browse/HBASE-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089807#comment-14089807 ] Jonathan Hsieh commented on HBASE-3361: --- i don't think this is correct or relevant any more (this is from 0.92/0.90 days before we modularized the maven build). I'm just going to close this won't fix. Modularize Maven Structure for Tests Key: HBASE-3361 URL: https://issues.apache.org/jira/browse/HBASE-3361 Project: HBase Issue Type: Improvement Components: documentation Reporter: Ed Kohlwey Assignee: Misty Stanley-Jones Attachments: HBASE-3361.patch There's a few reasons to break tests out into their own module: 1. Allowing maven users to easily re-consume test utilities as part of a test package which doesn't pollute the runtime classpath 2. Putting integration tests (tests that create or require a cluster) in their own module allows users to easily rebuild and test the core of HBase without running long-running tests, reducing the developer iteration loop After some discussions with Stack on IRC, it sounds like there was some historic investigation of this which was abandoned because the module system was becoming too complex. I'd suggest that rather than trying to break out components all at once into their modules, evaluate creation of modules on a case-by-case basis and only create them when there's a significant use case justification. I created a sample of what I'm thinking about (based on the current trunk) and posted it on github git://github.com/ekohlwey/modularized-hbase.git -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-3361) Modularize Maven Structure for Tests
[ https://issues.apache.org/jira/browse/HBASE-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-3361: -- Resolution: Won't Fix Status: Resolved (was: Patch Available) Modularize Maven Structure for Tests Key: HBASE-3361 URL: https://issues.apache.org/jira/browse/HBASE-3361 Project: HBase Issue Type: Improvement Components: documentation Reporter: Ed Kohlwey Assignee: Misty Stanley-Jones Attachments: HBASE-3361.patch There's a few reasons to break tests out into their own module: 1. Allowing maven users to easily re-consume test utilities as part of a test package which doesn't pollute the runtime classpath 2. Putting integration tests (tests that create or require a cluster) in their own module allows users to easily rebuild and test the core of HBase without running long-running tests, reducing the developer iteration loop After some discussions with Stack on IRC, it sounds like there was some historic investigation of this which was abandoned because the module system was becoming too complex. I'd suggest that rather than trying to break out components all at once into their modules, evaluate creation of modules on a case-by-case basis and only create them when there's a significant use case justification. I created a sample of what I'm thinking about (based on the current trunk) and posted it on github git://github.com/ekohlwey/modularized-hbase.git -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11697) Improve the 'Too many blocks' message on UI blockcache status page
[ https://issues.apache.org/jira/browse/HBASE-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-11697: --- Assignee: Mikhail Antonov (was: Andrew Purtell) Improve the 'Too many blocks' message on UI blockcache status page -- Key: HBASE-11697 URL: https://issues.apache.org/jira/browse/HBASE-11697 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Mikhail Antonov Priority: Minor Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11697.patch, TooManyBlocks.png If metrics calculations over blockcache contents stopped after examining hbase.ui.blockcache.by.file.max items, the UI will put up a message. However, this notion of too many blocks / fullness refers to structures used for calculating blockcache metrics. See BlockCacheUtil. We should improve this message so it does not leave a user the impression the blockcache may be in a bad state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HBASE-11697) Improve the 'Too many blocks' message on UI blockcache status page
[ https://issues.apache.org/jira/browse/HBASE-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reassigned HBASE-11697: -- Assignee: Andrew Purtell Improve the 'Too many blocks' message on UI blockcache status page -- Key: HBASE-11697 URL: https://issues.apache.org/jira/browse/HBASE-11697 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11697.patch, TooManyBlocks.png If metrics calculations over blockcache contents stopped after examining hbase.ui.blockcache.by.file.max items, the UI will put up a message. However, this notion of too many blocks / fullness refers to structures used for calculating blockcache metrics. See BlockCacheUtil. We should improve this message so it does not leave a user the impression the blockcache may be in a bad state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11697) Improve the 'Too many blocks' message on UI blockcache status page
[ https://issues.apache.org/jira/browse/HBASE-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089826#comment-14089826 ] Andrew Purtell commented on HBASE-11697: +1 Will commit shortly unless objection. Thanks for the patch [~mantonov]! Improve the 'Too many blocks' message on UI blockcache status page -- Key: HBASE-11697 URL: https://issues.apache.org/jira/browse/HBASE-11697 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Mikhail Antonov Priority: Minor Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11697.patch, TooManyBlocks.png If metrics calculations over blockcache contents stopped after examining hbase.ui.blockcache.by.file.max items, the UI will put up a message. However, this notion of too many blocks / fullness refers to structures used for calculating blockcache metrics. See BlockCacheUtil. We should improve this message so it does not leave a user the impression the blockcache may be in a bad state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11697) Improve the 'Too many blocks' message on UI blockcache status page
[ https://issues.apache.org/jira/browse/HBASE-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089830#comment-14089830 ] Mikhail Antonov commented on HBASE-11697: - Sure - thanks [~apurtell]! Improve the 'Too many blocks' message on UI blockcache status page -- Key: HBASE-11697 URL: https://issues.apache.org/jira/browse/HBASE-11697 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Mikhail Antonov Priority: Minor Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11697.patch, TooManyBlocks.png If metrics calculations over blockcache contents stopped after examining hbase.ui.blockcache.by.file.max items, the UI will put up a message. However, this notion of too many blocks / fullness refers to structures used for calculating blockcache metrics. See BlockCacheUtil. We should improve this message so it does not leave a user the impression the blockcache may be in a bad state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11678) BucketCache ramCache fills heap after running a few hours
[ https://issues.apache.org/jira/browse/HBASE-11678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089836#comment-14089836 ] Hadoop QA commented on HBASE-11678: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660434/11678v3.txt against trunk revision . ATTACHMENT ID: 12660434 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.TestRegionRebalancing Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10340//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10340//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10340//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10340//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10340//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10340//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10340//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10340//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10340//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10340//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10340//console This message is automatically generated. BucketCache ramCache fills heap after running a few hours - Key: HBASE-11678 URL: https://issues.apache.org/jira/browse/HBASE-11678 Project: HBase Issue Type: Bug Components: BlockCache Affects Versions: 0.99.0, 0.98.5, 2.0.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: 0001-When-we-failed-add-an-entry-failing-with-a-CacheFull.patch, 11678v2.txt, 11678v2.txt, 11678v3.txt, 11678v3.txt, gc_crash_unevenblocks_with_lots_of_evictions.png, gc_over_12_hours_unevenblocks_with_lots_of_evictions.png Testing BucketCache, my heap filled after running for hours. Dumping heap, culprit is the ramCache Map in BucketCache. Tried running with more writer threads but made no difference. Trying to figure now how our accounting is going wonky. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11476) Expand 'Conceptual View' section of Data Model chapter
[ https://issues.apache.org/jira/browse/HBASE-11476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089851#comment-14089851 ] Jonathan Hsieh commented on HBASE-11476: Let's give this another go -- here's some feedback for the next rev. {quote} + paraA row in HBase consists of a row key and one or more column families. If you think +of a row as a key-value pair, the column families are the value. /para {quote} Might be simpler to say a row consists of a rowkey and one or more columns with values associated with them. {quote} + paraA column family loosely corresponds to a type of data. Each row in a table has the +same column families, though a given row might not store anything in a given column +family. If an HBase table is a multi-dimensional map, the column family is a second +dimension./para {quote} How about something more like this: Column families physically colocate a set of columns and their values often for performance reasons. Each column family has a set of storage properties (in mem cached, compressed, data block encoding, etc), {quote} + paraA timestamp is written alongside each value, and is the identifier for a given +version of a value. By default, the timestamp represents the time on the RegionServer +when the data was written, but you can specify a different timestamp value when you put +data into the cell./para {quote} We should probably say times stamps are an advanced feature, and only exposed for use in special cases that are deeply aware and integrated with hbase. Direct use of these is discouraged -- encoding a timestamp at the application level is generally preferred. I'll do another pass that looks the tables/exampes. Expand 'Conceptual View' section of Data Model chapter --- Key: HBASE-11476 URL: https://issues.apache.org/jira/browse/HBASE-11476 Project: HBase Issue Type: Bug Components: documentation Reporter: Misty Stanley-Jones Assignee: Misty Stanley-Jones Attachments: HBASE-11476.patch Could use some updating and expansion to emphasize the differences between HBase and an RDBMS. I found http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable which is just excellent and we should link to it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11508) Document changes to IPC config parameters from HBASE-11492
[ https://issues.apache.org/jira/browse/HBASE-11508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089857#comment-14089857 ] Jonathan Hsieh commented on HBASE-11508: for these kinds of changes it is really helpful to say what versions the old and new. It seems we have one set of docs for all the 0.94 line, the 0.98/10 line, and will have another for the 2.0 line -- and making this clear will prevent confusion. Document changes to IPC config parameters from HBASE-11492 -- Key: HBASE-11508 URL: https://issues.apache.org/jira/browse/HBASE-11508 Project: HBase Issue Type: Sub-task Components: regionserver Reporter: Misty Stanley-Jones Assignee: Misty Stanley-Jones Fix For: 2.0.0 Attachments: HBASE-11492.patch, HBASE-11508-1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HBASE-9005) Improve documentation around KEEP_DELETED_CELLS, time range scans, and delete markers
[ https://issues.apache.org/jira/browse/HBASE-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh reassigned HBASE-9005: - Assignee: Jonathan Hsieh (was: Misty Stanley-Jones) Improve documentation around KEEP_DELETED_CELLS, time range scans, and delete markers - Key: HBASE-9005 URL: https://issues.apache.org/jira/browse/HBASE-9005 Project: HBase Issue Type: Bug Components: documentation Reporter: Lars Hofhansl Assignee: Jonathan Hsieh Priority: Minor Fix For: 0.99.0 Attachments: 9005.txt, HBASE-9005-1.patch Without KEEP_DELETED_CELLS all timerange queries are broken if their range covers a delete marker. As some internal discussions with colleagues showed, this feature is not well understand and documented. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-9005) Improve documentation around KEEP_DELETED_CELLS, time range scans, and delete markers
[ https://issues.apache.org/jira/browse/HBASE-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089869#comment-14089869 ] Jonathan Hsieh commented on HBASE-9005: --- can you add a link to http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html#KEEP_DELETED_CELLS instead of the current generic link to HCD (which has no javadoc)? Pointing to it and saying it is a boolean makes it much more actionable. Bonus points if an example were provided to change it in the shell. (maybe not just here, but Improve documentation around KEEP_DELETED_CELLS, time range scans, and delete markers - Key: HBASE-9005 URL: https://issues.apache.org/jira/browse/HBASE-9005 Project: HBase Issue Type: Bug Components: documentation Reporter: Lars Hofhansl Assignee: Jonathan Hsieh Priority: Minor Fix For: 0.99.0 Attachments: 9005.txt, HBASE-9005-1.patch Without KEEP_DELETED_CELLS all timerange queries are broken if their range covers a delete marker. As some internal discussions with colleagues showed, this feature is not well understand and documented. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11699) Region servers exclusion list to HMaster.
Gomathivinayagam Muthuvinayagam created HBASE-11699: --- Summary: Region servers exclusion list to HMaster. Key: HBASE-11699 URL: https://issues.apache.org/jira/browse/HBASE-11699 Project: HBase Issue Type: New Feature Components: Admin, Client, regionserver, Zookeeper Affects Versions: 0.98.3 Reporter: Gomathivinayagam Muthuvinayagam Priority: Minor Fix For: 0.98.3 Currently HBase does not support adding set of region servers to be in the exclusion list. So that administrators can prevent accidental startups of some region servers to join the cluster. There was initially some work done, and it is available in https://issues.apache.org/jira/browse/HBASE-3833. It was not done after that. I am planning to contribute it as a patch, and I would like to do some improvements as well. Instead of storing the exclusion entries on a file, I am planning to store it on zookeeper. Can anyone suggest thoughts on this? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11697) Improve the 'Too many blocks' message on UI blockcache status page
[ https://issues.apache.org/jira/browse/HBASE-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089882#comment-14089882 ] stack commented on HBASE-11697: --- +1 Thanks for cleaning up my scary message [~mantonov] Improve the 'Too many blocks' message on UI blockcache status page -- Key: HBASE-11697 URL: https://issues.apache.org/jira/browse/HBASE-11697 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Mikhail Antonov Priority: Minor Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11697.patch, TooManyBlocks.png If metrics calculations over blockcache contents stopped after examining hbase.ui.blockcache.by.file.max items, the UI will put up a message. However, this notion of too many blocks / fullness refers to structures used for calculating blockcache metrics. See BlockCacheUtil. We should improve this message so it does not leave a user the impression the blockcache may be in a bad state. -- This message was sent by Atlassian JIRA (v6.2#6252)