[jira] [Created] (HBASE-19960) Doc test timeouts and test categories in hbase2
stack created HBASE-19960: - Summary: Doc test timeouts and test categories in hbase2 Key: HBASE-19960 URL: https://issues.apache.org/jira/browse/HBASE-19960 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack Fix For: 2.0.0-beta-2 Write up that Categories are no longer acted upon, that we no longer timeout test methods. Write up that if a test goes longer than ten minutes, it is killed. Make passing reference to how it used to be but don't spend much time on it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19959) How much RAM space is to be really consumed by the memstore?
Chance Li created HBASE-19959: - Summary: How much RAM space is to be really consumed by the memstore? Key: HBASE-19959 URL: https://issues.apache.org/jira/browse/HBASE-19959 Project: HBase Issue Type: Brainstorming Components: regionserver Reporter: Chance Li Let's consider this scenario where memstoreLAB and ChunkPool is enable and max memstore size is 10G, and after some time all pooled chunk have been created, then flush all data, now memstore size is 0 but RAM actually have consumed 10G, then continue writing big cell which will not use the chunk pool but jvm heap, then memstore size will be increased to 10G(maybe more because overhead). now we can see RAM actually consumed 20G (10G of pooled chunk + 10G java objects), but the max memstore size is only 10G. what I say is the max memstore size not only take care about the cell "size" but also RAM really used. This will be a strict memory management: the max memstore size limit the RAM space which the memstore or related module can be used. This really rarely occured. It's just for a robust memory managemant semantically. What do you think? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19958) General framework to transit sync replication state
[ https://issues.apache.org/jira/browse/HBASE-19958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-19958. --- Resolution: Invalid Network lag... > General framework to transit sync replication state > --- > > Key: HBASE-19958 > URL: https://issues.apache.org/jira/browse/HBASE-19958 > Project: HBase > Issue Type: Sub-task >Reporter: Duo Zhang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19957) General framework to transit sync replication state
Duo Zhang created HBASE-19957: - Summary: General framework to transit sync replication state Key: HBASE-19957 URL: https://issues.apache.org/jira/browse/HBASE-19957 Project: HBase Issue Type: Sub-task Reporter: Duo Zhang -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19958) General framework to transit sync replication state
Duo Zhang created HBASE-19958: - Summary: General framework to transit sync replication state Key: HBASE-19958 URL: https://issues.apache.org/jira/browse/HBASE-19958 Project: HBase Issue Type: Sub-task Reporter: Duo Zhang -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19956) Remove category as a consideration timing out tests; set all test to timeout at 10minutes regardless
stack created HBASE-19956: - Summary: Remove category as a consideration timing out tests; set all test to timeout at 10minutes regardless Key: HBASE-19956 URL: https://issues.apache.org/jira/browse/HBASE-19956 Project: HBase Issue Type: Sub-task Reporter: stack Appy suggestion from parent issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Anonymous survey: Apache HBase 1.x Usage
Response to the survey so far. I think it confirms our expectations. Multiple choice was allowed so percentage will not add up to 100%. 1.0: 8% 1.1: 21% 1.2: 47% 1.3: 24% 1.4: 8% 1.5: 5% [image: Inline image 1] On Fri, Feb 2, 2018 at 3:40 PM, Andrew Purtellwrote: > Please take this anonymous survey > to > let us know what version of Apache HBase 1.x you are using in production > now or are planning to use in production in the next year or so. > > Multiple choices are allowed. > > There is no "I'm not using 1.x" choice. Consider upgrading! (smile) > > https://www.surveymonkey.com/r/8WQ8QY6 >
[ANNOUNCE] Apache HBase 1.4.1 is now available for download
The HBase team is happy to announce the immediate availability of Apach e e HBase 1.4.1! Apache HBase is an open-source, distributed, versioned, non-relational database. Apache HBase gives you low latency random access to billions of rows with millions of columns atop non-specialized hardware. To learn more about HBase, see https://hbase.apache.org/. Download through an ASF mirror: https://www.apache.org/dyn/closer.lua/hbase/1.4.1 HBase 1.4.1 is the second release of the new HBase 1.4 line, continuing on the theme of bringing a stable, reliable database to the Apache Big Data ecosystem and beyond. For instructions on verifying ASF release downloads, please see https://www.apache.org/dyn/closer.cgi#verify Project member signature keys can be found at https://www.apache.org/dist/hbase/KEYS Thanks to all the contributors who made this release possible! A list of the 38 issues resolved in this release can be found at https://s.apache.org/tx1w and following this announcement. Important changes include: HBASE-11409 (Add more flexibility for input directory structure to LoadIncrementalHFiles) Allows for users to bulk load entire tables from hdfs by specifying the parameter -loadTable. This allows you to pass in a table level directory and have all regions column families bulk loaded, if you do not specify the -loadTable parameter LoadIncrementalHFiles will work as before. Note: you must have a pre-created table to run with -loadTable it will not create one for you. HBASE-15321 (Ability to open a HRegion from HDFS snapshot.) HRegion.openReadOnlyFileSystemHRegion() provides the ability to open HRegion from a read-only HDFS snapshot. Because HDFS snapshots are read-only, no cleanup happens when using this API. HBASE-17513 (Thrift Server 1 uses different QOP settings than RPC and Thrift Server 2 and can easily be misconfigured so there is no encryption when the operator expects it) This change fixes an issue where users could have unintentionally configured the HBase Thrift1 server to run without wire-encryption, when they believed they had configured the Thrift1 server to do so. HBASE-19163 (“Maximum lock count exceeded" from region server's batch processing) When there are many mutations against the same row in a batch, as each mutation will acquire a shared row lock, it will exceed the maximum shared lock count the java ReadWritelock supports (64k). Along with other optimization, the batch is divided into multiple possible minibatches. A new config is added to limit the maximum number of mutations in the minibatch. hbase.regionserver.minibatch.size 2 The default value is 2. HBASE-19358 (Improve the stability of splitting log when do fail over) After HBASE-19358 we introduced a new property hbase.split.writer.creation.bounded to limit the opening writers for each WALSplitter. If set to true, we won't open any writer for recovered.edits until the entries accumulated in memory reaching hbase.regionserver.hlog.splitlog.buffersize (which defaults at 128M) and will write and close the file in one go instead of keeping the writer open. It's false by default and we recommend to set it to true if your cluster has a high region load (like more than 300 regions per RS), especially when you observed obvious NN/HDFS slow down during hbase (single RS or cluster) failover. HBASE-19483 (Add proper privilege check for rsgroup commands) RSGroup commands are now restricted unless access is granted at the global, namespace, or table level. Best, The HBase Dev Team HBASE-11409 Add more flexibility for input directory structure to LoadIncrementalHFiles HBASE-15321 Ability to open a HRegion from hdfs snapshot. HBASE-15580 Tag coprocessor limitedprivate scope to StoreFile.Reader HBASE-17079 HBase build fails on windows, hbase-archetype-builder is reason for failure HBASE-17513 Thrift Server 1 uses different QOP settings than RPC and Thrift Server 2 and can easily be misconfigured so there is no encryption when the operator expects it. HBASE-18625 Splitting of region with replica, doesn't update region list in serverHolding. A server crash leads to overlap. HBASE-19125 TestReplicator is flaky HBASE-19163 "Maximum lock count exceeded" from region server's batch processing HBASE-19358 Improve the stability of splitting log when do fail over HBASE-19378 Backport HBASE-19252 "Move the transform logic of FilterList into transformCell() method to avoid extra ref to question cell" HBASE-19383 [1.2] java.lang.AssertionError: expected:<2> but was:<1> at org.apache.hadoop.hbase.TestChoreService.testTriggerNowFailsWhenNotScheduled(TestChoreService.java:707) HBASE-19424 Metrics servlet throws NPE HBASE-19468 FNFE during scans and flushes
[jira] [Created] (HBASE-19955) Put back CategoryBased test method timeout Annotation
stack created HBASE-19955: - Summary: Put back CategoryBased test method timeout Annotation Key: HBASE-19955 URL: https://issues.apache.org/jira/browse/HBASE-19955 Project: HBase Issue Type: Sub-task Components: test Reporter: stack See parent issue. If we decide to put back method-based timeouts, here's a patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19947) MR jobs using ITU use wrong filesystem
[ https://issues.apache.org/jira/browse/HBASE-19947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob resolved HBASE-19947. --- Resolution: Fixed removed extra conf line and pushed to branch-2 and master. thanks for review, stack. > MR jobs using ITU use wrong filesystem > -- > > Key: HBASE-19947 > URL: https://issues.apache.org/jira/browse/HBASE-19947 > Project: HBase > Issue Type: Task > Components: integration tests >Reporter: stack >Assignee: Mike Drob >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19947.patch > > > Discovered by [~stack] as a result of HBASE-19841 > IntegrationTestUtil subclasses HBasteTestUtility which new sets local FS as > the default. When ITU is run against a mini cluster we reset it to the newly > created DFS, but when it runs against an already existing distributed > cluster, we forget to point our conf at the right place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19954) ShutdownHook should check whether shutdown hook is tracked by ShutdownHookManager
Ted Yu created HBASE-19954: -- Summary: ShutdownHook should check whether shutdown hook is tracked by ShutdownHookManager Key: HBASE-19954 URL: https://issues.apache.org/jira/browse/HBASE-19954 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Currently ShutdownHook#suppressHdfsShutdownHook() does the following: {code} synchronized (fsShutdownHooks) { boolean isFSCacheDisabled = fs.getConf().getBoolean("fs.hdfs.impl.disable.cache", false); if (!isFSCacheDisabled && !fsShutdownHooks.containsKey(hdfsClientFinalizer) && !ShutdownHookManager.deleteShutdownHook(hdfsClientFinalizer)) { {code} There is no check that ShutdownHookManager still tracks the shutdown hook, leading to potential RuntimeException (as can be observed in hadoop3 Jenkins job). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19953) Avoid calling post* hook when procedure fails
Josh Elser created HBASE-19953: -- Summary: Avoid calling post* hook when procedure fails Key: HBASE-19953 URL: https://issues.apache.org/jira/browse/HBASE-19953 Project: HBase Issue Type: Bug Components: master, proc-v2 Reporter: Ramesh Mani Assignee: Josh Elser Fix For: 2.0.0-beta-2 Ramesh pointed out a case where I think we're mishandling some post\* MasterObserver hooks. Specifically, I'm looking at the deleteNamespace. We synchronously execute the DeleteNamespace procedure. When the user provides a namespace that isn't empty, the procedure does a rollback (which is just a no-op), but this doesn't propagate an exception up to the NonceProcedureRunnable in {{HMaster#deleteNamespace}}. It took Ramesh pointing it out a bit better to me that the code executes a bit differently than we actually expect. I think we need to double-check our post hooks and make sure we aren't invoking them when the procedure actually failed. cc/ [~Apache9], [~stack]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19791) TestZKAsyncRegistry hangs
[ https://issues.apache.org/jira/browse/HBASE-19791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-19791. --- Resolution: Fixed Re-resolving. It no longer shows in flakies list. > TestZKAsyncRegistry hangs > - > > Key: HBASE-19791 > URL: https://issues.apache.org/jira/browse/HBASE-19791 > Project: HBase > Issue Type: Sub-task >Reporter: Duo Zhang >Assignee: stack >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: 0001-HBASE-19791-do-nothing.patch, jstack, output > > > It hangs in TEST_UTIL.shutdownMiniCluster() for me locally. > Will upload the test output and jstack result for further digging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19840) Flakey TestMetaWithReplicas
[ https://issues.apache.org/jira/browse/HBASE-19840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-19840. --- Resolution: Fixed Re-resolving. This fell off the flakies list. > Flakey TestMetaWithReplicas > --- > > Key: HBASE-19840 > URL: https://issues.apache.org/jira/browse/HBASE-19840 > Project: HBase > Issue Type: Sub-task > Components: flakey, test >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19840.master.001.patch, > HBASE-19840.master.001.patch > > > Failing about 15% of the time.. In testShutdownHandling.. > [https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests-branch2.0/lastSuccessfulBuild/artifact/dashboard.html] > > Adding some debug. Its hard to follow what is going on in this test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: QuotaExceededException as a DoNotRetryIOException?
Hi Mike, You are right. For rpc throttling, definitely it is retryable. For storage quota, I think it will be fail faster (non-retryable). We probably need to separate these two types of exceptions, I will do some more research and follow up. Thanks, Huaxiang > On Feb 7, 2018, at 9:16 AM, Mike Drobwrote: > > I think, philosophically, there can be two kinds of QEE - > > For throttling, we can retry. The quota is a temporal quota - you have done > too many operations this minute, please try again next minute and > everything will work. > For storage, we shouldn't retry. The quota is a fixed quote - you have > exceeded your allotted disk space, please do not try again until you have > remedied the situation. > > Our current usage conflates the two, sometimes it is correct, sometimes not. > > On Wed, Feb 7, 2018 at 11:00 AM, Huaxiang Sun wrote: > >> Hi Stack, >> >>I run into a case that a mapreduce job in hive cannot finish because >> it runs into a QEE. >> I need to look into the hive mr task to see if QEE is not handled >> correctly in hbase code or in hive code. >> >> I am thinking that if QEE is a retryable exception, then it should be >> taken care of by the hbase code. >> I will check more and report back. >> >> Thanks, >> Huaxiang >> >>> On Feb 7, 2018, at 8:23 AM, Stack wrote: >>> >>> QEE being a DNRIOE seems right on the face of it. >>> >>> But if throttling, a DNRIOE is inappropriate. Where you seeing a QEE in a >>> throttling scenario Huaxiang? >>> >>> Thanks, >>> S >>> >>> >>> On Tue, Feb 6, 2018 at 4:56 PM, Huaxiang Sun wrote: >>> Hi HBase devs, I found that QuotaExceededException is a DoNotRetryIOException, >> which is a bit strange from user’s point of view. For rpc throttling, the exception is retryable and it tells app to slow down and retry later. Any thoughts? Thanks, Huaxiang >> >>
Re: QuotaExceededException as a DoNotRetryIOException?
I think, philosophically, there can be two kinds of QEE - For throttling, we can retry. The quota is a temporal quota - you have done too many operations this minute, please try again next minute and everything will work. For storage, we shouldn't retry. The quota is a fixed quote - you have exceeded your allotted disk space, please do not try again until you have remedied the situation. Our current usage conflates the two, sometimes it is correct, sometimes not. On Wed, Feb 7, 2018 at 11:00 AM, Huaxiang Sunwrote: > Hi Stack, > > I run into a case that a mapreduce job in hive cannot finish because > it runs into a QEE. > I need to look into the hive mr task to see if QEE is not handled > correctly in hbase code or in hive code. > >I am thinking that if QEE is a retryable exception, then it should be > taken care of by the hbase code. >I will check more and report back. > >Thanks, >Huaxiang > > > On Feb 7, 2018, at 8:23 AM, Stack wrote: > > > > QEE being a DNRIOE seems right on the face of it. > > > > But if throttling, a DNRIOE is inappropriate. Where you seeing a QEE in a > > throttling scenario Huaxiang? > > > > Thanks, > > S > > > > > > On Tue, Feb 6, 2018 at 4:56 PM, Huaxiang Sun wrote: > > > >> Hi HBase devs, > >> > >>I found that QuotaExceededException is a DoNotRetryIOException, > which > >> is a bit strange from user’s point of view. > >>For rpc throttling, the exception is retryable and it tells app to > >> slow down and retry later. > >> > >>Any thoughts? > >> > >>Thanks, > >>Huaxiang > >
Re: QuotaExceededException as a DoNotRetryIOException?
Hi Stack, I run into a case that a mapreduce job in hive cannot finish because it runs into a QEE. I need to look into the hive mr task to see if QEE is not handled correctly in hbase code or in hive code. I am thinking that if QEE is a retryable exception, then it should be taken care of by the hbase code. I will check more and report back. Thanks, Huaxiang > On Feb 7, 2018, at 8:23 AM, Stackwrote: > > QEE being a DNRIOE seems right on the face of it. > > But if throttling, a DNRIOE is inappropriate. Where you seeing a QEE in a > throttling scenario Huaxiang? > > Thanks, > S > > > On Tue, Feb 6, 2018 at 4:56 PM, Huaxiang Sun wrote: > >> Hi HBase devs, >> >>I found that QuotaExceededException is a DoNotRetryIOException, which >> is a bit strange from user’s point of view. >>For rpc throttling, the exception is retryable and it tells app to >> slow down and retry later. >> >>Any thoughts? >> >>Thanks, >>Huaxiang
Re: HBaseCon Plans?
On Fri, Feb 2, 2018 at 9:13 PM, Mike Drobwrote: > Hi folks, has there been any consideration put forth toward the next > HBaseCon? The last one was very productive for me personally, but I hadn't > heard anything about the schedule for 2018 so figured I could ask on list. > > Mike > Is been kinda quiet this year in terms of hbasecon2018. We, the community, have been running the last bunch hosted by a generous, main sponsor (Huawei in Shenzhen and Google on east and west coast). If there was the interest, we could go beat the bushes to turn up a venue and a date. Wouldn't have to be a grand affair. Thanks, St.Ack
Re: QuotaExceededException as a DoNotRetryIOException?
QEE being a DNRIOE seems right on the face of it. But if throttling, a DNRIOE is inappropriate. Where you seeing a QEE in a throttling scenario Huaxiang? Thanks, S On Tue, Feb 6, 2018 at 4:56 PM, Huaxiang Sunwrote: > Hi HBase devs, > > I found that QuotaExceededException is a DoNotRetryIOException, which > is a bit strange from user’s point of view. > For rpc throttling, the exception is retryable and it tells app to > slow down and retry later. > > Any thoughts? > > Thanks, > Huaxiang
[jira] [Resolved] (HBASE-8997) Restore the disabled test TestHCM.testDeleteForZKConnLeak
[ https://issues.apache.org/jira/browse/HBASE-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Hentschel resolved HBASE-8997. -- Resolution: Not A Problem Closed this one, because the test does not exist anymore. > Restore the disabled test TestHCM.testDeleteForZKConnLeak > - > > Key: HBASE-8997 > URL: https://issues.apache.org/jira/browse/HBASE-8997 > Project: HBase > Issue Type: Bug >Reporter: stack >Priority: Major > > hbase-8996 disabled the test because it is flakey. See hbase-8996 for the > thread dump when test was hung. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-11156) Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
[ https://issues.apache.org/jira/browse/HBASE-11156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Hentschel resolved HBASE-11156. --- Resolution: Not A Problem Close this one, because the discussion was moved to the mailing list. > Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use > io.native.lib.available > - > > Key: HBASE-11156 > URL: https://issues.apache.org/jira/browse/HBASE-11156 > Project: HBase > Issue Type: Bug > Components: Admin >Affects Versions: 0.96.1.1 >Reporter: Jiten >Priority: Critical > > # hbase shell > 2014-05-13 14:51:41,582 INFO [main] Configuration.deprecation: > hadoop.native.lib is deprecated. Instead, use io.native.lib.available > HBase Shell; enter 'help' for list of supported commands. > Type "exit" to leave the HBase Shell > Version 0.96.1.1-cdh5.0.0, rUnknown, Thu Mar 27 23:01:59 PDT 2014. > Not able to create table in Hbase. Please help -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19952) Find tests which are declared with wrong category
Duo Zhang created HBASE-19952: - Summary: Find tests which are declared with wrong category Key: HBASE-19952 URL: https://issues.apache.org/jira/browse/HBASE-19952 Project: HBase Issue Type: Sub-task Reporter: Duo Zhang -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19951) Cleanup the explicit timeout value for test method
Duo Zhang created HBASE-19951: - Summary: Cleanup the explicit timeout value for test method Key: HBASE-19951 URL: https://issues.apache.org/jira/browse/HBASE-19951 Project: HBase Issue Type: Sub-task Reporter: Duo Zhang As said in the parent issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)