[jira] [Commented] (HBASE-16172) Unify the retry logic in ScannerCallableWithReplicas and RpcRetryingCallerWithReadReplicas
[ https://issues.apache.org/jira/browse/HBASE-16172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363979#comment-15363979 ] Nicolas Liochon commented on HBASE-16172: - bq. is there any needs about 'synchronized' of RpcRetryingCallerWithReadReplicas.call() ? It looks like the "synchronized" can be safely removed. > Unify the retry logic in ScannerCallableWithReplicas and > RpcRetryingCallerWithReadReplicas > -- > > Key: HBASE-16172 > URL: https://issues.apache.org/jira/browse/HBASE-16172 > Project: HBase > Issue Type: Bug >Reporter: Yu Li >Assignee: Ted Yu > Attachments: 16172.v1.txt, 16172.v2.txt > > > The issue is pointed out by [~devaraj] in HBASE-16132 (Thanks D.D.), that in > {{RpcRetryingCallerWithReadReplicas#call}} we will call > {{ResultBoundedCompletionService#take}} instead of {{poll}} to dead-wait on > the second one if the first replica timed out, while in > {{ScannerCallableWithReplicas#call}} we still use > {{ResultBoundedCompletionService#poll}} with some timeout for the 2nd replica. > This JIRA aims at discussing whether to unify the logic in these two kinds of > caller with region replica and taking action if necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15436) BufferedMutatorImpl.flush() appears to get stuck
[ https://issues.apache.org/jira/browse/HBASE-15436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217714#comment-15217714 ] Nicolas Liochon commented on HBASE-15436: - bq. There should be a cap size for the size above which we should block the writes. We should not take more than this limit. May be some thing like 1.5 times of what is the flush size. We definitively want to take more than this limit, but may be not as much as what we're taking today (or maybe we want to be clearer on what these settings mean) There is a limit, given by the number of task executed in parallel (hbase.client.max.total.tasks). If I understand correctly, this setting is now per client (and not per htable). Ideally these parameters should be hidden to the user (i.e. the defaults are ok for a standard client w/o too much memory constraints). bq. How long we should wait? Whether we should come out faster? iirc, A long time ago, the buffer was attached to the Table object, so the policy (or at least the objective :-)) when one of the puts had failed (i.e. reached the max retry number) was simple: all the operations currently in the buffer were considered as failed as well, even if we had not even tried to send them. As a consequence the buffer was empty after the failure of a single put. It was then up to the client to continue or not. May be we should do the same with the buffered mutator, for all cases, close or not? I haven't looked at the bufferedMutator code, but I can have a look it you whish [~anoop.hbase]. bq. What if we were doing multi Get to META table to know the region location for N mutations at a time. It seems like a good idea. There are many possible optimisation on how we use meta, and this is one of them. > BufferedMutatorImpl.flush() appears to get stuck > > > Key: HBASE-15436 > URL: https://issues.apache.org/jira/browse/HBASE-15436 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.0.2 >Reporter: Sangjin Lee > Attachments: hbaseException.log, threaddump.log > > > We noticed an instance where the thread that was executing a flush > ({{BufferedMutatorImpl.flush()}}) got stuck when the (local one-node) cluster > shut down and was unable to get out of that stuck state. > The setup is a single node HBase cluster, and apparently the cluster went > away when the client was executing flush. The flush eventually logged a > failure after 30+ minutes of retrying. That is understandable. > What is unexpected is that thread is stuck in this state (i.e. in the > {{flush()}} call). I would have expected the {{flush()}} call to return after > the complete failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10605) Manage the call timeout in the server
[ https://issues.apache.org/jira/browse/HBASE-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183013#comment-15183013 ] Nicolas Liochon commented on HBASE-10605: - > , I think client rpc should include the timeout parameter Yes, we would need to forward the timeout (not the the submit time, because we don't want to rely on having the server and client clocks in sync: the server can use its own clock). Then there is already a check in the server, the request is cancelled if the client is disconnected (i.e. the tcp connection is closed). > Manage the call timeout in the server > - > > Key: HBASE-10605 > URL: https://issues.apache.org/jira/browse/HBASE-10605 > Project: HBase > Issue Type: Improvement > Components: IPC/RPC, regionserver >Affects Versions: 0.99.0 >Reporter: Nicolas Liochon > > Since HBASE-10566, we have an explicit call timeout available in the client. > We could forward it to the server, and use this information for: > - if the call is still in the queue, just cancel it > - if the call is under execution, makes this information available in > RpcCallContext (actually change the RpcCallContext#disconnectSince to > something more generic), so it can be used by the query under execution to > stop its execution > - in the future, interrupt it to manage the case 'stuck on a dead datanode' > or something similar > - if the operation has finished, don't send the reply to the client, as by > definition the client is not interested anymore. > From this, it will be easy to manage the cancellation: > disconnect/timeout/cancellation are similar from a service execution PoV -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10605) Manage the call timeout in the server
[ https://issues.apache.org/jira/browse/HBASE-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105088#comment-15105088 ] Nicolas Liochon commented on HBASE-10605: - Hi [~java8964], what do you need to know? The point of this jira if that the server should not continue to handle a request if we know that the client has already stopped waiting for the result. > Manage the call timeout in the server > - > > Key: HBASE-10605 > URL: https://issues.apache.org/jira/browse/HBASE-10605 > Project: HBase > Issue Type: Improvement > Components: IPC/RPC, regionserver >Affects Versions: 0.99.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon > > Since HBASE-10566, we have an explicit call timeout available in the client. > We could forward it to the server, and use this information for: > - if the call is still in the queue, just cancel it > - if the call is under execution, makes this information available in > RpcCallContext (actually change the RpcCallContext#disconnectSince to > something more generic), so it can be used by the query under execution to > stop its execution > - in the future, interrupt it to manage the case 'stuck on a dead datanode' > or something similar > - if the operation has finished, don't send the reply to the client, as by > definition the client is not interested anymore. > From this, it will be easy to manage the cancellation: > disconnect/timeout/cancellation are similar from a service execution PoV -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10605) Manage the call timeout in the server
[ https://issues.apache.org/jira/browse/HBASE-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-10605: Assignee: (was: Nicolas Liochon) > Manage the call timeout in the server > - > > Key: HBASE-10605 > URL: https://issues.apache.org/jira/browse/HBASE-10605 > Project: HBase > Issue Type: Improvement > Components: IPC/RPC, regionserver >Affects Versions: 0.99.0 >Reporter: Nicolas Liochon > > Since HBASE-10566, we have an explicit call timeout available in the client. > We could forward it to the server, and use this information for: > - if the call is still in the queue, just cancel it > - if the call is under execution, makes this information available in > RpcCallContext (actually change the RpcCallContext#disconnectSince to > something more generic), so it can be used by the query under execution to > stop its execution > - in the future, interrupt it to manage the case 'stuck on a dead datanode' > or something similar > - if the operation has finished, don't send the reply to the client, as by > definition the client is not interested anymore. > From this, it will be easy to manage the cancellation: > disconnect/timeout/cancellation are similar from a service execution PoV -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos
[ https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15029108#comment-15029108 ] Nicolas Liochon commented on HBASE-14580: - The made it to the .98 branch but not the 1.1. [~ndimiduk], do you want it? I checked, the patch can be applied and works as expected. I can do the commit, just tell me the version number I should use (1.1.3? another one?) > Make the HBaseMiniCluster compliant with Kerberos > - > > Key: HBASE-14580 > URL: https://issues.apache.org/jira/browse/HBASE-14580 > Project: HBase > Issue Type: Improvement > Components: security, test >Affects Versions: 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon > Fix For: 2.0.0, 1.3.0, 1.2.1, 0.98.16 > > Attachments: hbase-14580.v2.patch, hbase-14580.v2.patch, > patch-14580.v1.patch > > > Whne using MiniKDC and the minicluster in a unit test, there is a conflict > causeed by HBaseTestingUtility: > {code} > public static User getDifferentUser(final Configuration c, > final String differentiatingSuffix) > throws IOException { >// snip > String username = User.getCurrent().getName() + > differentiatingSuffix; < problem here > User user = User.createUserForTesting(c, username, > new String[]{"supergroup"}); > return user; > } > {code} > This creates users like securedUser/localh...@example.com.hfs.0, and this > does not work. > My fix is to return the current user when Kerberos is set. I don't think that > there is another option (any other opinion?). However this user is not in a > group so we have logs like 'WARN [IPC Server handler 9 on 61366] > security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) > - No groups available for user securedUser' I'm not sure of its impact. > [~apurtell], what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos
[ https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15029161#comment-15029161 ] Nicolas Liochon commented on HBASE-14580: - I'm not sure I would call this a feature :-). If it's good enough for 0.98 it's good enough for 1.1 imho. No problem for me anyway. In general, I don't really like when there are holes like this (available in version x and x+2 but not x+1), but I agree that for this specific jira it"s unlikely to be visible to anybody. > Make the HBaseMiniCluster compliant with Kerberos > - > > Key: HBASE-14580 > URL: https://issues.apache.org/jira/browse/HBASE-14580 > Project: HBase > Issue Type: Improvement > Components: security, test >Affects Versions: 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon > Fix For: 2.0.0, 1.3.0, 1.2.1, 0.98.16 > > Attachments: hbase-14580.v2.patch, hbase-14580.v2.patch, > patch-14580.v1.patch > > > Whne using MiniKDC and the minicluster in a unit test, there is a conflict > causeed by HBaseTestingUtility: > {code} > public static User getDifferentUser(final Configuration c, > final String differentiatingSuffix) > throws IOException { >// snip > String username = User.getCurrent().getName() + > differentiatingSuffix; < problem here > User user = User.createUserForTesting(c, username, > new String[]{"supergroup"}); > return user; > } > {code} > This creates users like securedUser/localh...@example.com.hfs.0, and this > does not work. > My fix is to return the current user when Kerberos is set. I don't think that > there is another option (any other opinion?). However this user is not in a > group so we have logs like 'WARN [IPC Server handler 9 on 61366] > security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) > - No groups available for user securedUser' I'm not sure of its impact. > [~apurtell], what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14700) Support a "permissive" mode for secure clusters to allow "simple" auth clients
[ https://issues.apache.org/jira/browse/HBASE-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982573#comment-14982573 ] Nicolas Liochon commented on HBASE-14700: - +1 from me as well. I'm closing HBASE-14579 as this jira includes a fix for it as well. > Support a "permissive" mode for secure clusters to allow "simple" auth clients > -- > > Key: HBASE-14700 > URL: https://issues.apache.org/jira/browse/HBASE-14700 > Project: HBase > Issue Type: Improvement > Components: security >Reporter: Gary Helmling >Assignee: Gary Helmling > Fix For: 2.0.0 > > Attachments: HBASE-14700-v2.patch, HBASE-14700-v3.patch, > HBASE-14700.patch > > > When implementing HBase security for an existing cluster, it can be useful to > support mixed secure and insecure clients while all client configurations are > migrated over to secure authentication. > We currently have an option to allow secure clients to fallback to simple > auth against insecure clusters. By providing an analogous setting for > servers, we would allow a phased rollout of security: > # First, security can be enabled on the cluster servers, with the > "permissive" mode enabled > # Clients can be converting to using secure authentication incrementally > # The server audit logs allow identification of clients still using simple > auth to connect > # Finally, when sufficient clients have been converted to secure operation, > the server-side "permissive" mode can be removed, allowing completely secure > operation. > Obviously with this enabled, there is no effective access control, but this > would still be a useful tool to enable a smooth operational rollout of > security. Permissive mode would of course be disabled by default. Enabling > it should provide a big scary warning in the logs on startup, and possibly be > flagged on relevant UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE
[ https://issues.apache.org/jira/browse/HBASE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-14579: Resolution: Duplicate Status: Resolved (was: Patch Available) > Users authenticated with KERBEROS are recorded as being authenticated with > SIMPLE > - > > Key: HBASE-14579 > URL: https://issues.apache.org/jira/browse/HBASE-14579 > Project: HBase > Issue Type: Bug > Components: security >Affects Versions: 1.0.0, 1.2.0, 0.98.15 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Minor > Fix For: 2.0.0 > > Attachments: hbase-14579.patch > > > That's the HBase version of HADOOP-10683. > We see: > ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful > for securedUser/localh...@example.com (auth:SIMPLE)?? > while we would like to see: > ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful > for securedUser/localh...@example.com (auth:KERBEROS)?? > The fix is simple, but it means we need hadoop 2.5+. > There is also a lot of cases where HBase calls "createUser" w/o specifying > the authentication method... I don"'t have the solution for these ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE
[ https://issues.apache.org/jira/browse/HBASE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982571#comment-14982571 ] Nicolas Liochon commented on HBASE-14579: - > Does this also happen for users authenticated with authentication tokens > ("auth:SIMPLE" instead of "auth:TOKEN" or "auth:DIGEST")? For digest, I tink it's ok, the code is RpcServer is {code} private UserGroupInformation getAuthorizedUgi(String authorizedId) throws IOException { if(this.authMethod == AuthMethod.DIGEST) { TokenIdentifier tokenId = HBaseSaslRpcServer.getIdentifier(authorizedId, RpcServer.this.secretManager); UserGroupInformation ugi = tokenId.getUser(); if(ugi == null) { throw new AccessDeniedException("Can\'t retrieve username from tokenIdentifier."); } else { ugi.addTokenIdentifier(tokenId); return ugi; } } else { return UserGroupInformation.createRemoteUser(authorizedId); < auth method replaced by "SIMPLE" } } {code} > The latest patch (v3) for HBASE-14700 contains a fix for the UGI auth method > logged. Please take a look there if you have a chance. Looking... > Users authenticated with KERBEROS are recorded as being authenticated with > SIMPLE > - > > Key: HBASE-14579 > URL: https://issues.apache.org/jira/browse/HBASE-14579 > Project: HBase > Issue Type: Bug > Components: security >Affects Versions: 1.0.0, 1.2.0, 0.98.15 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Minor > Fix For: 2.0.0 > > Attachments: hbase-14579.patch > > > That's the HBase version of HADOOP-10683. > We see: > ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful > for securedUser/localh...@example.com (auth:SIMPLE)?? > while we would like to see: > ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful > for securedUser/localh...@example.com (auth:KERBEROS)?? > The fix is simple, but it means we need hadoop 2.5+. > There is also a lot of cases where HBase calls "createUser" w/o specifying > the authentication method... I don"'t have the solution for these ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11590) use a specific ThreadPoolExecutor
[ https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958560#comment-14958560 ] Nicolas Liochon commented on HBASE-11590: - The issue is that the ThreadPoolExecutor leaked all over the place, often for monitoring reasons. All lot of code depends on ThreadPoolExecutor rather than the ExecutorService... For example, see {code} /** * This class will coalesce increments from a thift server if * hbase.regionserver.thrift.coalesceIncrement is set to true. Turning this * config to true will cause the thrift server to queue increments into an * instance of this class. The thread pool associated with this class will drain * the coalesced increments as the thread is able. This can cause data loss if the * thrift server dies or is shut down before everything in the queue is drained. * */ public class IncrementCoalescer implements IncrementCoalescerMBean { // snip // MBean get/set methods public int getQueueSize() { return pool.getQueue().size(); } public int getMaxQueueSize() { return this.maxQueueSize; } public void setMaxQueueSize(int newSize) { this.maxQueueSize = newSize; } public long getPoolCompletedTaskCount() { return pool.getCompletedTaskCount(); } public long getPoolTaskCount() { return pool.getTaskCount(); } public int getPoolLargestPoolSize() { return pool.getLargestPoolSize(); } public int getCorePoolSize() { return pool.getCorePoolSize(); } public void setCorePoolSize(int newCoreSize) { pool.setCorePoolSize(newCoreSize); } public int getMaxPoolSize() { return pool.getMaximumPoolSize(); } public void setMaxPoolSize(int newMaxSize) { pool.setMaximumPoolSize(newMaxSize); } {code} I'm going to limit this patch to the easy/client stuff... > use a specific ThreadPoolExecutor > - > > Key: HBASE-11590 > URL: https://issues.apache.org/jira/browse/HBASE-11590 > Project: HBase > Issue Type: Bug > Components: Client, Performance >Affects Versions: 1.0.0, 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Minor > Fix For: 2.0.0 > > Attachments: ExecutorServiceTest.java, > LifoThreadPoolExecutorSQP.java, UnitQueueP.java, UnitQueuePU.java, tp.patch > > > The JDK TPE creates all the threads in the pool. As a consequence, we create > (by default) 256 threads even if we just need a few. > The attached TPE create threads only if we have something in the queue. > On a PE test with replica on, it improved the 99 latency percentile by 5%. > Warning: there are likely some race conditions, but I'm posting it here > because there is may be an implementation available somewhere we can use, or > a good reason not to do that. So feedback welcome as usual. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14521) Unify the semantic of hbase.client.retries.number
[ https://issues.apache.org/jira/browse/HBASE-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-14521: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: (was: 1.3.0) Status: Resolved (was: Patch Available) Committed to master. I didn't commit to 1.x branch because it's a behavior change... Thanks for the patch, Yu!. > Unify the semantic of hbase.client.retries.number > - > > Key: HBASE-14521 > URL: https://issues.apache.org/jira/browse/HBASE-14521 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14, 1.1.2 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14521.patch, HBASE-14521_v2.patch, > HBASE-14521_v3.patch > > > From name of the _hbase.client.retries.number_ property, it should be the > number of maximum *retries*, or say if we set the property to 1, there should > be 2 attempts in total. However, there're two different semantics when using > it in current code base. > For example, in ConnectionImplementation#locateRegionInMeta: > {code} > int localNumRetries = (retry ? numTries : 1); > for (int tries = 0; true; tries++) { > if (tries >= localNumRetries) { > throw new NoServerForRegionException("Unable to find region for " > + Bytes.toStringBinary(row) + " in " + tableName + > " after " + numTries + " tries."); > } > {code} > the retries number is regarded as max times for *tries* > While in RpcRetryingCallerImpl#callWithRetries: > {code} > for (int tries = 0;; tries++) { > long expectedSleep; > try { > callable.prepare(tries != 0); // if called with false, check table > status on ZK > interceptor.intercept(context.prepare(callable, tries)); > return callable.call(getRemainingTime(callTimeout)); > } catch (PreemptiveFastFailException e) { > throw e; > } catch (Throwable t) { > ... > if (tries >= retries - 1) { > throw new RetriesExhaustedException(tries, exceptions); > } > {code} > it's regarded as exactly for *REtry* (try a call first with no condition and > then check whether to retry or exceeds maximum retry number) > This inconsistency will cause misunderstanding in usage, such as one of our > customer set the property to zero expecting one single call but finally > received NoServerForRegionException. > We should unify the semantic of the property, and I suggest to keep the > original one for retry rather than total tries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11590) use a specific ThreadPoolExecutor
[ https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-11590: Attachment: HBASE-11590.v1.patch > use a specific ThreadPoolExecutor > - > > Key: HBASE-11590 > URL: https://issues.apache.org/jira/browse/HBASE-11590 > Project: HBase > Issue Type: Bug > Components: Client, Performance >Affects Versions: 1.0.0, 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Minor > Fix For: 2.0.0 > > Attachments: ExecutorServiceTest.java, HBASE-11590.v1.patch, > LifoThreadPoolExecutorSQP.java, UnitQueueP.java, UnitQueuePU.java, tp.patch > > > The JDK TPE creates all the threads in the pool. As a consequence, we create > (by default) 256 threads even if we just need a few. > The attached TPE create threads only if we have something in the queue. > On a PE test with replica on, it improved the 99 latency percentile by 5%. > Warning: there are likely some race conditions, but I'm posting it here > because there is may be an implementation available somewhere we can use, or > a good reason not to do that. So feedback welcome as usual. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11590) use a specific ThreadPoolExecutor
[ https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-11590: Status: Patch Available (was: Open) > use a specific ThreadPoolExecutor > - > > Key: HBASE-11590 > URL: https://issues.apache.org/jira/browse/HBASE-11590 > Project: HBase > Issue Type: Bug > Components: Client, Performance >Affects Versions: 1.0.0, 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Minor > Fix For: 2.0.0 > > Attachments: ExecutorServiceTest.java, HBASE-11590.v1.patch, > LifoThreadPoolExecutorSQP.java, UnitQueueP.java, UnitQueuePU.java, tp.patch > > > The JDK TPE creates all the threads in the pool. As a consequence, we create > (by default) 256 threads even if we just need a few. > The attached TPE create threads only if we have something in the queue. > On a PE test with replica on, it improved the 99 latency percentile by 5%. > Warning: there are likely some race conditions, but I'm posting it here > because there is may be an implementation available somewhere we can use, or > a good reason not to do that. So feedback welcome as usual. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11590) use a specific ThreadPoolExecutor
[ https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958906#comment-14958906 ] Nicolas Liochon commented on HBASE-11590: - The patch compiles locally, but it's all I checked. client side: use the ForkJoin instead of ThreadPoolExecutor; remove the monitoring linked to ThreadPoolExecutor server side: when possible; use the interface (ExecutorService) instead of the implementation (ThreadPoolExecutor) > use a specific ThreadPoolExecutor > - > > Key: HBASE-11590 > URL: https://issues.apache.org/jira/browse/HBASE-11590 > Project: HBase > Issue Type: Bug > Components: Client, Performance >Affects Versions: 1.0.0, 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Minor > Fix For: 2.0.0 > > Attachments: ExecutorServiceTest.java, HBASE-11590.v1.patch, > LifoThreadPoolExecutorSQP.java, UnitQueueP.java, UnitQueuePU.java, tp.patch > > > The JDK TPE creates all the threads in the pool. As a consequence, we create > (by default) 256 threads even if we just need a few. > The attached TPE create threads only if we have something in the queue. > On a PE test with replica on, it improved the 99 latency percentile by 5%. > Warning: there are likely some race conditions, but I'm posting it here > because there is may be an implementation available somewhere we can use, or > a good reason not to do that. So feedback welcome as usual. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE
[ https://issues.apache.org/jira/browse/HBASE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954636#comment-14954636 ] Nicolas Liochon commented on HBASE-14579: - Thanks Stack, yes, it would be great. I could change the script, but I can't easily test it right now. > Users authenticated with KERBEROS are recorded as being authenticated with > SIMPLE > - > > Key: HBASE-14579 > URL: https://issues.apache.org/jira/browse/HBASE-14579 > Project: HBase > Issue Type: Bug > Components: security >Affects Versions: 1.0.0, 1.2.0, 0.98.15 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Minor > Fix For: 2.0.0 > > Attachments: hbase-14579.patch > > > That's the HBase version of HADOOP-10683. > We see: > ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful > for securedUser/localh...@example.com (auth:SIMPLE)?? > while we would like to see: > ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful > for securedUser/localh...@example.com (auth:KERBEROS)?? > The fix is simple, but it means we need hadoop 2.5+. > There is also a lot of cases where HBase calls "createUser" w/o specifying > the authentication method... I don"'t have the solution for these ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos
[ https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954583#comment-14954583 ] Nicolas Liochon commented on HBASE-14580: - This second run makes more sense :-). I'm going to commit on the master branch. [~ndimiduk], you may want this for the 1.2 branch? > Make the HBaseMiniCluster compliant with Kerberos > - > > Key: HBASE-14580 > URL: https://issues.apache.org/jira/browse/HBASE-14580 > Project: HBase > Issue Type: Improvement > Components: security, test >Affects Versions: 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon > Fix For: 2.0.0 > > Attachments: hbase-14580.v2.patch, hbase-14580.v2.patch, > patch-14580.v1.patch > > > Whne using MiniKDC and the minicluster in a unit test, there is a conflict > causeed by HBaseTestingUtility: > {code} > public static User getDifferentUser(final Configuration c, > final String differentiatingSuffix) > throws IOException { >// snip > String username = User.getCurrent().getName() + > differentiatingSuffix; < problem here > User user = User.createUserForTesting(c, username, > new String[]{"supergroup"}); > return user; > } > {code} > This creates users like securedUser/localh...@example.com.hfs.0, and this > does not work. > My fix is to return the current user when Kerberos is set. I don't think that > there is another option (any other opinion?). However this user is not in a > group so we have logs like 'WARN [IPC Server handler 9 on 61366] > security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) > - No groups available for user securedUser' I'm not sure of its impact. > [~apurtell], what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14268) Improve KeyLocker
[ https://issues.apache.org/jira/browse/HBASE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954690#comment-14954690 ] Nicolas Liochon commented on HBASE-14268: - I just saw that, I'm having a look. > Improve KeyLocker > - > > Key: HBASE-14268 > URL: https://issues.apache.org/jira/browse/HBASE-14268 > Project: HBase > Issue Type: Improvement > Components: util >Reporter: Hiroshi Ikeda >Assignee: Hiroshi Ikeda >Priority: Minor > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14268-V5.patch, HBASE-14268-V2.patch, > HBASE-14268-V3.patch, HBASE-14268-V4.patch, HBASE-14268-V5.patch, > HBASE-14268-V5.patch, HBASE-14268-V6.patch, HBASE-14268-V7.patch, > HBASE-14268-V7.patch, HBASE-14268-V7.patch, HBASE-14268-V7.patch, > HBASE-14268-V7.patch, HBASE-14268-V7.patch, HBASE-14268-V7.patch, > HBASE-14268.patch, KeyLockerIncrKeysPerformance.java, > KeyLockerPerformance.java, ReferenceTestApp.java > > > 1. In the implementation of {{KeyLocker}} it uses atomic variables inside a > synchronized block, which doesn't make sense. Moreover, logic inside the > synchronized block is not trivial so that it makes less performance in heavy > multi-threaded environment. > 2. {{KeyLocker}} gives an instance of {{RentrantLock}} which is already > locked, but it doesn't follow the contract of {{ReentrantLock}} because you > are not allowed to freely invoke lock/unlock methods under that contract. > That introduces a potential risk; Whenever you see a variable of the type > {{RentrantLock}}, you should pay attention to what the included instance is > coming from. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14268) Improve KeyLocker
[ https://issues.apache.org/jira/browse/HBASE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954701#comment-14954701 ] Nicolas Liochon commented on HBASE-14268: - [~sreenivasulureddy]It should be ok now. I added the two missing files. > Improve KeyLocker > - > > Key: HBASE-14268 > URL: https://issues.apache.org/jira/browse/HBASE-14268 > Project: HBase > Issue Type: Improvement > Components: util >Reporter: Hiroshi Ikeda >Assignee: Hiroshi Ikeda >Priority: Minor > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14268-V5.patch, HBASE-14268-V2.patch, > HBASE-14268-V3.patch, HBASE-14268-V4.patch, HBASE-14268-V5.patch, > HBASE-14268-V5.patch, HBASE-14268-V6.patch, HBASE-14268-V7.patch, > HBASE-14268-V7.patch, HBASE-14268-V7.patch, HBASE-14268-V7.patch, > HBASE-14268-V7.patch, HBASE-14268-V7.patch, HBASE-14268-V7.patch, > HBASE-14268.patch, KeyLockerIncrKeysPerformance.java, > KeyLockerPerformance.java, ReferenceTestApp.java > > > 1. In the implementation of {{KeyLocker}} it uses atomic variables inside a > synchronized block, which doesn't make sense. Moreover, logic inside the > synchronized block is not trivial so that it makes less performance in heavy > multi-threaded environment. > 2. {{KeyLocker}} gives an instance of {{RentrantLock}} which is already > locked, but it doesn't follow the contract of {{ReentrantLock}} because you > are not allowed to freely invoke lock/unlock methods under that contract. > That introduces a potential risk; Whenever you see a variable of the type > {{RentrantLock}}, you should pay attention to what the included instance is > coming from. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos
[ https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-14580: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed on master only > Make the HBaseMiniCluster compliant with Kerberos > - > > Key: HBASE-14580 > URL: https://issues.apache.org/jira/browse/HBASE-14580 > Project: HBase > Issue Type: Improvement > Components: security, test >Affects Versions: 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon > Fix For: 2.0.0 > > Attachments: hbase-14580.v2.patch, hbase-14580.v2.patch, > patch-14580.v1.patch > > > Whne using MiniKDC and the minicluster in a unit test, there is a conflict > causeed by HBaseTestingUtility: > {code} > public static User getDifferentUser(final Configuration c, > final String differentiatingSuffix) > throws IOException { >// snip > String username = User.getCurrent().getName() + > differentiatingSuffix; < problem here > User user = User.createUserForTesting(c, username, > new String[]{"supergroup"}); > return user; > } > {code} > This creates users like securedUser/localh...@example.com.hfs.0, and this > does not work. > My fix is to return the current user when Kerberos is set. I don't think that > there is another option (any other opinion?). However this user is not in a > group so we have logs like 'WARN [IPC Server handler 9 on 61366] > security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) > - No groups available for user securedUser' I'm not sure of its impact. > [~apurtell], what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-14521) Unify the semantic of hbase.client.retries.number
[ https://issues.apache.org/jira/browse/HBASE-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955354#comment-14955354 ] Nicolas Liochon edited comment on HBASE-14521 at 10/13/15 5:52 PM: --- Yep [~carp84], I think your analysis is correct: it was a workaround. While looking again at the patch, I found a typo that I will fix on commit > public RetriesExhaustedException(final int numReries, I'm +1, I will commit on master tomorrow my time if nobody disagrees. was (Author: nkeywal): Yep [~carp84], I think your analysis is correct: it was a workaround. While looking again at the patch, I found a typo that I will fix on commit > public RetriesExhaustedException(final int numReries, I'm +1, I will commit on branch2 tomorrow my time if nobody disagrees. > Unify the semantic of hbase.client.retries.number > - > > Key: HBASE-14521 > URL: https://issues.apache.org/jira/browse/HBASE-14521 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14, 1.1.2 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14521.patch, HBASE-14521_v2.patch, > HBASE-14521_v3.patch > > > From name of the _hbase.client.retries.number_ property, it should be the > number of maximum *retries*, or say if we set the property to 1, there should > be 2 attempts in total. However, there're two different semantics when using > it in current code base. > For example, in ConnectionImplementation#locateRegionInMeta: > {code} > int localNumRetries = (retry ? numTries : 1); > for (int tries = 0; true; tries++) { > if (tries >= localNumRetries) { > throw new NoServerForRegionException("Unable to find region for " > + Bytes.toStringBinary(row) + " in " + tableName + > " after " + numTries + " tries."); > } > {code} > the retries number is regarded as max times for *tries* > While in RpcRetryingCallerImpl#callWithRetries: > {code} > for (int tries = 0;; tries++) { > long expectedSleep; > try { > callable.prepare(tries != 0); // if called with false, check table > status on ZK > interceptor.intercept(context.prepare(callable, tries)); > return callable.call(getRemainingTime(callTimeout)); > } catch (PreemptiveFastFailException e) { > throw e; > } catch (Throwable t) { > ... > if (tries >= retries - 1) { > throw new RetriesExhaustedException(tries, exceptions); > } > {code} > it's regarded as exactly for *REtry* (try a call first with no condition and > then check whether to retry or exceeds maximum retry number) > This inconsistency will cause misunderstanding in usage, such as one of our > customer set the property to zero expecting one single call but finally > received NoServerForRegionException. > We should unify the semantic of the property, and I suggest to keep the > original one for retry rather than total tries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos
[ https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955296#comment-14955296 ] Nicolas Liochon commented on HBASE-14580: - Committed to the 1.2 branch. > Make the HBaseMiniCluster compliant with Kerberos > - > > Key: HBASE-14580 > URL: https://issues.apache.org/jira/browse/HBASE-14580 > Project: HBase > Issue Type: Improvement > Components: security, test >Affects Versions: 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon > Fix For: 2.0.0, 1.2.1 > > Attachments: hbase-14580.v2.patch, hbase-14580.v2.patch, > patch-14580.v1.patch > > > Whne using MiniKDC and the minicluster in a unit test, there is a conflict > causeed by HBaseTestingUtility: > {code} > public static User getDifferentUser(final Configuration c, > final String differentiatingSuffix) > throws IOException { >// snip > String username = User.getCurrent().getName() + > differentiatingSuffix; < problem here > User user = User.createUserForTesting(c, username, > new String[]{"supergroup"}); > return user; > } > {code} > This creates users like securedUser/localh...@example.com.hfs.0, and this > does not work. > My fix is to return the current user when Kerberos is set. I don't think that > there is another option (any other opinion?). However this user is not in a > group so we have logs like 'WARN [IPC Server handler 9 on 61366] > security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) > - No groups available for user securedUser' I'm not sure of its impact. > [~apurtell], what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos
[ https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-14580: Fix Version/s: 1.2.1 > Make the HBaseMiniCluster compliant with Kerberos > - > > Key: HBASE-14580 > URL: https://issues.apache.org/jira/browse/HBASE-14580 > Project: HBase > Issue Type: Improvement > Components: security, test >Affects Versions: 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon > Fix For: 2.0.0, 1.2.1 > > Attachments: hbase-14580.v2.patch, hbase-14580.v2.patch, > patch-14580.v1.patch > > > Whne using MiniKDC and the minicluster in a unit test, there is a conflict > causeed by HBaseTestingUtility: > {code} > public static User getDifferentUser(final Configuration c, > final String differentiatingSuffix) > throws IOException { >// snip > String username = User.getCurrent().getName() + > differentiatingSuffix; < problem here > User user = User.createUserForTesting(c, username, > new String[]{"supergroup"}); > return user; > } > {code} > This creates users like securedUser/localh...@example.com.hfs.0, and this > does not work. > My fix is to return the current user when Kerberos is set. I don't think that > there is another option (any other opinion?). However this user is not in a > group so we have logs like 'WARN [IPC Server handler 9 on 61366] > security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) > - No groups available for user securedUser' I'm not sure of its impact. > [~apurtell], what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14521) Unify the semantic of hbase.client.retries.number
[ https://issues.apache.org/jira/browse/HBASE-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955354#comment-14955354 ] Nicolas Liochon commented on HBASE-14521: - Yep [~carp84], I think your analysis is correct: it was a workaround. While looking again at the patch, I found a typo that I will fix on commit > public RetriesExhaustedException(final int numReries, I'm +1, I will commit on branch2 tomorrow my time if nobody disagrees. > Unify the semantic of hbase.client.retries.number > - > > Key: HBASE-14521 > URL: https://issues.apache.org/jira/browse/HBASE-14521 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14, 1.1.2 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14521.patch, HBASE-14521_v2.patch, > HBASE-14521_v3.patch > > > From name of the _hbase.client.retries.number_ property, it should be the > number of maximum *retries*, or say if we set the property to 1, there should > be 2 attempts in total. However, there're two different semantics when using > it in current code base. > For example, in ConnectionImplementation#locateRegionInMeta: > {code} > int localNumRetries = (retry ? numTries : 1); > for (int tries = 0; true; tries++) { > if (tries >= localNumRetries) { > throw new NoServerForRegionException("Unable to find region for " > + Bytes.toStringBinary(row) + " in " + tableName + > " after " + numTries + " tries."); > } > {code} > the retries number is regarded as max times for *tries* > While in RpcRetryingCallerImpl#callWithRetries: > {code} > for (int tries = 0;; tries++) { > long expectedSleep; > try { > callable.prepare(tries != 0); // if called with false, check table > status on ZK > interceptor.intercept(context.prepare(callable, tries)); > return callable.call(getRemainingTime(callTimeout)); > } catch (PreemptiveFastFailException e) { > throw e; > } catch (Throwable t) { > ... > if (tries >= retries - 1) { > throw new RetriesExhaustedException(tries, exceptions); > } > {code} > it's regarded as exactly for *REtry* (try a call first with no condition and > then check whether to retry or exceeds maximum retry number) > This inconsistency will cause misunderstanding in usage, such as one of our > customer set the property to zero expecting one single call but finally > received NoServerForRegionException. > We should unify the semantic of the property, and I suggest to keep the > original one for retry rather than total tries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11590) use a specific ThreadPoolExecutor
[ https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952237#comment-14952237 ] Nicolas Liochon commented on HBASE-11590: - > maybe just because it is more parsimonious in its thread use? That's the magic part: even of there is a single thread in the pool it's faster than the others. I didn't check if it consumes more CPU or not however. I will do the patch to use ForkJoin soon (hopefully today, if not next week). > use a specific ThreadPoolExecutor > - > > Key: HBASE-11590 > URL: https://issues.apache.org/jira/browse/HBASE-11590 > Project: HBase > Issue Type: Bug > Components: Client, Performance >Affects Versions: 1.0.0, 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Minor > Fix For: 2.0.0 > > Attachments: ExecutorServiceTest.java, > LifoThreadPoolExecutorSQP.java, UnitQueueP.java, UnitQueuePU.java, tp.patch > > > The JDK TPE creates all the threads in the pool. As a consequence, we create > (by default) 256 threads even if we just need a few. > The attached TPE create threads only if we have something in the queue. > On a PE test with replica on, it improved the 99 latency percentile by 5%. > Warning: there are likely some race conditions, but I'm posting it here > because there is may be an implementation available somewhere we can use, or > a good reason not to do that. So feedback welcome as usual. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos
[ https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-14580: Attachment: hbase-14580.v2.patch > Make the HBaseMiniCluster compliant with Kerberos > - > > Key: HBASE-14580 > URL: https://issues.apache.org/jira/browse/HBASE-14580 > Project: HBase > Issue Type: Improvement > Components: security, test >Affects Versions: 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon > Fix For: 2.0.0 > > Attachments: hbase-14580.v2.patch, patch-14580.v1.patch > > > Whne using MiniKDC and the minicluster in a unit test, there is a conflict > causeed by HBaseTestingUtility: > {code} > public static User getDifferentUser(final Configuration c, > final String differentiatingSuffix) > throws IOException { >// snip > String username = User.getCurrent().getName() + > differentiatingSuffix; < problem here > User user = User.createUserForTesting(c, username, > new String[]{"supergroup"}); > return user; > } > {code} > This creates users like securedUser/localh...@example.com.hfs.0, and this > does not work. > My fix is to return the current user when Kerberos is set. I don't think that > there is another option (any other opinion?). However this user is not in a > group so we have logs like 'WARN [IPC Server handler 9 on 61366] > security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) > - No groups available for user securedUser' I'm not sure of its impact. > [~apurtell], what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos
[ https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-14580: Status: Patch Available (was: Open) > Make the HBaseMiniCluster compliant with Kerberos > - > > Key: HBASE-14580 > URL: https://issues.apache.org/jira/browse/HBASE-14580 > Project: HBase > Issue Type: Improvement > Components: security, test >Affects Versions: 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon > Fix For: 2.0.0 > > Attachments: hbase-14580.v2.patch, patch-14580.v1.patch > > > Whne using MiniKDC and the minicluster in a unit test, there is a conflict > causeed by HBaseTestingUtility: > {code} > public static User getDifferentUser(final Configuration c, > final String differentiatingSuffix) > throws IOException { >// snip > String username = User.getCurrent().getName() + > differentiatingSuffix; < problem here > User user = User.createUserForTesting(c, username, > new String[]{"supergroup"}); > return user; > } > {code} > This creates users like securedUser/localh...@example.com.hfs.0, and this > does not work. > My fix is to return the current user when Kerberos is set. I don't think that > there is another option (any other opinion?). However this user is not in a > group so we have logs like 'WARN [IPC Server handler 9 on 61366] > security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) > - No groups available for user securedUser' I'm not sure of its impact. > [~apurtell], what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE
[ https://issues.apache.org/jira/browse/HBASE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952250#comment-14952250 ] Nicolas Liochon commented on HBASE-14579: - I'm not sure I understand correctly the test patch script. Can I just change the property {code} # All supported Hadoop versions that we want to test the compilation with HADOOP2_VERSIONS="2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1" {code} to {code} HADOOP2_VERSIONS="2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1" {code} Or is there is a risk to hide a problem a patch could cause to the 0.98 release (and even the 0.94)? We will need to update the matrix in the hbase book as well... > Users authenticated with KERBEROS are recorded as being authenticated with > SIMPLE > - > > Key: HBASE-14579 > URL: https://issues.apache.org/jira/browse/HBASE-14579 > Project: HBase > Issue Type: Bug > Components: security >Affects Versions: 1.0.0, 1.2.0, 0.98.15 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Minor > Fix For: 2.0.0 > > Attachments: hbase-14579.patch > > > That's the HBase version of HADOOP-10683. > We see: > ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful > for securedUser/localh...@example.com (auth:SIMPLE)?? > while we would like to see: > ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful > for securedUser/localh...@example.com (auth:KERBEROS)?? > The fix is simple, but it means we need hadoop 2.5+. > There is also a lot of cases where HBase calls "createUser" w/o specifying > the authentication method... I don"'t have the solution for these ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos
[ https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-14580: Status: Open (was: Patch Available) > Make the HBaseMiniCluster compliant with Kerberos > - > > Key: HBASE-14580 > URL: https://issues.apache.org/jira/browse/HBASE-14580 > Project: HBase > Issue Type: Improvement > Components: security, test >Affects Versions: 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon > Fix For: 2.0.0 > > Attachments: patch-14580.v1.patch > > > Whne using MiniKDC and the minicluster in a unit test, there is a conflict > causeed by HBaseTestingUtility: > {code} > public static User getDifferentUser(final Configuration c, > final String differentiatingSuffix) > throws IOException { >// snip > String username = User.getCurrent().getName() + > differentiatingSuffix; < problem here > User user = User.createUserForTesting(c, username, > new String[]{"supergroup"}); > return user; > } > {code} > This creates users like securedUser/localh...@example.com.hfs.0, and this > does not work. > My fix is to return the current user when Kerberos is set. I don't think that > there is another option (any other opinion?). However this user is not in a > group so we have logs like 'WARN [IPC Server handler 9 on 61366] > security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) > - No groups available for user securedUser' I'm not sure of its impact. > [~apurtell], what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos
[ https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952241#comment-14952241 ] Nicolas Liochon commented on HBASE-14580: - > I think it's just a log spam issue – see HADOOP-12450. Oh, yeah. Thanks for the pointer. > Instead of hard-coding the config values, can you use > User.isHBaseSecurityEnabled(c)? Yes, you're right. I updated the patch. > The username suffixes were fed into the data dirs used by each DN/RS's for a > "distributed" minicluster setup > So, as I understand it, that would not be an issue here are Kerberos would > only be supported with a single node setup? I'm not sure here: I don't see why the user name is needed in the data dirs. But in any case, this patch does not break anything, as the suffix approach clashes with the kerberos realm... > As Gary said, this is harmless unless you're test depends on a valid group > membership. Thanks. Thanks for the reviews, all. I will commit the v2 if hadoop-qa passes with the v2. > Make the HBaseMiniCluster compliant with Kerberos > - > > Key: HBASE-14580 > URL: https://issues.apache.org/jira/browse/HBASE-14580 > Project: HBase > Issue Type: Improvement > Components: security, test >Affects Versions: 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon > Fix For: 2.0.0 > > Attachments: patch-14580.v1.patch > > > Whne using MiniKDC and the minicluster in a unit test, there is a conflict > causeed by HBaseTestingUtility: > {code} > public static User getDifferentUser(final Configuration c, > final String differentiatingSuffix) > throws IOException { >// snip > String username = User.getCurrent().getName() + > differentiatingSuffix; < problem here > User user = User.createUserForTesting(c, username, > new String[]{"supergroup"}); > return user; > } > {code} > This creates users like securedUser/localh...@example.com.hfs.0, and this > does not work. > My fix is to return the current user when Kerberos is set. I don't think that > there is another option (any other opinion?). However this user is not in a > group so we have logs like 'WARN [IPC Server handler 9 on 61366] > security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) > - No groups available for user securedUser' I'm not sure of its impact. > [~apurtell], what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14479) Apply the Leader/Followers pattern to RpcServer's Reader
[ https://issues.apache.org/jira/browse/HBASE-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952235#comment-14952235 ] Nicolas Liochon commented on HBASE-14479: - Yeah, I tried to get rid of this array of readers a while back, but I didn't push the patch because I didn't get any significant result. Nice work, [~ikeda] > Apply the Leader/Followers pattern to RpcServer's Reader > > > Key: HBASE-14479 > URL: https://issues.apache.org/jira/browse/HBASE-14479 > Project: HBase > Issue Type: Improvement > Components: IPC/RPC, Performance >Reporter: Hiroshi Ikeda >Assignee: Hiroshi Ikeda >Priority: Minor > Attachments: HBASE-14479-V2 (1).patch, HBASE-14479-V2.patch, > HBASE-14479-V2.patch, HBASE-14479.patch, gc.png, gets.png, io.png, median.png > > > {{RpcServer}} uses multiple selectors to read data for load distribution, but > the distribution is just done by round-robin. It is uncertain, especially for > long run, whether load is equally divided and resources are used without > being wasted. > Moreover, multiple selectors may cause excessive context switches which give > priority to low latency (while we just add the requests to queues), and it is > possible to reduce throughput of the whole server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14521) Unify the semantic of hbase.client.retries.number
[ https://issues.apache.org/jira/browse/HBASE-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950283#comment-14950283 ] Nicolas Liochon commented on HBASE-14521: - It's a good point: the existing implementation is confusing. The patch looks good. It contains a lot of cleanup that will make the code easier to read (thanks, Yu!) I'm surprised by this: {code} @@ -137,7 +137,6 @@ public class TestAsyncProcess { AsyncRequestFutureImpl r = super.createAsyncRequestFuture( DUMMY_TABLE, actions, nonceGroup, pool, callback, results, needResults); allReqs.add(r); - callsCt.incrementAndGet(); <=== We should continue to count the calls, no? return r; } {code} Note that setting retries to zero is most of the time an error as we can have a retry in many cases, for example iif the client cache is not up to date (contains the wrong region server for a region). > Unify the semantic of hbase.client.retries.number > - > > Key: HBASE-14521 > URL: https://issues.apache.org/jira/browse/HBASE-14521 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14, 1.1.2 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14521.patch, HBASE-14521_v2.patch, > HBASE-14521_v3.patch > > > From name of the _hbase.client.retries.number_ property, it should be the > number of maximum *retries*, or say if we set the property to 1, there should > be 2 attempts in total. However, there're two different semantics when using > it in current code base. > For example, in ConnectionImplementation#locateRegionInMeta: > {code} > int localNumRetries = (retry ? numTries : 1); > for (int tries = 0; true; tries++) { > if (tries >= localNumRetries) { > throw new NoServerForRegionException("Unable to find region for " > + Bytes.toStringBinary(row) + " in " + tableName + > " after " + numTries + " tries."); > } > {code} > the retries number is regarded as max times for *tries* > While in RpcRetryingCallerImpl#callWithRetries: > {code} > for (int tries = 0;; tries++) { > long expectedSleep; > try { > callable.prepare(tries != 0); // if called with false, check table > status on ZK > interceptor.intercept(context.prepare(callable, tries)); > return callable.call(getRemainingTime(callTimeout)); > } catch (PreemptiveFastFailException e) { > throw e; > } catch (Throwable t) { > ... > if (tries >= retries - 1) { > throw new RetriesExhaustedException(tries, exceptions); > } > {code} > it's regarded as exactly for *REtry* (try a call first with no condition and > then check whether to retry or exceeds maximum retry number) > This inconsistency will cause misunderstanding in usage, such as one of our > customer set the property to zero expecting one single call but finally > received NoServerForRegionException. > We should unify the semantic of the property, and I suggest to keep the > original one for retry rather than total tries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11590) use a specific ThreadPoolExecutor
[ https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948255#comment-14948255 ] Nicolas Liochon commented on HBASE-11590: - Hey [~saint@gmail.com] Attached some tests comparing ThreadPoolExecutor (the one we use currently), ForkJoinPool (available in jdk1.7+) and LifoThreadPoolExecutorSQP (the one mentionned in the stackoverflow discussion) . - the critical use case is: 1) do a table.batch(puts) that needs a lot of threads 2) then do a loop { table.get(get) }, this needs a single thread but each call may use any of the threads in the pool, resetting the keepalive timeout => they may never expire. ThreadPoolExecutor is actually worse it tries to create a thread even if there are already enough threads available. See the code for the details, but here is the interesting case with a thread pools of 1000 threads while we need only 1 thread. {quote} * ForkJoinPool maxThread=1000, immediateGet=true, LOOP=200 * ForkJoinPool total=68942ms * ForkJoinPool step1=68657ms * ForkJoinPool step2=284ms * ForkJoinPool threads: 6, 1006, 456, 6 <=== we have 456 threads instead of the ideal 7 * ThreadPoolExecutor maxThread=1000, immediateGet=true, LOOP=200 * ThreadPoolExecutor total=107449ms <=== very slow * ThreadPoolExecutor step1=107145ms * ThreadPoolExecutor step2=304ms * ThreadPoolExecutor threads: 6, 1006, 889, 6 <== keeps nearly all the threads - * LifoThreadPoolExecutorSQP maxThread=1000, immediateGet=true, LOOP=200 * LifoThreadPoolExecutorSQP total=4805ms < quite fast * LifoThreadPoolExecutorSQP step1=4803ms * LifoThreadPoolExecutorSQP step2=1ms * LifoThreadPoolExecutorSQP threads: 6, 248, 8, 6 <== removes the threads quickly {quote} You may want to rerun the tests to see if you reproduce them. I included my results in the code. - The root issue is that we need a LIFO poll/lock but it does not exists. - LifoThreadPoolExecutorSQP solves this with a LIFO queues for the threads waiting for work. But it comes with a LGPL license, and the code is not trivial. A bug there could be difficult to find. It is however incredible to see how faster/better it is compared to the other pools. - ForkJoinPool is better then TPE. It's not as good as LifoThreadPoolExecutorSQP, but it's much closer to what we need. It's available in the JDK 1.7 it looks like a safe bet for HBase 1.+ ForkJoinPool: threads are created only if there are waiting tasks. They expire after 2seconds (it's hardcoded in the jdk code). They are not LIFO, and the task allocation is not as fast as the one in LifoThreadPoolExecutorSQP. => Proposition: Let's migrate to ForkJoinPool. If someone has time to try LifoThreadPoolExecutorSQP it can be interesting in the future (if the license can be changed)... > use a specific ThreadPoolExecutor > - > > Key: HBASE-11590 > URL: https://issues.apache.org/jira/browse/HBASE-11590 > Project: HBase > Issue Type: Bug > Components: Client, Performance >Affects Versions: 1.0.0, 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Minor > Fix For: 2.0.0 > > Attachments: tp.patch > > > The JDK TPE creates all the threads in the pool. As a consequence, we create > (by default) 256 threads even if we just need a few. > The attached TPE create threads only if we have something in the queue. > On a PE test with replica on, it improved the 99 latency percentile by 5%. > Warning: there are likely some race conditions, but I'm posting it here > because there is may be an implementation available somewhere we can use, or > a good reason not to do that. So feedback welcome as usual. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11590) use a specific ThreadPoolExecutor
[ https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-11590: Attachment: ExecutorServiceTest.java UnitQueuePU.java UnitQueueP.java LifoThreadPoolExecutorSQP.java > use a specific ThreadPoolExecutor > - > > Key: HBASE-11590 > URL: https://issues.apache.org/jira/browse/HBASE-11590 > Project: HBase > Issue Type: Bug > Components: Client, Performance >Affects Versions: 1.0.0, 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Minor > Fix For: 2.0.0 > > Attachments: ExecutorServiceTest.java, > LifoThreadPoolExecutorSQP.java, UnitQueueP.java, UnitQueuePU.java, tp.patch > > > The JDK TPE creates all the threads in the pool. As a consequence, we create > (by default) 256 threads even if we just need a few. > The attached TPE create threads only if we have something in the queue. > On a PE test with replica on, it improved the 99 latency percentile by 5%. > Warning: there are likely some race conditions, but I'm posting it here > because there is may be an implementation available somewhere we can use, or > a good reason not to do that. So feedback welcome as usual. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE
[ https://issues.apache.org/jira/browse/HBASE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-14579: Hadoop Flags: Incompatible change Status: Patch Available (was: Open) > Users authenticated with KERBEROS are recorded as being authenticated with > SIMPLE > - > > Key: HBASE-14579 > URL: https://issues.apache.org/jira/browse/HBASE-14579 > Project: HBase > Issue Type: Bug > Components: security >Affects Versions: 0.98.15, 1.0.0, 1.2.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Minor > Fix For: 2.0.0 > > Attachments: hbase-14579.patch > > > That's the HBase version of HADOOP-10683. > The fix is simple, but it means we need hadoop 2.5+. > There is also a lot of cases where HBase calls "createUser" w/o specifying > the authentication method... I don"'t have the solution for these ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE
[ https://issues.apache.org/jira/browse/HBASE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-14579: Attachment: hbase-14579.patch > Users authenticated with KERBEROS are recorded as being authenticated with > SIMPLE > - > > Key: HBASE-14579 > URL: https://issues.apache.org/jira/browse/HBASE-14579 > Project: HBase > Issue Type: Bug > Components: security >Affects Versions: 1.0.0, 1.2.0, 0.98.15 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Minor > Fix For: 2.0.0 > > Attachments: hbase-14579.patch > > > That's the HBase version of HADOOP-10683. > The fix is simple, but it means we need hadoop 2.5+. > There is also a lot of cases where HBase calls "createUser" w/o specifying > the authentication method... I don"'t have the solution for these ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos
Nicolas Liochon created HBASE-14580: --- Summary: Make the HBaseMiniCluster compliant with Kerberos Key: HBASE-14580 URL: https://issues.apache.org/jira/browse/HBASE-14580 Project: HBase Issue Type: Improvement Components: security, test Affects Versions: 2.0.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 2.0.0 Whne using MiniKDC and the minicluster in a unit test, there is a conflict causeed by HBaseTestingUtility: {code} public static User getDifferentUser(final Configuration c, final String differentiatingSuffix) throws IOException { // snip String username = User.getCurrent().getName() + differentiatingSuffix; < problem here User user = User.createUserForTesting(c, username, new String[]{"supergroup"}); return user; } {code} This creates users like securedUser/localh...@example.com.hfs.0, and this does not work. My fix is to return the current user when Kerberos is set. I don't think that there is another option (any other opinion?). However this user is not in a group so we have logs like 'WARN [IPC Server handler 9 on 61366] security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user securedUser' I'm not sure of its impact. [~apurtell], what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE
[ https://issues.apache.org/jira/browse/HBASE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-14579: Description: That's the HBase version of HADOOP-10683. We see: ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful for securedUser/localh...@example.com (auth:SIMPLE)?? while we would like to see: ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful for securedUser/localh...@example.com (auth:KERBEROS)?? The fix is simple, but it means we need hadoop 2.5+. There is also a lot of cases where HBase calls "createUser" w/o specifying the authentication method... I don"'t have the solution for these ones. was: That's the HBase version of HADOOP-10683. The fix is simple, but it means we need hadoop 2.5+. There is also a lot of cases where HBase calls "createUser" w/o specifying the authentication method... I don"'t have the solution for these ones. > Users authenticated with KERBEROS are recorded as being authenticated with > SIMPLE > - > > Key: HBASE-14579 > URL: https://issues.apache.org/jira/browse/HBASE-14579 > Project: HBase > Issue Type: Bug > Components: security >Affects Versions: 1.0.0, 1.2.0, 0.98.15 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Minor > Fix For: 2.0.0 > > Attachments: hbase-14579.patch > > > That's the HBase version of HADOOP-10683. > We see: > ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful > for securedUser/localh...@example.com (auth:SIMPLE)?? > while we would like to see: > ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful > for securedUser/localh...@example.com (auth:KERBEROS)?? > The fix is simple, but it means we need hadoop 2.5+. > There is also a lot of cases where HBase calls "createUser" w/o specifying > the authentication method... I don"'t have the solution for these ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE
[ https://issues.apache.org/jira/browse/HBASE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948684#comment-14948684 ] Nicolas Liochon commented on HBASE-14579: - > The patch appears to cause mvn compile goal to fail with Hadoop version 2.4.0. Yes. Is that an issue for the 2.0 branch? > Users authenticated with KERBEROS are recorded as being authenticated with > SIMPLE > - > > Key: HBASE-14579 > URL: https://issues.apache.org/jira/browse/HBASE-14579 > Project: HBase > Issue Type: Bug > Components: security >Affects Versions: 1.0.0, 1.2.0, 0.98.15 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Minor > Fix For: 2.0.0 > > Attachments: hbase-14579.patch > > > That's the HBase version of HADOOP-10683. > We see: > ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful > for securedUser/localh...@example.com (auth:SIMPLE)?? > while we would like to see: > ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful > for securedUser/localh...@example.com (auth:KERBEROS)?? > The fix is simple, but it means we need hadoop 2.5+. > There is also a lot of cases where HBase calls "createUser" w/o specifying > the authentication method... I don"'t have the solution for these ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos
[ https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-14580: Status: Patch Available (was: Open) > Make the HBaseMiniCluster compliant with Kerberos > - > > Key: HBASE-14580 > URL: https://issues.apache.org/jira/browse/HBASE-14580 > Project: HBase > Issue Type: Improvement > Components: security, test >Affects Versions: 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon > Fix For: 2.0.0 > > Attachments: patch-14580.v1.patch > > > Whne using MiniKDC and the minicluster in a unit test, there is a conflict > causeed by HBaseTestingUtility: > {code} > public static User getDifferentUser(final Configuration c, > final String differentiatingSuffix) > throws IOException { >// snip > String username = User.getCurrent().getName() + > differentiatingSuffix; < problem here > User user = User.createUserForTesting(c, username, > new String[]{"supergroup"}); > return user; > } > {code} > This creates users like securedUser/localh...@example.com.hfs.0, and this > does not work. > My fix is to return the current user when Kerberos is set. I don't think that > there is another option (any other opinion?). However this user is not in a > group so we have logs like 'WARN [IPC Server handler 9 on 61366] > security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) > - No groups available for user securedUser' I'm not sure of its impact. > [~apurtell], what do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE
Nicolas Liochon created HBASE-14579: --- Summary: Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE Key: HBASE-14579 URL: https://issues.apache.org/jira/browse/HBASE-14579 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.98.15, 1.0.0, 1.2.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Minor Fix For: 2.0.0 That's the HBase version of HADOOP-10683. The fix is simple, but it means we need hadoop 2.5+. There is also a lot of cases where HBase calls "createUser" w/o specifying the authentication method... I don"'t have the solution for these ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11590) use a specific ThreadPoolExecutor
[ https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804373#comment-14804373 ] Nicolas Liochon commented on HBASE-11590: - If we cut down the timeout, it's more or less equivalent of not having a thread pool at all. One of the things I don't like in many solutions (the TPE I wrote myself included) is that we have a race condition: we may create a thread even if it's not needed. I'm off for 3 days, but I will try to find a reasonable solution next week. > use a specific ThreadPoolExecutor > - > > Key: HBASE-11590 > URL: https://issues.apache.org/jira/browse/HBASE-11590 > Project: HBase > Issue Type: Bug > Components: Client, Performance >Affects Versions: 1.0.0, 2.0.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Minor > Fix For: 2.0.0 > > Attachments: tp.patch > > > The JDK TPE creates all the threads in the pool. As a consequence, we create > (by default) 256 threads even if we just need a few. > The attached TPE create threads only if we have something in the queue. > On a PE test with replica on, it improved the 99 latency percentile by 5%. > Warning: there are likely some race conditions, but I'm posting it here > because there is may be an implementation available somewhere we can use, or > a good reason not to do that. So feedback welcome as usual. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14768844#comment-14768844 ] Nicolas Liochon commented on HBASE-10449: - What's happening for the expire is: - we have a 60s timeout with 256 seconds. - let's imagine we have 1 query per second. We will still have 60 threads, because each new request will create a new thread until we reach coreSize. As the timeout is 60s, the oldest threads will expire after 60s. I haven't double-checked, but I believe that the threads are needed because of the old i/o pattern. So we do need a max in the x00 range (it's like this since 0.90 at least. In theory, it's good for small cluster (100 nodes), but not as good if the cluster is composed of thousands of nodes) I did actually spent some time on this a year ago, in HBASE-11590. @stack, what do you think of the approach? I can finish the work I started there. But I will need a review. There are also some ideas/hacks in http://stackoverflow.com/questions/19528304/how-to-get-the-threadpoolexecutor-to-increase-threads-to-max-before-queueing/19528305#19528305 I haven't reviewed them yet. > Wrong execution pool configuration in HConnectionManager > > > Key: HBASE-10449 > URL: https://issues.apache.org/jira/browse/HBASE-10449 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Critical > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBASE-10449.v1.patch > > > There is a confusion in the configuration of the pool. The attached patch > fixes this. This may change the client performances, as we were using a > single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14790660#comment-14790660 ] Nicolas Liochon commented on HBASE-10449: - > I was thinking that we'd go to core size – say # of cores – and then if one > request a second, we'd just stay at core size because there would be a free > thread when the request-per-second came in (assuming request took a good deal > < a second). I expect that if we have more than coreSize calls in timeout (256 vs 60 seconds in our case) then we always have coreSize threads. > Didn't we have a mock server somewhere such that we could standup a client > with no friction and watch it in operation? I thought we'd make such a > beast Yep, you built one, we used it when we looked at the perf issues in the client (the protobuf nightmare if you remember ;:-)). > Wrong execution pool configuration in HConnectionManager > > > Key: HBASE-10449 > URL: https://issues.apache.org/jira/browse/HBASE-10449 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Critical > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBASE-10449.v1.patch > > > There is a confusion in the configuration of the pool. The attached patch > fixes this. This may change the client performances, as we were using a > single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791129#comment-14791129 ] Nicolas Liochon commented on HBASE-10449: - The algo for the ThreadPoolExecutor is: onNewTask(){ if (currentSize < coreSize) createNewThread() else reuseThread() } And there is a timeout for each thread. So if we do a coreSize of 2, a time of 20s, and a query every 15s, we have: 0s query1: create thread1, poolSize=1 15s query2: create thread2, poolSize=2 20s close thread1, poolSize=1 30s query3: create thread3, poolSize=2 35s: close thread2, poolSize=1 45s: query4: create thread4, poolSize=2 And so on. So even if we have 1 query each 15s, we have 2 threads in the pool nearly all the time. > Yes. Smile. Need to revive it for here and for doing client timeouts I found the code in TestClientNoCluster#run , ready to be reused! I think we need to go for a hack like in Stackoverflow or for a different implementation for TPE like HBASE-11590... > Wrong execution pool configuration in HConnectionManager > > > Key: HBASE-10449 > URL: https://issues.apache.org/jira/browse/HBASE-10449 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Critical > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBASE-10449.v1.patch > > > There is a confusion in the configuration of the pool. The attached patch > fixes this. This may change the client performances, as we were using a > single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791145#comment-14791145 ] Nicolas Liochon commented on HBASE-10449: - It's the former: in this case, the queries are queued. A new thread will be created only when the queue is full. Then, if we reach maxThreads and the queue is full the new tasks are rejected. In our case the queue is nearly unbounded, so we stay with corePoolSize. > Wrong execution pool configuration in HConnectionManager > > > Key: HBASE-10449 > URL: https://issues.apache.org/jira/browse/HBASE-10449 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Critical > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBASE-10449.v1.patch > > > There is a confusion in the configuration of the pool. The attached patch > fixes this. This may change the client performances, as we were using a > single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746919#comment-14746919 ] Nicolas Liochon commented on HBASE-10449: - Actually I'm having two doubts: - the core threads should already have this timeout, no. We should not see 256 threads, because they should expire already - IIRC, this thread pool is used when connecting to the various regionserver, and they block until they have an answer. So with 4 core threads (for example), it means that if we do a multi we contact 4 servers simultaneously at most. The threads are not really using CPUs, they're waiting (old i/o style). BUt may be it has changed? > Wrong execution pool configuration in HConnectionManager > > > Key: HBASE-10449 > URL: https://issues.apache.org/jira/browse/HBASE-10449 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Critical > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBASE-10449.v1.patch > > > There is a confusion in the configuration of the pool. The attached patch > fixes this. This may change the client performances, as we were using a > single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746897#comment-14746897 ] Nicolas Liochon commented on HBASE-10449: - As I understand the doc, if we do that we create maxThreads and then reject all the tasks. Not really useful. But the patch in HBASE-14433 seems ok: - we create up to core threads (Runtime.getRuntime().availableProcessors()). If we have 10 tasks in parallel we still have Runtime.getRuntime().availableProcessors() threads. - the expire quite quickly (because we do allowCoreThreadTimeOut(true);) May be we should set maxThreads to coreThreads as well and increase HConstants.DEFAULT_HBASE_CLIENT_MAX_TOTAL_TASKS. But I'm +1 with HBASE-14433 as it is now. > Wrong execution pool configuration in HConnectionManager > > > Key: HBASE-10449 > URL: https://issues.apache.org/jira/browse/HBASE-10449 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Critical > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBASE-10449.v1.patch > > > There is a confusion in the configuration of the pool. The attached patch > fixes this. This may change the client performances, as we were using a > single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746863#comment-14746863 ] Nicolas Liochon commented on HBASE-10449: - Sorry for the delay, I'm seeing this now only. Let me have a look. > Wrong execution pool configuration in HConnectionManager > > > Key: HBASE-10449 > URL: https://issues.apache.org/jira/browse/HBASE-10449 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Critical > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBASE-10449.v1.patch > > > There is a confusion in the configuration of the pool. The attached patch > fixes this. This may change the client performances, as we were using a > single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager
[ https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746875#comment-14746875 ] Nicolas Liochon commented on HBASE-10449: - > Where does 'Create a single thread, queue all the tasks for this thread.' > come from? This is what HBASE-9917 actually implemented: with the ThreadPoolExecutor if the task queue is unbounded, it does not create new threads: From: http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html If fewer than corePoolSize threads are running, the Executor always prefers adding a new thread rather than queuing. If corePoolSize or more threads are running, the Executor always prefers queuing a request rather than adding a new thread. If a request cannot be queued, a new thread is created unless this would exceed maximumPoolSize, in which case, the task will be rejected. But having less than 256 threads is fine. This was just restoring the previous value. > Wrong execution pool configuration in HConnectionManager > > > Key: HBASE-10449 > URL: https://issues.apache.org/jira/browse/HBASE-10449 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.98.0, 0.99.0, 0.96.1.1 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Critical > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: HBASE-10449.v1.patch > > > There is a confusion in the configuration of the pool. The attached patch > fixes this. This may change the client performances, as we were using a > single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13865) Increase the default value for hbase.hregion.memstore.block.multipler from 2 to 4 (part 2)
[ https://issues.apache.org/jira/browse/HBASE-13865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-13865: Release Note: Increase default hbase.hregion.memstore.block.multiplier from 2 to 4 in the code to match the default value in the config files. (was: Increase hbase.hregion.memstore.block.multiplier from 2 to 4) Increase the default value for hbase.hregion.memstore.block.multipler from 2 to 4 (part 2) -- Key: HBASE-13865 URL: https://issues.apache.org/jira/browse/HBASE-13865 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 2.0.0 Reporter: Vladimir Rodionov Assignee: Gabor Liptak Priority: Trivial Fix For: 2.0.0, 0.98.14, 1.3.0, 1.2.1, 1.0.3, 1.1.3 Attachments: HBASE-13865.1.patch, HBASE-13865.2.patch, HBASE-13865.2.patch Its 4 in the book and 2 in a current master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13865) Increase the default value for hbase.hregion.memstore.block.multipler from 2 to 4 (part 2)
[ https://issues.apache.org/jira/browse/HBASE-13865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659909#comment-14659909 ] Nicolas Liochon commented on HBASE-13865: - Hey Nick :-) If I'm not mistaken (I'm always confused by the various config files...), the patch should not change the behavior for most common deployments, because the value is set to 4 in the hbase-default.xml (and for the users who set it to 2: the xml config is used first, it won't change for them as well). So: - The patch is a good cleanup imho - It's safe as it does not change the behavior. +1 I updated the release notes. Increase the default value for hbase.hregion.memstore.block.multipler from 2 to 4 (part 2) -- Key: HBASE-13865 URL: https://issues.apache.org/jira/browse/HBASE-13865 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 2.0.0 Reporter: Vladimir Rodionov Assignee: Gabor Liptak Priority: Trivial Fix For: 2.0.0, 0.98.14, 1.3.0, 1.2.1, 1.0.3, 1.1.3 Attachments: HBASE-13865.1.patch, HBASE-13865.2.patch, HBASE-13865.2.patch Its 4 in the book and 2 in a current master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13865) Increase the default value for hbase.hregion.memstore.block.multipler from 2 to 4 (part 2)
[ https://issues.apache.org/jira/browse/HBASE-13865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-13865: Component/s: (was: documentation) regionserver Increase the default value for hbase.hregion.memstore.block.multipler from 2 to 4 (part 2) -- Key: HBASE-13865 URL: https://issues.apache.org/jira/browse/HBASE-13865 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 2.0.0 Reporter: Vladimir Rodionov Assignee: Gabor Liptak Priority: Trivial Fix For: 2.0.0, 0.98.14, 1.3.0, 1.2.1, 1.0.3, 1.1.3 Attachments: HBASE-13865.1.patch, HBASE-13865.2.patch, HBASE-13865.2.patch Its 4 in the book and 2 in a current master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610738#comment-14610738 ] Nicolas Liochon commented on HBASE-13992: - +1 as well for me. How does it work for the binaries version, will we have to enter into the scala game, i.e. hbase-spark-2_10? What about the spark version? The spark-hadoop version? Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Reporter: Ted Malaska Assignee: Ted Malaska Labels: spark This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13647) Default value for hbase.client.operation.timeout is too high
[ https://issues.apache.org/jira/browse/HBASE-13647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566396#comment-14566396 ] Nicolas Liochon commented on HBASE-13647: - I would recommend 20 minutes. The idea is that if a machine fails and the recovery needs a hdfs timeout (10:30 mins) we have some extra time. As well, iirc with the default retries number and pause we around 15 minutes today. It seems better to default above that. I kept the operation timeout in the htable stuff (but it's not me who put it there :-) ), but now I wonder if we should not just remove it from this code path: it overlaps with the number of retries, and does it add that much value? Default value for hbase.client.operation.timeout is too high Key: HBASE-13647 URL: https://issues.apache.org/jira/browse/HBASE-13647 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 1.0.1, 0.98.13, 1.2.0, 1.1.1 Reporter: Andrey Stepachev Assignee: Andrey Stepachev Priority: Blocker Fix For: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1 Attachments: HBASE-13647.patch, HBASE-13647.v2.patch Default value for hbase.client.operation.timeout is too high, it is LONG.Max. That value will block any service calls to coprocessor endpoints indefinitely. Should we introduce better default value for that? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-12116) Hot contention spots; writing
[ https://issues.apache.org/jira/browse/HBASE-12116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496031#comment-14496031 ] Nicolas Liochon edited comment on HBASE-12116 at 4/15/15 10:50 AM: --- I had a look at crc a while ago. My understanding back then was that there are specific instruction in x86 processors to calculate crc, unfortunately a little bit different than the standard crc32. When I was looking at Intel/Hadoop roadmap 2 years ago, it looked like Intel was planning to do the changes in hadoop to use the hw one. There is some info here: http://www.strchr.com/crc32_popcnt was (Author: nkeywal): I had a look at crc a while ago. My understanding back then was that there are specific instruction in x86 processors to calculate crc, unfortunately a little bit different than the standard crc32. When I was looking at Intel/Hadoop roadmap 2 years ago, it looked like Intel was planning to do the changes in hadoop to use the hw one. There are some info here: http://www.strchr.com/crc32_popcnt Hot contention spots; writing - Key: HBASE-12116 URL: https://issues.apache.org/jira/browse/HBASE-12116 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Attachments: 12116.checkForReplicas.txt, 12116.stringify.and.cache.scanner.maxsize.txt, 12116.txt, Screen Shot 2014-09-29 at 5.12.51 PM.png, Screen Shot 2014-09-30 at 10.39.34 PM.png, Screen Shot 2015-04-13 at 2.03.05 PM.png, perf.write3.svg, perf.write4.svg Playing with flight recorder, here are some write-time contentious synchronizations/locks (picture coming) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12116) Hot contention spots; writing
[ https://issues.apache.org/jira/browse/HBASE-12116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496031#comment-14496031 ] Nicolas Liochon commented on HBASE-12116: - I had a look at crc a while ago. My understanding back then was that there are specific instruction in x86 processors to calculate crc, unfortunately a little bit different than the standard crc32. When I was looking at Intel/Hadoop roadmap 2 years ago, it looked like Intel was planning to do the changes in hadoop to use the hw one. There are some info here: http://www.strchr.com/crc32_popcnt Hot contention spots; writing - Key: HBASE-12116 URL: https://issues.apache.org/jira/browse/HBASE-12116 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Attachments: 12116.checkForReplicas.txt, 12116.stringify.and.cache.scanner.maxsize.txt, 12116.txt, Screen Shot 2014-09-29 at 5.12.51 PM.png, Screen Shot 2014-09-30 at 10.39.34 PM.png, Screen Shot 2015-04-13 at 2.03.05 PM.png, perf.write3.svg, perf.write4.svg Playing with flight recorder, here are some write-time contentious synchronizations/locks (picture coming) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13272) Get.setClosestRowBefore() breaks specific column Get
[ https://issues.apache.org/jira/browse/HBASE-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371234#comment-14371234 ] Nicolas Liochon commented on HBASE-13272: - The HTable#getRowOrBefore does a get#setClosestRowBefore(true); Yeah, I should have deprecated both. I think setClosestRowBefore is really old, but may be I'm wrong. From the code - It seems it's not used in HBase now - I have not found a test as well. - it seems it does not work if you're hitting a region boundary (i.e. the closest_row_before is in another region). - It's limited to single family as well (RSRpcServices.java) get ClosestRowBefore supports one and only one family now, not + get.getColumnCount() + families); I think this can be replaced by the reverseScanner, hopefully reverseScanner covers more usages. My guess is that it leaked getRowOrBefore was purely internal and got deprecated in 0.92: * @deprecated As of version 0.92 this method is deprecated without * replacement. Since version 0.96+, you can use reversed scan. * getRowOrBefore is used internally to find entries in hbase:meta and makes * various assumptions about the table (which are true for hbase:meta but not * in general) to be efficient. My guess is that Get#setClosestRowBefore was there only for the meta table and has been forgotten on the deprecation path. Now I'm not against a fix, we're open source :-) and anyway we can't remove the feature in less than two hbase releases. But from the client code point of view using the reverse scanner seems safer. imho setClosestRowBefore should be deprecated as soon as possible: very ad-hoc, not used in the internal code, not tested, fails on cross boundaries calls, fails on multiple families, and this jira as a bounty: these are good reasons imho. Get.setClosestRowBefore() breaks specific column Get Key: HBASE-13272 URL: https://issues.apache.org/jira/browse/HBASE-13272 Project: HBase Issue Type: Bug Reporter: stack Priority: Trivial Via [~larsgeorge] Get.setClosestRowBefore() is breaking a specific Get that specifies a column. If you set the latter to true it will return the _entire_ row! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13286) Minimum timeout for a rpc call could be 1 ms instead of 2 seconds
[ https://issues.apache.org/jira/browse/HBASE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-13286: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) committed to master, thanks for the reviews! Minimum timeout for a rpc call could be 1 ms instead of 2 seconds - Key: HBASE-13286 URL: https://issues.apache.org/jira/browse/HBASE-13286 Project: HBase Issue Type: Improvement Components: Client Affects Versions: 1.0.0, 0.98.12 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Minor Fix For: 1.1.0 Attachments: 13286.patch There is a check in the client to be sure that we don't use a timeout of zero (i.e. infinite). This includes setting the minimal time out for a rpc timeout to 2 seconds. However, it makes sense for some calls (typically gets going to the cache) to have much lower timeouts. So it's better to do the check vs. zero but with a minimal timeout of 1. I fixed a typo a wrong comment in this patch as well. I don't understand this code: {code} // t could be a RemoteException so go around again. translateException(t); // We don't use the result? {code} but may be it's good. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13272) Get.setClosestRowBefore() breaks specific column Get
[ https://issues.apache.org/jira/browse/HBASE-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368872#comment-14368872 ] Nicolas Liochon commented on HBASE-13272: - On the other hand if it's broken it's not that useful to keep it :-) Get.setClosestRowBefore() breaks specific column Get Key: HBASE-13272 URL: https://issues.apache.org/jira/browse/HBASE-13272 Project: HBase Issue Type: Bug Reporter: stack Priority: Trivial Via [~larsgeorge] Get.setClosestRowBefore() is breaking a specific Get that specifies a column. If you set the latter to true it will return the _entire_ row! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13272) Get.setClosestRowBefore() breaks specific column Get
[ https://issues.apache.org/jira/browse/HBASE-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368871#comment-14368871 ] Nicolas Liochon commented on HBASE-13272: - I'm +1 for the suppression (I though I deprecated it already, may be I'm wrong or I missed some of the interfaces), but it needs to be done carefully: we need to keep it on the server/protobuf for a while as we want the old clients to be able to speak to the new servers. Get.setClosestRowBefore() breaks specific column Get Key: HBASE-13272 URL: https://issues.apache.org/jira/browse/HBASE-13272 Project: HBase Issue Type: Bug Reporter: stack Priority: Trivial Via [~larsgeorge] Get.setClosestRowBefore() is breaking a specific Get that specifies a column. If you set the latter to true it will return the _entire_ row! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13188) java.lang.ArithmeticException issue in BoundedByteBufferPool.putBuffer
[ https://issues.apache.org/jira/browse/HBASE-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369261#comment-14369261 ] Nicolas Liochon commented on HBASE-13188: - [~saint@gmail.com] if you're interested, there is a ByteBufferPool in HBASE-9535. If I understand well 9535 is now irrelevant (please close it if it's the case), but may be there is some code to take there. java.lang.ArithmeticException issue in BoundedByteBufferPool.putBuffer -- Key: HBASE-13188 URL: https://issues.apache.org/jira/browse/HBASE-13188 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13188.patch Running a range scan with PE tool with 25 threads getting this error {code} java.lang.ArithmeticException: / by zero at org.apache.hadoop.hbase.io.BoundedByteBufferPool.putBuffer(BoundedByteBufferPool.java:104) at org.apache.hadoop.hbase.ipc.RpcServer$Call.done(RpcServer.java:325) at org.apache.hadoop.hbase.ipc.RpcServer$Responder.processResponse(RpcServer.java:1078) at org.apache.hadoop.hbase.ipc.RpcServer$Responder.processAllResponses(RpcServer.java:1103) at org.apache.hadoop.hbase.ipc.RpcServer$Responder.doAsyncWrite(RpcServer.java:1036) at org.apache.hadoop.hbase.ipc.RpcServer$Responder.doRunLoop(RpcServer.java:956) at org.apache.hadoop.hbase.ipc.RpcServer$Responder.run(RpcServer.java:891) {code} I checked in the trunk code also. I think the comment in the code suggests that the size will not be exact so there is a chance that it could be even 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-13286) Minimum timeout for a rpc call could be 1 ms instead of 2 seconds
[ https://issues.apache.org/jira/browse/HBASE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369735#comment-14369735 ] Nicolas Liochon edited comment on HBASE-13286 at 3/19/15 5:17 PM: -- bq. Good catch. Not really, it's me who put it a while ago (to fix the infinite timeout), and I've been questioning myself about this for a while. :-) bq. Why remove MIN_RPC_TIMEOUT though? Why not just set it to 1 instead of 2000? I thought it would make the code simpler to read. As you like, I can change it. For the stack during the tests: org.apache.oozie.test.MiniHCatServer$1.run(MiniHCatServer.java:137) It's because H10 is configured to run two builds in parallel, and this is looking for trouble. We ran with a oozie build. From what I see the findbug does not come from this patch. I will commit tomorrow my time if there is no objection. was (Author: nkeywal): bq. Good catch. Not really, it's me who put it a while ago (to fix the infinite timeout), and I've been questioning myself about this for a while. :-) bq. Why remove MIN_RPC_TIMEOUT though? Why not just set it to 1 instead of 2000? I thought it would make the code simpler to read. As you like, I can change it. For the stack during the tests: org.apache.oozie.test.MiniHCatServer$1.run(MiniHCatServer.java:137) It's because H10 is configured to run two builds in parallel, and this is looking for trouble. We ran with a oozie build. From what I see the findbug is already not from me. I will commit tomorrow my time if there is no objection. Minimum timeout for a rpc call could be 1 ms instead of 2 seconds - Key: HBASE-13286 URL: https://issues.apache.org/jira/browse/HBASE-13286 Project: HBase Issue Type: Improvement Components: Client Affects Versions: 1.0.0, 0.98.12 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Minor Fix For: 1.1.0 Attachments: 13286.patch There is a check in the client to be sure that we don't use a timeout of zero (i.e. infinite). This includes setting the minimal time out for a rpc timeout to 2 seconds. However, it makes sense for some calls (typically gets going to the cache) to have much lower timeouts. So it's better to do the check vs. zero but with a minimal timeout of 1. I fixed a typo a wrong comment in this patch as well. I don't understand this code: {code} // t could be a RemoteException so go around again. translateException(t); // We don't use the result? {code} but may be it's good. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13286) Minimum timeout for a rpc call could be 1 ms instead of 2 seconds
[ https://issues.apache.org/jira/browse/HBASE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369735#comment-14369735 ] Nicolas Liochon commented on HBASE-13286: - bq. Good catch. Not really, it's me who put it a while ago (to fix the infinite timeout), and I've been questioning myself about this for a while. :-) bq. Why remove MIN_RPC_TIMEOUT though? Why not just set it to 1 instead of 2000? I thought it would make the code simpler to read. As you like, I can change it. For the stack during the tests: org.apache.oozie.test.MiniHCatServer$1.run(MiniHCatServer.java:137) It's because H10 is configured to run two builds in parallel, and this is looking for trouble. We ran with a oozie build. From what I see the findbug is already not from me. I will commit tomorrow my time if there is no objection. Minimum timeout for a rpc call could be 1 ms instead of 2 seconds - Key: HBASE-13286 URL: https://issues.apache.org/jira/browse/HBASE-13286 Project: HBase Issue Type: Improvement Components: Client Affects Versions: 1.0.0, 0.98.12 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Minor Fix For: 1.1.0 Attachments: 13286.patch There is a check in the client to be sure that we don't use a timeout of zero (i.e. infinite). This includes setting the minimal time out for a rpc timeout to 2 seconds. However, it makes sense for some calls (typically gets going to the cache) to have much lower timeouts. So it's better to do the check vs. zero but with a minimal timeout of 1. I fixed a typo a wrong comment in this patch as well. I don't understand this code: {code} // t could be a RemoteException so go around again. translateException(t); // We don't use the result? {code} but may be it's good. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13286) Minimum timeout for a rpc call could be 1 ms instead of 2 seconds
[ https://issues.apache.org/jira/browse/HBASE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-13286: Attachment: 13286.patch Minimum timeout for a rpc call could be 1 ms instead of 2 seconds - Key: HBASE-13286 URL: https://issues.apache.org/jira/browse/HBASE-13286 Project: HBase Issue Type: Improvement Components: Client Affects Versions: 1.0.0, 0.98.12 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Minor Fix For: 1.1.0 Attachments: 13286.patch There is a check in the client to be sure that we don't use a timeout of zero (i.e. infinite). This includes setting the minimal time out for a rpc timeout to 2 seconds. However, it makes sense for some calls (typically gets going to the cache) to have much lower timeouts. So it's better to do the check vs. zero but with a minimal timeout of 0. I fixed a typo a wrong comment in this patch as well. I don't understand this code: {code} // t could be a RemoteException so go around again. translateException(t); // We don't use the result? {code} but may be it's good. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13286) Minimum timeout for a rpc call could be 1 ms instead of 2 seconds
[ https://issues.apache.org/jira/browse/HBASE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-13286: Description: There is a check in the client to be sure that we don't use a timeout of zero (i.e. infinite). This includes setting the minimal time out for a rpc timeout to 2 seconds. However, it makes sense for some calls (typically gets going to the cache) to have much lower timeouts. So it's better to do the check vs. zero but with a minimal timeout of 1. I fixed a typo a wrong comment in this patch as well. I don't understand this code: {code} // t could be a RemoteException so go around again. translateException(t); // We don't use the result? {code} but may be it's good. was: There is a check in the client to be sure that we don't use a timeout of zero (i.e. infinite). This includes setting the minimal time out for a rpc timeout to 2 seconds. However, it makes sense for some calls (typically gets going to the cache) to have much lower timeouts. So it's better to do the check vs. zero but with a minimal timeout of 0. I fixed a typo a wrong comment in this patch as well. I don't understand this code: {code} // t could be a RemoteException so go around again. translateException(t); // We don't use the result? {code} but may be it's good. Minimum timeout for a rpc call could be 1 ms instead of 2 seconds - Key: HBASE-13286 URL: https://issues.apache.org/jira/browse/HBASE-13286 Project: HBase Issue Type: Improvement Components: Client Affects Versions: 1.0.0, 0.98.12 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Minor Fix For: 1.1.0 Attachments: 13286.patch There is a check in the client to be sure that we don't use a timeout of zero (i.e. infinite). This includes setting the minimal time out for a rpc timeout to 2 seconds. However, it makes sense for some calls (typically gets going to the cache) to have much lower timeouts. So it's better to do the check vs. zero but with a minimal timeout of 1. I fixed a typo a wrong comment in this patch as well. I don't understand this code: {code} // t could be a RemoteException so go around again. translateException(t); // We don't use the result? {code} but may be it's good. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13286) Minimum timeout for a rpc call could be 1 ms instead of 2 seconds
[ https://issues.apache.org/jira/browse/HBASE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369451#comment-14369451 ] Nicolas Liochon commented on HBASE-13286: - Thanks Ted, let see what hadoop-qa says, I hope I won't discover an ocean of race conditions here :-). Minimum timeout for a rpc call could be 1 ms instead of 2 seconds - Key: HBASE-13286 URL: https://issues.apache.org/jira/browse/HBASE-13286 Project: HBase Issue Type: Improvement Components: Client Affects Versions: 1.0.0, 0.98.12 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Minor Fix For: 1.1.0 Attachments: 13286.patch There is a check in the client to be sure that we don't use a timeout of zero (i.e. infinite). This includes setting the minimal time out for a rpc timeout to 2 seconds. However, it makes sense for some calls (typically gets going to the cache) to have much lower timeouts. So it's better to do the check vs. zero but with a minimal timeout of 1. I fixed a typo a wrong comment in this patch as well. I don't understand this code: {code} // t could be a RemoteException so go around again. translateException(t); // We don't use the result? {code} but may be it's good. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13286) Minimum timeout for a rpc call could be 1 ms instead of 2 seconds
Nicolas Liochon created HBASE-13286: --- Summary: Minimum timeout for a rpc call could be 1 ms instead of 2 seconds Key: HBASE-13286 URL: https://issues.apache.org/jira/browse/HBASE-13286 Project: HBase Issue Type: Improvement Components: Client Affects Versions: 0.98.12, 1.0.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Minor Fix For: 1.1.0 There is a check in the client to be sure that we don't use a timeout of zero (i.e. infinite). This includes setting the minimal time out for a rpc timeout to 2 seconds. However, it makes sense for some calls (typically gets going to the cache) to have much lower timeouts. So it's better to do the check vs. zero but with a minimal timeout of 0. I fixed a typo a wrong comment in this patch as well. I don't understand this code: {code} // t could be a RemoteException so go around again. translateException(t); // We don't use the result? {code} but may be it's good. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13286) Minimum timeout for a rpc call could be 1 ms instead of 2 seconds
[ https://issues.apache.org/jira/browse/HBASE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-13286: Status: Patch Available (was: Open) Minimum timeout for a rpc call could be 1 ms instead of 2 seconds - Key: HBASE-13286 URL: https://issues.apache.org/jira/browse/HBASE-13286 Project: HBase Issue Type: Improvement Components: Client Affects Versions: 0.98.12, 1.0.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Minor Fix For: 1.1.0 Attachments: 13286.patch There is a check in the client to be sure that we don't use a timeout of zero (i.e. infinite). This includes setting the minimal time out for a rpc timeout to 2 seconds. However, it makes sense for some calls (typically gets going to the cache) to have much lower timeouts. So it's better to do the check vs. zero but with a minimal timeout of 0. I fixed a typo a wrong comment in this patch as well. I don't understand this code: {code} // t could be a RemoteException so go around again. translateException(t); // We don't use the result? {code} but may be it's good. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13271) Table#puts(ListPut) operation is indeterminate; remove!
[ https://issues.apache.org/jira/browse/HBASE-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368052#comment-14368052 ] Nicolas Liochon commented on HBASE-13271: - Oh ok. Thanks for the explanation. Then the call to batch seems to be the perfect solution. Table#puts(ListPut) operation is indeterminate; remove! - Key: HBASE-13271 URL: https://issues.apache.org/jira/browse/HBASE-13271 Project: HBase Issue Type: Improvement Components: API Affects Versions: 1.0.0 Reporter: stack Another API issue found by [~larsgeorge]: Table.put(ListPut) is questionable after the API change. {code} [Mar-17 9:21 AM] Lars George: Table.put(ListPut) is weird since you cannot flush partial lists [Mar-17 9:21 AM] Lars George: Say out of 5 the third is broken, then the put() call returns with a local exception (say empty Put) and then you have 2 that are in the buffer [Mar-17 9:21 AM] Lars George: but how to you force commit them? [Mar-17 9:22 AM] Lars George: In the past you would call flushCache(), but that is gone now [Mar-17 9:22 AM] Lars George: and flush() is not available on a Table [Mar-17 9:22 AM] Lars George: And you cannot access the underlying BufferedMutation neither [Mar-17 9:23 AM] Lars George: You can *only* add more Puts if you can, or call close() [Mar-17 9:23 AM] Lars George: that is just weird to explain {code} So, Table needs to get flush back or we deprecate this method or it flushes immediately and does not return until complete in the implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13272) Get.setClosestRowBefore() breaks specific column Get
[ https://issues.apache.org/jira/browse/HBASE-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367886#comment-14367886 ] Nicolas Liochon commented on HBASE-13272: - setClosestRowBefore is supersedes by reverse scan, imho? IIRC we don't use it internally anymore (the region locator uses the reverse scan). Get.setClosestRowBefore() breaks specific column Get Key: HBASE-13272 URL: https://issues.apache.org/jira/browse/HBASE-13272 Project: HBase Issue Type: Bug Reporter: stack Via [~larsgeorge] Get.setClosestRowBefore() is breaking a specific Get that specifies a column. If you set the latter to true it will return the _entire_ row! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13271) Table#puts(ListPut) operation is indeterminate; remove!
[ https://issues.apache.org/jira/browse/HBASE-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367934#comment-14367934 ] Nicolas Liochon commented on HBASE-13271: - [Mar-17 9:21 AM] Lars George: Say out of 5 the third is broken, then the put() call returns with a local exception (say empty Put) and then you have 2 that are in the buffer [Mar-17 9:21 AM] Lars George: but how to you force commit them? [Mar-17 9:22 AM] Lars George: In the past you would call flushCache(), but that is gone now If they failed the first time, why would they succeed the second time? Why are they still in the buffer if it failed? Why flushCache is not available? flushing the commit should be available to be end user, no? Table#puts(ListPut) operation is indeterminate; remove! - Key: HBASE-13271 URL: https://issues.apache.org/jira/browse/HBASE-13271 Project: HBase Issue Type: Improvement Components: API Affects Versions: 1.0.0 Reporter: stack Another API issue found by [~larsgeorge]: Table.put(ListPut) is questionable after the API change. {code} [Mar-17 9:21 AM] Lars George: Table.put(ListPut) is weird since you cannot flush partial lists [Mar-17 9:21 AM] Lars George: Say out of 5 the third is broken, then the put() call returns with a local exception (say empty Put) and then you have 2 that are in the buffer [Mar-17 9:21 AM] Lars George: but how to you force commit them? [Mar-17 9:22 AM] Lars George: In the past you would call flushCache(), but that is gone now [Mar-17 9:22 AM] Lars George: and flush() is not available on a Table [Mar-17 9:22 AM] Lars George: And you cannot access the underlying BufferedMutation neither [Mar-17 9:23 AM] Lars George: You can *only* add more Puts if you can, or call close() [Mar-17 9:23 AM] Lars George: that is just weird to explain {code} So, Table needs to get flush back or we deprecate this method or it flushes immediately and does not return until complete in the implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13271) Table#puts(ListPut) operation is indeterminate; remove!
[ https://issues.apache.org/jira/browse/HBASE-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367989#comment-14367989 ] Nicolas Liochon commented on HBASE-13271: - bq. There shouldn't be a need to flush any buffer in Table. Autoflush should always be true. The idea is that users who want autoflush=false should use the new BufferedMutator interface rather than a table. Ok (if we can flush the BufferedMutator on demand it's fine). bq. I personally think that the put(ListPut) method is useful I agree. And I think a lot of people depends on it: they use it to have better performances than calling multiple times a the put(Put). bq. Maybe HTable.put(ListPut) should use HTable.batch() rather than BufferedMutator.mutate for the autoflush=true case From what I know of the code I like this idea. But it seems that Lars issue is with autoflush=false? Thanks Solomon. Table#puts(ListPut) operation is indeterminate; remove! - Key: HBASE-13271 URL: https://issues.apache.org/jira/browse/HBASE-13271 Project: HBase Issue Type: Improvement Components: API Affects Versions: 1.0.0 Reporter: stack Another API issue found by [~larsgeorge]: Table.put(ListPut) is questionable after the API change. {code} [Mar-17 9:21 AM] Lars George: Table.put(ListPut) is weird since you cannot flush partial lists [Mar-17 9:21 AM] Lars George: Say out of 5 the third is broken, then the put() call returns with a local exception (say empty Put) and then you have 2 that are in the buffer [Mar-17 9:21 AM] Lars George: but how to you force commit them? [Mar-17 9:22 AM] Lars George: In the past you would call flushCache(), but that is gone now [Mar-17 9:22 AM] Lars George: and flush() is not available on a Table [Mar-17 9:22 AM] Lars George: And you cannot access the underlying BufferedMutation neither [Mar-17 9:23 AM] Lars George: You can *only* add more Puts if you can, or call close() [Mar-17 9:23 AM] Lars George: that is just weird to explain {code} So, Table needs to get flush back or we deprecate this method or it flushes immediately and does not return until complete in the implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13219) Issues with PE tool in trunk
[ https://issues.apache.org/jira/browse/HBASE-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360723#comment-14360723 ] Nicolas Liochon commented on HBASE-13219: - bq. What was the behavior before HBASE-11390? Multiple connections? Yeah, exactly. I kept it to make comparison between multiple versions possible. It can help to find some bottlenecks (multiple connections means multiple tcp connections, multiple pools and so on). But simplicity is good as well, so both options are ok to me. Issues with PE tool in trunk Key: HBASE-13219 URL: https://issues.apache.org/jira/browse/HBASE-13219 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: t1 - PE tool tries to create the TEstTable and waits for it to be enabled and just hangs there Previously this was not happening and the PE tool used to run fine after the table creation. - When we try to scan with 25 threads the PE tool fails after some time saying Unable to create native threads. I lost the Stack trace now. But I could get it easily. It happens here {code} public void submit(RetryingCallableV task, int callTimeout, int id) { QueueingFutureV newFuture = new QueueingFutureV(task, callTimeout); executor.execute(Trace.wrap(newFuture)); tasks[id] = newFuture; } {code} in ResultBoundedCompletionService. This is also new. Previously it used to work with 25 threads without any issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13099) Scans as in DynamoDB
[ https://issues.apache.org/jira/browse/HBASE-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338467#comment-14338467 ] Nicolas Liochon commented on HBASE-13099: - The 1mb could be changed / made configurable. The scan could finish if we are at the end of a row and one of these conditions is met: - we already have more than XX Mb and - the scan has been running for more than YY seconds - the scan reached the end of a region This could simplify some code, and make the server less sensitive to client issues. This would allow to remove the small scan code in the client as well (and, for all the clients that are doing small scans w/o setting this small flag, it would be faster). Scans as in DynamoDB Key: HBASE-13099 URL: https://issues.apache.org/jira/browse/HBASE-13099 Project: HBase Issue Type: Brainstorming Components: Client, regionserver Reporter: Nicolas Liochon cc: [~saint@gmail.com] - as discussed offline. DynamoDB has a very simple way to manage scans server side: ??citation?? The data returned from a Query or Scan operation is limited to 1 MB; this means that if you scan a table that has more than 1 MB of data, you'll need to perform another Scan operation to continue to the next 1 MB of data in the table. If you query or scan for specific attributes that match values that amount to more than 1 MB of data, you'll need to perform another Query or Scan request for the next 1 MB of data. To do this, take the LastEvaluatedKey value from the previous request, and use that value as the ExclusiveStartKey in the next request. This will let you progressively query or scan for new data in 1 MB increments. When the entire result set from a Query or Scan has been processed, the LastEvaluatedKey is null. This indicates that the result set is complete (i.e. the operation processed the “last page” of data). ??citation?? This means that there is no state server side: the work is done client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13099) Scans as in DynamoDB
Nicolas Liochon created HBASE-13099: --- Summary: Scans as in DynamoDB Key: HBASE-13099 URL: https://issues.apache.org/jira/browse/HBASE-13099 Project: HBase Issue Type: Brainstorming Components: Client, regionserver Reporter: Nicolas Liochon cc: [~saint@gmail.com] - as discussed offline. DynamoDB has a very simple way to manage scans server side: ??citation?? The data returned from a Query or Scan operation is limited to 1 MB; this means that if you scan a table that has more than 1 MB of data, you'll need to perform another Scan operation to continue to the next 1 MB of data in the table. If you query or scan for specific attributes that match values that amount to more than 1 MB of data, you'll need to perform another Query or Scan request for the next 1 MB of data. To do this, take the LastEvaluatedKey value from the previous request, and use that value as the ExclusiveStartKey in the next request. This will let you progressively query or scan for new data in 1 MB increments. When the entire result set from a Query or Scan has been processed, the LastEvaluatedKey is null. This indicates that the result set is complete (i.e. the operation processed the “last page” of data). ??citation?? This means that there is no state server side: the work is done client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12995) Document that HConnection#getTable methods do not check table existence since 0.98.1
[ https://issues.apache.org/jira/browse/HBASE-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312892#comment-14312892 ] Nicolas Liochon commented on HBASE-12995: - Yep, I confirm it's from this the 10080. +1 for the javadoc change, I should have done it in the original jira. Document that HConnection#getTable methods do not check table existence since 0.98.1 Key: HBASE-12995 URL: https://issues.apache.org/jira/browse/HBASE-12995 Project: HBase Issue Type: Task Affects Versions: 0.98.1 Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 [~jamestaylor] mentioned that recently Phoenix discovered at some point the {{HConnection#getTable}} lightweight table reference methods stopped throwing TableNotFoundExceptions. It used to be (in 0.94 and 0.96) that all APIs that construct HTables would check if the table is locatable and throw exceptions if not. Now, if using the {{HConnection#getTable}} APIs, such exceptions will only be thrown at the time of the first operation submitted using the table reference, should a problem be detected then. We did a bisect and it seems this was changed in the 0.98.1 release by HBASE-10080. Since the change has now shipped in 10 in total 0.98 releases we should just document the change, in the javadoc of the HConnection class, Connection in branch-1+. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12974) Opaque AsyncProcess failure: RetriesExhaustedWithDetailsException but no detail
[ https://issues.apache.org/jira/browse/HBASE-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307249#comment-14307249 ] Nicolas Liochon commented on HBASE-12974: - bq. 1 time only We don't keep the history of the exceptions. the time is only about the last exception. So if you have 1 action that failed you will have 1 time. If 10 actions fail for the same reason you will have '10 times'. Yes it's kind of useless. We used to start to log after 10 retries or so, so the log should contain more information (at the info level iirc). Opaque AsyncProcess failure: RetriesExhaustedWithDetailsException but no detail --- Key: HBASE-12974 URL: https://issues.apache.org/jira/browse/HBASE-12974 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0 Reporter: stack Assignee: stack I'm trying to do longer running tests but when I up the numbers for a task I run into this: {code} 2015-02-04 15:35:10,267 FATAL [IPC Server handler 17 on 43975] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1419986015214_0204_m_02_3 - exited : org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: IOException: 1 time, at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:227) at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:207) at org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1658) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:208) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doMutate(BufferedMutatorImpl.java:141) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:98) at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.persist(IntegrationTestBigLinkedList.java:449) at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:407) at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:355) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) {code} Its telling me an action failed but 1 time only with an empty IOE? I'm kinda stumped. Starting up this issue to see if I can get to the bottom of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12964) Add the ability for hbase-daemon.sh to start in the foreground
[ https://issues.apache.org/jira/browse/HBASE-12964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304093#comment-14304093 ] Nicolas Liochon commented on HBASE-12964: - I read the patch, w/o actually testing it. It seems ok to me. +1 if it works for you, Elliott. Add the ability for hbase-daemon.sh to start in the foreground -- Key: HBASE-12964 URL: https://issues.apache.org/jira/browse/HBASE-12964 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 0.98.10 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 2.0.0, 1.1.0, 0.98.11 Attachments: HBASE-12964-v1.patch, HBASE-12964-v2.patch, HBASE-12964.patch The znode cleaner is awesome and gives great benefits. As more and more deployments start using containers some of them will want to run things in the foreground. hbase-daemon.sh should allow that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10942) support parallel request cancellation for multi-get
[ https://issues.apache.org/jira/browse/HBASE-10942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302028#comment-14302028 ] Nicolas Liochon commented on HBASE-10942: - Time goes by ;-) LGTM, +1 support parallel request cancellation for multi-get --- Key: HBASE-10942 URL: https://issues.apache.org/jira/browse/HBASE-10942 Project: HBase Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Nicolas Liochon Fix For: hbase-10070 Attachments: 10942-1.1.txt, 10942-for-98.zip, 10942.patch, HBASE-10942.01.patch, HBASE-10942.02.patch, HBASE-10942.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12684) Add new AsyncRpcClient
[ https://issues.apache.org/jira/browse/HBASE-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291974#comment-14291974 ] Nicolas Liochon commented on HBASE-12684: - Sorry, I'm seeing this only now (I missed the message on the 15th), but yep, I like this. And I like the configurable RPC implementation, great as well. Add new AsyncRpcClient -- Key: HBASE-12684 URL: https://issues.apache.org/jira/browse/HBASE-12684 Project: HBase Issue Type: Improvement Components: Client Reporter: Jurriaan Mous Assignee: Jurriaan Mous Fix For: 2.0.0, 1.1.0 Attachments: HBASE-12684-DEBUG2.patch, HBASE-12684-DEBUG3.patch, HBASE-12684-v1.patch, HBASE-12684-v10.patch, HBASE-12684-v11.patch, HBASE-12684-v12.patch, HBASE-12684-v13.patch, HBASE-12684-v14.patch, HBASE-12684-v15.patch, HBASE-12684-v16.patch, HBASE-12684-v17.patch, HBASE-12684-v17.patch, HBASE-12684-v18.patch, HBASE-12684-v19.1.patch, HBASE-12684-v19.patch, HBASE-12684-v19.patch, HBASE-12684-v2.patch, HBASE-12684-v20-heapBuffer.patch, HBASE-12684-v20.patch, HBASE-12684-v21-heapBuffer.1.patch, HBASE-12684-v21-heapBuffer.patch, HBASE-12684-v21.patch, HBASE-12684-v22.patch, HBASE-12684-v23-epoll.patch, HBASE-12684-v24.patch, HBASE-12684-v24.patch, HBASE-12684-v24.patch, HBASE-12684-v24.patch, HBASE-12684-v24.patch, HBASE-12684-v25.patch, HBASE-12684-v26.patch, HBASE-12684-v27.patch, HBASE-12684-v27.patch, HBASE-12684-v28.patch, HBASE-12684-v29.patch, HBASE-12684-v3.patch, HBASE-12684-v30.patch, HBASE-12684-v30.patch, HBASE-12684-v30.patch, HBASE-12684-v31.patch, HBASE-12684-v31.patch, HBASE-12684-v31.patch, HBASE-12684-v4.patch, HBASE-12684-v5.patch, HBASE-12684-v6.patch, HBASE-12684-v7.patch, HBASE-12684-v8.patch, HBASE-12684-v9.patch, HBASE-12684.patch, Screen Shot 2015-01-11 at 11.55.32 PM.png, myrecording.jfr, q.png, requests.png With the changes in HBASE-12597 it is possible to add new RpcClients. This issue is about adding a new Async RpcClient which would enable HBase to do non blocking protobuf service communication. Besides delivering a new AsyncRpcClient I would also like to ask the question what it would take to replace the current RpcClient? This would enable to simplify async code in some next issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12611) Create autoCommit() method and remove clearBufferOnFail
[ https://issues.apache.org/jira/browse/HBASE-12611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234055#comment-14234055 ] Nicolas Liochon commented on HBASE-12611: - bq. stack and Nick Dimiduk came to the conclusion that the flush method should be called autoCommit() similar to the SQL APIs. Sorry for beeing late in the game. The meaning in SQL is slightly different. In jdbc, whatever the value for autoCommit, the query will be sent to be server and executed. autocommit is set to false if the client application wants to send multiple queries within a single transaction (and then it will do a begin/commit explicitly). I haven't double checked if it's the standard or an implementation detail (the docs are not very clear), but it's unlikely to change anyway: there is another set of methods for batches in jdbc. Our old autoFlush is different as it impacts the client behavior. I think we're creating a confusion here. Moreover, if we add transactions between rows in the future, then may be we will want to use autoCommit for what it really is. As I'm very late here I leave the decision to you, but we should at least be clear in the javadoc imho. bq. Do we also want to change the default with this patch? I like the fact that HBase its secure by default and it would be very confusing for the users as well imho. {code} - public boolean isAutoFlush() { -return autoFlush; + public boolean getAutoCommit() { +return autoCommit; {code} If I'm not wrong we use 'is' for getters on boolean? Create autoCommit() method and remove clearBufferOnFail --- Key: HBASE-12611 URL: https://issues.apache.org/jira/browse/HBASE-12611 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.99.2 Reporter: Solomon Duskis Assignee: Solomon Duskis Fix For: 1.0.0 Attachments: HBASE-12611.patch There was quite a bit of good discussion on HBASE-12490 about this topic. [~stack] and [~ndimiduk] came to the conclusion that the flush method should be called autoCommit() similar to the SQL APIs. [~ndimiduk] also suggested that clearBufferOnFail should be removed from HTable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12557) Introduce timeout mechanism for IP to rack resolution
[ https://issues.apache.org/jira/browse/HBASE-12557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227721#comment-14227721 ] Nicolas Liochon commented on HBASE-12557: - bq. Still looking for a way to make lengthy DNS related call. kill suspend (-STOP) the dns process should do it? Introduce timeout mechanism for IP to rack resolution - Key: HBASE-12557 URL: https://issues.apache.org/jira/browse/HBASE-12557 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Attachments: 12557-v1.txt Config parameter, hbase.util.ip.to.rack.determiner, determines the class which does IP to rack resolution. The actual resolution may be lengthy. This JIRA is continuation of HBASE-12554 where a mock DNSToSwitchMapping is used for rack resolution. A timeout parameter, hbase.ip.to.rack.determiner.timeout, is proposed whose value governs the duration which RackManager waits before rack resolution is stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12557) Introduce timeout mechanism for IP to rack resolution
[ https://issues.apache.org/jira/browse/HBASE-12557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227785#comment-14227785 ] Nicolas Liochon commented on HBASE-12557: - Agreed (or you can add a hook to ease tests, this save you from using mockito). If you want to test that we don't leak resources (i.e. that the dns client implementation supports correctly an interruption), then you can't do that but it will be an integration test then Introduce timeout mechanism for IP to rack resolution - Key: HBASE-12557 URL: https://issues.apache.org/jira/browse/HBASE-12557 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Attachments: 12557-v1.txt Config parameter, hbase.util.ip.to.rack.determiner, determines the class which does IP to rack resolution. The actual resolution may be lengthy. This JIRA is continuation of HBASE-12554 where a mock DNSToSwitchMapping is used for rack resolution. A timeout parameter, hbase.ip.to.rack.determiner.timeout, is proposed whose value governs the duration which RackManager waits before rack resolution is stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12490) Replace uses of setAutoFlush(boolean, boolean)
[ https://issues.apache.org/jira/browse/HBASE-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227815#comment-14227815 ] Nicolas Liochon commented on HBASE-12490: - bq. It seems reasonable to me to remove it, that's a decision beyond my paygrade Well if you do that patch you get some decision power :-) [~ndimiduk], any opinion? Replace uses of setAutoFlush(boolean, boolean) -- Key: HBASE-12490 URL: https://issues.apache.org/jira/browse/HBASE-12490 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.99.2 Reporter: Solomon Duskis Assignee: Solomon Duskis Attachments: HBASE-12490.patch, HBASE-12490B.patch, HBASE-12490B.patch, HBASE-12490B.patch, HBASE-12490C.patch The various uses of setAutoFlush() seem to need some tlc. There's a note in HTableInterface: @deprecated in 0.99 since setting clearBufferOnFail is deprecated. Use setAutoFlushTo(boolean) instead. It would be ideal to change all internal uses of setAutoFlush(boolean, boolean) to use setAutoFlushTo, if possible. HTable.setAutoFlush(boolean, boolean) is used in a handful of places. setAutoFlush(false, false) has the same results as HTable.setAutoFlush(false). Calling HTable.setAutoFlush(false, true) has the same affect as Table.setAutoFlushTo(false), assuming HTable.setAutoFlush(false) was not called previously (by default, the second parameter, clearBufferOnFail, is true and should remain true according to the comments). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12490) Replace uses of setAutoFlush(boolean, boolean)
[ https://issues.apache.org/jira/browse/HBASE-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224866#comment-14224866 ] Nicolas Liochon commented on HBASE-12490: - For stuff like: {code} -ht.setAutoFlush(false, false); +ht.setAutoFlush(false); {code} It's not a big deal, but I don't really like the 'setAutoFlush(boolean)', because it looks like a setter while actually it's not. I do prefer 'setAutoFlush(boolean, boolean)' because there is no confusion with a setter, so it's easier for the reader. The implicit setting of the clearBufferOnFail on something named like a setter is really confusing imho. I'm not -1, but I'm -0, if I'm the only one confused here... :-) Replace uses of setAutoFlush(boolean, boolean) -- Key: HBASE-12490 URL: https://issues.apache.org/jira/browse/HBASE-12490 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.99.2 Reporter: Solomon Duskis Assignee: Solomon Duskis Attachments: HBASE-12490.patch, HBASE-12490B.patch, HBASE-12490B.patch, HBASE-12490B.patch, HBASE-12490C.patch The various uses of setAutoFlush() seem to need some tlc. There's a note in HTableInterface: @deprecated in 0.99 since setting clearBufferOnFail is deprecated. Use setAutoFlushTo(boolean) instead. It would be ideal to change all internal uses of setAutoFlush(boolean, boolean) to use setAutoFlushTo, if possible. HTable.setAutoFlush(boolean, boolean) is used in a handful of places. setAutoFlush(false, false) has the same results as HTable.setAutoFlush(false). Calling HTable.setAutoFlush(false, true) has the same affect as Table.setAutoFlushTo(false), assuming HTable.setAutoFlush(false) was not called previously (by default, the second parameter, clearBufferOnFail, is true and should remain true according to the comments). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12490) Replace uses of setAutoFlush(boolean, boolean)
[ https://issues.apache.org/jira/browse/HBASE-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224939#comment-14224939 ] Nicolas Liochon commented on HBASE-12490: - Yeah, I saw it but I was ok with you answer so I didn't comment :-) Let's try to decide in this jira (Nick should see it). My point of view is: - we should not change the meaning of setAutoFlush(boolean), as it would be confusing during the upgrade (i.e. someone upgrading from .098 to 1.0 would have its code compiling but with a hidden behavior change) - we should not use setAutoFlush(boolean), may be we should remove it in 1.0. This because of the confusion around it's a setter-like that is not a setter. - I don't think that we need to keep clearBufferOnFail (i.e. we could remove it in 1.0), but may be I'm wrong here. If we do that then we can keep setAutoFlush(boolean), it will become a real setter (and then the points above are not an issue anymore. Replace uses of setAutoFlush(boolean, boolean) -- Key: HBASE-12490 URL: https://issues.apache.org/jira/browse/HBASE-12490 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.99.2 Reporter: Solomon Duskis Assignee: Solomon Duskis Attachments: HBASE-12490.patch, HBASE-12490B.patch, HBASE-12490B.patch, HBASE-12490B.patch, HBASE-12490C.patch The various uses of setAutoFlush() seem to need some tlc. There's a note in HTableInterface: @deprecated in 0.99 since setting clearBufferOnFail is deprecated. Use setAutoFlushTo(boolean) instead. It would be ideal to change all internal uses of setAutoFlush(boolean, boolean) to use setAutoFlushTo, if possible. HTable.setAutoFlush(boolean, boolean) is used in a handful of places. setAutoFlush(false, false) has the same results as HTable.setAutoFlush(false). Calling HTable.setAutoFlush(false, true) has the same affect as Table.setAutoFlushTo(false), assuming HTable.setAutoFlush(false) was not called previously (by default, the second parameter, clearBufferOnFail, is true and should remain true according to the comments). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12534) Wrong region location cache in client after regions are moved
[ https://issues.apache.org/jira/browse/HBASE-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224951#comment-14224951 ] Nicolas Liochon commented on HBASE-12534: - bq. it seems we can simply get rid of MIN_RPC_TIMEOUT I'm not against removing it (may be it's too much of a corner case) but it solves more than a configuration issue. with the setting above {code} hbase.rpc.timeout=1000 hbase.client.operation.timeout=1200 {code} if the first try fails after 1080ms, then the second try will have a rpc.timeout of 20ms (hbase.client.pause put aside). The MIN_RPC_TIMEOUT will say 'that's too low, let's set it to something more reasonable. We can remove it. What we need to detect however is a setting of 0 (if not it will be an infinite timeout). Wrong region location cache in client after regions are moved - Key: HBASE-12534 URL: https://issues.apache.org/jira/browse/HBASE-12534 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Critical Labels: client Attachments: HBASE-12534-0.94-v1.diff, HBASE-12534-v1.diff In our 0.94 hbase cluster, we found that client got wrong region location cache and did not update it after a region is moved to another regionserver. The reason is wrong client config and bug in RpcRetryingCaller of hbase client. The rpc configs are following: {code} hbase.rpc.timeout=1000 hbase.client.pause=200 hbase.client.operation.timeout=1200 {code} But the client retry number is 3 {code} hbase.client.retries.number=3 {code} Assumed that a region is at regionserver A before, and then it is moved to regionserver B. The client try to make a call to regionserver A and get an NotServingRegionException. For the rety number is not 1, the region server location cache is not cleaned. See: RpcRetryingCaller.java#141 and RegionServerCallable.java#127 {code} @Override public void throwable(Throwable t, boolean retrying) { if (t instanceof SocketTimeoutException || } else if (t instanceof NotServingRegionException !retrying) { // Purge cache entries for this specific region from hbase:meta cache // since we don't call connect(true) when number of retries is 1. getConnection().deleteCachedRegionLocation(location); } } {code} But the call did not retry and throw an SocketTimeoutException for the time the call will take is larger than the operation timeout.See RpcRetryingCaller.java#152 {code} expectedSleep = callable.sleep(pause, tries + 1); // If, after the planned sleep, there won't be enough time left, we stop now. long duration = singleCallDuration(expectedSleep); if (duration callTimeout) { String msg = callTimeout= + callTimeout + , callDuration= + duration + : + callable.getExceptionMessageAdditionalDetail(); throw (SocketTimeoutException)(new SocketTimeoutException(msg).initCause(t)); } {code} At last, the wrong region location will never be not cleaned up . [~lhofhansl] In hbase 0.94, the MIN_RPC_TIMEOUT in singleCallDuration is 2000 in default, which trigger this bug. {code} private long singleCallDuration(final long expectedSleep) { return (EnvironmentEdgeManager.currentTimeMillis() - this.globalStartTime) + MIN_RPC_TIMEOUT + expectedSleep; } {code} But there is risk in master code too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12534) Wrong region location cache in client after regions are moved
[ https://issues.apache.org/jira/browse/HBASE-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222873#comment-14222873 ] Nicolas Liochon commented on HBASE-12534: - MIN_RPC_TIMEOUT is linked to operation timeout: w/o it we could send a request w/o giving enough time to the server. As well until recently the rcp timeout was not multithreaded safe: it was set for all calls. So may be this min timeout saves in in the .94 .96 versions (not sure about .98). May be this min timeout should be configurable (cf. hbase.rpc.timeout=1000, which is lower than the mintimeout) Wrong region location cache in client after regions are moved - Key: HBASE-12534 URL: https://issues.apache.org/jira/browse/HBASE-12534 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Critical Labels: client Attachments: HBASE-12534-0.94-v1.diff, HBASE-12534-v1.diff In our 0.94 hbase cluster, we found that client got wrong region location cache and did not update it after a region is moved to another regionserver. The reason is wrong client config and bug in RpcRetryingCaller of hbase client. The rpc configs are following: {code} hbase.rpc.timeout=1000 hbase.client.pause=200 hbase.client.operation.timeout=1200 {code} But the client retry number is 3 {code} hbase.client.retries.number=3 {code} Assumed that a region is at regionserver A before, and then it is moved to regionserver B. The client try to make a call to regionserver A and get an NotServingRegionException. For the rety number is not 1, the region server location cache is not cleaned. See: RpcRetryingCaller.java#141 and RegionServerCallable.java#127 {code} @Override public void throwable(Throwable t, boolean retrying) { if (t instanceof SocketTimeoutException || } else if (t instanceof NotServingRegionException !retrying) { // Purge cache entries for this specific region from hbase:meta cache // since we don't call connect(true) when number of retries is 1. getConnection().deleteCachedRegionLocation(location); } } {code} But the call did not retry and throw an SocketTimeoutException for the time the call will take is larger than the operation timeout.See RpcRetryingCaller.java#152 {code} expectedSleep = callable.sleep(pause, tries + 1); // If, after the planned sleep, there won't be enough time left, we stop now. long duration = singleCallDuration(expectedSleep); if (duration callTimeout) { String msg = callTimeout= + callTimeout + , callDuration= + duration + : + callable.getExceptionMessageAdditionalDetail(); throw (SocketTimeoutException)(new SocketTimeoutException(msg).initCause(t)); } {code} At last, the wrong region location will never be not cleaned up . [~lhofhansl] In hbase 0.94, the MIN_RPC_TIMEOUT in singleCallDuration is 2000 in default, which trigger this bug. {code} private long singleCallDuration(final long expectedSleep) { return (EnvironmentEdgeManager.currentTimeMillis() - this.globalStartTime) + MIN_RPC_TIMEOUT + expectedSleep; } {code} But there is risk in master code too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12354) Update dependencies in time for 1.0 release
[ https://issues.apache.org/jira/browse/HBASE-12354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188316#comment-14188316 ] Nicolas Liochon commented on HBASE-12354: - +1, there is a +1 from Enis above as well. Update dependencies in time for 1.0 release --- Key: HBASE-12354 URL: https://issues.apache.org/jira/browse/HBASE-12354 Project: HBase Issue Type: Sub-task Components: dependencies Reporter: stack Assignee: stack Fix For: 2.0.0, 0.99.2 Attachments: 12354.txt, 12354v2.txt Going through and updating egregiously old dependencies for 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12293) Tests are logging too much
[ https://issues.apache.org/jira/browse/HBASE-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178404#comment-14178404 ] Nicolas Liochon commented on HBASE-12293: - tests should be at info level at the minimum, as in production: if not we will discover in production/integration test that we log too much (or worse triggers NPE or stuff like this). For the same reason, I prefer to use the debug level in tests, to be sure that I won't have surprises (NPE) if I try to use them. What I did in the past is reusing the info from the apache build (run time and logs), and looked at the both the log size and the log rate per test to prioritize the tests I was looking at. Then I was just improving the logs around these area. Tests are logging too much -- Key: HBASE-12293 URL: https://issues.apache.org/jira/browse/HBASE-12293 Project: HBase Issue Type: Bug Components: test Reporter: Dima Spivak Assignee: Dima Spivak Priority: Minor In trying to solve HBASE-12285, it was pointed out that tests are writing too much to output again. At best, this is a sloppy practice and, at worst, it leaves us open to builds breaking when our test tools can't handle the flood. If [~nkeywal] would be willing give me a little bit of mentoring on how he dealt with this problem a few years back, I'd be happy to add it to my plate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12285) Builds are failing, possibly because of SUREFIRE-1091
[ https://issues.apache.org/jira/browse/HBASE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178414#comment-14178414 ] Nicolas Liochon commented on HBASE-12285: - I think changing the log level is not a good idea (I added a comment in the related jira: it's very common to discover NPE when you activate logs, and it's a very bad user experience: something does not work as expected, you activate the debug logs to understand and then you get a NPE.). If we don't want to pay the testing cost of the debug logs, then I'm +1 for removing them (seriously: they are becoming useless as we now run info by default). But if we keep them in the code we must keep them in the tests. Builds are failing, possibly because of SUREFIRE-1091 - Key: HBASE-12285 URL: https://issues.apache.org/jira/browse/HBASE-12285 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Dima Spivak Assignee: Dima Spivak Priority: Blocker Attachments: HBASE-12285_branch-1_v1.patch Our branch-1 builds on builds.apache.org have been failing in recent days after we switched over to an official version of Surefire a few days back (HBASE-4955). The version we're using, 2.17, is hit by a bug ([SUREFIRE-1091|https://jira.codehaus.org/browse/SUREFIRE-1091]) that results in an IOException, which looks like what we're seeing on Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12285) Builds are failing, possibly because of SUREFIRE-1091
[ https://issues.apache.org/jira/browse/HBASE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179414#comment-14179414 ] Nicolas Liochon commented on HBASE-12285: - Sure we can try. But when will we go back to the good setting? Builds are failing, possibly because of SUREFIRE-1091 - Key: HBASE-12285 URL: https://issues.apache.org/jira/browse/HBASE-12285 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Dima Spivak Assignee: Dima Spivak Priority: Blocker Attachments: HBASE-12285_branch-1_v1.patch Our branch-1 builds on builds.apache.org have been failing in recent days after we switched over to an official version of Surefire a few days back (HBASE-4955). The version we're using, 2.17, is hit by a bug ([SUREFIRE-1091|https://jira.codehaus.org/browse/SUREFIRE-1091]) that results in an IOException, which looks like what we're seeing on Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12285) Builds are failing, possibly because of SUREFIRE-1091
[ https://issues.apache.org/jira/browse/HBASE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175960#comment-14175960 ] Nicolas Liochon commented on HBASE-12285: - surefire-1091 is a very good suspect, because in our private surefire version the implementation for this was different. This said, my be we just log too much in the test(s)? I've done some cleanup there nearly 3 years ago, but this belongs to the never ending story category... Builds are failing, possibly because of SUREFIRE-1091 - Key: HBASE-12285 URL: https://issues.apache.org/jira/browse/HBASE-12285 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Dima Spivak Assignee: Dima Spivak Priority: Blocker Our branch-1 builds on builds.apache.org have been failing in recent days after we switched over to an official version of Surefire a few days back (HBASE-4955). The version we're using, 2.17, is hit by a bug ([SUREFIRE-1091|https://jira.codehaus.org/browse/SUREFIRE-1091]) that results in an IOException, which looks like what we're seeing on Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11835) Wrong managenement of non expected calls in the client
[ https://issues.apache.org/jira/browse/HBASE-11835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160384#comment-14160384 ] Nicolas Liochon commented on HBASE-11835: - The failures are very likely unrelated. I plan to commit this this week if nobody disagrees. Wrong managenement of non expected calls in the client -- Key: HBASE-11835 URL: https://issues.apache.org/jira/browse/HBASE-11835 Project: HBase Issue Type: Bug Components: Client Affects Versions: 1.0.0, 2.0.0, 0.98.6 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 2.0.0, 0.99.1 Attachments: 11835.rebase.patch, rpcClient.patch If a call is purged or canceled we try to skip the reply from the server, but we read the wrong number of bytes so we corrupt the tcp channel. It's hidden as it triggers retry and so on, but it's bad for performances obviously. It happens with cell blocks. [~ram_krish_86], [~saint@gmail.com], you know this part better than me, do you agree with the analysis and the patch? The changes in rpcServer are not fully related: as the client close the connections in such situation, I observed both ClosedChannelException and CancelledKeyException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12148) Remove TimeRangeTracker as point of contention when many threads writing a Store
[ https://issues.apache.org/jira/browse/HBASE-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160412#comment-14160412 ] Nicolas Liochon commented on HBASE-12148: - bq. I almost feel we should doc this and move on, if anyone is running the server side on a 32 bit JVM they shouldn't. But yeah the potential for torn reads isn't good. +1. As well, IIRC, there are other parts of code where we rely on atomic op for 64 bits stuff (as we don't test on 32 bits, what I said is likely true with the usual pattern not tested means not working). Remove TimeRangeTracker as point of contention when many threads writing a Store Key: HBASE-12148 URL: https://issues.apache.org/jira/browse/HBASE-12148 Project: HBase Issue Type: Sub-task Components: Performance Affects Versions: 2.0.0, 0.99.1 Reporter: stack Assignee: stack Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: 12148.txt, 12148.txt, 12148v2.txt, 12148v2.txt, Screen Shot 2014-10-01 at 3.39.46 PM.png, Screen Shot 2014-10-01 at 3.41.07 PM.png -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12153) Fixing TestReplicaWithCluster
[ https://issues.apache.org/jira/browse/HBASE-12153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156252#comment-14156252 ] Nicolas Liochon commented on HBASE-12153: - Yeah, I actually don't like much timeouts in tests because they have do be removed during debugging sessions (the test is from me, but the timeouts are from Stack ;-) ). It's a workaround for surefire zombies... +1 for the patch. Fixing TestReplicaWithCluster - Key: HBASE-12153 URL: https://issues.apache.org/jira/browse/HBASE-12153 Project: HBase Issue Type: Bug Components: test Affects Versions: 1.0.0 Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Priority: Trivial Fix For: 1.0.0 Attachments: 0001-FixTestReplicaWithCluster.patch This test takes about 30 ~ 40 seconds depending upon the resources available. Doesn't make sense to have such a tight bound(30s) on the unit test. [~nkeywal], what do you think ? Did you intend to have such a tight bound while adding the test here ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11835) Wrong managenement of non expected calls in the client
[ https://issues.apache.org/jira/browse/HBASE-11835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156258#comment-14156258 ] Nicolas Liochon commented on HBASE-11835: - it got lost somewhere in a todo list. Let me have a look again. Wrong managenement of non expected calls in the client -- Key: HBASE-11835 URL: https://issues.apache.org/jira/browse/HBASE-11835 Project: HBase Issue Type: Bug Components: Client Affects Versions: 1.0.0, 2.0.0, 0.98.6 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 2.0.0, 0.98.7, 0.99.1 Attachments: rpcClient.patch If a call is purged or canceled we try to skip the reply from the server, but we read the wrong number of bytes so we corrupt the tcp channel. It's hidden as it triggers retry and so on, but it's bad for performances obviously. It happens with cell blocks. [~ram_krish_86], [~saint@gmail.com], you know this part better than me, do you agree with the analysis and the patch? The changes in rpcServer are not fully related: as the client close the connections in such situation, I observed both ClosedChannelException and CancelledKeyException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12141) ClusterStatus message might exceed max datagram payload limits
[ https://issues.apache.org/jira/browse/HBASE-12141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156267#comment-14156267 ] Nicolas Liochon commented on HBASE-12141: - Yeah, the strategy was to keep the message small enough (if multiple servers fail simultaneously, we send multiple messages instead of one). As well, we send the message multiple times in case it got lost somewhere. I had issue with Netty 3.x when tried to add frames. I haven't tried very hard. We could make MAX_SERVER_PER_MESSAGE configurable for network with a very small mtu? It's also possible to compress the message. Once again, I had issue with Netty 3.x for this in the past. This said, I would be interested to understand the network config. ClusterStatus message might exceed max datagram payload limits -- Key: HBASE-12141 URL: https://issues.apache.org/jira/browse/HBASE-12141 Project: HBase Issue Type: Bug Affects Versions: 0.98.3 Reporter: Andrew Purtell The multicast ClusterStatusPublisher and its companion listener are using datagram channels without any framing. I think this is an issue because Netty's ProtobufDecoder expects a complete PB message to be available in the ChannelBuffer yet ClusterStatus messages can be large and might exceed the maximum datagram payload size. As one user reported on list: {noformat} org.apache.hadoop.hbase.client.ClusterStatusListener - ERROR - Unexpected exception, continuing. com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type. at com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99) at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498) at com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193) at org.apache.hadoop.hbase.protobuf.generated.ClusterStatusProtos$ClusterStatus.init(ClusterStatusProtos.java:7554) at org.apache.hadoop.hbase.protobuf.generated.ClusterStatusProtos$ClusterStatus.init(ClusterStatusProtos.java:7512) at org.apache.hadoop.hbase.protobuf.generated.ClusterStatusProtos$ClusterStatus$1.parsePartialFrom(ClusterStatusProtos.java:7689) at org.apache.hadoop.hbase.protobuf.generated.ClusterStatusProtos$ClusterStatus$1.parsePartialFrom(ClusterStatusProtos.java:7684) at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:141) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:182) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) at org.jboss.netty.handler.codec.protobuf.ProtobufDecoder.decode(ProtobufDecoder.java:122) at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.socket.oio.OioDatagramWorker.process(OioDatagramWorker.java:52) at org.jboss.netty.channel.socket.oio.AbstractOioWorker.run(AbstractOioWorker.java:73) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} The javadoc for ProtobufDecoder says: {quote} Decodes a received ChannelBuffer into a Google Protocol Buffers Message and MessageLite. Please note that this decoder must be used with a proper FrameDecoder such as ProtobufVarint32FrameDecoder or LengthFieldBasedFrameDecoder if you are using a stream-based transport such as TCP/IP. {quote} and even though we are using a datagram transport we have related issues, depending on what the sending and receiving OS does with overly large datagrams: - We may receive a datagram with a truncated message - We may get an upcall when processing one fragment of a fragmented datagram, where the complete message is not available yet - We may not be able to send the overly large ClusterStatus in the first place. Linux claims to do PMTU and return EMSGSIZE if a datagram packet payload exceeds the MTU, but will send a fragmented datagram if PMTU is disabled. I'm surprised we have the above report given the default is to reject overly large datagram payloads, so perhaps the user is using a different server OS or Netty datagram channels do their own fragmentation (I haven't checked). In any case, the server and client pipelines are definitely not doing any kind of framing. This is the multicast status listener from
[jira] [Updated] (HBASE-11835) Wrong managenement of non expected calls in the client
[ https://issues.apache.org/jira/browse/HBASE-11835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-11835: Attachment: 11835.rebase.patch Wrong managenement of non expected calls in the client -- Key: HBASE-11835 URL: https://issues.apache.org/jira/browse/HBASE-11835 Project: HBase Issue Type: Bug Components: Client Affects Versions: 1.0.0, 2.0.0, 0.98.6 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 2.0.0, 0.99.1 Attachments: 11835.rebase.patch, rpcClient.patch If a call is purged or canceled we try to skip the reply from the server, but we read the wrong number of bytes so we corrupt the tcp channel. It's hidden as it triggers retry and so on, but it's bad for performances obviously. It happens with cell blocks. [~ram_krish_86], [~saint@gmail.com], you know this part better than me, do you agree with the analysis and the patch? The changes in rpcServer are not fully related: as the client close the connections in such situation, I observed both ClosedChannelException and CancelledKeyException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)