[jira] [Commented] (HBASE-16172) Unify the retry logic in ScannerCallableWithReplicas and RpcRetryingCallerWithReadReplicas

2016-07-06 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363979#comment-15363979
 ] 

Nicolas Liochon commented on HBASE-16172:
-

bq.  is there any needs about 'synchronized' of 
RpcRetryingCallerWithReadReplicas.call() ?
It looks like the "synchronized" can be safely removed.


> Unify the retry logic in ScannerCallableWithReplicas and 
> RpcRetryingCallerWithReadReplicas
> --
>
> Key: HBASE-16172
> URL: https://issues.apache.org/jira/browse/HBASE-16172
> Project: HBase
>  Issue Type: Bug
>Reporter: Yu Li
>Assignee: Ted Yu
> Attachments: 16172.v1.txt, 16172.v2.txt
>
>
> The issue is pointed out by [~devaraj] in HBASE-16132 (Thanks D.D.), that in 
> {{RpcRetryingCallerWithReadReplicas#call}} we will call 
> {{ResultBoundedCompletionService#take}} instead of {{poll}} to dead-wait on 
> the second one if the first replica timed out, while in 
> {{ScannerCallableWithReplicas#call}} we still use 
> {{ResultBoundedCompletionService#poll}} with some timeout for the 2nd replica.
> This JIRA aims at discussing whether to unify the logic in these two kinds of 
> caller with region replica and taking action if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15436) BufferedMutatorImpl.flush() appears to get stuck

2016-03-30 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217714#comment-15217714
 ] 

Nicolas Liochon commented on HBASE-15436:
-

bq.  There should be a cap size for the size above which we should block the 
writes. We should not take more than this limit. May be some thing like 1.5 
times of what is the flush size.
We definitively want to take more than this limit, but may be not as much as 
what we're taking today (or maybe we want to be clearer on what these settings 
mean)
There is a limit, given by the number of task executed in parallel 
(hbase.client.max.total.tasks). If I understand correctly, this setting is now 
per client (and not per htable).
Ideally these parameters should be hidden to the user (i.e. the defaults are ok 
for a standard client w/o too much memory constraints). 

bq. How long we should wait? Whether we should come out faster? 
iirc, A long time ago, the buffer was attached to the Table object, so the 
policy (or at least the objective :-)) when one of the puts had failed (i.e. 
reached the max retry number) was simple: all the operations currently in the 
buffer were considered as failed as well, even if we had not even tried to send 
them. As a consequence the buffer was empty after the failure of a single put. 
It was then up to the client to continue or not. May be we should do the same 
with the buffered mutator, for all  cases, close or not? I haven't looked at 
the bufferedMutator code, but I can have a look it you whish [~anoop.hbase]. 

bq.  What if we were doing multi Get to META table to know the region location 
for N mutations at a time.
It seems like a good idea. There are many possible optimisation on how we use 
meta, and this is one of them.




> BufferedMutatorImpl.flush() appears to get stuck
> 
>
> Key: HBASE-15436
> URL: https://issues.apache.org/jira/browse/HBASE-15436
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.0.2
>Reporter: Sangjin Lee
> Attachments: hbaseException.log, threaddump.log
>
>
> We noticed an instance where the thread that was executing a flush 
> ({{BufferedMutatorImpl.flush()}}) got stuck when the (local one-node) cluster 
> shut down and was unable to get out of that stuck state.
> The setup is a single node HBase cluster, and apparently the cluster went 
> away when the client was executing flush. The flush eventually logged a 
> failure after 30+ minutes of retrying. That is understandable.
> What is unexpected is that thread is stuck in this state (i.e. in the 
> {{flush()}} call). I would have expected the {{flush()}} call to return after 
> the complete failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10605) Manage the call timeout in the server

2016-03-07 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183013#comment-15183013
 ] 

Nicolas Liochon commented on HBASE-10605:
-

> , I think client rpc should include the timeout parameter
Yes, we would need to forward the timeout (not the the submit time, because we 
don't want to rely on having the server and client clocks in sync: the server 
can use its own clock).
Then there is already a check in the server, the request is cancelled if the 
client is disconnected (i.e. the tcp connection is closed).

> Manage the call timeout in the server
> -
>
> Key: HBASE-10605
> URL: https://issues.apache.org/jira/browse/HBASE-10605
> Project: HBase
>  Issue Type: Improvement
>  Components: IPC/RPC, regionserver
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>
> Since HBASE-10566, we have an explicit call timeout available in the client.
> We could forward it to the server, and use this information for:
> - if the call is still in the queue, just cancel it
> - if the call is under execution, makes this information available in 
> RpcCallContext (actually change the RpcCallContext#disconnectSince to 
> something more generic), so it can be used by the query under execution to 
> stop its execution
> - in the future, interrupt it to manage the case 'stuck on a dead datanode' 
> or something similar
> - if the operation has finished, don't send the reply to the client, as by 
> definition the client is not interested anymore.
> From this, it will be easy to manage the cancellation: 
> disconnect/timeout/cancellation are similar from a service execution PoV



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10605) Manage the call timeout in the server

2016-01-18 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105088#comment-15105088
 ] 

Nicolas Liochon commented on HBASE-10605:
-

Hi [~java8964], what do you need to know?
The point of this jira if that the server should not continue to handle a 
request if we know that the client has already stopped waiting for the result.


> Manage the call timeout in the server
> -
>
> Key: HBASE-10605
> URL: https://issues.apache.org/jira/browse/HBASE-10605
> Project: HBase
>  Issue Type: Improvement
>  Components: IPC/RPC, regionserver
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>
> Since HBASE-10566, we have an explicit call timeout available in the client.
> We could forward it to the server, and use this information for:
> - if the call is still in the queue, just cancel it
> - if the call is under execution, makes this information available in 
> RpcCallContext (actually change the RpcCallContext#disconnectSince to 
> something more generic), so it can be used by the query under execution to 
> stop its execution
> - in the future, interrupt it to manage the case 'stuck on a dead datanode' 
> or something similar
> - if the operation has finished, don't send the reply to the client, as by 
> definition the client is not interested anymore.
> From this, it will be easy to manage the cancellation: 
> disconnect/timeout/cancellation are similar from a service execution PoV



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-10605) Manage the call timeout in the server

2016-01-18 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10605:

Assignee: (was: Nicolas Liochon)

> Manage the call timeout in the server
> -
>
> Key: HBASE-10605
> URL: https://issues.apache.org/jira/browse/HBASE-10605
> Project: HBase
>  Issue Type: Improvement
>  Components: IPC/RPC, regionserver
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>
> Since HBASE-10566, we have an explicit call timeout available in the client.
> We could forward it to the server, and use this information for:
> - if the call is still in the queue, just cancel it
> - if the call is under execution, makes this information available in 
> RpcCallContext (actually change the RpcCallContext#disconnectSince to 
> something more generic), so it can be used by the query under execution to 
> stop its execution
> - in the future, interrupt it to manage the case 'stuck on a dead datanode' 
> or something similar
> - if the operation has finished, don't send the reply to the client, as by 
> definition the client is not interested anymore.
> From this, it will be easy to manage the cancellation: 
> disconnect/timeout/cancellation are similar from a service execution PoV



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos

2015-11-26 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15029108#comment-15029108
 ] 

Nicolas Liochon commented on HBASE-14580:
-

The made it to the .98 branch but not the 1.1. [~ndimiduk], do you want it? I 
checked, the patch can be applied and  works as expected. I can do the commit, 
just tell me the version number I should use (1.1.3? another one?)

> Make the HBaseMiniCluster compliant with Kerberos
> -
>
> Key: HBASE-14580
> URL: https://issues.apache.org/jira/browse/HBASE-14580
> Project: HBase
>  Issue Type: Improvement
>  Components: security, test
>Affects Versions: 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 2.0.0, 1.3.0, 1.2.1, 0.98.16
>
> Attachments: hbase-14580.v2.patch, hbase-14580.v2.patch, 
> patch-14580.v1.patch
>
>
> Whne using MiniKDC and the minicluster in a unit test, there is a conflict 
> causeed by HBaseTestingUtility:
> {code}
>   public static User getDifferentUser(final Configuration c,
> final String differentiatingSuffix)
>   throws IOException {
>// snip
> String username = User.getCurrent().getName() +
>   differentiatingSuffix; < problem here
> User user = User.createUserForTesting(c, username,
> new String[]{"supergroup"});
> return user;
>   }
> {code}
> This creates users like securedUser/localh...@example.com.hfs.0, and this 
> does not work.
> My fix is to return the current user when Kerberos is set. I don't think that 
> there is another option (any other opinion?). However this user is not in a 
> group so we have logs like 'WARN  [IPC Server handler 9 on 61366] 
> security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) 
> - No groups available for user securedUser' I'm not sure of its impact. 
> [~apurtell], what do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos

2015-11-26 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15029161#comment-15029161
 ] 

Nicolas Liochon commented on HBASE-14580:
-

I'm not sure I would call this a feature :-). If it's good enough for 0.98 it's 
good enough for 1.1 imho.
No problem for me anyway. In general, I don't really like when there are holes 
like this (available in version x and x+2 but not x+1), but I agree that for 
this specific jira it"s unlikely to be visible to anybody. 

> Make the HBaseMiniCluster compliant with Kerberos
> -
>
> Key: HBASE-14580
> URL: https://issues.apache.org/jira/browse/HBASE-14580
> Project: HBase
>  Issue Type: Improvement
>  Components: security, test
>Affects Versions: 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 2.0.0, 1.3.0, 1.2.1, 0.98.16
>
> Attachments: hbase-14580.v2.patch, hbase-14580.v2.patch, 
> patch-14580.v1.patch
>
>
> Whne using MiniKDC and the minicluster in a unit test, there is a conflict 
> causeed by HBaseTestingUtility:
> {code}
>   public static User getDifferentUser(final Configuration c,
> final String differentiatingSuffix)
>   throws IOException {
>// snip
> String username = User.getCurrent().getName() +
>   differentiatingSuffix; < problem here
> User user = User.createUserForTesting(c, username,
> new String[]{"supergroup"});
> return user;
>   }
> {code}
> This creates users like securedUser/localh...@example.com.hfs.0, and this 
> does not work.
> My fix is to return the current user when Kerberos is set. I don't think that 
> there is another option (any other opinion?). However this user is not in a 
> group so we have logs like 'WARN  [IPC Server handler 9 on 61366] 
> security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) 
> - No groups available for user securedUser' I'm not sure of its impact. 
> [~apurtell], what do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14700) Support a "permissive" mode for secure clusters to allow "simple" auth clients

2015-10-30 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982573#comment-14982573
 ] 

Nicolas Liochon commented on HBASE-14700:
-

+1 from me as well. I'm closing HBASE-14579 as this jira includes a fix for it 
as well.

> Support a "permissive" mode for secure clusters to allow "simple" auth clients
> --
>
> Key: HBASE-14700
> URL: https://issues.apache.org/jira/browse/HBASE-14700
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Fix For: 2.0.0
>
> Attachments: HBASE-14700-v2.patch, HBASE-14700-v3.patch, 
> HBASE-14700.patch
>
>
> When implementing HBase security for an existing cluster, it can be useful to 
> support mixed secure and insecure clients while all client configurations are 
> migrated over to secure authentication.  
> We currently have an option to allow secure clients to fallback to simple 
> auth against insecure clusters.  By providing an analogous setting for 
> servers, we would allow a phased rollout of security:
> # First, security can be enabled on the cluster servers, with the 
> "permissive" mode enabled
> # Clients can be converting to using secure authentication incrementally
> # The server audit logs allow identification of clients still using simple 
> auth to connect
> # Finally, when sufficient clients have been converted to secure operation, 
> the server-side "permissive" mode can be removed, allowing completely secure 
> operation.
> Obviously with this enabled, there is no effective access control, but this 
> would still be a useful tool to enable a smooth operational rollout of 
> security.  Permissive mode would of course be disabled by default.  Enabling 
> it should provide a big scary warning in the logs on startup, and possibly be 
> flagged on relevant UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE

2015-10-30 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-14579:

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

> Users authenticated with KERBEROS are recorded as being authenticated with 
> SIMPLE
> -
>
> Key: HBASE-14579
> URL: https://issues.apache.org/jira/browse/HBASE-14579
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.0.0, 1.2.0, 0.98.15
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: hbase-14579.patch
>
>
> That's the HBase version of HADOOP-10683.
> We see:
> ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful 
> for securedUser/localh...@example.com (auth:SIMPLE)??
> while we would like to see:
> ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful 
> for securedUser/localh...@example.com (auth:KERBEROS)??
> The fix is simple, but it means we need hadoop 2.5+. 
> There is also a lot of cases where HBase calls "createUser" w/o specifying 
> the authentication method... I don"'t have the solution for these ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE

2015-10-30 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982571#comment-14982571
 ] 

Nicolas Liochon commented on HBASE-14579:
-

> Does this also happen for users authenticated with authentication tokens 
> ("auth:SIMPLE" instead of "auth:TOKEN" or "auth:DIGEST")? 
For digest, I tink it's ok, the code is RpcServer is

{code}
private UserGroupInformation getAuthorizedUgi(String authorizedId) throws 
IOException {
  if(this.authMethod == AuthMethod.DIGEST) {
TokenIdentifier tokenId = 
HBaseSaslRpcServer.getIdentifier(authorizedId, RpcServer.this.secretManager);
UserGroupInformation ugi = tokenId.getUser();
if(ugi == null) {
  throw new AccessDeniedException("Can\'t retrieve username from 
tokenIdentifier.");
} else {
  ugi.addTokenIdentifier(tokenId);
  return ugi;
}
  } else {
return UserGroupInformation.createRemoteUser(authorizedId); 
< auth method replaced by "SIMPLE"
  }
}
{code}


> The latest patch (v3) for HBASE-14700 contains a fix for the UGI auth method 
> logged. Please take a look there if you have a chance.
Looking...

> Users authenticated with KERBEROS are recorded as being authenticated with 
> SIMPLE
> -
>
> Key: HBASE-14579
> URL: https://issues.apache.org/jira/browse/HBASE-14579
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.0.0, 1.2.0, 0.98.15
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: hbase-14579.patch
>
>
> That's the HBase version of HADOOP-10683.
> We see:
> ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful 
> for securedUser/localh...@example.com (auth:SIMPLE)??
> while we would like to see:
> ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful 
> for securedUser/localh...@example.com (auth:KERBEROS)??
> The fix is simple, but it means we need hadoop 2.5+. 
> There is also a lot of cases where HBase calls "createUser" w/o specifying 
> the authentication method... I don"'t have the solution for these ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11590) use a specific ThreadPoolExecutor

2015-10-15 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958560#comment-14958560
 ] 

Nicolas Liochon commented on HBASE-11590:
-

The issue is that the ThreadPoolExecutor  leaked all over the place, often for 
monitoring reasons.
All lot of code depends on ThreadPoolExecutor  rather than the 
ExecutorService...

For example, see 
{code}
/**
 * This class will coalesce increments from a thift server if
 * hbase.regionserver.thrift.coalesceIncrement is set to true. Turning this
 * config to true will cause the thrift server to queue increments into an
 * instance of this class. The thread pool associated with this class will drain
 * the coalesced increments as the thread is able. This can cause data loss if 
the
 * thrift server dies or is shut down before everything in the queue is drained.
 *
 */
public class IncrementCoalescer implements IncrementCoalescerMBean {
// snip
  // MBean get/set methods
  public int getQueueSize() {
return pool.getQueue().size();
  }
  public int getMaxQueueSize() {
return this.maxQueueSize;
  }
  public void setMaxQueueSize(int newSize) {
this.maxQueueSize = newSize;
  }

  public long getPoolCompletedTaskCount() {
return pool.getCompletedTaskCount();
  }
  public long getPoolTaskCount() {
return pool.getTaskCount();
  }
  public int getPoolLargestPoolSize() {
return pool.getLargestPoolSize();
  }
  public int getCorePoolSize() {
return pool.getCorePoolSize();
  }
  public void setCorePoolSize(int newCoreSize) {
pool.setCorePoolSize(newCoreSize);
  }
  public int getMaxPoolSize() {
return pool.getMaximumPoolSize();
  }
  public void setMaxPoolSize(int newMaxSize) {
pool.setMaximumPoolSize(newMaxSize);
  }
{code}

I'm going to limit this patch to the easy/client stuff...

> use a specific ThreadPoolExecutor
> -
>
> Key: HBASE-11590
> URL: https://issues.apache.org/jira/browse/HBASE-11590
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Performance
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: ExecutorServiceTest.java, 
> LifoThreadPoolExecutorSQP.java, UnitQueueP.java, UnitQueuePU.java, tp.patch
>
>
> The JDK TPE creates all the threads in the pool. As a consequence, we create 
> (by default) 256 threads even if we just need a few.
> The attached TPE create threads only if we have something in the queue.
> On a PE test with replica on, it improved the 99 latency percentile by 5%. 
> Warning: there are likely some race conditions, but I'm posting it here 
> because there is may be an implementation available somewhere we can use, or 
> a good reason not to do that. So feedback welcome as usual. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14521) Unify the semantic of hbase.client.retries.number

2015-10-15 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-14521:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: (was: 1.3.0)
   Status: Resolved  (was: Patch Available)

Committed to master. I didn't commit to 1.x branch because it's a  behavior 
change...

Thanks for the patch, Yu!.

> Unify the semantic of hbase.client.retries.number
> -
>
> Key: HBASE-14521
> URL: https://issues.apache.org/jira/browse/HBASE-14521
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-14521.patch, HBASE-14521_v2.patch, 
> HBASE-14521_v3.patch
>
>
> From name of the _hbase.client.retries.number_ property, it should be the 
> number of maximum *retries*, or say if we set the property to 1, there should 
> be 2 attempts in total. However, there're two different semantics when using 
> it in current code base.
> For example, in ConnectionImplementation#locateRegionInMeta:
> {code}
> int localNumRetries = (retry ? numTries : 1);
> for (int tries = 0; true; tries++) {
>   if (tries >= localNumRetries) {
> throw new NoServerForRegionException("Unable to find region for "
> + Bytes.toStringBinary(row) + " in " + tableName +
> " after " + numTries + " tries.");
>   }
> {code}
> the retries number is regarded as max times for *tries*
> While in RpcRetryingCallerImpl#callWithRetries:
> {code}
> for (int tries = 0;; tries++) {
>   long expectedSleep;
>   try {
> callable.prepare(tries != 0); // if called with false, check table 
> status on ZK
> interceptor.intercept(context.prepare(callable, tries));
> return callable.call(getRemainingTime(callTimeout));
>   } catch (PreemptiveFastFailException e) {
> throw e;
>   } catch (Throwable t) {
> ...
> if (tries >= retries - 1) {
>   throw new RetriesExhaustedException(tries, exceptions);
> }
> {code}
> it's regarded as exactly for *REtry* (try a call first with no condition and 
> then check whether to retry or exceeds maximum retry number)
> This inconsistency will cause misunderstanding in usage, such as one of our 
> customer set the property to zero expecting one single call but finally 
> received NoServerForRegionException.
> We should unify the semantic of the property, and I suggest to keep the 
> original one for retry rather than total tries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11590) use a specific ThreadPoolExecutor

2015-10-15 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-11590:

Attachment: HBASE-11590.v1.patch

> use a specific ThreadPoolExecutor
> -
>
> Key: HBASE-11590
> URL: https://issues.apache.org/jira/browse/HBASE-11590
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Performance
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: ExecutorServiceTest.java, HBASE-11590.v1.patch, 
> LifoThreadPoolExecutorSQP.java, UnitQueueP.java, UnitQueuePU.java, tp.patch
>
>
> The JDK TPE creates all the threads in the pool. As a consequence, we create 
> (by default) 256 threads even if we just need a few.
> The attached TPE create threads only if we have something in the queue.
> On a PE test with replica on, it improved the 99 latency percentile by 5%. 
> Warning: there are likely some race conditions, but I'm posting it here 
> because there is may be an implementation available somewhere we can use, or 
> a good reason not to do that. So feedback welcome as usual. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11590) use a specific ThreadPoolExecutor

2015-10-15 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-11590:

Status: Patch Available  (was: Open)

> use a specific ThreadPoolExecutor
> -
>
> Key: HBASE-11590
> URL: https://issues.apache.org/jira/browse/HBASE-11590
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Performance
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: ExecutorServiceTest.java, HBASE-11590.v1.patch, 
> LifoThreadPoolExecutorSQP.java, UnitQueueP.java, UnitQueuePU.java, tp.patch
>
>
> The JDK TPE creates all the threads in the pool. As a consequence, we create 
> (by default) 256 threads even if we just need a few.
> The attached TPE create threads only if we have something in the queue.
> On a PE test with replica on, it improved the 99 latency percentile by 5%. 
> Warning: there are likely some race conditions, but I'm posting it here 
> because there is may be an implementation available somewhere we can use, or 
> a good reason not to do that. So feedback welcome as usual. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11590) use a specific ThreadPoolExecutor

2015-10-15 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958906#comment-14958906
 ] 

Nicolas Liochon commented on HBASE-11590:
-

The patch compiles locally, but it's all I checked.
client side: use the ForkJoin instead of ThreadPoolExecutor; remove the 
monitoring linked to ThreadPoolExecutor
server side: when possible; use the interface (ExecutorService) instead of the 
implementation (ThreadPoolExecutor)

> use a specific ThreadPoolExecutor
> -
>
> Key: HBASE-11590
> URL: https://issues.apache.org/jira/browse/HBASE-11590
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Performance
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: ExecutorServiceTest.java, HBASE-11590.v1.patch, 
> LifoThreadPoolExecutorSQP.java, UnitQueueP.java, UnitQueuePU.java, tp.patch
>
>
> The JDK TPE creates all the threads in the pool. As a consequence, we create 
> (by default) 256 threads even if we just need a few.
> The attached TPE create threads only if we have something in the queue.
> On a PE test with replica on, it improved the 99 latency percentile by 5%. 
> Warning: there are likely some race conditions, but I'm posting it here 
> because there is may be an implementation available somewhere we can use, or 
> a good reason not to do that. So feedback welcome as usual. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE

2015-10-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954636#comment-14954636
 ] 

Nicolas Liochon commented on HBASE-14579:
-

Thanks Stack, yes, it would be great. I could change the script, but I can't 
easily test it right now.

> Users authenticated with KERBEROS are recorded as being authenticated with 
> SIMPLE
> -
>
> Key: HBASE-14579
> URL: https://issues.apache.org/jira/browse/HBASE-14579
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.0.0, 1.2.0, 0.98.15
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: hbase-14579.patch
>
>
> That's the HBase version of HADOOP-10683.
> We see:
> ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful 
> for securedUser/localh...@example.com (auth:SIMPLE)??
> while we would like to see:
> ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful 
> for securedUser/localh...@example.com (auth:KERBEROS)??
> The fix is simple, but it means we need hadoop 2.5+. 
> There is also a lot of cases where HBase calls "createUser" w/o specifying 
> the authentication method... I don"'t have the solution for these ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos

2015-10-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954583#comment-14954583
 ] 

Nicolas Liochon commented on HBASE-14580:
-

This second run makes more sense :-). I'm going to commit on the master branch. 
 [~ndimiduk], you may want this for the 1.2 branch?

> Make the HBaseMiniCluster compliant with Kerberos
> -
>
> Key: HBASE-14580
> URL: https://issues.apache.org/jira/browse/HBASE-14580
> Project: HBase
>  Issue Type: Improvement
>  Components: security, test
>Affects Versions: 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 2.0.0
>
> Attachments: hbase-14580.v2.patch, hbase-14580.v2.patch, 
> patch-14580.v1.patch
>
>
> Whne using MiniKDC and the minicluster in a unit test, there is a conflict 
> causeed by HBaseTestingUtility:
> {code}
>   public static User getDifferentUser(final Configuration c,
> final String differentiatingSuffix)
>   throws IOException {
>// snip
> String username = User.getCurrent().getName() +
>   differentiatingSuffix; < problem here
> User user = User.createUserForTesting(c, username,
> new String[]{"supergroup"});
> return user;
>   }
> {code}
> This creates users like securedUser/localh...@example.com.hfs.0, and this 
> does not work.
> My fix is to return the current user when Kerberos is set. I don't think that 
> there is another option (any other opinion?). However this user is not in a 
> group so we have logs like 'WARN  [IPC Server handler 9 on 61366] 
> security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) 
> - No groups available for user securedUser' I'm not sure of its impact. 
> [~apurtell], what do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14268) Improve KeyLocker

2015-10-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954690#comment-14954690
 ] 

Nicolas Liochon commented on HBASE-14268:
-

I just saw that, I'm having a look.

> Improve KeyLocker
> -
>
> Key: HBASE-14268
> URL: https://issues.apache.org/jira/browse/HBASE-14268
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Reporter: Hiroshi Ikeda
>Assignee: Hiroshi Ikeda
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14268-V5.patch, HBASE-14268-V2.patch, 
> HBASE-14268-V3.patch, HBASE-14268-V4.patch, HBASE-14268-V5.patch, 
> HBASE-14268-V5.patch, HBASE-14268-V6.patch, HBASE-14268-V7.patch, 
> HBASE-14268-V7.patch, HBASE-14268-V7.patch, HBASE-14268-V7.patch, 
> HBASE-14268-V7.patch, HBASE-14268-V7.patch, HBASE-14268-V7.patch, 
> HBASE-14268.patch, KeyLockerIncrKeysPerformance.java, 
> KeyLockerPerformance.java, ReferenceTestApp.java
>
>
> 1. In the implementation of {{KeyLocker}} it uses atomic variables inside a 
> synchronized block, which doesn't make sense. Moreover, logic inside the 
> synchronized block is not trivial so that it makes less performance in heavy 
> multi-threaded environment.
> 2. {{KeyLocker}} gives an instance of {{RentrantLock}} which is already 
> locked, but it doesn't follow the contract of {{ReentrantLock}} because you 
> are not allowed to freely invoke lock/unlock methods under that contract. 
> That introduces a potential risk; Whenever you see a variable of the type 
> {{RentrantLock}}, you should pay attention to what the included instance is 
> coming from.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14268) Improve KeyLocker

2015-10-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954701#comment-14954701
 ] 

Nicolas Liochon commented on HBASE-14268:
-

[~sreenivasulureddy]It should be ok now. I added the two missing files.

> Improve KeyLocker
> -
>
> Key: HBASE-14268
> URL: https://issues.apache.org/jira/browse/HBASE-14268
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Reporter: Hiroshi Ikeda
>Assignee: Hiroshi Ikeda
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14268-V5.patch, HBASE-14268-V2.patch, 
> HBASE-14268-V3.patch, HBASE-14268-V4.patch, HBASE-14268-V5.patch, 
> HBASE-14268-V5.patch, HBASE-14268-V6.patch, HBASE-14268-V7.patch, 
> HBASE-14268-V7.patch, HBASE-14268-V7.patch, HBASE-14268-V7.patch, 
> HBASE-14268-V7.patch, HBASE-14268-V7.patch, HBASE-14268-V7.patch, 
> HBASE-14268.patch, KeyLockerIncrKeysPerformance.java, 
> KeyLockerPerformance.java, ReferenceTestApp.java
>
>
> 1. In the implementation of {{KeyLocker}} it uses atomic variables inside a 
> synchronized block, which doesn't make sense. Moreover, logic inside the 
> synchronized block is not trivial so that it makes less performance in heavy 
> multi-threaded environment.
> 2. {{KeyLocker}} gives an instance of {{RentrantLock}} which is already 
> locked, but it doesn't follow the contract of {{ReentrantLock}} because you 
> are not allowed to freely invoke lock/unlock methods under that contract. 
> That introduces a potential risk; Whenever you see a variable of the type 
> {{RentrantLock}}, you should pay attention to what the included instance is 
> coming from.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos

2015-10-13 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-14580:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed on master only

> Make the HBaseMiniCluster compliant with Kerberos
> -
>
> Key: HBASE-14580
> URL: https://issues.apache.org/jira/browse/HBASE-14580
> Project: HBase
>  Issue Type: Improvement
>  Components: security, test
>Affects Versions: 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 2.0.0
>
> Attachments: hbase-14580.v2.patch, hbase-14580.v2.patch, 
> patch-14580.v1.patch
>
>
> Whne using MiniKDC and the minicluster in a unit test, there is a conflict 
> causeed by HBaseTestingUtility:
> {code}
>   public static User getDifferentUser(final Configuration c,
> final String differentiatingSuffix)
>   throws IOException {
>// snip
> String username = User.getCurrent().getName() +
>   differentiatingSuffix; < problem here
> User user = User.createUserForTesting(c, username,
> new String[]{"supergroup"});
> return user;
>   }
> {code}
> This creates users like securedUser/localh...@example.com.hfs.0, and this 
> does not work.
> My fix is to return the current user when Kerberos is set. I don't think that 
> there is another option (any other opinion?). However this user is not in a 
> group so we have logs like 'WARN  [IPC Server handler 9 on 61366] 
> security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) 
> - No groups available for user securedUser' I'm not sure of its impact. 
> [~apurtell], what do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-14521) Unify the semantic of hbase.client.retries.number

2015-10-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955354#comment-14955354
 ] 

Nicolas Liochon edited comment on HBASE-14521 at 10/13/15 5:52 PM:
---

Yep [~carp84], I think your analysis is correct: it was a workaround.

While looking again at the patch, I found a typo that I will fix on commit
> public RetriesExhaustedException(final int numReries,

I'm +1, I will commit on master tomorrow my time if nobody disagrees.



was (Author: nkeywal):
Yep [~carp84], I think your analysis is correct: it was a workaround.

While looking again at the patch, I found a typo that I will fix on commit
> public RetriesExhaustedException(final int numReries,

I'm +1, I will commit on branch2 tomorrow my time if nobody disagrees.


> Unify the semantic of hbase.client.retries.number
> -
>
> Key: HBASE-14521
> URL: https://issues.apache.org/jira/browse/HBASE-14521
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14521.patch, HBASE-14521_v2.patch, 
> HBASE-14521_v3.patch
>
>
> From name of the _hbase.client.retries.number_ property, it should be the 
> number of maximum *retries*, or say if we set the property to 1, there should 
> be 2 attempts in total. However, there're two different semantics when using 
> it in current code base.
> For example, in ConnectionImplementation#locateRegionInMeta:
> {code}
> int localNumRetries = (retry ? numTries : 1);
> for (int tries = 0; true; tries++) {
>   if (tries >= localNumRetries) {
> throw new NoServerForRegionException("Unable to find region for "
> + Bytes.toStringBinary(row) + " in " + tableName +
> " after " + numTries + " tries.");
>   }
> {code}
> the retries number is regarded as max times for *tries*
> While in RpcRetryingCallerImpl#callWithRetries:
> {code}
> for (int tries = 0;; tries++) {
>   long expectedSleep;
>   try {
> callable.prepare(tries != 0); // if called with false, check table 
> status on ZK
> interceptor.intercept(context.prepare(callable, tries));
> return callable.call(getRemainingTime(callTimeout));
>   } catch (PreemptiveFastFailException e) {
> throw e;
>   } catch (Throwable t) {
> ...
> if (tries >= retries - 1) {
>   throw new RetriesExhaustedException(tries, exceptions);
> }
> {code}
> it's regarded as exactly for *REtry* (try a call first with no condition and 
> then check whether to retry or exceeds maximum retry number)
> This inconsistency will cause misunderstanding in usage, such as one of our 
> customer set the property to zero expecting one single call but finally 
> received NoServerForRegionException.
> We should unify the semantic of the property, and I suggest to keep the 
> original one for retry rather than total tries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos

2015-10-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955296#comment-14955296
 ] 

Nicolas Liochon commented on HBASE-14580:
-

Committed to the 1.2 branch.

> Make the HBaseMiniCluster compliant with Kerberos
> -
>
> Key: HBASE-14580
> URL: https://issues.apache.org/jira/browse/HBASE-14580
> Project: HBase
>  Issue Type: Improvement
>  Components: security, test
>Affects Versions: 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 2.0.0, 1.2.1
>
> Attachments: hbase-14580.v2.patch, hbase-14580.v2.patch, 
> patch-14580.v1.patch
>
>
> Whne using MiniKDC and the minicluster in a unit test, there is a conflict 
> causeed by HBaseTestingUtility:
> {code}
>   public static User getDifferentUser(final Configuration c,
> final String differentiatingSuffix)
>   throws IOException {
>// snip
> String username = User.getCurrent().getName() +
>   differentiatingSuffix; < problem here
> User user = User.createUserForTesting(c, username,
> new String[]{"supergroup"});
> return user;
>   }
> {code}
> This creates users like securedUser/localh...@example.com.hfs.0, and this 
> does not work.
> My fix is to return the current user when Kerberos is set. I don't think that 
> there is another option (any other opinion?). However this user is not in a 
> group so we have logs like 'WARN  [IPC Server handler 9 on 61366] 
> security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) 
> - No groups available for user securedUser' I'm not sure of its impact. 
> [~apurtell], what do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos

2015-10-13 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-14580:

Fix Version/s: 1.2.1

> Make the HBaseMiniCluster compliant with Kerberos
> -
>
> Key: HBASE-14580
> URL: https://issues.apache.org/jira/browse/HBASE-14580
> Project: HBase
>  Issue Type: Improvement
>  Components: security, test
>Affects Versions: 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 2.0.0, 1.2.1
>
> Attachments: hbase-14580.v2.patch, hbase-14580.v2.patch, 
> patch-14580.v1.patch
>
>
> Whne using MiniKDC and the minicluster in a unit test, there is a conflict 
> causeed by HBaseTestingUtility:
> {code}
>   public static User getDifferentUser(final Configuration c,
> final String differentiatingSuffix)
>   throws IOException {
>// snip
> String username = User.getCurrent().getName() +
>   differentiatingSuffix; < problem here
> User user = User.createUserForTesting(c, username,
> new String[]{"supergroup"});
> return user;
>   }
> {code}
> This creates users like securedUser/localh...@example.com.hfs.0, and this 
> does not work.
> My fix is to return the current user when Kerberos is set. I don't think that 
> there is another option (any other opinion?). However this user is not in a 
> group so we have logs like 'WARN  [IPC Server handler 9 on 61366] 
> security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) 
> - No groups available for user securedUser' I'm not sure of its impact. 
> [~apurtell], what do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14521) Unify the semantic of hbase.client.retries.number

2015-10-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955354#comment-14955354
 ] 

Nicolas Liochon commented on HBASE-14521:
-

Yep [~carp84], I think your analysis is correct: it was a workaround.

While looking again at the patch, I found a typo that I will fix on commit
> public RetriesExhaustedException(final int numReries,

I'm +1, I will commit on branch2 tomorrow my time if nobody disagrees.


> Unify the semantic of hbase.client.retries.number
> -
>
> Key: HBASE-14521
> URL: https://issues.apache.org/jira/browse/HBASE-14521
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14521.patch, HBASE-14521_v2.patch, 
> HBASE-14521_v3.patch
>
>
> From name of the _hbase.client.retries.number_ property, it should be the 
> number of maximum *retries*, or say if we set the property to 1, there should 
> be 2 attempts in total. However, there're two different semantics when using 
> it in current code base.
> For example, in ConnectionImplementation#locateRegionInMeta:
> {code}
> int localNumRetries = (retry ? numTries : 1);
> for (int tries = 0; true; tries++) {
>   if (tries >= localNumRetries) {
> throw new NoServerForRegionException("Unable to find region for "
> + Bytes.toStringBinary(row) + " in " + tableName +
> " after " + numTries + " tries.");
>   }
> {code}
> the retries number is regarded as max times for *tries*
> While in RpcRetryingCallerImpl#callWithRetries:
> {code}
> for (int tries = 0;; tries++) {
>   long expectedSleep;
>   try {
> callable.prepare(tries != 0); // if called with false, check table 
> status on ZK
> interceptor.intercept(context.prepare(callable, tries));
> return callable.call(getRemainingTime(callTimeout));
>   } catch (PreemptiveFastFailException e) {
> throw e;
>   } catch (Throwable t) {
> ...
> if (tries >= retries - 1) {
>   throw new RetriesExhaustedException(tries, exceptions);
> }
> {code}
> it's regarded as exactly for *REtry* (try a call first with no condition and 
> then check whether to retry or exceeds maximum retry number)
> This inconsistency will cause misunderstanding in usage, such as one of our 
> customer set the property to zero expecting one single call but finally 
> received NoServerForRegionException.
> We should unify the semantic of the property, and I suggest to keep the 
> original one for retry rather than total tries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11590) use a specific ThreadPoolExecutor

2015-10-11 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952237#comment-14952237
 ] 

Nicolas Liochon commented on HBASE-11590:
-

> maybe just because it is more parsimonious in its thread use?
That's the magic part: even of there is a single thread in the pool it's faster 
than the others. I didn't check if it consumes more CPU or not however.

I will do the patch to use ForkJoin soon (hopefully today, if not next week).  

> use a specific ThreadPoolExecutor
> -
>
> Key: HBASE-11590
> URL: https://issues.apache.org/jira/browse/HBASE-11590
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Performance
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: ExecutorServiceTest.java, 
> LifoThreadPoolExecutorSQP.java, UnitQueueP.java, UnitQueuePU.java, tp.patch
>
>
> The JDK TPE creates all the threads in the pool. As a consequence, we create 
> (by default) 256 threads even if we just need a few.
> The attached TPE create threads only if we have something in the queue.
> On a PE test with replica on, it improved the 99 latency percentile by 5%. 
> Warning: there are likely some race conditions, but I'm posting it here 
> because there is may be an implementation available somewhere we can use, or 
> a good reason not to do that. So feedback welcome as usual. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos

2015-10-11 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-14580:

Attachment: hbase-14580.v2.patch

> Make the HBaseMiniCluster compliant with Kerberos
> -
>
> Key: HBASE-14580
> URL: https://issues.apache.org/jira/browse/HBASE-14580
> Project: HBase
>  Issue Type: Improvement
>  Components: security, test
>Affects Versions: 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 2.0.0
>
> Attachments: hbase-14580.v2.patch, patch-14580.v1.patch
>
>
> Whne using MiniKDC and the minicluster in a unit test, there is a conflict 
> causeed by HBaseTestingUtility:
> {code}
>   public static User getDifferentUser(final Configuration c,
> final String differentiatingSuffix)
>   throws IOException {
>// snip
> String username = User.getCurrent().getName() +
>   differentiatingSuffix; < problem here
> User user = User.createUserForTesting(c, username,
> new String[]{"supergroup"});
> return user;
>   }
> {code}
> This creates users like securedUser/localh...@example.com.hfs.0, and this 
> does not work.
> My fix is to return the current user when Kerberos is set. I don't think that 
> there is another option (any other opinion?). However this user is not in a 
> group so we have logs like 'WARN  [IPC Server handler 9 on 61366] 
> security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) 
> - No groups available for user securedUser' I'm not sure of its impact. 
> [~apurtell], what do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos

2015-10-11 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-14580:

Status: Patch Available  (was: Open)

> Make the HBaseMiniCluster compliant with Kerberos
> -
>
> Key: HBASE-14580
> URL: https://issues.apache.org/jira/browse/HBASE-14580
> Project: HBase
>  Issue Type: Improvement
>  Components: security, test
>Affects Versions: 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 2.0.0
>
> Attachments: hbase-14580.v2.patch, patch-14580.v1.patch
>
>
> Whne using MiniKDC and the minicluster in a unit test, there is a conflict 
> causeed by HBaseTestingUtility:
> {code}
>   public static User getDifferentUser(final Configuration c,
> final String differentiatingSuffix)
>   throws IOException {
>// snip
> String username = User.getCurrent().getName() +
>   differentiatingSuffix; < problem here
> User user = User.createUserForTesting(c, username,
> new String[]{"supergroup"});
> return user;
>   }
> {code}
> This creates users like securedUser/localh...@example.com.hfs.0, and this 
> does not work.
> My fix is to return the current user when Kerberos is set. I don't think that 
> there is another option (any other opinion?). However this user is not in a 
> group so we have logs like 'WARN  [IPC Server handler 9 on 61366] 
> security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) 
> - No groups available for user securedUser' I'm not sure of its impact. 
> [~apurtell], what do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE

2015-10-11 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952250#comment-14952250
 ] 

Nicolas Liochon commented on HBASE-14579:
-

I'm not sure I understand correctly the test patch script. Can I just change 
the property 

{code}
# All supported Hadoop versions that we want to test the compilation with
HADOOP2_VERSIONS="2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1"
{code}

to 
{code}
HADOOP2_VERSIONS="2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1"
{code}

Or is there is a risk to hide a problem a patch could cause to the 0.98 release 
(and even the 0.94)?

We will need to update the matrix in the hbase book as well...

> Users authenticated with KERBEROS are recorded as being authenticated with 
> SIMPLE
> -
>
> Key: HBASE-14579
> URL: https://issues.apache.org/jira/browse/HBASE-14579
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.0.0, 1.2.0, 0.98.15
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: hbase-14579.patch
>
>
> That's the HBase version of HADOOP-10683.
> We see:
> ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful 
> for securedUser/localh...@example.com (auth:SIMPLE)??
> while we would like to see:
> ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful 
> for securedUser/localh...@example.com (auth:KERBEROS)??
> The fix is simple, but it means we need hadoop 2.5+. 
> There is also a lot of cases where HBase calls "createUser" w/o specifying 
> the authentication method... I don"'t have the solution for these ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos

2015-10-11 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-14580:

Status: Open  (was: Patch Available)

> Make the HBaseMiniCluster compliant with Kerberos
> -
>
> Key: HBASE-14580
> URL: https://issues.apache.org/jira/browse/HBASE-14580
> Project: HBase
>  Issue Type: Improvement
>  Components: security, test
>Affects Versions: 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 2.0.0
>
> Attachments: patch-14580.v1.patch
>
>
> Whne using MiniKDC and the minicluster in a unit test, there is a conflict 
> causeed by HBaseTestingUtility:
> {code}
>   public static User getDifferentUser(final Configuration c,
> final String differentiatingSuffix)
>   throws IOException {
>// snip
> String username = User.getCurrent().getName() +
>   differentiatingSuffix; < problem here
> User user = User.createUserForTesting(c, username,
> new String[]{"supergroup"});
> return user;
>   }
> {code}
> This creates users like securedUser/localh...@example.com.hfs.0, and this 
> does not work.
> My fix is to return the current user when Kerberos is set. I don't think that 
> there is another option (any other opinion?). However this user is not in a 
> group so we have logs like 'WARN  [IPC Server handler 9 on 61366] 
> security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) 
> - No groups available for user securedUser' I'm not sure of its impact. 
> [~apurtell], what do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos

2015-10-11 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952241#comment-14952241
 ] 

Nicolas Liochon commented on HBASE-14580:
-

>  I think it's just a log spam issue – see HADOOP-12450.
Oh, yeah. Thanks for the pointer.

> Instead of hard-coding the config values, can you use 
> User.isHBaseSecurityEnabled(c)?
Yes, you're right. I updated the patch.

> The username suffixes were fed into the data dirs used by each DN/RS's for a 
> "distributed" minicluster setup
> So, as I understand it, that would not be an issue here are Kerberos would 
> only be supported with a single node setup?
I'm not sure here: I don't see why the user name is needed in the data dirs. 
But in any case, this patch does not break anything, as the suffix approach 
clashes with the kerberos realm...

> As Gary said, this is harmless unless you're test depends on a valid group 
> membership.
Thanks.

Thanks for the reviews, all. I will commit the v2 if hadoop-qa passes with the 
v2.

> Make the HBaseMiniCluster compliant with Kerberos
> -
>
> Key: HBASE-14580
> URL: https://issues.apache.org/jira/browse/HBASE-14580
> Project: HBase
>  Issue Type: Improvement
>  Components: security, test
>Affects Versions: 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 2.0.0
>
> Attachments: patch-14580.v1.patch
>
>
> Whne using MiniKDC and the minicluster in a unit test, there is a conflict 
> causeed by HBaseTestingUtility:
> {code}
>   public static User getDifferentUser(final Configuration c,
> final String differentiatingSuffix)
>   throws IOException {
>// snip
> String username = User.getCurrent().getName() +
>   differentiatingSuffix; < problem here
> User user = User.createUserForTesting(c, username,
> new String[]{"supergroup"});
> return user;
>   }
> {code}
> This creates users like securedUser/localh...@example.com.hfs.0, and this 
> does not work.
> My fix is to return the current user when Kerberos is set. I don't think that 
> there is another option (any other opinion?). However this user is not in a 
> group so we have logs like 'WARN  [IPC Server handler 9 on 61366] 
> security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) 
> - No groups available for user securedUser' I'm not sure of its impact. 
> [~apurtell], what do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14479) Apply the Leader/Followers pattern to RpcServer's Reader

2015-10-11 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952235#comment-14952235
 ] 

Nicolas Liochon commented on HBASE-14479:
-

Yeah, I tried to get rid of this array of readers a while back, but I didn't 
push the patch because I didn't get any significant result. Nice work, [~ikeda]

> Apply the Leader/Followers pattern to RpcServer's Reader
> 
>
> Key: HBASE-14479
> URL: https://issues.apache.org/jira/browse/HBASE-14479
> Project: HBase
>  Issue Type: Improvement
>  Components: IPC/RPC, Performance
>Reporter: Hiroshi Ikeda
>Assignee: Hiroshi Ikeda
>Priority: Minor
> Attachments: HBASE-14479-V2 (1).patch, HBASE-14479-V2.patch, 
> HBASE-14479-V2.patch, HBASE-14479.patch, gc.png, gets.png, io.png, median.png
>
>
> {{RpcServer}} uses multiple selectors to read data for load distribution, but 
> the distribution is just done by round-robin. It is uncertain, especially for 
> long run, whether load is equally divided and resources are used without 
> being wasted.
> Moreover, multiple selectors may cause excessive context switches which give 
> priority to low latency (while we just add the requests to queues), and it is 
> possible to reduce throughput of the whole server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14521) Unify the semantic of hbase.client.retries.number

2015-10-09 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950283#comment-14950283
 ] 

Nicolas Liochon commented on HBASE-14521:
-

It's a good point: the existing implementation is confusing.
The patch looks good. It contains a lot of cleanup that will make the code 
easier to read (thanks, Yu!)

I'm surprised by this:
{code}
@@ -137,7 +137,6 @@ public class TestAsyncProcess {
   AsyncRequestFutureImpl r = super.createAsyncRequestFuture(
   DUMMY_TABLE, actions, nonceGroup, pool, callback, results, 
needResults);
   allReqs.add(r);
-  callsCt.incrementAndGet();  <=== We should continue to count the 
calls, no?
   return r;
 }
{code}

Note that setting retries to zero is most of the time an error as we can have a 
retry in many cases, for example iif the client cache is not up to date 
(contains the wrong region server for a region). 

> Unify the semantic of hbase.client.retries.number
> -
>
> Key: HBASE-14521
> URL: https://issues.apache.org/jira/browse/HBASE-14521
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14521.patch, HBASE-14521_v2.patch, 
> HBASE-14521_v3.patch
>
>
> From name of the _hbase.client.retries.number_ property, it should be the 
> number of maximum *retries*, or say if we set the property to 1, there should 
> be 2 attempts in total. However, there're two different semantics when using 
> it in current code base.
> For example, in ConnectionImplementation#locateRegionInMeta:
> {code}
> int localNumRetries = (retry ? numTries : 1);
> for (int tries = 0; true; tries++) {
>   if (tries >= localNumRetries) {
> throw new NoServerForRegionException("Unable to find region for "
> + Bytes.toStringBinary(row) + " in " + tableName +
> " after " + numTries + " tries.");
>   }
> {code}
> the retries number is regarded as max times for *tries*
> While in RpcRetryingCallerImpl#callWithRetries:
> {code}
> for (int tries = 0;; tries++) {
>   long expectedSleep;
>   try {
> callable.prepare(tries != 0); // if called with false, check table 
> status on ZK
> interceptor.intercept(context.prepare(callable, tries));
> return callable.call(getRemainingTime(callTimeout));
>   } catch (PreemptiveFastFailException e) {
> throw e;
>   } catch (Throwable t) {
> ...
> if (tries >= retries - 1) {
>   throw new RetriesExhaustedException(tries, exceptions);
> }
> {code}
> it's regarded as exactly for *REtry* (try a call first with no condition and 
> then check whether to retry or exceeds maximum retry number)
> This inconsistency will cause misunderstanding in usage, such as one of our 
> customer set the property to zero expecting one single call but finally 
> received NoServerForRegionException.
> We should unify the semantic of the property, and I suggest to keep the 
> original one for retry rather than total tries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11590) use a specific ThreadPoolExecutor

2015-10-08 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948255#comment-14948255
 ] 

Nicolas Liochon commented on HBASE-11590:
-

Hey [~saint@gmail.com]

Attached some tests comparing ThreadPoolExecutor (the one we use currently), 
ForkJoinPool (available in jdk1.7+) and LifoThreadPoolExecutorSQP (the one 
mentionned in the stackoverflow discussion) .

- the critical use case is:
   1) do a table.batch(puts) that needs a lot of threads
   2) then do a loop { table.get(get) }, this needs a single thread but each 
call may use any of the threads in the pool, resetting the keepalive timeout => 
they may never expire.
ThreadPoolExecutor is actually worse it tries to create a thread even if there 
are already enough threads available.

 See the code for the details, but here is the interesting case with a thread 
pools of 1000 threads while we need only 1 thread.
{quote}
   * ForkJoinPool maxThread=1000, immediateGet=true, LOOP=200
   * ForkJoinPool total=68942ms
   * ForkJoinPool step1=68657ms
   * ForkJoinPool step2=284ms
   * ForkJoinPool threads: 6, 1006, 456, 6  <=== we have 456 threads instead of 
the ideal 7

   * ThreadPoolExecutor maxThread=1000, immediateGet=true, LOOP=200
   * ThreadPoolExecutor total=107449ms <=== very slow
   * ThreadPoolExecutor step1=107145ms
   * ThreadPoolExecutor step2=304ms
   * ThreadPoolExecutor threads: 6, 1006, 889, 6 <== keeps nearly all  the 
threads -
 
   * LifoThreadPoolExecutorSQP maxThread=1000, immediateGet=true, LOOP=200
   * LifoThreadPoolExecutorSQP total=4805ms < quite fast
   * LifoThreadPoolExecutorSQP step1=4803ms
   * LifoThreadPoolExecutorSQP step2=1ms
   * LifoThreadPoolExecutorSQP threads: 6, 248, 8, 6 <== 
removes the threads quickly
{quote}

You may want to rerun the tests to see if you reproduce them. I included my 
results in the code.

- The root issue is that we need a LIFO poll/lock but it does not exists.
- LifoThreadPoolExecutorSQP solves this with a LIFO queues for the threads 
waiting for work. But it
 comes with a LGPL license, and the code is not trivial. A bug there could be 
difficult to find. It
  is however incredible to see how faster/better it is compared to the other 
pools.
- ForkJoinPool is better then TPE. It's not as good as 
LifoThreadPoolExecutorSQP, but it's much
 closer to what we need. It's available in the JDK 1.7 it looks like a safe bet 
for HBase 1.+
 ForkJoinPool: threads are created only if there are waiting tasks. They expire 
after 2seconds (it's
  hardcoded in the jdk code). They are not LIFO, and the task allocation is not 
as fast as the one in LifoThreadPoolExecutorSQP.

=> Proposition: Let's migrate to ForkJoinPool. If someone has time to try 
LifoThreadPoolExecutorSQP it can be interesting in the future (if the license 
can be changed)...

> use a specific ThreadPoolExecutor
> -
>
> Key: HBASE-11590
> URL: https://issues.apache.org/jira/browse/HBASE-11590
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Performance
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: tp.patch
>
>
> The JDK TPE creates all the threads in the pool. As a consequence, we create 
> (by default) 256 threads even if we just need a few.
> The attached TPE create threads only if we have something in the queue.
> On a PE test with replica on, it improved the 99 latency percentile by 5%. 
> Warning: there are likely some race conditions, but I'm posting it here 
> because there is may be an implementation available somewhere we can use, or 
> a good reason not to do that. So feedback welcome as usual. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11590) use a specific ThreadPoolExecutor

2015-10-08 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-11590:

Attachment: ExecutorServiceTest.java
UnitQueuePU.java
UnitQueueP.java
LifoThreadPoolExecutorSQP.java

> use a specific ThreadPoolExecutor
> -
>
> Key: HBASE-11590
> URL: https://issues.apache.org/jira/browse/HBASE-11590
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Performance
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: ExecutorServiceTest.java, 
> LifoThreadPoolExecutorSQP.java, UnitQueueP.java, UnitQueuePU.java, tp.patch
>
>
> The JDK TPE creates all the threads in the pool. As a consequence, we create 
> (by default) 256 threads even if we just need a few.
> The attached TPE create threads only if we have something in the queue.
> On a PE test with replica on, it improved the 99 latency percentile by 5%. 
> Warning: there are likely some race conditions, but I'm posting it here 
> because there is may be an implementation available somewhere we can use, or 
> a good reason not to do that. So feedback welcome as usual. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE

2015-10-08 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-14579:

Hadoop Flags: Incompatible change
  Status: Patch Available  (was: Open)

> Users authenticated with KERBEROS are recorded as being authenticated with 
> SIMPLE
> -
>
> Key: HBASE-14579
> URL: https://issues.apache.org/jira/browse/HBASE-14579
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.98.15, 1.0.0, 1.2.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: hbase-14579.patch
>
>
> That's the HBase version of HADOOP-10683.
> The fix is simple, but it means we need hadoop 2.5+. 
> There is also a lot of cases where HBase calls "createUser" w/o specifying 
> the authentication method... I don"'t have the solution for these ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE

2015-10-08 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-14579:

Attachment: hbase-14579.patch

> Users authenticated with KERBEROS are recorded as being authenticated with 
> SIMPLE
> -
>
> Key: HBASE-14579
> URL: https://issues.apache.org/jira/browse/HBASE-14579
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.0.0, 1.2.0, 0.98.15
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: hbase-14579.patch
>
>
> That's the HBase version of HADOOP-10683.
> The fix is simple, but it means we need hadoop 2.5+. 
> There is also a lot of cases where HBase calls "createUser" w/o specifying 
> the authentication method... I don"'t have the solution for these ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos

2015-10-08 Thread Nicolas Liochon (JIRA)
Nicolas Liochon created HBASE-14580:
---

 Summary: Make the HBaseMiniCluster compliant with Kerberos
 Key: HBASE-14580
 URL: https://issues.apache.org/jira/browse/HBASE-14580
 Project: HBase
  Issue Type: Improvement
  Components: security, test
Affects Versions: 2.0.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 2.0.0


Whne using MiniKDC and the minicluster in a unit test, there is a conflict 
causeed by HBaseTestingUtility:

{code}
  public static User getDifferentUser(final Configuration c,
final String differentiatingSuffix)
  throws IOException {
   // snip
String username = User.getCurrent().getName() +
  differentiatingSuffix; < problem here
User user = User.createUserForTesting(c, username,
new String[]{"supergroup"});
return user;
  }
{code}

This creates users like securedUser/localh...@example.com.hfs.0, and this does 
not work.

My fix is to return the current user when Kerberos is set. I don't think that 
there is another option (any other opinion?). However this user is not in a 
group so we have logs like 'WARN  [IPC Server handler 9 on 61366] 
security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - 
No groups available for user securedUser' I'm not sure of its impact. 
[~apurtell], what do you think?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE

2015-10-08 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-14579:

Description: 
That's the HBase version of HADOOP-10683.

We see:
??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful for 
securedUser/localh...@example.com (auth:SIMPLE)??

while we would like to see:
??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful for 
securedUser/localh...@example.com (auth:KERBEROS)??

The fix is simple, but it means we need hadoop 2.5+. 
There is also a lot of cases where HBase calls "createUser" w/o specifying the 
authentication method... I don"'t have the solution for these ones.


  was:
That's the HBase version of HADOOP-10683.

The fix is simple, but it means we need hadoop 2.5+. 
There is also a lot of cases where HBase calls "createUser" w/o specifying the 
authentication method... I don"'t have the solution for these ones.



> Users authenticated with KERBEROS are recorded as being authenticated with 
> SIMPLE
> -
>
> Key: HBASE-14579
> URL: https://issues.apache.org/jira/browse/HBASE-14579
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.0.0, 1.2.0, 0.98.15
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: hbase-14579.patch
>
>
> That's the HBase version of HADOOP-10683.
> We see:
> ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful 
> for securedUser/localh...@example.com (auth:SIMPLE)??
> while we would like to see:
> ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful 
> for securedUser/localh...@example.com (auth:KERBEROS)??
> The fix is simple, but it means we need hadoop 2.5+. 
> There is also a lot of cases where HBase calls "createUser" w/o specifying 
> the authentication method... I don"'t have the solution for these ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE

2015-10-08 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948684#comment-14948684
 ] 

Nicolas Liochon commented on HBASE-14579:
-

> The patch appears to cause mvn compile goal to fail with Hadoop version 2.4.0.
Yes. Is that an issue for the 2.0 branch?

> Users authenticated with KERBEROS are recorded as being authenticated with 
> SIMPLE
> -
>
> Key: HBASE-14579
> URL: https://issues.apache.org/jira/browse/HBASE-14579
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.0.0, 1.2.0, 0.98.15
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: hbase-14579.patch
>
>
> That's the HBase version of HADOOP-10683.
> We see:
> ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful 
> for securedUser/localh...@example.com (auth:SIMPLE)??
> while we would like to see:
> ??hbase.Server (RpcServer.java:saslReadAndProcess(1446)) - Auth successful 
> for securedUser/localh...@example.com (auth:KERBEROS)??
> The fix is simple, but it means we need hadoop 2.5+. 
> There is also a lot of cases where HBase calls "createUser" w/o specifying 
> the authentication method... I don"'t have the solution for these ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14580) Make the HBaseMiniCluster compliant with Kerberos

2015-10-08 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-14580:

Status: Patch Available  (was: Open)

> Make the HBaseMiniCluster compliant with Kerberos
> -
>
> Key: HBASE-14580
> URL: https://issues.apache.org/jira/browse/HBASE-14580
> Project: HBase
>  Issue Type: Improvement
>  Components: security, test
>Affects Versions: 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 2.0.0
>
> Attachments: patch-14580.v1.patch
>
>
> Whne using MiniKDC and the minicluster in a unit test, there is a conflict 
> causeed by HBaseTestingUtility:
> {code}
>   public static User getDifferentUser(final Configuration c,
> final String differentiatingSuffix)
>   throws IOException {
>// snip
> String username = User.getCurrent().getName() +
>   differentiatingSuffix; < problem here
> User user = User.createUserForTesting(c, username,
> new String[]{"supergroup"});
> return user;
>   }
> {code}
> This creates users like securedUser/localh...@example.com.hfs.0, and this 
> does not work.
> My fix is to return the current user when Kerberos is set. I don't think that 
> there is another option (any other opinion?). However this user is not in a 
> group so we have logs like 'WARN  [IPC Server handler 9 on 61366] 
> security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) 
> - No groups available for user securedUser' I'm not sure of its impact. 
> [~apurtell], what do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14579) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE

2015-10-08 Thread Nicolas Liochon (JIRA)
Nicolas Liochon created HBASE-14579:
---

 Summary: Users authenticated with KERBEROS are recorded as being 
authenticated with SIMPLE
 Key: HBASE-14579
 URL: https://issues.apache.org/jira/browse/HBASE-14579
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.98.15, 1.0.0, 1.2.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Minor
 Fix For: 2.0.0


That's the HBase version of HADOOP-10683.

The fix is simple, but it means we need hadoop 2.5+. 
There is also a lot of cases where HBase calls "createUser" w/o specifying the 
authentication method... I don"'t have the solution for these ones.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11590) use a specific ThreadPoolExecutor

2015-09-17 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804373#comment-14804373
 ] 

Nicolas Liochon commented on HBASE-11590:
-

If we cut down the timeout, it's more or less equivalent of not having a thread 
pool at all. 
One of the things I don't like in many solutions (the TPE I wrote myself 
included) is that we have a race condition: we may create a thread even if it's 
not needed.
I'm off for 3 days, but I will try to find a reasonable solution next week.

> use a specific ThreadPoolExecutor
> -
>
> Key: HBASE-11590
> URL: https://issues.apache.org/jira/browse/HBASE-11590
> Project: HBase
>  Issue Type: Bug
>  Components: Client, Performance
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: tp.patch
>
>
> The JDK TPE creates all the threads in the pool. As a consequence, we create 
> (by default) 256 threads even if we just need a few.
> The attached TPE create threads only if we have something in the queue.
> On a PE test with replica on, it improved the 99 latency percentile by 5%. 
> Warning: there are likely some race conditions, but I'm posting it here 
> because there is may be an implementation available somewhere we can use, or 
> a good reason not to do that. So feedback welcome as usual. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager

2015-09-16 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14768844#comment-14768844
 ] 

Nicolas Liochon commented on HBASE-10449:
-

What's happening for the expire is:
- we have a 60s timeout with 256 seconds.
- let's imagine we have 1 query per second. We will still have 60 threads, 
because each new request will create a new thread until we reach coreSize. As 
the timeout is 60s, the oldest threads will expire after 60s. 

I haven't double-checked, but I believe that the threads are needed because of 
the old i/o pattern. So we do need a max in the x00 range (it's like this since 
0.90 at least. In theory, it's good for small cluster (100 nodes), but not as 
good if the cluster is composed of thousands of nodes)

I did actually spent some time on this a year ago, in HBASE-11590. @stack, what 
do you think of the approach? I can finish the work I started there. But I will 
need a review. There are also some ideas/hacks in 
http://stackoverflow.com/questions/19528304/how-to-get-the-threadpoolexecutor-to-increase-threads-to-max-before-queueing/19528305#19528305
 I haven't reviewed them yet.


> Wrong execution pool configuration in HConnectionManager
> 
>
> Key: HBASE-10449
> URL: https://issues.apache.org/jira/browse/HBASE-10449
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.99.0, 0.96.1.1
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Critical
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: HBASE-10449.v1.patch
>
>
> There is a confusion in the configuration of the pool. The attached patch 
> fixes this. This may change the client performances, as we were using a 
> single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager

2015-09-16 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14790660#comment-14790660
 ] 

Nicolas Liochon commented on HBASE-10449:
-

> I was thinking that we'd go to core size – say # of cores – and then if one 
> request a second, we'd just stay at core size because there would be a free 
> thread when the request-per-second came in (assuming request took a good deal 
> < a second).

I expect that if we have more than coreSize calls in timeout (256 vs 60 seconds 
in our case) then we always have coreSize threads.

> Didn't we have a mock server somewhere such that we could standup a client 
> with no friction and watch it in operation? I thought we'd make such a 
> beast
Yep, you built one, we used it when we looked at the perf issues in the client 
(the protobuf nightmare if you remember ;:-)). 


> Wrong execution pool configuration in HConnectionManager
> 
>
> Key: HBASE-10449
> URL: https://issues.apache.org/jira/browse/HBASE-10449
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.99.0, 0.96.1.1
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Critical
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: HBASE-10449.v1.patch
>
>
> There is a confusion in the configuration of the pool. The attached patch 
> fixes this. This may change the client performances, as we were using a 
> single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager

2015-09-16 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791129#comment-14791129
 ] 

Nicolas Liochon commented on HBASE-10449:
-

The algo for the ThreadPoolExecutor is:

onNewTask(){
  if (currentSize < coreSize) createNewThread() else reuseThread()
}

And there is a timeout for each thread.

So if we do a coreSize of 2, a time of 20s, and a query every 15s, we have:
0s query1: create thread1, poolSize=1
15s query2: create thread2, poolSize=2
20s close thread1, poolSize=1
30s query3: create thread3, poolSize=2
35s: close thread2, poolSize=1
45s: query4: create thread4, poolSize=2

And so on. So even if we have 1 query each 15s, we have 2 threads in the pool 
nearly all the time.

> Yes. Smile. Need to revive it for here and for doing client timeouts
I found the code in TestClientNoCluster#run , ready to be reused!

I think we need to go for a hack like in Stackoverflow or for a different 
implementation for TPE like HBASE-11590...

> Wrong execution pool configuration in HConnectionManager
> 
>
> Key: HBASE-10449
> URL: https://issues.apache.org/jira/browse/HBASE-10449
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.99.0, 0.96.1.1
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Critical
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: HBASE-10449.v1.patch
>
>
> There is a confusion in the configuration of the pool. The attached patch 
> fixes this. This may change the client performances, as we were using a 
> single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager

2015-09-16 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791145#comment-14791145
 ] 

Nicolas Liochon commented on HBASE-10449:
-

It's the former: in this case, the queries are queued. A new thread will be 
created only when the queue is full. Then, if we reach maxThreads and the queue 
is full the new tasks are rejected. In our case the queue is nearly unbounded, 
so we stay with corePoolSize.

> Wrong execution pool configuration in HConnectionManager
> 
>
> Key: HBASE-10449
> URL: https://issues.apache.org/jira/browse/HBASE-10449
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.99.0, 0.96.1.1
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Critical
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: HBASE-10449.v1.patch
>
>
> There is a confusion in the configuration of the pool. The attached patch 
> fixes this. This may change the client performances, as we were using a 
> single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager

2015-09-15 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746919#comment-14746919
 ] 

Nicolas Liochon commented on HBASE-10449:
-

Actually I'm having two doubts:
- the core threads should already have this timeout, no. We should not see 256 
threads, because they should expire already
- IIRC, this thread pool is used when connecting to the various regionserver, 
and they block until they have an answer. So with 4 core threads (for example), 
it means that if we do a multi we contact 4 servers simultaneously at most. The 
threads are not really using CPUs, they're waiting  (old i/o style). BUt may be 
it has changed?





> Wrong execution pool configuration in HConnectionManager
> 
>
> Key: HBASE-10449
> URL: https://issues.apache.org/jira/browse/HBASE-10449
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.99.0, 0.96.1.1
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Critical
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: HBASE-10449.v1.patch
>
>
> There is a confusion in the configuration of the pool. The attached patch 
> fixes this. This may change the client performances, as we were using a 
> single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager

2015-09-15 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746897#comment-14746897
 ] 

Nicolas Liochon commented on HBASE-10449:
-

As I understand the doc, if we do that we create maxThreads and then reject all 
the tasks. Not really useful.
But the patch in HBASE-14433 seems ok:
- we create up to core threads (Runtime.getRuntime().availableProcessors()). If 
we have 10 tasks in parallel we still have 
Runtime.getRuntime().availableProcessors() threads.
- the expire quite quickly (because we do allowCoreThreadTimeOut(true);)

May be we should set maxThreads to coreThreads as well and increase 
HConstants.DEFAULT_HBASE_CLIENT_MAX_TOTAL_TASKS.

But I'm +1 with HBASE-14433 as it is now.

> Wrong execution pool configuration in HConnectionManager
> 
>
> Key: HBASE-10449
> URL: https://issues.apache.org/jira/browse/HBASE-10449
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.99.0, 0.96.1.1
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Critical
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: HBASE-10449.v1.patch
>
>
> There is a confusion in the configuration of the pool. The attached patch 
> fixes this. This may change the client performances, as we were using a 
> single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager

2015-09-15 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746863#comment-14746863
 ] 

Nicolas Liochon commented on HBASE-10449:
-

Sorry for the delay, I'm seeing this now only.
Let me have a look.

> Wrong execution pool configuration in HConnectionManager
> 
>
> Key: HBASE-10449
> URL: https://issues.apache.org/jira/browse/HBASE-10449
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.99.0, 0.96.1.1
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Critical
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: HBASE-10449.v1.patch
>
>
> There is a confusion in the configuration of the pool. The attached patch 
> fixes this. This may change the client performances, as we were using a 
> single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10449) Wrong execution pool configuration in HConnectionManager

2015-09-15 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746875#comment-14746875
 ] 

Nicolas Liochon commented on HBASE-10449:
-

> Where does 'Create a single thread, queue all the tasks for this thread.' 
> come from?
This is what HBASE-9917 actually implemented: with the ThreadPoolExecutor if 
the task queue is unbounded, it does not create new threads:

From: 
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html
If fewer than corePoolSize threads are running, the Executor always prefers 
adding a new thread rather than queuing.
If corePoolSize or more threads are running, the Executor always prefers 
queuing a request rather than adding a new thread.
If a request cannot be queued, a new thread is created unless this would exceed 
maximumPoolSize, in which case, the task will be rejected.

But having less than 256 threads is fine. This was just restoring the previous 
value.


> Wrong execution pool configuration in HConnectionManager
> 
>
> Key: HBASE-10449
> URL: https://issues.apache.org/jira/browse/HBASE-10449
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.99.0, 0.96.1.1
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>Priority: Critical
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: HBASE-10449.v1.patch
>
>
> There is a confusion in the configuration of the pool. The attached patch 
> fixes this. This may change the client performances, as we were using a 
> single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13865) Increase the default value for hbase.hregion.memstore.block.multipler from 2 to 4 (part 2)

2015-08-06 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-13865:

Release Note: Increase default hbase.hregion.memstore.block.multiplier from 
2 to 4 in the code to match the default value in the config files.  (was: 
Increase hbase.hregion.memstore.block.multiplier from 2 to 4)

 Increase the default value for hbase.hregion.memstore.block.multipler from 2 
 to 4 (part 2)
 --

 Key: HBASE-13865
 URL: https://issues.apache.org/jira/browse/HBASE-13865
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 2.0.0
Reporter: Vladimir Rodionov
Assignee: Gabor Liptak
Priority: Trivial
 Fix For: 2.0.0, 0.98.14, 1.3.0, 1.2.1, 1.0.3, 1.1.3

 Attachments: HBASE-13865.1.patch, HBASE-13865.2.patch, 
 HBASE-13865.2.patch


 Its 4 in the book and 2 in a current master. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13865) Increase the default value for hbase.hregion.memstore.block.multipler from 2 to 4 (part 2)

2015-08-06 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659909#comment-14659909
 ] 

Nicolas Liochon commented on HBASE-13865:
-

Hey Nick :-)

If I'm not mistaken (I'm always confused by the various config files...), the 
patch should not change the behavior for most common deployments, because the 
value is set to 4 in the hbase-default.xml (and for the users who set it to 2: 
the xml config is used first, it won't change for them as well).

So:
- The patch is a good cleanup imho
- It's safe as it does not change the behavior.

+1

I updated the release notes.

 Increase the default value for hbase.hregion.memstore.block.multipler from 2 
 to 4 (part 2)
 --

 Key: HBASE-13865
 URL: https://issues.apache.org/jira/browse/HBASE-13865
 Project: HBase
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.0.0
Reporter: Vladimir Rodionov
Assignee: Gabor Liptak
Priority: Trivial
 Fix For: 2.0.0, 0.98.14, 1.3.0, 1.2.1, 1.0.3, 1.1.3

 Attachments: HBASE-13865.1.patch, HBASE-13865.2.patch, 
 HBASE-13865.2.patch


 Its 4 in the book and 2 in a current master. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13865) Increase the default value for hbase.hregion.memstore.block.multipler from 2 to 4 (part 2)

2015-08-06 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-13865:

Component/s: (was: documentation)
 regionserver

 Increase the default value for hbase.hregion.memstore.block.multipler from 2 
 to 4 (part 2)
 --

 Key: HBASE-13865
 URL: https://issues.apache.org/jira/browse/HBASE-13865
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 2.0.0
Reporter: Vladimir Rodionov
Assignee: Gabor Liptak
Priority: Trivial
 Fix For: 2.0.0, 0.98.14, 1.3.0, 1.2.1, 1.0.3, 1.1.3

 Attachments: HBASE-13865.1.patch, HBASE-13865.2.patch, 
 HBASE-13865.2.patch


 Its 4 in the book and 2 in a current master. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-01 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610738#comment-14610738
 ] 

Nicolas Liochon commented on HBASE-13992:
-

+1 as well for me. How does it work for the binaries version, will we have to 
enter into the scala game, i.e. hbase-spark-2_10? What about the spark version? 
The spark-hadoop version?


 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Ted Malaska
  Labels: spark

 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13647) Default value for hbase.client.operation.timeout is too high

2015-05-31 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566396#comment-14566396
 ] 

Nicolas Liochon commented on HBASE-13647:
-

I would recommend 20 minutes. The idea is that if a machine fails and the 
recovery needs a hdfs timeout (10:30 mins) we have some extra time. As well, 
iirc with the default retries number and pause we around 15 minutes today. It 
seems better to default above that. 

I kept the operation timeout in the htable stuff (but it's not me who put it 
there :-) ), but now I wonder if we should not just remove it from this code 
path: it overlaps with the number of retries, and does it add that much value?

 Default value for hbase.client.operation.timeout is too high
 

 Key: HBASE-13647
 URL: https://issues.apache.org/jira/browse/HBASE-13647
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 1.0.1, 0.98.13, 1.2.0, 1.1.1
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev
Priority: Blocker
 Fix For: 2.0.0, 0.98.13, 1.0.2, 1.2.0, 1.1.1

 Attachments: HBASE-13647.patch, HBASE-13647.v2.patch


 Default value for hbase.client.operation.timeout is too high, it is LONG.Max.
 That value will block any service calls to coprocessor endpoints indefinitely.
 Should we introduce better default value for that?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-12116) Hot contention spots; writing

2015-04-15 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496031#comment-14496031
 ] 

Nicolas Liochon edited comment on HBASE-12116 at 4/15/15 10:50 AM:
---

I had a look at crc a while ago.
My understanding back then was that there are specific instruction in x86 
processors to calculate crc, unfortunately a little bit different than the 
standard crc32. When I was looking at Intel/Hadoop roadmap 2 years ago, it 
looked like Intel was planning to do the changes in hadoop to use the hw one.

There is some info here: http://www.strchr.com/crc32_popcnt



was (Author: nkeywal):
I had a look at crc a while ago.
My understanding back then was that there are specific instruction in x86 
processors to calculate crc, unfortunately a little bit different than the 
standard crc32. When I was looking at Intel/Hadoop roadmap 2 years ago, it 
looked like Intel was planning to do the changes in hadoop to use the hw one.

There are some info here: http://www.strchr.com/crc32_popcnt


 Hot contention spots; writing
 -

 Key: HBASE-12116
 URL: https://issues.apache.org/jira/browse/HBASE-12116
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Attachments: 12116.checkForReplicas.txt, 
 12116.stringify.and.cache.scanner.maxsize.txt, 12116.txt, Screen Shot 
 2014-09-29 at 5.12.51 PM.png, Screen Shot 2014-09-30 at 10.39.34 PM.png, 
 Screen Shot 2015-04-13 at 2.03.05 PM.png, perf.write3.svg, perf.write4.svg


 Playing with flight recorder, here are some write-time contentious 
 synchronizations/locks (picture coming)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12116) Hot contention spots; writing

2015-04-15 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496031#comment-14496031
 ] 

Nicolas Liochon commented on HBASE-12116:
-

I had a look at crc a while ago.
My understanding back then was that there are specific instruction in x86 
processors to calculate crc, unfortunately a little bit different than the 
standard crc32. When I was looking at Intel/Hadoop roadmap 2 years ago, it 
looked like Intel was planning to do the changes in hadoop to use the hw one.

There are some info here: http://www.strchr.com/crc32_popcnt


 Hot contention spots; writing
 -

 Key: HBASE-12116
 URL: https://issues.apache.org/jira/browse/HBASE-12116
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Attachments: 12116.checkForReplicas.txt, 
 12116.stringify.and.cache.scanner.maxsize.txt, 12116.txt, Screen Shot 
 2014-09-29 at 5.12.51 PM.png, Screen Shot 2014-09-30 at 10.39.34 PM.png, 
 Screen Shot 2015-04-13 at 2.03.05 PM.png, perf.write3.svg, perf.write4.svg


 Playing with flight recorder, here are some write-time contentious 
 synchronizations/locks (picture coming)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13272) Get.setClosestRowBefore() breaks specific column Get

2015-03-20 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371234#comment-14371234
 ] 

Nicolas Liochon commented on HBASE-13272:
-

The HTable#getRowOrBefore does a get#setClosestRowBefore(true);
Yeah, I should have deprecated both. I think setClosestRowBefore is really old, 
but may be I'm wrong.



From the code
- It seems it's not used in HBase now
- I have not found a test as well.
- it seems it does not work if you're hitting a region boundary (i.e. the 
closest_row_before is in another region).
- It's limited to single family as well (RSRpcServices.java)
get ClosestRowBefore supports one and only one family now, not 
  + get.getColumnCount() +  families);

I think this can be replaced by the reverseScanner, hopefully reverseScanner 
covers more usages.

My guess is that it leaked getRowOrBefore  was purely internal and got 
deprecated in 0.92:
   * @deprecated As of version 0.92 this method is deprecated without
   * replacement. Since version 0.96+, you can use reversed scan.
   * getRowOrBefore is used internally to find entries in hbase:meta and makes
   * various assumptions about the table (which are true for hbase:meta but not
   * in general) to be efficient.

My guess is that Get#setClosestRowBefore was there only for the meta table and 
has been forgotten on the deprecation path.

Now I'm not against a fix, we're open source :-) and anyway we can't remove the 
feature in less than two hbase releases.
But from the client code point of view using the reverse scanner seems safer. 
imho setClosestRowBefore should be deprecated as soon as possible: very ad-hoc, 
not used in the internal code, not tested, fails on cross boundaries calls, 
fails on multiple families, and this jira as a bounty: these are good reasons 
imho.






 Get.setClosestRowBefore() breaks specific column Get
 

 Key: HBASE-13272
 URL: https://issues.apache.org/jira/browse/HBASE-13272
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Trivial

 Via [~larsgeorge]
 Get.setClosestRowBefore() is breaking a specific Get that specifies a column. 
 If you set the latter to true it will return the _entire_ row!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13286) Minimum timeout for a rpc call could be 1 ms instead of 2 seconds

2015-03-20 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-13286:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

committed to master, thanks for the reviews!

 Minimum timeout for a rpc call could be 1 ms instead of 2 seconds
 -

 Key: HBASE-13286
 URL: https://issues.apache.org/jira/browse/HBASE-13286
 Project: HBase
  Issue Type: Improvement
  Components: Client
Affects Versions: 1.0.0, 0.98.12
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Minor
 Fix For: 1.1.0

 Attachments: 13286.patch


 There is a check in the client to be sure that we don't use a timeout of zero 
 (i.e. infinite). This includes setting the minimal time out for a rpc timeout 
 to 2 seconds. However, it makes sense for some calls (typically gets going to 
 the cache) to have much lower timeouts. So it's better to do the check vs. 
 zero but with a minimal timeout of 1. 
 I fixed a typo  a wrong comment in this patch as well. I don't understand 
 this code:
 {code}
   // t could be a RemoteException so go around again.
   translateException(t); // We don't use the result?
 {code}
 but may be it's good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13272) Get.setClosestRowBefore() breaks specific column Get

2015-03-19 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368872#comment-14368872
 ] 

Nicolas Liochon commented on HBASE-13272:
-

On the other hand if it's broken it's not that useful to keep it :-)

 Get.setClosestRowBefore() breaks specific column Get
 

 Key: HBASE-13272
 URL: https://issues.apache.org/jira/browse/HBASE-13272
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Trivial

 Via [~larsgeorge]
 Get.setClosestRowBefore() is breaking a specific Get that specifies a column. 
 If you set the latter to true it will return the _entire_ row!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13272) Get.setClosestRowBefore() breaks specific column Get

2015-03-19 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368871#comment-14368871
 ] 

Nicolas Liochon commented on HBASE-13272:
-

I'm +1 for the suppression (I though I deprecated it already, may be I'm wrong 
or I missed some of the interfaces), but it needs to be done carefully: we need 
to keep it on the server/protobuf for a while as we want the old clients to be 
able to speak to the new servers.

 Get.setClosestRowBefore() breaks specific column Get
 

 Key: HBASE-13272
 URL: https://issues.apache.org/jira/browse/HBASE-13272
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Trivial

 Via [~larsgeorge]
 Get.setClosestRowBefore() is breaking a specific Get that specifies a column. 
 If you set the latter to true it will return the _entire_ row!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13188) java.lang.ArithmeticException issue in BoundedByteBufferPool.putBuffer

2015-03-19 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369261#comment-14369261
 ] 

Nicolas Liochon commented on HBASE-13188:
-

[~saint@gmail.com] if you're interested, there is a ByteBufferPool in 
HBASE-9535. If I understand well 9535 is now irrelevant (please close it if 
it's the case), but may be there is some code to take there.

 java.lang.ArithmeticException issue in BoundedByteBufferPool.putBuffer
 --

 Key: HBASE-13188
 URL: https://issues.apache.org/jira/browse/HBASE-13188
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13188.patch


 Running a range scan with PE tool with 25 threads getting this error
 {code}
 java.lang.ArithmeticException: / by zero
 at 
 org.apache.hadoop.hbase.io.BoundedByteBufferPool.putBuffer(BoundedByteBufferPool.java:104)
 at org.apache.hadoop.hbase.ipc.RpcServer$Call.done(RpcServer.java:325)
 at 
 org.apache.hadoop.hbase.ipc.RpcServer$Responder.processResponse(RpcServer.java:1078)
 at 
 org.apache.hadoop.hbase.ipc.RpcServer$Responder.processAllResponses(RpcServer.java:1103)
 at 
 org.apache.hadoop.hbase.ipc.RpcServer$Responder.doAsyncWrite(RpcServer.java:1036)
 at 
 org.apache.hadoop.hbase.ipc.RpcServer$Responder.doRunLoop(RpcServer.java:956)
 at 
 org.apache.hadoop.hbase.ipc.RpcServer$Responder.run(RpcServer.java:891)
 {code}
 I checked in the trunk code also.  I think the comment in the code suggests 
 that the size will not be exact so there is a chance that it could be even 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-13286) Minimum timeout for a rpc call could be 1 ms instead of 2 seconds

2015-03-19 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369735#comment-14369735
 ] 

Nicolas Liochon edited comment on HBASE-13286 at 3/19/15 5:17 PM:
--

bq. Good catch.
Not really, it's me who put it a while ago (to fix the infinite timeout), and 
I've been questioning myself about this for a while. :-)

bq. Why remove MIN_RPC_TIMEOUT though? Why not just set it to 1 instead of 2000?
I thought it would make the code simpler to read. As you like, I can change it.

For the stack during the tests:
org.apache.oozie.test.MiniHCatServer$1.run(MiniHCatServer.java:137)

It's because H10 is configured to run two builds in parallel, and this is 
looking for trouble. We ran with a oozie build.

From what I see the findbug does not come from this patch.

I will commit tomorrow my time if there is no objection.



was (Author: nkeywal):
bq. Good catch.
Not really, it's me who put it a while ago (to fix the infinite timeout), and 
I've been questioning myself about this for a while. :-)

bq. Why remove MIN_RPC_TIMEOUT though? Why not just set it to 1 instead of 2000?
I thought it would make the code simpler to read. As you like, I can change it.

For the stack during the tests:
org.apache.oozie.test.MiniHCatServer$1.run(MiniHCatServer.java:137)

It's because H10 is configured to run two builds in parallel, and this is 
looking for trouble. We ran with a oozie build.

From what I see the findbug is already not from me.

I will commit tomorrow my time if there is no objection.


 Minimum timeout for a rpc call could be 1 ms instead of 2 seconds
 -

 Key: HBASE-13286
 URL: https://issues.apache.org/jira/browse/HBASE-13286
 Project: HBase
  Issue Type: Improvement
  Components: Client
Affects Versions: 1.0.0, 0.98.12
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Minor
 Fix For: 1.1.0

 Attachments: 13286.patch


 There is a check in the client to be sure that we don't use a timeout of zero 
 (i.e. infinite). This includes setting the minimal time out for a rpc timeout 
 to 2 seconds. However, it makes sense for some calls (typically gets going to 
 the cache) to have much lower timeouts. So it's better to do the check vs. 
 zero but with a minimal timeout of 1. 
 I fixed a typo  a wrong comment in this patch as well. I don't understand 
 this code:
 {code}
   // t could be a RemoteException so go around again.
   translateException(t); // We don't use the result?
 {code}
 but may be it's good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13286) Minimum timeout for a rpc call could be 1 ms instead of 2 seconds

2015-03-19 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369735#comment-14369735
 ] 

Nicolas Liochon commented on HBASE-13286:
-

bq. Good catch.
Not really, it's me who put it a while ago (to fix the infinite timeout), and 
I've been questioning myself about this for a while. :-)

bq. Why remove MIN_RPC_TIMEOUT though? Why not just set it to 1 instead of 2000?
I thought it would make the code simpler to read. As you like, I can change it.

For the stack during the tests:
org.apache.oozie.test.MiniHCatServer$1.run(MiniHCatServer.java:137)

It's because H10 is configured to run two builds in parallel, and this is 
looking for trouble. We ran with a oozie build.

From what I see the findbug is already not from me.

I will commit tomorrow my time if there is no objection.


 Minimum timeout for a rpc call could be 1 ms instead of 2 seconds
 -

 Key: HBASE-13286
 URL: https://issues.apache.org/jira/browse/HBASE-13286
 Project: HBase
  Issue Type: Improvement
  Components: Client
Affects Versions: 1.0.0, 0.98.12
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Minor
 Fix For: 1.1.0

 Attachments: 13286.patch


 There is a check in the client to be sure that we don't use a timeout of zero 
 (i.e. infinite). This includes setting the minimal time out for a rpc timeout 
 to 2 seconds. However, it makes sense for some calls (typically gets going to 
 the cache) to have much lower timeouts. So it's better to do the check vs. 
 zero but with a minimal timeout of 1. 
 I fixed a typo  a wrong comment in this patch as well. I don't understand 
 this code:
 {code}
   // t could be a RemoteException so go around again.
   translateException(t); // We don't use the result?
 {code}
 but may be it's good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13286) Minimum timeout for a rpc call could be 1 ms instead of 2 seconds

2015-03-19 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-13286:

Attachment: 13286.patch

 Minimum timeout for a rpc call could be 1 ms instead of 2 seconds
 -

 Key: HBASE-13286
 URL: https://issues.apache.org/jira/browse/HBASE-13286
 Project: HBase
  Issue Type: Improvement
  Components: Client
Affects Versions: 1.0.0, 0.98.12
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Minor
 Fix For: 1.1.0

 Attachments: 13286.patch


 There is a check in the client to be sure that we don't use a timeout of zero 
 (i.e. infinite). This includes setting the minimal time out for a rpc timeout 
 to 2 seconds. However, it makes sense for some calls (typically gets going to 
 the cache) to have much lower timeouts. So it's better to do the check vs. 
 zero but with a minimal timeout of 0. 
 I fixed a typo  a wrong comment in this patch as well. I don't understand 
 this code:
 {code}
   // t could be a RemoteException so go around again.
   translateException(t); // We don't use the result?
 {code}
 but may be it's good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13286) Minimum timeout for a rpc call could be 1 ms instead of 2 seconds

2015-03-19 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-13286:

Description: 
There is a check in the client to be sure that we don't use a timeout of zero 
(i.e. infinite). This includes setting the minimal time out for a rpc timeout 
to 2 seconds. However, it makes sense for some calls (typically gets going to 
the cache) to have much lower timeouts. So it's better to do the check vs. zero 
but with a minimal timeout of 1. 

I fixed a typo  a wrong comment in this patch as well. I don't understand this 
code:
{code}
  // t could be a RemoteException so go around again.
  translateException(t); // We don't use the result?
{code}

but may be it's good.

  was:
There is a check in the client to be sure that we don't use a timeout of zero 
(i.e. infinite). This includes setting the minimal time out for a rpc timeout 
to 2 seconds. However, it makes sense for some calls (typically gets going to 
the cache) to have much lower timeouts. So it's better to do the check vs. zero 
but with a minimal timeout of 0. 

I fixed a typo  a wrong comment in this patch as well. I don't understand this 
code:
{code}
  // t could be a RemoteException so go around again.
  translateException(t); // We don't use the result?
{code}

but may be it's good.


 Minimum timeout for a rpc call could be 1 ms instead of 2 seconds
 -

 Key: HBASE-13286
 URL: https://issues.apache.org/jira/browse/HBASE-13286
 Project: HBase
  Issue Type: Improvement
  Components: Client
Affects Versions: 1.0.0, 0.98.12
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Minor
 Fix For: 1.1.0

 Attachments: 13286.patch


 There is a check in the client to be sure that we don't use a timeout of zero 
 (i.e. infinite). This includes setting the minimal time out for a rpc timeout 
 to 2 seconds. However, it makes sense for some calls (typically gets going to 
 the cache) to have much lower timeouts. So it's better to do the check vs. 
 zero but with a minimal timeout of 1. 
 I fixed a typo  a wrong comment in this patch as well. I don't understand 
 this code:
 {code}
   // t could be a RemoteException so go around again.
   translateException(t); // We don't use the result?
 {code}
 but may be it's good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13286) Minimum timeout for a rpc call could be 1 ms instead of 2 seconds

2015-03-19 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369451#comment-14369451
 ] 

Nicolas Liochon commented on HBASE-13286:
-

Thanks Ted, let see what hadoop-qa says, I hope I won't discover an ocean of 
race conditions here :-).

 Minimum timeout for a rpc call could be 1 ms instead of 2 seconds
 -

 Key: HBASE-13286
 URL: https://issues.apache.org/jira/browse/HBASE-13286
 Project: HBase
  Issue Type: Improvement
  Components: Client
Affects Versions: 1.0.0, 0.98.12
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Minor
 Fix For: 1.1.0

 Attachments: 13286.patch


 There is a check in the client to be sure that we don't use a timeout of zero 
 (i.e. infinite). This includes setting the minimal time out for a rpc timeout 
 to 2 seconds. However, it makes sense for some calls (typically gets going to 
 the cache) to have much lower timeouts. So it's better to do the check vs. 
 zero but with a minimal timeout of 1. 
 I fixed a typo  a wrong comment in this patch as well. I don't understand 
 this code:
 {code}
   // t could be a RemoteException so go around again.
   translateException(t); // We don't use the result?
 {code}
 but may be it's good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13286) Minimum timeout for a rpc call could be 1 ms instead of 2 seconds

2015-03-19 Thread Nicolas Liochon (JIRA)
Nicolas Liochon created HBASE-13286:
---

 Summary: Minimum timeout for a rpc call could be 1 ms instead of 2 
seconds
 Key: HBASE-13286
 URL: https://issues.apache.org/jira/browse/HBASE-13286
 Project: HBase
  Issue Type: Improvement
  Components: Client
Affects Versions: 0.98.12, 1.0.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Minor
 Fix For: 1.1.0


There is a check in the client to be sure that we don't use a timeout of zero 
(i.e. infinite). This includes setting the minimal time out for a rpc timeout 
to 2 seconds. However, it makes sense for some calls (typically gets going to 
the cache) to have much lower timeouts. So it's better to do the check vs. zero 
but with a minimal timeout of 0. 

I fixed a typo  a wrong comment in this patch as well. I don't understand this 
code:
{code}
  // t could be a RemoteException so go around again.
  translateException(t); // We don't use the result?
{code}

but may be it's good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13286) Minimum timeout for a rpc call could be 1 ms instead of 2 seconds

2015-03-19 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-13286:

Status: Patch Available  (was: Open)

 Minimum timeout for a rpc call could be 1 ms instead of 2 seconds
 -

 Key: HBASE-13286
 URL: https://issues.apache.org/jira/browse/HBASE-13286
 Project: HBase
  Issue Type: Improvement
  Components: Client
Affects Versions: 0.98.12, 1.0.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Minor
 Fix For: 1.1.0

 Attachments: 13286.patch


 There is a check in the client to be sure that we don't use a timeout of zero 
 (i.e. infinite). This includes setting the minimal time out for a rpc timeout 
 to 2 seconds. However, it makes sense for some calls (typically gets going to 
 the cache) to have much lower timeouts. So it's better to do the check vs. 
 zero but with a minimal timeout of 0. 
 I fixed a typo  a wrong comment in this patch as well. I don't understand 
 this code:
 {code}
   // t could be a RemoteException so go around again.
   translateException(t); // We don't use the result?
 {code}
 but may be it's good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13271) Table#puts(ListPut) operation is indeterminate; remove!

2015-03-18 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368052#comment-14368052
 ] 

Nicolas Liochon commented on HBASE-13271:
-

Oh ok. Thanks for the explanation. Then the call to batch seems to be the 
perfect solution.

 Table#puts(ListPut) operation is indeterminate; remove!
 -

 Key: HBASE-13271
 URL: https://issues.apache.org/jira/browse/HBASE-13271
 Project: HBase
  Issue Type: Improvement
  Components: API
Affects Versions: 1.0.0
Reporter: stack

 Another API issue found by [~larsgeorge]:
 Table.put(ListPut) is questionable after the API change.
 {code}
 [Mar-17 9:21 AM] Lars George: Table.put(ListPut) is weird since you cannot 
 flush partial lists
 [Mar-17 9:21 AM] Lars George: Say out of 5 the third is broken, then the 
 put() call returns with a local exception (say empty Put) and then you have 2 
 that are in the buffer
 [Mar-17 9:21 AM] Lars George: but how to you force commit them?
 [Mar-17 9:22 AM] Lars George: In the past you would call flushCache(), but 
 that is gone now
 [Mar-17 9:22 AM] Lars George: and flush() is not available on a Table
 [Mar-17 9:22 AM] Lars George: And you cannot access the underlying 
 BufferedMutation neither
 [Mar-17 9:23 AM] Lars George: You can *only* add more Puts if you can, or 
 call close()
 [Mar-17 9:23 AM] Lars George: that is just weird to explain
 {code}
 So, Table needs to get flush back or we deprecate this method or it flushes 
 immediately and does not return until complete in the implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13272) Get.setClosestRowBefore() breaks specific column Get

2015-03-18 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367886#comment-14367886
 ] 

Nicolas Liochon commented on HBASE-13272:
-

setClosestRowBefore is supersedes by reverse scan, imho? IIRC we don't use it 
internally anymore (the region locator uses the reverse scan).


 Get.setClosestRowBefore() breaks specific column Get
 

 Key: HBASE-13272
 URL: https://issues.apache.org/jira/browse/HBASE-13272
 Project: HBase
  Issue Type: Bug
Reporter: stack

 Via [~larsgeorge]
 Get.setClosestRowBefore() is breaking a specific Get that specifies a column. 
 If you set the latter to true it will return the _entire_ row!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13271) Table#puts(ListPut) operation is indeterminate; remove!

2015-03-18 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367934#comment-14367934
 ] 

Nicolas Liochon commented on HBASE-13271:
-

[Mar-17 9:21 AM] Lars George: Say out of 5 the third is broken, then the put() 
call returns with a local exception (say empty Put) and then you have 2 that 
are in the buffer
[Mar-17 9:21 AM] Lars George: but how to you force commit them?
[Mar-17 9:22 AM] Lars George: In the past you would call flushCache(), but that 
is gone now

If they failed the first time, why would they succeed the second time?
Why are they still in the buffer if it failed?
Why flushCache is not available? flushing the commit should be available to be 
end user, no?

 Table#puts(ListPut) operation is indeterminate; remove!
 -

 Key: HBASE-13271
 URL: https://issues.apache.org/jira/browse/HBASE-13271
 Project: HBase
  Issue Type: Improvement
  Components: API
Affects Versions: 1.0.0
Reporter: stack

 Another API issue found by [~larsgeorge]:
 Table.put(ListPut) is questionable after the API change.
 {code}
 [Mar-17 9:21 AM] Lars George: Table.put(ListPut) is weird since you cannot 
 flush partial lists
 [Mar-17 9:21 AM] Lars George: Say out of 5 the third is broken, then the 
 put() call returns with a local exception (say empty Put) and then you have 2 
 that are in the buffer
 [Mar-17 9:21 AM] Lars George: but how to you force commit them?
 [Mar-17 9:22 AM] Lars George: In the past you would call flushCache(), but 
 that is gone now
 [Mar-17 9:22 AM] Lars George: and flush() is not available on a Table
 [Mar-17 9:22 AM] Lars George: And you cannot access the underlying 
 BufferedMutation neither
 [Mar-17 9:23 AM] Lars George: You can *only* add more Puts if you can, or 
 call close()
 [Mar-17 9:23 AM] Lars George: that is just weird to explain
 {code}
 So, Table needs to get flush back or we deprecate this method or it flushes 
 immediately and does not return until complete in the implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13271) Table#puts(ListPut) operation is indeterminate; remove!

2015-03-18 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367989#comment-14367989
 ] 

Nicolas Liochon commented on HBASE-13271:
-

bq. There shouldn't be a need to flush any buffer in Table. Autoflush should 
always be true. The idea is that users who want autoflush=false should use the 
new BufferedMutator interface rather than a table. 
Ok (if we can flush the BufferedMutator on demand it's fine).

bq. I personally think that the put(ListPut) method is useful
I agree. And I think a lot of people depends on it: they use it to have better 
performances than calling multiple times a the put(Put).

bq. Maybe HTable.put(ListPut) should use HTable.batch() rather than 
BufferedMutator.mutate for the autoflush=true case
From what I know of the code I like this idea. But it seems that Lars issue is 
with autoflush=false?

Thanks Solomon.


 Table#puts(ListPut) operation is indeterminate; remove!
 -

 Key: HBASE-13271
 URL: https://issues.apache.org/jira/browse/HBASE-13271
 Project: HBase
  Issue Type: Improvement
  Components: API
Affects Versions: 1.0.0
Reporter: stack

 Another API issue found by [~larsgeorge]:
 Table.put(ListPut) is questionable after the API change.
 {code}
 [Mar-17 9:21 AM] Lars George: Table.put(ListPut) is weird since you cannot 
 flush partial lists
 [Mar-17 9:21 AM] Lars George: Say out of 5 the third is broken, then the 
 put() call returns with a local exception (say empty Put) and then you have 2 
 that are in the buffer
 [Mar-17 9:21 AM] Lars George: but how to you force commit them?
 [Mar-17 9:22 AM] Lars George: In the past you would call flushCache(), but 
 that is gone now
 [Mar-17 9:22 AM] Lars George: and flush() is not available on a Table
 [Mar-17 9:22 AM] Lars George: And you cannot access the underlying 
 BufferedMutation neither
 [Mar-17 9:23 AM] Lars George: You can *only* add more Puts if you can, or 
 call close()
 [Mar-17 9:23 AM] Lars George: that is just weird to explain
 {code}
 So, Table needs to get flush back or we deprecate this method or it flushes 
 immediately and does not return until complete in the implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13219) Issues with PE tool in trunk

2015-03-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360723#comment-14360723
 ] 

Nicolas Liochon commented on HBASE-13219:
-

bq. What was the behavior before HBASE-11390? Multiple connections?

Yeah, exactly. I kept it to make comparison between multiple versions possible.
It can help to find some bottlenecks (multiple connections means multiple tcp 
connections, multiple pools and so on).

But simplicity is good as well, so both options are ok to me.



 Issues with PE tool in trunk
 

 Key: HBASE-13219
 URL: https://issues.apache.org/jira/browse/HBASE-13219
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: t1


 - PE tool tries to create the TEstTable and waits for it to be enabled and 
 just hangs there 
 Previously this was not happening and the PE tool used to run fine after the 
 table creation.
 - When we try to scan with 25 threads the PE tool fails after some time 
 saying Unable to create native threads.
 I lost the Stack trace now. But I could get it easily.  It happens here 
 {code}
   public void submit(RetryingCallableV task, int callTimeout, int id) {
 QueueingFutureV newFuture = new QueueingFutureV(task, callTimeout);
 executor.execute(Trace.wrap(newFuture));
 tasks[id] = newFuture;
   }
 {code}
 in ResultBoundedCompletionService. This is also new.  Previously it used to 
 work with 25 threads without any issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13099) Scans as in DynamoDB

2015-02-26 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338467#comment-14338467
 ] 

Nicolas Liochon commented on HBASE-13099:
-

The 1mb could be changed / made configurable.

The scan could finish if we are at the end of a row and one of these conditions 
is met:
 - we already have more than XX Mb and
 - the scan has been running for more than YY seconds
 - the scan reached the end of a region

This could simplify some code, and make the server less sensitive to client 
issues.

This would allow to remove the small scan code in the client as well (and, for 
all the clients that are doing small scans w/o setting this small flag, it 
would be faster).





 Scans as in DynamoDB
 

 Key: HBASE-13099
 URL: https://issues.apache.org/jira/browse/HBASE-13099
 Project: HBase
  Issue Type: Brainstorming
  Components: Client, regionserver
Reporter: Nicolas Liochon

 cc: [~saint@gmail.com] - as discussed offline.
 DynamoDB has a very simple way to manage scans server side:
 ??citation??
 The data returned from a Query or Scan operation is limited to 1 MB; this 
 means that if you scan a table that has more than 1 MB of data, you'll need 
 to perform another Scan operation to continue to the next 1 MB of data in the 
 table.
 If you query or scan for specific attributes that match values that amount to 
 more than 1 MB of data, you'll need to perform another Query or Scan request 
 for the next 1 MB of data. To do this, take the LastEvaluatedKey value from 
 the previous request, and use that value as the ExclusiveStartKey in the next 
 request. This will let you progressively query or scan for new data in 1 MB 
 increments.
 When the entire result set from a Query or Scan has been processed, the 
 LastEvaluatedKey is null. This indicates that the result set is complete 
 (i.e. the operation processed the “last page” of data).
 ??citation??
 This means that there is no state server side: the work is done client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13099) Scans as in DynamoDB

2015-02-25 Thread Nicolas Liochon (JIRA)
Nicolas Liochon created HBASE-13099:
---

 Summary: Scans as in DynamoDB
 Key: HBASE-13099
 URL: https://issues.apache.org/jira/browse/HBASE-13099
 Project: HBase
  Issue Type: Brainstorming
  Components: Client, regionserver
Reporter: Nicolas Liochon


cc: [~saint@gmail.com] - as discussed offline.

DynamoDB has a very simple way to manage scans server side:
??citation??
The data returned from a Query or Scan operation is limited to 1 MB; this means 
that if you scan a table that has more than 1 MB of data, you'll need to 
perform another Scan operation to continue to the next 1 MB of data in the 
table.

If you query or scan for specific attributes that match values that amount to 
more than 1 MB of data, you'll need to perform another Query or Scan request 
for the next 1 MB of data. To do this, take the LastEvaluatedKey value from the 
previous request, and use that value as the ExclusiveStartKey in the next 
request. This will let you progressively query or scan for new data in 1 MB 
increments.

When the entire result set from a Query or Scan has been processed, the 
LastEvaluatedKey is null. This indicates that the result set is complete (i.e. 
the operation processed the “last page” of data).
??citation??

This means that there is no state server side: the work is done client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12995) Document that HConnection#getTable methods do not check table existence since 0.98.1

2015-02-09 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312892#comment-14312892
 ] 

Nicolas Liochon commented on HBASE-12995:
-

Yep, I confirm it's from this the 10080. +1 for the javadoc change, I should 
have done it in the original jira. 

 Document that HConnection#getTable methods do not check table existence since 
 0.98.1
 

 Key: HBASE-12995
 URL: https://issues.apache.org/jira/browse/HBASE-12995
 Project: HBase
  Issue Type: Task
Affects Versions: 0.98.1
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11


 [~jamestaylor] mentioned that recently Phoenix discovered at some point the 
 {{HConnection#getTable}} lightweight table reference methods stopped 
 throwing TableNotFoundExceptions. It used to be (in 0.94 and 0.96) that all 
 APIs that construct HTables would check if the table is locatable and throw 
 exceptions if not. Now, if using the {{HConnection#getTable}} APIs, such 
 exceptions will only be thrown at the time of the first operation submitted 
 using the table reference, should a problem be detected then. We did a bisect 
 and it seems this was changed in the 0.98.1 release by HBASE-10080. Since the 
 change has now shipped in 10 in total 0.98 releases we should just document 
 the change, in the javadoc of the HConnection class, Connection in branch-1+. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12974) Opaque AsyncProcess failure: RetriesExhaustedWithDetailsException but no detail

2015-02-05 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307249#comment-14307249
 ] 

Nicolas Liochon commented on HBASE-12974:
-

bq. 1 time only
We don't keep the history of the exceptions. the time is only about the last 
exception. So if you have 1 action that failed you will have 1 time. If 10 
actions fail for the same reason you will have '10 times'. Yes it's kind of 
useless. We used to start to log after 10 retries or so, so the log should 
contain more information (at the info level iirc).


 Opaque AsyncProcess failure: RetriesExhaustedWithDetailsException but no 
 detail
 ---

 Key: HBASE-12974
 URL: https://issues.apache.org/jira/browse/HBASE-12974
 Project: HBase
  Issue Type: Bug
  Components: integration tests
Affects Versions: 1.0.0
Reporter: stack
Assignee: stack

 I'm trying to do longer running tests but when I up the numbers for a task I 
 run into this:
 {code}
 2015-02-04 15:35:10,267 FATAL [IPC Server handler 17 on 43975] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1419986015214_0204_m_02_3 - exited : 
 org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 
 action: IOException: 1 time,
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:227)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:207)
 at 
 org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1658)
 at 
 org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:208)
 at 
 org.apache.hadoop.hbase.client.BufferedMutatorImpl.doMutate(BufferedMutatorImpl.java:141)
 at 
 org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:98)
 at 
 org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.persist(IntegrationTestBigLinkedList.java:449)
 at 
 org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:407)
 at 
 org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:355)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 {code}
 Its telling me an action failed but 1 time only with an empty IOE?
 I'm kinda stumped.
 Starting up this issue to see if I can get to the bottom of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12964) Add the ability for hbase-daemon.sh to start in the foreground

2015-02-03 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304093#comment-14304093
 ] 

Nicolas Liochon commented on HBASE-12964:
-

I read the patch, w/o actually testing it. It seems ok to me. +1 if it works 
for you, Elliott.

 Add the ability for hbase-daemon.sh to start in the foreground
 --

 Key: HBASE-12964
 URL: https://issues.apache.org/jira/browse/HBASE-12964
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 2.0.0, 0.98.10
Reporter: Elliott Clark
Assignee: Elliott Clark
 Fix For: 2.0.0, 1.1.0, 0.98.11

 Attachments: HBASE-12964-v1.patch, HBASE-12964-v2.patch, 
 HBASE-12964.patch


 The znode cleaner is awesome and gives great benefits.
 As more and more deployments start using containers some of them will want to 
 run things in the foreground. hbase-daemon.sh should allow that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10942) support parallel request cancellation for multi-get

2015-02-02 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302028#comment-14302028
 ] 

Nicolas Liochon commented on HBASE-10942:
-

Time goes by ;-)
LGTM, +1

 support parallel request cancellation for multi-get
 ---

 Key: HBASE-10942
 URL: https://issues.apache.org/jira/browse/HBASE-10942
 Project: HBase
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Nicolas Liochon
 Fix For: hbase-10070

 Attachments: 10942-1.1.txt, 10942-for-98.zip, 10942.patch, 
 HBASE-10942.01.patch, HBASE-10942.02.patch, HBASE-10942.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12684) Add new AsyncRpcClient

2015-01-26 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291974#comment-14291974
 ] 

Nicolas Liochon commented on HBASE-12684:
-

Sorry, I'm seeing this only now (I missed the message on the 15th), but yep, I 
like this. And I like the configurable RPC implementation, great as well.

 Add new AsyncRpcClient
 --

 Key: HBASE-12684
 URL: https://issues.apache.org/jira/browse/HBASE-12684
 Project: HBase
  Issue Type: Improvement
  Components: Client
Reporter: Jurriaan Mous
Assignee: Jurriaan Mous
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-12684-DEBUG2.patch, HBASE-12684-DEBUG3.patch, 
 HBASE-12684-v1.patch, HBASE-12684-v10.patch, HBASE-12684-v11.patch, 
 HBASE-12684-v12.patch, HBASE-12684-v13.patch, HBASE-12684-v14.patch, 
 HBASE-12684-v15.patch, HBASE-12684-v16.patch, HBASE-12684-v17.patch, 
 HBASE-12684-v17.patch, HBASE-12684-v18.patch, HBASE-12684-v19.1.patch, 
 HBASE-12684-v19.patch, HBASE-12684-v19.patch, HBASE-12684-v2.patch, 
 HBASE-12684-v20-heapBuffer.patch, HBASE-12684-v20.patch, 
 HBASE-12684-v21-heapBuffer.1.patch, HBASE-12684-v21-heapBuffer.patch, 
 HBASE-12684-v21.patch, HBASE-12684-v22.patch, HBASE-12684-v23-epoll.patch, 
 HBASE-12684-v24.patch, HBASE-12684-v24.patch, HBASE-12684-v24.patch, 
 HBASE-12684-v24.patch, HBASE-12684-v24.patch, HBASE-12684-v25.patch, 
 HBASE-12684-v26.patch, HBASE-12684-v27.patch, HBASE-12684-v27.patch, 
 HBASE-12684-v28.patch, HBASE-12684-v29.patch, HBASE-12684-v3.patch, 
 HBASE-12684-v30.patch, HBASE-12684-v30.patch, HBASE-12684-v30.patch, 
 HBASE-12684-v31.patch, HBASE-12684-v31.patch, HBASE-12684-v31.patch, 
 HBASE-12684-v4.patch, HBASE-12684-v5.patch, HBASE-12684-v6.patch, 
 HBASE-12684-v7.patch, HBASE-12684-v8.patch, HBASE-12684-v9.patch, 
 HBASE-12684.patch, Screen Shot 2015-01-11 at 11.55.32 PM.png, 
 myrecording.jfr, q.png, requests.png


 With the changes in HBASE-12597 it is possible to add new RpcClients. This 
 issue is about adding a new Async RpcClient which would enable HBase to do 
 non blocking protobuf service communication.
 Besides delivering a new AsyncRpcClient I would also like to ask the question 
 what it would take to replace the current RpcClient? This would enable to 
 simplify async code in some next issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12611) Create autoCommit() method and remove clearBufferOnFail

2014-12-04 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234055#comment-14234055
 ] 

Nicolas Liochon commented on HBASE-12611:
-

bq.  stack and Nick Dimiduk came to the conclusion that the flush method should 
be called autoCommit() similar to the SQL APIs.
Sorry for beeing late in the game. The meaning in SQL is slightly different. In 
jdbc, whatever the value for autoCommit, the query will be sent to be server 
and executed. autocommit is set to false if the client application wants to 
send multiple queries within a single transaction (and then it will do a 
begin/commit explicitly). I haven't double checked if it's the standard or an 
implementation detail (the docs are not very clear), but it's unlikely to 
change anyway: there is another set of methods for batches in jdbc.
Our old autoFlush is different as it impacts the client behavior. I think we're 
creating a confusion here. Moreover, if we add transactions between rows in the 
future, then may be we will want to use autoCommit for what it really is. As 
I'm very late here I leave the decision to you, but we should at least be clear 
in the javadoc imho.

bq. Do we also want to change the default with this patch?
I like the fact that HBase its secure by default and it would be very confusing 
for the users as well imho.

{code}
-  public boolean isAutoFlush() {
-return autoFlush;
+  public boolean getAutoCommit() {
+return autoCommit;
{code}
If I'm not wrong we use 'is' for getters on boolean?



 Create autoCommit() method and remove clearBufferOnFail
 ---

 Key: HBASE-12611
 URL: https://issues.apache.org/jira/browse/HBASE-12611
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 0.99.2
Reporter: Solomon Duskis
Assignee: Solomon Duskis
 Fix For: 1.0.0

 Attachments: HBASE-12611.patch


 There was quite a bit of good discussion on HBASE-12490 about this topic.  
 [~stack] and [~ndimiduk] came to the conclusion that the flush method should 
 be called autoCommit() similar to the SQL APIs.   [~ndimiduk]  also suggested 
 that clearBufferOnFail should be removed from HTable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12557) Introduce timeout mechanism for IP to rack resolution

2014-11-27 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227721#comment-14227721
 ] 

Nicolas Liochon commented on HBASE-12557:
-

bq.  Still looking for a way to make lengthy DNS related call.
kill suspend (-STOP) the dns process should do it?

 Introduce timeout mechanism for IP to rack resolution
 -

 Key: HBASE-12557
 URL: https://issues.apache.org/jira/browse/HBASE-12557
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 12557-v1.txt


 Config parameter, hbase.util.ip.to.rack.determiner, determines the class 
 which does IP to rack resolution.
 The actual resolution may be lengthy.
 This JIRA is continuation of HBASE-12554 where a mock DNSToSwitchMapping is 
 used for rack resolution.
 A timeout parameter, hbase.ip.to.rack.determiner.timeout, is proposed whose 
 value governs the duration which RackManager waits before rack resolution is 
 stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12557) Introduce timeout mechanism for IP to rack resolution

2014-11-27 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227785#comment-14227785
 ] 

Nicolas Liochon commented on HBASE-12557:
-

Agreed (or you can add a hook to ease tests, this save you from using mockito). 
If you want to test that we don't leak resources (i.e. that the dns client 
implementation supports correctly an interruption), then you can't do that but 
it will be an integration test then

 Introduce timeout mechanism for IP to rack resolution
 -

 Key: HBASE-12557
 URL: https://issues.apache.org/jira/browse/HBASE-12557
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 12557-v1.txt


 Config parameter, hbase.util.ip.to.rack.determiner, determines the class 
 which does IP to rack resolution.
 The actual resolution may be lengthy.
 This JIRA is continuation of HBASE-12554 where a mock DNSToSwitchMapping is 
 used for rack resolution.
 A timeout parameter, hbase.ip.to.rack.determiner.timeout, is proposed whose 
 value governs the duration which RackManager waits before rack resolution is 
 stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12490) Replace uses of setAutoFlush(boolean, boolean)

2014-11-27 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227815#comment-14227815
 ] 

Nicolas Liochon commented on HBASE-12490:
-

bq.  It seems reasonable to me to remove it,  that's a decision beyond my 
paygrade
Well if you do that patch you get some decision power :-)
[~ndimiduk], any opinion?

 Replace uses of setAutoFlush(boolean, boolean)
 --

 Key: HBASE-12490
 URL: https://issues.apache.org/jira/browse/HBASE-12490
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 0.99.2
Reporter: Solomon Duskis
Assignee: Solomon Duskis
 Attachments: HBASE-12490.patch, HBASE-12490B.patch, 
 HBASE-12490B.patch, HBASE-12490B.patch, HBASE-12490C.patch


 The various uses of setAutoFlush() seem to need some tlc.  There's a note in 
 HTableInterface: @deprecated in 0.99 since setting clearBufferOnFail is 
 deprecated. Use setAutoFlushTo(boolean) instead.  It would be ideal to 
 change all internal uses of setAutoFlush(boolean, boolean) to use 
 setAutoFlushTo, if possible.
 HTable.setAutoFlush(boolean, boolean) is used in a handful of places.  
 setAutoFlush(false, false) has the same results as 
 HTable.setAutoFlush(false).  Calling HTable.setAutoFlush(false, true) has the 
 same affect as Table.setAutoFlushTo(false), assuming 
 HTable.setAutoFlush(false) was not called previously (by default, the second 
 parameter, clearBufferOnFail, is true and should remain true according to the 
 comments). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12490) Replace uses of setAutoFlush(boolean, boolean)

2014-11-25 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224866#comment-14224866
 ] 

Nicolas Liochon commented on HBASE-12490:
-

For stuff like:
{code}
-ht.setAutoFlush(false, false);
+ht.setAutoFlush(false);
{code}

It's not a big deal, but I don't really like the 'setAutoFlush(boolean)', 
because it looks like a setter while actually it's not. I do prefer 
'setAutoFlush(boolean, boolean)' because there is no confusion with a setter, 
so it's easier for the reader. The implicit setting of the clearBufferOnFail on 
something named like a setter is really confusing imho.  I'm not -1, but I'm 
-0, if I'm the only one confused here... :-)

 Replace uses of setAutoFlush(boolean, boolean)
 --

 Key: HBASE-12490
 URL: https://issues.apache.org/jira/browse/HBASE-12490
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 0.99.2
Reporter: Solomon Duskis
Assignee: Solomon Duskis
 Attachments: HBASE-12490.patch, HBASE-12490B.patch, 
 HBASE-12490B.patch, HBASE-12490B.patch, HBASE-12490C.patch


 The various uses of setAutoFlush() seem to need some tlc.  There's a note in 
 HTableInterface: @deprecated in 0.99 since setting clearBufferOnFail is 
 deprecated. Use setAutoFlushTo(boolean) instead.  It would be ideal to 
 change all internal uses of setAutoFlush(boolean, boolean) to use 
 setAutoFlushTo, if possible.
 HTable.setAutoFlush(boolean, boolean) is used in a handful of places.  
 setAutoFlush(false, false) has the same results as 
 HTable.setAutoFlush(false).  Calling HTable.setAutoFlush(false, true) has the 
 same affect as Table.setAutoFlushTo(false), assuming 
 HTable.setAutoFlush(false) was not called previously (by default, the second 
 parameter, clearBufferOnFail, is true and should remain true according to the 
 comments). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12490) Replace uses of setAutoFlush(boolean, boolean)

2014-11-25 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224939#comment-14224939
 ] 

Nicolas Liochon commented on HBASE-12490:
-

Yeah, I saw it but I was ok with you answer so I didn't comment :-)
Let's try to decide in this jira (Nick should see it).
My point of view is:
 - we should not change the meaning of setAutoFlush(boolean), as it would be 
confusing during the upgrade (i.e. someone upgrading from .098 to 1.0 would 
have its code compiling but with a hidden behavior change)
 - we should not use setAutoFlush(boolean), may be we should remove it in 1.0. 
This because of the confusion around it's a setter-like that is not a setter. 
 - I don't think that we need to keep clearBufferOnFail (i.e. we could remove 
it in 1.0), but may be I'm wrong here. If we do that then we can keep 
setAutoFlush(boolean), it will become a real setter (and then the points above 
are not an issue anymore.

 Replace uses of setAutoFlush(boolean, boolean)
 --

 Key: HBASE-12490
 URL: https://issues.apache.org/jira/browse/HBASE-12490
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 0.99.2
Reporter: Solomon Duskis
Assignee: Solomon Duskis
 Attachments: HBASE-12490.patch, HBASE-12490B.patch, 
 HBASE-12490B.patch, HBASE-12490B.patch, HBASE-12490C.patch


 The various uses of setAutoFlush() seem to need some tlc.  There's a note in 
 HTableInterface: @deprecated in 0.99 since setting clearBufferOnFail is 
 deprecated. Use setAutoFlushTo(boolean) instead.  It would be ideal to 
 change all internal uses of setAutoFlush(boolean, boolean) to use 
 setAutoFlushTo, if possible.
 HTable.setAutoFlush(boolean, boolean) is used in a handful of places.  
 setAutoFlush(false, false) has the same results as 
 HTable.setAutoFlush(false).  Calling HTable.setAutoFlush(false, true) has the 
 same affect as Table.setAutoFlushTo(false), assuming 
 HTable.setAutoFlush(false) was not called previously (by default, the second 
 parameter, clearBufferOnFail, is true and should remain true according to the 
 comments). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12534) Wrong region location cache in client after regions are moved

2014-11-25 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224951#comment-14224951
 ] 

Nicolas Liochon commented on HBASE-12534:
-

bq. it seems we can simply get rid of MIN_RPC_TIMEOUT
I'm not against removing it (may be it's too much of a corner case) but it 
solves more than a configuration issue. 
with the setting above
{code}
hbase.rpc.timeout=1000
hbase.client.operation.timeout=1200
{code}
if the first try fails after 1080ms, then the second try will have a 
rpc.timeout of 20ms (hbase.client.pause put aside). The MIN_RPC_TIMEOUT will 
say 'that's too low, let's set it to something more reasonable. 

We can remove it. What we need to detect however is a setting of 0 (if not it 
will be an infinite timeout).

 Wrong region location cache in client after regions are moved
 -

 Key: HBASE-12534
 URL: https://issues.apache.org/jira/browse/HBASE-12534
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Critical
  Labels: client
 Attachments: HBASE-12534-0.94-v1.diff, HBASE-12534-v1.diff


 In our 0.94 hbase cluster, we found that client got wrong region location 
 cache and did not update it after a region is moved to another regionserver.
 The reason is wrong client config and bug in RpcRetryingCaller  of hbase 
 client.
 The rpc configs are following:
 {code}
 hbase.rpc.timeout=1000
 hbase.client.pause=200
 hbase.client.operation.timeout=1200
 {code}
 But the client retry number is 3
 {code}
 hbase.client.retries.number=3
 {code}
 Assumed that a region is at regionserver A before, and then it is moved to 
 regionserver B. The client try to make a  call to regionserver A and get an 
 NotServingRegionException. For the rety number is not 1, the region server 
 location cache is not cleaned. See: RpcRetryingCaller.java#141 and 
 RegionServerCallable.java#127
 {code}
   @Override
   public void throwable(Throwable t, boolean retrying) {
 if (t instanceof SocketTimeoutException ||
   
 } else if (t instanceof NotServingRegionException  !retrying) {
   // Purge cache entries for this specific region from hbase:meta cache
   // since we don't call connect(true) when number of retries is 1.
   getConnection().deleteCachedRegionLocation(location);
 }
   }
 {code}
 But the call did not retry and throw an SocketTimeoutException for the time 
 the call will take is larger than the operation timeout.See 
 RpcRetryingCaller.java#152
 {code}
 expectedSleep = callable.sleep(pause, tries + 1);
 // If, after the planned sleep, there won't be enough time left, we 
 stop now.
 long duration = singleCallDuration(expectedSleep);
 if (duration  callTimeout) {
   String msg = callTimeout= + callTimeout + , callDuration= + 
 duration +
   :  + callable.getExceptionMessageAdditionalDetail();
   throw (SocketTimeoutException)(new 
 SocketTimeoutException(msg).initCause(t));
 }
 {code}
 At last, the wrong region location will never be not cleaned up . 
 [~lhofhansl]
 In hbase 0.94, the MIN_RPC_TIMEOUT in singleCallDuration is 2000 in default, 
 which trigger this bug. 
 {code}
   private long singleCallDuration(final long expectedSleep) {
 return (EnvironmentEdgeManager.currentTimeMillis() - this.globalStartTime)
   + MIN_RPC_TIMEOUT + expectedSleep;
   }
 {code}
 But there is risk in master code too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12534) Wrong region location cache in client after regions are moved

2014-11-24 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222873#comment-14222873
 ] 

Nicolas Liochon commented on HBASE-12534:
-

MIN_RPC_TIMEOUT is linked to operation timeout: w/o it we could send a request 
w/o giving enough time to the server. As well until recently the rcp timeout 
was not multithreaded safe: it was set for all calls. So may be this min 
timeout saves in in the .94  .96 versions (not sure about .98). May be this 
min timeout should be configurable (cf. hbase.rpc.timeout=1000, which is lower 
than the mintimeout)

 Wrong region location cache in client after regions are moved
 -

 Key: HBASE-12534
 URL: https://issues.apache.org/jira/browse/HBASE-12534
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Critical
  Labels: client
 Attachments: HBASE-12534-0.94-v1.diff, HBASE-12534-v1.diff


 In our 0.94 hbase cluster, we found that client got wrong region location 
 cache and did not update it after a region is moved to another regionserver.
 The reason is wrong client config and bug in RpcRetryingCaller  of hbase 
 client.
 The rpc configs are following:
 {code}
 hbase.rpc.timeout=1000
 hbase.client.pause=200
 hbase.client.operation.timeout=1200
 {code}
 But the client retry number is 3
 {code}
 hbase.client.retries.number=3
 {code}
 Assumed that a region is at regionserver A before, and then it is moved to 
 regionserver B. The client try to make a  call to regionserver A and get an 
 NotServingRegionException. For the rety number is not 1, the region server 
 location cache is not cleaned. See: RpcRetryingCaller.java#141 and 
 RegionServerCallable.java#127
 {code}
   @Override
   public void throwable(Throwable t, boolean retrying) {
 if (t instanceof SocketTimeoutException ||
   
 } else if (t instanceof NotServingRegionException  !retrying) {
   // Purge cache entries for this specific region from hbase:meta cache
   // since we don't call connect(true) when number of retries is 1.
   getConnection().deleteCachedRegionLocation(location);
 }
   }
 {code}
 But the call did not retry and throw an SocketTimeoutException for the time 
 the call will take is larger than the operation timeout.See 
 RpcRetryingCaller.java#152
 {code}
 expectedSleep = callable.sleep(pause, tries + 1);
 // If, after the planned sleep, there won't be enough time left, we 
 stop now.
 long duration = singleCallDuration(expectedSleep);
 if (duration  callTimeout) {
   String msg = callTimeout= + callTimeout + , callDuration= + 
 duration +
   :  + callable.getExceptionMessageAdditionalDetail();
   throw (SocketTimeoutException)(new 
 SocketTimeoutException(msg).initCause(t));
 }
 {code}
 At last, the wrong region location will never be not cleaned up . 
 [~lhofhansl]
 In hbase 0.94, the MIN_RPC_TIMEOUT in singleCallDuration is 2000 in default, 
 which trigger this bug. 
 {code}
   private long singleCallDuration(final long expectedSleep) {
 return (EnvironmentEdgeManager.currentTimeMillis() - this.globalStartTime)
   + MIN_RPC_TIMEOUT + expectedSleep;
   }
 {code}
 But there is risk in master code too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12354) Update dependencies in time for 1.0 release

2014-10-29 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188316#comment-14188316
 ] 

Nicolas Liochon commented on HBASE-12354:
-

+1, there is a +1 from Enis above as well.

 Update dependencies in time for 1.0 release
 ---

 Key: HBASE-12354
 URL: https://issues.apache.org/jira/browse/HBASE-12354
 Project: HBase
  Issue Type: Sub-task
  Components: dependencies
Reporter: stack
Assignee: stack
 Fix For: 2.0.0, 0.99.2

 Attachments: 12354.txt, 12354v2.txt


 Going through and updating egregiously old dependencies for 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12293) Tests are logging too much

2014-10-21 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178404#comment-14178404
 ] 

Nicolas Liochon commented on HBASE-12293:
-

tests should be at info level at the minimum, as in production: if not we will 
discover in production/integration test that we log too much (or worse triggers 
NPE or stuff like this). For the same reason, I prefer to use the debug level 
in tests, to be sure that I won't have surprises (NPE) if I try to use them.

What I did in the past is reusing the info from the apache build (run time and 
logs), and looked at the both the log size and the log rate per test to 
prioritize the tests I was looking at. Then I was just improving the logs 
around these area.

 Tests are logging too much
 --

 Key: HBASE-12293
 URL: https://issues.apache.org/jira/browse/HBASE-12293
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Dima Spivak
Assignee: Dima Spivak
Priority: Minor

 In trying to solve HBASE-12285, it was pointed out that tests are writing too 
 much to output again. At best, this is a sloppy practice and, at worst, it 
 leaves us open to builds breaking when our test tools can't handle the flood. 
 If [~nkeywal] would be willing give me a little bit of mentoring on how he 
 dealt with this problem a few years back, I'd be happy to add it to my plate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12285) Builds are failing, possibly because of SUREFIRE-1091

2014-10-21 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178414#comment-14178414
 ] 

Nicolas Liochon commented on HBASE-12285:
-

I think changing the log level is not a good idea (I added a comment in the 
related jira: it's very common to discover NPE when you activate logs, and it's 
a very bad user experience: something does not work as expected, you activate 
the debug logs to understand and then you get a NPE.).
If we don't want to pay the testing cost of the debug logs, then I'm +1 for 
removing them (seriously: they are becoming useless as we now run info by 
default). But if we keep them in the code we must keep them in the tests.

 Builds are failing, possibly because of SUREFIRE-1091
 -

 Key: HBASE-12285
 URL: https://issues.apache.org/jira/browse/HBASE-12285
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dima Spivak
Assignee: Dima Spivak
Priority: Blocker
 Attachments: HBASE-12285_branch-1_v1.patch


 Our branch-1 builds on builds.apache.org have been failing in recent days 
 after we switched over to an official version of Surefire a few days back 
 (HBASE-4955). The version we're using, 2.17, is hit by a bug 
 ([SUREFIRE-1091|https://jira.codehaus.org/browse/SUREFIRE-1091]) that results 
 in an IOException, which looks like what we're seeing on Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12285) Builds are failing, possibly because of SUREFIRE-1091

2014-10-21 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179414#comment-14179414
 ] 

Nicolas Liochon commented on HBASE-12285:
-

Sure we can try. But when will we go back to the good setting?

 Builds are failing, possibly because of SUREFIRE-1091
 -

 Key: HBASE-12285
 URL: https://issues.apache.org/jira/browse/HBASE-12285
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dima Spivak
Assignee: Dima Spivak
Priority: Blocker
 Attachments: HBASE-12285_branch-1_v1.patch


 Our branch-1 builds on builds.apache.org have been failing in recent days 
 after we switched over to an official version of Surefire a few days back 
 (HBASE-4955). The version we're using, 2.17, is hit by a bug 
 ([SUREFIRE-1091|https://jira.codehaus.org/browse/SUREFIRE-1091]) that results 
 in an IOException, which looks like what we're seeing on Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12285) Builds are failing, possibly because of SUREFIRE-1091

2014-10-18 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175960#comment-14175960
 ] 

Nicolas Liochon commented on HBASE-12285:
-

surefire-1091 is a very good suspect, because in our private surefire version 
the implementation for this was different.
This said, my be we just log too much in the test(s)? I've done some cleanup 
there nearly 3 years ago, but this belongs to the never ending story 
category...

 Builds are failing, possibly because of SUREFIRE-1091
 -

 Key: HBASE-12285
 URL: https://issues.apache.org/jira/browse/HBASE-12285
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dima Spivak
Assignee: Dima Spivak
Priority: Blocker

 Our branch-1 builds on builds.apache.org have been failing in recent days 
 after we switched over to an official version of Surefire a few days back 
 (HBASE-4955). The version we're using, 2.17, is hit by a bug 
 ([SUREFIRE-1091|https://jira.codehaus.org/browse/SUREFIRE-1091]) that results 
 in an IOException, which looks like what we're seeing on Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11835) Wrong managenement of non expected calls in the client

2014-10-06 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160384#comment-14160384
 ] 

Nicolas Liochon commented on HBASE-11835:
-

The failures are very likely unrelated. I plan to commit this this week if 
nobody disagrees.

 Wrong managenement of non expected calls in the client
 --

 Key: HBASE-11835
 URL: https://issues.apache.org/jira/browse/HBASE-11835
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 1.0.0, 2.0.0, 0.98.6
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 2.0.0, 0.99.1

 Attachments: 11835.rebase.patch, rpcClient.patch


 If a call is purged or canceled we try to skip the reply from the server, but 
 we read the wrong number of bytes so we corrupt the tcp channel. It's hidden 
 as it triggers retry and so on, but it's bad for performances obviously.
 It happens with cell blocks.
 [~ram_krish_86], [~saint@gmail.com], you know this part better than me, 
 do you agree with the analysis and the patch?
 The changes in rpcServer are not fully related: as the client close the 
 connections in such situation, I observed  both ClosedChannelException and 
 CancelledKeyException. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12148) Remove TimeRangeTracker as point of contention when many threads writing a Store

2014-10-06 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160412#comment-14160412
 ] 

Nicolas Liochon commented on HBASE-12148:
-

bq. I almost feel we should doc this and move on, if anyone is running the 
server side on a 32 bit JVM they shouldn't. But yeah the potential for torn 
reads isn't good.
+1. As well, IIRC, there are other parts of code where we rely on atomic op for 
64 bits stuff (as we don't test on 32 bits, what I said is likely true with the 
usual pattern not tested means not working).

 Remove TimeRangeTracker as point of contention when many threads writing a 
 Store
 

 Key: HBASE-12148
 URL: https://issues.apache.org/jira/browse/HBASE-12148
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Affects Versions: 2.0.0, 0.99.1
Reporter: stack
Assignee: stack
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: 12148.txt, 12148.txt, 12148v2.txt, 12148v2.txt, Screen 
 Shot 2014-10-01 at 3.39.46 PM.png, Screen Shot 2014-10-01 at 3.41.07 PM.png






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12153) Fixing TestReplicaWithCluster

2014-10-02 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156252#comment-14156252
 ] 

Nicolas Liochon commented on HBASE-12153:
-

Yeah, I actually don't like much timeouts in tests because they have do be 
removed during debugging sessions (the test is from me, but the timeouts are 
from Stack ;-) ). It's a workaround for surefire zombies...  +1 for the patch.

 Fixing TestReplicaWithCluster
 -

 Key: HBASE-12153
 URL: https://issues.apache.org/jira/browse/HBASE-12153
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 1.0.0
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
Priority: Trivial
 Fix For: 1.0.0

 Attachments: 0001-FixTestReplicaWithCluster.patch


 This test takes about 30 ~ 40 seconds depending upon the resources available. 
 Doesn't make sense to have such a tight bound(30s) on the unit test. 
 [~nkeywal], what do you think ? Did you intend to have such a tight bound  
 while adding the test here ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11835) Wrong managenement of non expected calls in the client

2014-10-02 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156258#comment-14156258
 ] 

Nicolas Liochon commented on HBASE-11835:
-

it got lost somewhere in a todo list. Let me have a look again.

 Wrong managenement of non expected calls in the client
 --

 Key: HBASE-11835
 URL: https://issues.apache.org/jira/browse/HBASE-11835
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 1.0.0, 2.0.0, 0.98.6
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 2.0.0, 0.98.7, 0.99.1

 Attachments: rpcClient.patch


 If a call is purged or canceled we try to skip the reply from the server, but 
 we read the wrong number of bytes so we corrupt the tcp channel. It's hidden 
 as it triggers retry and so on, but it's bad for performances obviously.
 It happens with cell blocks.
 [~ram_krish_86], [~saint@gmail.com], you know this part better than me, 
 do you agree with the analysis and the patch?
 The changes in rpcServer are not fully related: as the client close the 
 connections in such situation, I observed  both ClosedChannelException and 
 CancelledKeyException. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12141) ClusterStatus message might exceed max datagram payload limits

2014-10-02 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156267#comment-14156267
 ] 

Nicolas Liochon commented on HBASE-12141:
-

Yeah, the strategy was to keep the message small enough (if multiple servers 
fail simultaneously, we send multiple messages instead of one). As well, we 
send the message multiple times in case it got lost somewhere. I had issue with 
Netty 3.x when tried to add frames. I haven't tried very hard. We could make 
MAX_SERVER_PER_MESSAGE configurable for network with a very small mtu? It's 
also possible to compress the message. Once again, I had issue with Netty 3.x 
for this in the past.

This said, I would be interested to understand the network config. 

 ClusterStatus message might exceed max datagram payload limits
 --

 Key: HBASE-12141
 URL: https://issues.apache.org/jira/browse/HBASE-12141
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.3
Reporter: Andrew Purtell

 The multicast ClusterStatusPublisher and its companion listener are using 
 datagram channels without any framing. I think this is an issue because 
 Netty's ProtobufDecoder expects a complete PB message to be available in the 
 ChannelBuffer yet ClusterStatus messages can be large and might exceed the 
 maximum datagram payload size. As one user reported on list:
 {noformat}
 org.apache.hadoop.hbase.client.ClusterStatusListener - ERROR - Unexpected 
 exception, continuing.
 com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had 
 invalid wire type.
 at 
 com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
 at 
 com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
 at 
 com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClusterStatusProtos$ClusterStatus.init(ClusterStatusProtos.java:7554)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClusterStatusProtos$ClusterStatus.init(ClusterStatusProtos.java:7512)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClusterStatusProtos$ClusterStatus$1.parsePartialFrom(ClusterStatusProtos.java:7689)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClusterStatusProtos$ClusterStatus$1.parsePartialFrom(ClusterStatusProtos.java:7684)
 at 
 com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:141)
 at 
 com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
 at 
 com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:182)
 at 
 com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
 at 
 org.jboss.netty.handler.codec.protobuf.ProtobufDecoder.decode(ProtobufDecoder.java:122)
 at 
 org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66)
 at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
 at 
 org.jboss.netty.channel.socket.oio.OioDatagramWorker.process(OioDatagramWorker.java:52)
 at 
 org.jboss.netty.channel.socket.oio.AbstractOioWorker.run(AbstractOioWorker.java:73)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 The javadoc for ProtobufDecoder says:
 {quote}
 Decodes a received ChannelBuffer into a Google Protocol Buffers Message and 
 MessageLite. Please note that this decoder must be used with a proper 
 FrameDecoder such as ProtobufVarint32FrameDecoder or 
 LengthFieldBasedFrameDecoder if you are using a stream-based transport such 
 as TCP/IP.
 {quote}
 and even though we are using a datagram transport we have related issues, 
 depending on what the sending and receiving OS does with overly large 
 datagrams:
 - We may receive a datagram with a truncated message
 - We may get an upcall when processing one fragment of a fragmented datagram, 
 where the complete message is not available yet
 - We may not be able to send the overly large ClusterStatus in the first 
 place. Linux claims to do PMTU and return EMSGSIZE if a datagram packet 
 payload exceeds the MTU, but will send a fragmented datagram if PMTU is 
 disabled. I'm surprised we have the above report given the default is to 
 reject overly large datagram payloads, so perhaps the user is using a 
 different server OS or Netty datagram channels do their own fragmentation (I 
 haven't checked).
 In any case, the server and client pipelines are definitely not doing any 
 kind of framing. This is the multicast status listener from 

[jira] [Updated] (HBASE-11835) Wrong managenement of non expected calls in the client

2014-10-02 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-11835:

Attachment: 11835.rebase.patch

 Wrong managenement of non expected calls in the client
 --

 Key: HBASE-11835
 URL: https://issues.apache.org/jira/browse/HBASE-11835
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 1.0.0, 2.0.0, 0.98.6
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 2.0.0, 0.99.1

 Attachments: 11835.rebase.patch, rpcClient.patch


 If a call is purged or canceled we try to skip the reply from the server, but 
 we read the wrong number of bytes so we corrupt the tcp channel. It's hidden 
 as it triggers retry and so on, but it's bad for performances obviously.
 It happens with cell blocks.
 [~ram_krish_86], [~saint@gmail.com], you know this part better than me, 
 do you agree with the analysis and the patch?
 The changes in rpcServer are not fully related: as the client close the 
 connections in such situation, I observed  both ClosedChannelException and 
 CancelledKeyException. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >