[jira] [Comment Edited] (HBASE-18359) CoprocessorHConnection#getConnectionForEnvironment should read config from CoprocessorEnvironment
[ https://issues.apache.org/jira/browse/HBASE-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258239#comment-16258239 ] Samarth Jain edited comment on HBASE-18359 at 11/18/17 10:34 PM: - We need our connections to have a different configuration than the one in hbase-site.xml. The current custom configs mostly are related to remote RPCs, so probably in case of short-circuit we can use the same connection as the RS one. In future, it may change. So to avoid any such hard to catch regressions, I think the best approach would be to just let the clients manage the life cycle of the connection. Explicit documentation on the lines of "You know what you are doing. And that creating too many connections will make you run out of ZK resources, etc" would help. In Phoenix we guard against that by caching the connection. was (Author: samarthjain): We need our connections to have a different configuration that the one in hbase-site.xml. The current custom configs mostly are related to remote RPCs, so probably in case of short-circuit we can use the same connection as the RS one. In future, it may change. So to avoid the any such hard to catch regression, I think the best approach would be to just let the clients manage the life cycle of the connection. Explicit documentation on the lines of "You know what you are doing. And that creating too many connections will make you run out of ZK resources, etc" would help. In Phoenix we guard against that by caching the connection. > CoprocessorHConnection#getConnectionForEnvironment should read config from > CoprocessorEnvironment > - > > Key: HBASE-18359 > URL: https://issues.apache.org/jira/browse/HBASE-18359 > Project: HBase > Issue Type: Bug >Reporter: Samarth Jain > Fix For: 2.0.0 > > > It seems like the method getConnectionForEnvironment isn't doing the right > thing when it is creating a CoprocessorHConnection by reading the config from > HRegionServer and not from the env passed in. > If coprocessors want to use a CoprocessorHConnection with some custom config > settings, then they have no option but to configure it in the hbase-site.xml > of the region servers. This isn't ideal as a lot of times these "global" > level configs can have side effects. See PHOENIX-3974 as an example where > configuring ServerRpcControllerFactory (a Phoenix implementation of > RpcControllerFactory) could result in deadlocks. Or PHOENIX-3983 where > presence of this global config causes our index rebuild code to incorrectly > use handlers it shouldn't. > If the CoprocessorHConnection created through getConnectionForEnvironment API > used the CoprocessorEnvironment config, then it would allow co-processors to > pass in their own config without needing to configure them in hbase-site.xml. > The change would be simple. Basically change the below > {code} > if (services instanceof HRegionServer) { > return new CoprocessorHConnection((HRegionServer) services); > } > {code} > to > {code} > if (services instanceof HRegionServer) { > return new CoprocessorHConnection(env.getConfiguration(), > (HRegionServer) services); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18359) CoprocessorHConnection#getConnectionForEnvironment should read config from CoprocessorEnvironment
[ https://issues.apache.org/jira/browse/HBASE-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258239#comment-16258239 ] Samarth Jain commented on HBASE-18359: -- We need our connections to have a different configuration that the one in hbase-site.xml. The current custom configs mostly are related to remote RPCs, so probably in case of short-circuit we can use the same connection as the RS one. In future, it may change. So to avoid the any such hard to catch regression, I think the best approach would be to just let the clients manage the life cycle of the connection. Explicit documentation on the lines of "You know what you are doing. And that creating too many connections will make you run out of ZK resources, etc" would help. In Phoenix we guard against that by caching the connection. > CoprocessorHConnection#getConnectionForEnvironment should read config from > CoprocessorEnvironment > - > > Key: HBASE-18359 > URL: https://issues.apache.org/jira/browse/HBASE-18359 > Project: HBase > Issue Type: Bug >Reporter: Samarth Jain > Fix For: 2.0.0 > > > It seems like the method getConnectionForEnvironment isn't doing the right > thing when it is creating a CoprocessorHConnection by reading the config from > HRegionServer and not from the env passed in. > If coprocessors want to use a CoprocessorHConnection with some custom config > settings, then they have no option but to configure it in the hbase-site.xml > of the region servers. This isn't ideal as a lot of times these "global" > level configs can have side effects. See PHOENIX-3974 as an example where > configuring ServerRpcControllerFactory (a Phoenix implementation of > RpcControllerFactory) could result in deadlocks. Or PHOENIX-3983 where > presence of this global config causes our index rebuild code to incorrectly > use handlers it shouldn't. > If the CoprocessorHConnection created through getConnectionForEnvironment API > used the CoprocessorEnvironment config, then it would allow co-processors to > pass in their own config without needing to configure them in hbase-site.xml. > The change would be simple. Basically change the below > {code} > if (services instanceof HRegionServer) { > return new CoprocessorHConnection((HRegionServer) services); > } > {code} > to > {code} > if (services instanceof HRegionServer) { > return new CoprocessorHConnection(env.getConfiguration(), > (HRegionServer) services); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HBASE-18359) CoprocessorHConnection#getConnectionForEnvironment should read config from CoprocessorEnvironment
[ https://issues.apache.org/jira/browse/HBASE-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257476#comment-16257476 ] Samarth Jain edited comment on HBASE-18359 at 11/17/17 7:50 PM: bq. On older versions also it is not as the CoprocessorHConnection#getConnectionForEnvironment was taking a CoprocessorEnvironment instance not config The issue is present in older versions. Instead of using the env passed in, the code was using the environment of HRegionServer which is what the description also talks about. {code} if (env instanceof RegionCoprocessorEnvironment) { RegionCoprocessorEnvironment e = (RegionCoprocessorEnvironment) env; RegionServerServices services = e.getRegionServerServices(); if (services instanceof HRegionServer) { return new CoprocessorHConnection((HRegionServer) services); } } {code} bq. What we need in HBase is an API CoprocessorEnvironment#getConnection(Config). This call will always make new connection (which is short circuit enabled). So caching of this and reuse the callee has to take care. Is that ok? I think that should work, [~anoop.hbase]. We already cache the HConnection in CoprocessorHConnectionTableFactory by doing this: {code} private synchronized HConnection getConnection(Configuration conf) throws IOException { if (connection == null || connection.isClosed()) { connection = new CoprocessorHConnection(conf, server); } return connection; } {code} It would be good if the API has explicit documentation saying it is the caller's responsibility to make sure the connection returned by the getConnection(config) API is appropriately closed. was (Author: samarthjain): bq. On older versions also it is not as the CoprocessorHConnection#getConnectionForEnvironment was taking a CoprocessorEnvironment instance not config The issue is present in older versions. Instead of using the env passed in, the code was using the environment of HRegionServer which is what the description also talks about. {code if (env instanceof RegionCoprocessorEnvironment) { RegionCoprocessorEnvironment e = (RegionCoprocessorEnvironment) env; RegionServerServices services = e.getRegionServerServices(); if (services instanceof HRegionServer) { return new CoprocessorHConnection((HRegionServer) services); } } {code} bq. What we need in HBase is an API CoprocessorEnvironment#getConnection(Config). This call will always make new connection (which is short circuit enabled). So caching of this and reuse the callee has to take care. Is that ok? I think that should work, [~anoop.hbase]. We already cache the HConnection in CoprocessorHConnectionTableFactory by doing this: {code} private synchronized HConnection getConnection(Configuration conf) throws IOException { if (connection == null || connection.isClosed()) { connection = new CoprocessorHConnection(conf, server); } return connection; } {code} It would be good if the API has explicit documentation saying it is the caller's responsibility to make sure the connection returned by the getConnection(config) API is appropriately closed. > CoprocessorHConnection#getConnectionForEnvironment should read config from > CoprocessorEnvironment > - > > Key: HBASE-18359 > URL: https://issues.apache.org/jira/browse/HBASE-18359 > Project: HBase > Issue Type: Bug >Reporter: Samarth Jain > Fix For: 2.0.0 > > > It seems like the method getConnectionForEnvironment isn't doing the right > thing when it is creating a CoprocessorHConnection by reading the config from > HRegionServer and not from the env passed in. > If coprocessors want to use a CoprocessorHConnection with some custom config > settings, then they have no option but to configure it in the hbase-site.xml > of the region servers. This isn't ideal as a lot of times these "global" > level configs can have side effects. See PHOENIX-3974 as an example where > configuring ServerRpcControllerFactory (a Phoenix implementation of > RpcControllerFactory) could result in deadlocks. Or PHOENIX-3983 where > presence of this global config causes our index rebuild code to incorrectly > use handlers it shouldn't. > If the CoprocessorHConnection created through getConnectionForEnvironment API > used the CoprocessorEnvironment config, then it would allow co-processors to > pass in their own config without needing to configure them in hbase-site.xml. > The change would be simple. Basically change the below > {code} > if (services instanceof HRegionServer) { > return new CoprocessorHConnection((HRegionServer) services); > } >
[jira] [Commented] (HBASE-18359) CoprocessorHConnection#getConnectionForEnvironment should read config from CoprocessorEnvironment
[ https://issues.apache.org/jira/browse/HBASE-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257476#comment-16257476 ] Samarth Jain commented on HBASE-18359: -- bq. On older versions also it is not as the CoprocessorHConnection#getConnectionForEnvironment was taking a CoprocessorEnvironment instance not config The issue is present in older versions. Instead of using the env passed in, the code was using the environment of HRegionServer which is what the description also talks about. {code if (env instanceof RegionCoprocessorEnvironment) { RegionCoprocessorEnvironment e = (RegionCoprocessorEnvironment) env; RegionServerServices services = e.getRegionServerServices(); if (services instanceof HRegionServer) { return new CoprocessorHConnection((HRegionServer) services); } } {code} bq. What we need in HBase is an API CoprocessorEnvironment#getConnection(Config). This call will always make new connection (which is short circuit enabled). So caching of this and reuse the callee has to take care. Is that ok? I think that should work, [~anoop.hbase]. We already cache the HConnection in CoprocessorHConnectionTableFactory by doing this: {code} private synchronized HConnection getConnection(Configuration conf) throws IOException { if (connection == null || connection.isClosed()) { connection = new CoprocessorHConnection(conf, server); } return connection; } {code} It would be good if the API has explicit documentation saying it is the caller's responsibility to make sure the connection returned by the getConnection(config) API is appropriately closed. > CoprocessorHConnection#getConnectionForEnvironment should read config from > CoprocessorEnvironment > - > > Key: HBASE-18359 > URL: https://issues.apache.org/jira/browse/HBASE-18359 > Project: HBase > Issue Type: Bug >Reporter: Samarth Jain > Fix For: 2.0.0 > > > It seems like the method getConnectionForEnvironment isn't doing the right > thing when it is creating a CoprocessorHConnection by reading the config from > HRegionServer and not from the env passed in. > If coprocessors want to use a CoprocessorHConnection with some custom config > settings, then they have no option but to configure it in the hbase-site.xml > of the region servers. This isn't ideal as a lot of times these "global" > level configs can have side effects. See PHOENIX-3974 as an example where > configuring ServerRpcControllerFactory (a Phoenix implementation of > RpcControllerFactory) could result in deadlocks. Or PHOENIX-3983 where > presence of this global config causes our index rebuild code to incorrectly > use handlers it shouldn't. > If the CoprocessorHConnection created through getConnectionForEnvironment API > used the CoprocessorEnvironment config, then it would allow co-processors to > pass in their own config without needing to configure them in hbase-site.xml. > The change would be simple. Basically change the below > {code} > if (services instanceof HRegionServer) { > return new CoprocessorHConnection((HRegionServer) services); > } > {code} > to > {code} > if (services instanceof HRegionServer) { > return new CoprocessorHConnection(env.getConfiguration(), > (HRegionServer) services); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18359) CoprocessorHConnection#getConnectionForEnvironment should read config from CoprocessorEnvironment
[ https://issues.apache.org/jira/browse/HBASE-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256704#comment-16256704 ] Samarth Jain commented on HBASE-18359: -- We have been cloning config to make sure the changes we are making do not introduce side effects on other regions of the region server. Something like this: {code} /* * We need to create a copy of region's configuration since we don't want any side effect of * setting the RpcControllerFactory. */ clonedConfig = PropertiesUtil.cloneConfig(e.getConfiguration()); {code} On this clonedConfig we also set the various timeout and retry related configs that [~jamestaylor] mentioned. This clonedConfig is then to our CoprocessorHConnectionTableFactory that makes sure that the HConnection used by the HTables generated from this factory use this cloned config. The HConnection is today created by doing this: {code} new CoprocessorHConnection(clonedConfig, server) {code} So we would need a way of creating this "short circuit optimized" HConnection by having it take a configuration object. > CoprocessorHConnection#getConnectionForEnvironment should read config from > CoprocessorEnvironment > - > > Key: HBASE-18359 > URL: https://issues.apache.org/jira/browse/HBASE-18359 > Project: HBase > Issue Type: Bug >Reporter: Samarth Jain > Fix For: 2.0.0 > > > It seems like the method getConnectionForEnvironment isn't doing the right > thing when it is creating a CoprocessorHConnection by reading the config from > HRegionServer and not from the env passed in. > If coprocessors want to use a CoprocessorHConnection with some custom config > settings, then they have no option but to configure it in the hbase-site.xml > of the region servers. This isn't ideal as a lot of times these "global" > level configs can have side effects. See PHOENIX-3974 as an example where > configuring ServerRpcControllerFactory (a Phoenix implementation of > RpcControllerFactory) could result in deadlocks. Or PHOENIX-3983 where > presence of this global config causes our index rebuild code to incorrectly > use handlers it shouldn't. > If the CoprocessorHConnection created through getConnectionForEnvironment API > used the CoprocessorEnvironment config, then it would allow co-processors to > pass in their own config without needing to configure them in hbase-site.xml. > The change would be simple. Basically change the below > {code} > if (services instanceof HRegionServer) { > return new CoprocessorHConnection((HRegionServer) services); > } > {code} > to > {code} > if (services instanceof HRegionServer) { > return new CoprocessorHConnection(env.getConfiguration(), > (HRegionServer) services); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18735) Provide a fast mechanism for shutting down mini cluster
Samarth Jain created HBASE-18735: Summary: Provide a fast mechanism for shutting down mini cluster Key: HBASE-18735 URL: https://issues.apache.org/jira/browse/HBASE-18735 Project: HBase Issue Type: Wish Reporter: Samarth Jain The current mechanism of shutting down a mini cluster through HBaseTestingUtility.shutDownMiniCluster can take a lot of time when the mini cluster almost has a lot of tables. A lot of this time is spent in closing all the user regions. It would be nice to have a mechanism where this shutdown can happen quickly without having to worry about closing these user regions. At the same time, this mechanism would need to make sure that all the critical system resources like file handles and network ports are still released so that subsequently initialized mini clusters on the same JVM or system won't run into resource issues. This would make testing using HBase mini clusters much faster and immensely help out test frameworks of dependent projects like Phoenix. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18734) Possible memory leak when running mini cluster
Samarth Jain created HBASE-18734: Summary: Possible memory leak when running mini cluster Key: HBASE-18734 URL: https://issues.apache.org/jira/browse/HBASE-18734 Project: HBase Issue Type: Bug Reporter: Samarth Jain As part of improving the stability of Phoenix tests, I recently did some analysis and found that when the mini cluster is not able to close all the regions properly, or if there is some other cruft left behind by a mini cluster after it has been shut down, it can result in a memory leak. The region server adds it's thread to the JVM shut down hook in HRegionServer. {code} ShutdownHook.install(conf, fs, this, Thread.currentThread()); {code} So, even if the region server thread terminates when a mini cluster is shut down, the terminated thread's object stays around. If there is any remaining cruft (regions, configuration, etc) enclosed within a region server, GC isn't able to garbage them away since they are still referred to by this terminated thread object in the shutdown hook. A possible/likely fix for this would be to call ShutdownHookManager.removeShutdownHook(regionServerThread) when the mini cluster is shut down. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18378) Cloning configuration contained in CoprocessorEnvironment doesn't work
[ https://issues.apache.org/jira/browse/HBASE-18378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086615#comment-16086615 ] Samarth Jain commented on HBASE-18378: -- Using the HBaseConfiguration method could have unintended side-effects. For ex, the HBaseConfiguration#create() method adds the HBaseConfiguration.class.getClassLoader(). {code} /** * Creates a Configuration with HBase resources * @return a Configuration with HBase resources */ public static Configuration create() { Configuration conf = new Configuration(); // In case HBaseConfiguration is loaded from a different classloader than // Configuration, conf needs to be set with appropriate class loader to resolve // HBase resources. conf.setClassLoader(HBaseConfiguration.class.getClassLoader()); return addHbaseResources(conf); } {code} So if I used {code} public static Configuration create(final Configuration that) {code} then the config returned by the above method would have the class loader set. > Cloning configuration contained in CoprocessorEnvironment doesn't work > -- > > Key: HBASE-18378 > URL: https://issues.apache.org/jira/browse/HBASE-18378 > Project: HBase > Issue Type: Bug >Reporter: Samarth Jain > > In our phoenix co-processors, we need to clone configuration passed in > CoprocessorEnvironment. > However, using the copy constructor declared in it's parent class, > Configuration, doesn't copy over anything. > For example: > {code} > CorpocessorEnvironment e > Configuration original = e.getConfiguration(); > Configuration clone = new Configuration(original); > clone.get(HConstants.ZK_SESSION_TIMEOUT) -> returns null > e.configuration.get(HConstants.ZK_SEESION_TIMEOUT) -> returns > HConstants.DEFAULT_ZK_SESSION_TIMEOUT > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18378) Cloning configuration contained in CoprocessorEnvironment doesn't work
[ https://issues.apache.org/jira/browse/HBASE-18378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086606#comment-16086606 ] Samarth Jain commented on HBASE-18378: -- Thanks for the comment, [~tedyu]. While it might just work out in our case, it seems a bit odd to me that for copying a CompoundConfiguration I need to use a method of it's sibling class, HBaseConfiguration. > Cloning configuration contained in CoprocessorEnvironment doesn't work > -- > > Key: HBASE-18378 > URL: https://issues.apache.org/jira/browse/HBASE-18378 > Project: HBase > Issue Type: Bug >Reporter: Samarth Jain > > In our phoenix co-processors, we need to clone configuration passed in > CoprocessorEnvironment. > However, using the copy constructor declared in it's parent class, > Configuration, doesn't copy over anything. > For example: > {code} > CorpocessorEnvironment e > Configuration original = e.getConfiguration(); > Configuration clone = new Configuration(original); > clone.get(HConstants.ZK_SESSION_TIMEOUT) -> returns null > e.configuration.get(HConstants.ZK_SEESION_TIMEOUT) -> returns > HConstants.DEFAULT_ZK_SESSION_TIMEOUT > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18378) Cloning configuration contained in CoprocessorEnvironment doesn't work
Samarth Jain created HBASE-18378: Summary: Cloning configuration contained in CoprocessorEnvironment doesn't work Key: HBASE-18378 URL: https://issues.apache.org/jira/browse/HBASE-18378 Project: HBase Issue Type: Bug Reporter: Samarth Jain In our phoenix co-processors, we need to clone configuration passed in CoprocessorEnvironment. However, using the copy constructor declared in it's parent class, Configuration, doesn't copy over anything. For example: {code} CorpocessorEnvironment e Configuration original = e.getConfiguration(); Configuration clone = new Configuration(original); clone.get(HConstants.ZK_SESSION_TIMEOUT) -> returns null e.configuration.get(HConstants.ZK_SEESION_TIMEOUT) -> returns HConstants.DEFAULT_ZK_SESSION_TIMEOUT {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18359) CoprocessorHConnection#getConnectionForEnvironment should read config from CoprocessorEnvironment from CoporcessorEnvironment
Samarth Jain created HBASE-18359: Summary: CoprocessorHConnection#getConnectionForEnvironment should read config from CoprocessorEnvironment from CoporcessorEnvironment Key: HBASE-18359 URL: https://issues.apache.org/jira/browse/HBASE-18359 Project: HBase Issue Type: Bug Reporter: Samarth Jain It seems like the method getConnectionForEnvironment isn't doing the right thing when it is creating a CoprocessorHConnection by reading the config from HRegionServer and not from the env passed in. If coprocessors want to use a CoprocessorHConnection with some custom config settings, then they have no option but to configure it in the hbase-site.xml of the region servers. This isn't ideal as a lot of times these "global" level configs can have side effects. See PHOENIX-3974 as an example where configuring ServerRpcControllerFactory (a Phoenix implementation of RpcControllerFactory) could result in deadlocks. Or PHOENIX-3983 where presence of this global config causes our index rebuild code to incorrectly use handlers it shouldn't. If the CoprocessorHConnection created through getConnectionForEnvironment API used the CoprocessorEnvironment config, then it would allow co-processors to pass in their own config without needing to configure them in hbase-site.xml. The change would be simple. Basically change the below {code} if (services instanceof HRegionServer) { return new CoprocessorHConnection((HRegionServer) services); } {code} to {code} if (services instanceof HRegionServer) { return new CoprocessorHConnection(env.getConfiguration(), (HRegionServer) services); } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18359) CoprocessorHConnection#getConnectionForEnvironment should read config from CoprocessorEnvironment
[ https://issues.apache.org/jira/browse/HBASE-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated HBASE-18359: - Summary: CoprocessorHConnection#getConnectionForEnvironment should read config from CoprocessorEnvironment (was: CoprocessorHConnection#getConnectionForEnvironment should read config from CoprocessorEnvironment from CoporcessorEnvironment) > CoprocessorHConnection#getConnectionForEnvironment should read config from > CoprocessorEnvironment > - > > Key: HBASE-18359 > URL: https://issues.apache.org/jira/browse/HBASE-18359 > Project: HBase > Issue Type: Bug >Reporter: Samarth Jain > > It seems like the method getConnectionForEnvironment isn't doing the right > thing when it is creating a CoprocessorHConnection by reading the config from > HRegionServer and not from the env passed in. > If coprocessors want to use a CoprocessorHConnection with some custom config > settings, then they have no option but to configure it in the hbase-site.xml > of the region servers. This isn't ideal as a lot of times these "global" > level configs can have side effects. See PHOENIX-3974 as an example where > configuring ServerRpcControllerFactory (a Phoenix implementation of > RpcControllerFactory) could result in deadlocks. Or PHOENIX-3983 where > presence of this global config causes our index rebuild code to incorrectly > use handlers it shouldn't. > If the CoprocessorHConnection created through getConnectionForEnvironment API > used the CoprocessorEnvironment config, then it would allow co-processors to > pass in their own config without needing to configure them in hbase-site.xml. > The change would be simple. Basically change the below > {code} > if (services instanceof HRegionServer) { > return new CoprocessorHConnection((HRegionServer) services); > } > {code} > to > {code} > if (services instanceof HRegionServer) { > return new CoprocessorHConnection(env.getConfiguration(), > (HRegionServer) services); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17886) Fix compatibility of ServerSideScanMetrics
[ https://issues.apache.org/jira/browse/HBASE-17886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959143#comment-15959143 ] Samarth Jain commented on HBASE-17886: -- We don't have to have HBASE-17716 in 1.3. We can use the workaround of using the metric names directly with the caveat that we would have to look into hbase source code to get hold of them. > Fix compatibility of ServerSideScanMetrics > -- > > Key: HBASE-17886 > URL: https://issues.apache.org/jira/browse/HBASE-17886 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.4.0, 1.3.1 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0, 1.4.0, 1.3.1 > > Attachments: compatibility_check_1.3.1RC0.png, HBASE-17886.patch, > HBASE-17886.v2.patch > > > In HBASE-17716 we have changed the public field name in > {{ServerSideScanMetrics}} which is IA.Public, which causes source > compatibility issue, and we propose to fix it in this JIRA. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-12790) Support fairness across parallelized scans
[ https://issues.apache.org/jira/browse/HBASE-12790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900259#comment-15900259 ] Samarth Jain edited comment on HBASE-12790 at 3/7/17 10:07 PM: --- Thanks for the pointer [~anoop.hbase]. I see that there is already an executor RWQueueRpcExecutor that gets hold of the ScanRequest in the dispatch method. {code} private boolean isScanRequest(final RequestHeader header, final Message param) { if (param instanceof ScanRequest) { // The first scan request will be executed as a "short read" ScanRequest request = (ScanRequest)param; return request.hasScannerId(); } return false; } @Override public boolean dispatch(final CallRunner callTask) throws InterruptedException { ... if (numScanQueues > 0 && isScanRequest(call.getHeader(), call.param)) { queueIndex = numWriteQueues + numReadQueues + scanBalancer.getNextQueue(); } ... {code} So yes, there is a way forward by utilizing the scan attribute for this purpose without having to add an API to Operation. Having said that, looking at the isWriteRequest method in the same class, I see that things can get gnarly/brittle/inefficient. {code} private boolean isWriteRequest(final RequestHeader header, final Message param) { // TODO: Is there a better way to do this? if (param instanceof MultiRequest) { MultiRequest multi = (MultiRequest)param; for (RegionAction regionAction : multi.getRegionActionList()) { for (Action action: regionAction.getActionList()) { if (action.hasMutation()) { return true; } } } } {code} So the ideal would be to have a generic enough API to enable clients mark read/write requests for whatever they would want to do with it on the server side. was (Author: samarthjain): Thanks for the pointer [~anoop.hbase]. I see that there is already an executor RWQueueRpcExecutor that gets hold of the ScanRequest in the dispatch method. {code} private boolean isScanRequest(final RequestHeader header, final Message param) { if (param instanceof ScanRequest) { // The first scan request will be executed as a "short read" ScanRequest request = (ScanRequest)param; return request.hasScannerId(); } return false; } @Override public boolean dispatch(final CallRunner callTask) throws InterruptedException { ... if (numScanQueues > 0 && isScanRequest(call.getHeader(), call.param)) { queueIndex = numWriteQueues + numReadQueues + scanBalancer.getNextQueue(); } ... {code} So yes, there is a way forward by utilizing the scan attribute for this purpose without having to add an API to Operation. Having said that, looking at the isWriteRequest method in the same class, I see that things can get gnarly/brittle/inefficient. {code} private boolean isWriteRequest(final RequestHeader header, final Message param) { // TODO: Is there a better way to do this? if (param instanceof MultiRequest) { MultiRequest multi = (MultiRequest)param; for (RegionAction regionAction : multi.getRegionActionList()) { for (Action action: regionAction.getActionList()) { if (action.hasMutation()) { return true; } } } } {code} > Support fairness across parallelized scans > -- > > Key: HBASE-12790 > URL: https://issues.apache.org/jira/browse/HBASE-12790 > Project: HBase > Issue Type: New Feature >Reporter: James Taylor >Assignee: ramkrishna.s.vasudevan > Labels: Phoenix > Attachments: AbstractRoundRobinQueue.java, HBASE-12790_1.patch, > HBASE-12790_5.patch, HBASE-12790_callwrapper.patch, HBASE-12790.patch, > HBASE-12790_trunk_1.patch, PHOENIX_4.5.3-HBase-0.98-2317-SNAPSHOT.zip > > > Some HBase clients parallelize the execution of a scan to reduce latency in > getting back results. This can lead to starvation with a loaded cluster and > interleaved scans, since the RPC queue will be ordered and processed on a > FIFO basis. For example, if there are two clients, A & B that submit largish > scans at the same time. Say each scan is broken down into 100 scans by the > client (broken down into equal depth chunks along the row key), and the 100 > scans of client A are queued first, followed immediately by the 100 scans of > client B. In this case, client B will be starved out of getting any results > back until the scans for client A complete. > One solution to this is to use the attached AbstractRoundRobinQueue instead > of the standard FIFO queue. The queue to be used could be (maybe it already > is) configurable based on a new config parameter. Using this queue would > require the client to have the
[jira] [Commented] (HBASE-12790) Support fairness across parallelized scans
[ https://issues.apache.org/jira/browse/HBASE-12790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900259#comment-15900259 ] Samarth Jain commented on HBASE-12790: -- Thanks for the pointer [~anoop.hbase]. I see that there is already an executor RWQueueRpcExecutor that gets hold of the ScanRequest in the dispatch method. {code} private boolean isScanRequest(final RequestHeader header, final Message param) { if (param instanceof ScanRequest) { // The first scan request will be executed as a "short read" ScanRequest request = (ScanRequest)param; return request.hasScannerId(); } return false; } @Override public boolean dispatch(final CallRunner callTask) throws InterruptedException { ... if (numScanQueues > 0 && isScanRequest(call.getHeader(), call.param)) { queueIndex = numWriteQueues + numReadQueues + scanBalancer.getNextQueue(); } ... {code} So yes, there is a way forward by utilizing the scan attribute for this purpose without having to add an API to Operation. Having said that, looking at the isWriteRequest method in the same class, I see that things can get gnarly/brittle/inefficient. {code} private boolean isWriteRequest(final RequestHeader header, final Message param) { // TODO: Is there a better way to do this? if (param instanceof MultiRequest) { MultiRequest multi = (MultiRequest)param; for (RegionAction regionAction : multi.getRegionActionList()) { for (Action action: regionAction.getActionList()) { if (action.hasMutation()) { return true; } } } } {code} > Support fairness across parallelized scans > -- > > Key: HBASE-12790 > URL: https://issues.apache.org/jira/browse/HBASE-12790 > Project: HBase > Issue Type: New Feature >Reporter: James Taylor >Assignee: ramkrishna.s.vasudevan > Labels: Phoenix > Attachments: AbstractRoundRobinQueue.java, HBASE-12790_1.patch, > HBASE-12790_5.patch, HBASE-12790_callwrapper.patch, HBASE-12790.patch, > HBASE-12790_trunk_1.patch, PHOENIX_4.5.3-HBase-0.98-2317-SNAPSHOT.zip > > > Some HBase clients parallelize the execution of a scan to reduce latency in > getting back results. This can lead to starvation with a loaded cluster and > interleaved scans, since the RPC queue will be ordered and processed on a > FIFO basis. For example, if there are two clients, A & B that submit largish > scans at the same time. Say each scan is broken down into 100 scans by the > client (broken down into equal depth chunks along the row key), and the 100 > scans of client A are queued first, followed immediately by the 100 scans of > client B. In this case, client B will be starved out of getting any results > back until the scans for client A complete. > One solution to this is to use the attached AbstractRoundRobinQueue instead > of the standard FIFO queue. The queue to be used could be (maybe it already > is) configurable based on a new config parameter. Using this queue would > require the client to have the same identifier for all of the 100 parallel > scans that represent a single logical scan from the clients point of view. > With this information, the round robin queue would pick off a task from the > queue in a round robin fashion (instead of a strictly FIFO manner) to prevent > starvation over interleaved parallelized scans. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-12790) Support fairness across parallelized scans
[ https://issues.apache.org/jira/browse/HBASE-12790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899725#comment-15899725 ] Samarth Jain commented on HBASE-12790: -- [~anoop.hbase] - I don't see an obvious way of getting hold of scan attribute at the RpcScheduler level. Maybe I am missing something? > Support fairness across parallelized scans > -- > > Key: HBASE-12790 > URL: https://issues.apache.org/jira/browse/HBASE-12790 > Project: HBase > Issue Type: New Feature >Reporter: James Taylor >Assignee: ramkrishna.s.vasudevan > Labels: Phoenix > Attachments: AbstractRoundRobinQueue.java, HBASE-12790_1.patch, > HBASE-12790_5.patch, HBASE-12790_callwrapper.patch, HBASE-12790.patch, > HBASE-12790_trunk_1.patch, PHOENIX_4.5.3-HBase-0.98-2317-SNAPSHOT.zip > > > Some HBase clients parallelize the execution of a scan to reduce latency in > getting back results. This can lead to starvation with a loaded cluster and > interleaved scans, since the RPC queue will be ordered and processed on a > FIFO basis. For example, if there are two clients, A & B that submit largish > scans at the same time. Say each scan is broken down into 100 scans by the > client (broken down into equal depth chunks along the row key), and the 100 > scans of client A are queued first, followed immediately by the 100 scans of > client B. In this case, client B will be starved out of getting any results > back until the scans for client A complete. > One solution to this is to use the attached AbstractRoundRobinQueue instead > of the standard FIFO queue. The queue to be used could be (maybe it already > is) configurable based on a new config parameter. Using this queue would > require the client to have the same identifier for all of the 100 parallel > scans that represent a single logical scan from the clients point of view. > With this information, the round robin queue would pick off a task from the > queue in a round robin fashion (instead of a strictly FIFO manner) to prevent > starvation over interleaved parallelized scans. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-12790) Support fairness across parallelized scans
[ https://issues.apache.org/jira/browse/HBASE-12790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898836#comment-15898836 ] Samarth Jain commented on HBASE-12790: -- [~lhofhansl] - just having a pluggable RPCScheduler won't do it. Assuming we are targeting only queries for now, we need to be able to tag/mark scans belonging to a query with the same identifier. This information will be then used by the round robin queue to do the round-robin of groups thing. {code} @Override public boolean dispatch(final CallRunner callTask) throws InterruptedException { roundRobinQueue.offer(callTask); } {code} > Support fairness across parallelized scans > -- > > Key: HBASE-12790 > URL: https://issues.apache.org/jira/browse/HBASE-12790 > Project: HBase > Issue Type: New Feature >Reporter: James Taylor >Assignee: ramkrishna.s.vasudevan > Labels: Phoenix > Attachments: AbstractRoundRobinQueue.java, HBASE-12790_1.patch, > HBASE-12790_5.patch, HBASE-12790_callwrapper.patch, HBASE-12790.patch, > HBASE-12790_trunk_1.patch, PHOENIX_4.5.3-HBase-0.98-2317-SNAPSHOT.zip > > > Some HBase clients parallelize the execution of a scan to reduce latency in > getting back results. This can lead to starvation with a loaded cluster and > interleaved scans, since the RPC queue will be ordered and processed on a > FIFO basis. For example, if there are two clients, A & B that submit largish > scans at the same time. Say each scan is broken down into 100 scans by the > client (broken down into equal depth chunks along the row key), and the 100 > scans of client A are queued first, followed immediately by the 100 scans of > client B. In this case, client B will be starved out of getting any results > back until the scans for client A complete. > One solution to this is to use the attached AbstractRoundRobinQueue instead > of the standard FIFO queue. The queue to be used could be (maybe it already > is) configurable based on a new config parameter. Using this queue would > require the client to have the same identifier for all of the 100 parallel > scans that represent a single logical scan from the clients point of view. > With this information, the round robin queue would pick off a task from the > queue in a round robin fashion (instead of a strictly FIFO manner) to prevent > starvation over interleaved parallelized scans. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-17716) Formalize Scan Metric names
[ https://issues.apache.org/jira/browse/HBASE-17716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897909#comment-15897909 ] Samarth Jain edited comment on HBASE-17716 at 3/6/17 7:47 PM: -- Question though - should we be addressing all the metrics that are exposed by HBase in this patch and see if they can be annotated with a @Metric annotation on a case by case basis? What about metrics in classes annotated as @Private ? Or should we just address Scan metrics in this patch and file a subsequent JIRA for rest of the metrics? was (Author: samarthjain): Question though - should we be addressing all the metrics that are exposed by HBase in this patch and see if they can be annotated with a @Metric annotation on a case by case basis? What about metrics in classes annotated as @Private ? > Formalize Scan Metric names > --- > > Key: HBASE-17716 > URL: https://issues.apache.org/jira/browse/HBASE-17716 > Project: HBase > Issue Type: Bug > Components: metrics >Reporter: Karan Mehta >Assignee: Karan Mehta >Priority: Minor > Attachments: HBASE-17716.patch > > > HBase provides various metrics through the API's exposed by ScanMetrics > class. > The JIRA PHOENIX-3248 requires them to be surfaced through the Phoenix > Metrics API. Currently these metrics are referred via hard-coded strings, > which are not formal and can break the Phoenix API. Hence we need to refactor > the code to assign enums for these metrics. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17716) Formalize Scan Metric names
[ https://issues.apache.org/jira/browse/HBASE-17716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897909#comment-15897909 ] Samarth Jain commented on HBASE-17716: -- Question though - should we be addressing all the metrics that are exposed by HBase in this patch and see if they can be annotated with a @Metric annotation on a case by case basis? What about metrics in classes annotated as @Private ? > Formalize Scan Metric names > --- > > Key: HBASE-17716 > URL: https://issues.apache.org/jira/browse/HBASE-17716 > Project: HBase > Issue Type: Bug > Components: metrics >Reporter: Karan Mehta >Assignee: Karan Mehta >Priority: Minor > Attachments: HBASE-17716.patch > > > HBase provides various metrics through the API's exposed by ScanMetrics > class. > The JIRA PHOENIX-3248 requires them to be surfaced through the Phoenix > Metrics API. Currently these metrics are referred via hard-coded strings, > which are not formal and can break the Phoenix API. Hence we need to refactor > the code to assign enums for these metrics. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17716) Formalize Scan Metric names
[ https://issues.apache.org/jira/browse/HBASE-17716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897904#comment-15897904 ] Samarth Jain commented on HBASE-17716: -- +1 to the approach you have adopted, [~saint@gmail.com]. This is enough for us - metric names exposed as constants, with the _METRIC_NAME suffix as convention and having them annotated as @Metric to denote that they shouldn't be changed. > Formalize Scan Metric names > --- > > Key: HBASE-17716 > URL: https://issues.apache.org/jira/browse/HBASE-17716 > Project: HBase > Issue Type: Bug > Components: metrics >Reporter: Karan Mehta >Assignee: Karan Mehta >Priority: Minor > Attachments: HBASE-17716.patch > > > HBase provides various metrics through the API's exposed by ScanMetrics > class. > The JIRA PHOENIX-3248 requires them to be surfaced through the Phoenix > Metrics API. Currently these metrics are referred via hard-coded strings, > which are not formal and can break the Phoenix API. Hence we need to refactor > the code to assign enums for these metrics. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17716) Formalize Scan Metric names
[ https://issues.apache.org/jira/browse/HBASE-17716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895423#comment-15895423 ] Samarth Jain commented on HBASE-17716: -- +1 for @InterfaceAudience.Public(HBaseInterfaceAudience.METRIC) > Formalize Scan Metric names > --- > > Key: HBASE-17716 > URL: https://issues.apache.org/jira/browse/HBASE-17716 > Project: HBase > Issue Type: Bug > Components: metrics >Reporter: Karan Mehta >Assignee: Karan Mehta >Priority: Minor > Attachments: HBASE-17716.patch > > > HBase provides various metrics through the API's exposed by ScanMetrics > class. > The JIRA PHOENIX-3248 requires them to be surfaced through the Phoenix > Metrics API. Currently these metrics are referred via hard-coded strings, > which are not formal and can break the Phoenix API. Hence we need to refactor > the code to assign enums for these metrics. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17716) Formalize Scan Metric names
[ https://issues.apache.org/jira/browse/HBASE-17716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894766#comment-15894766 ] Samarth Jain commented on HBASE-17716: -- bq. Phoenix could read the MBeans. Are the scan metrics in MBeans for all the scans or are they per scan level? bq. So, enums. Is the above the only advantage you see to enum'ing all of our metrics? Its minor, no? You have some perf stats to go along w/ above? It's not a big deal really. We could have metric names defined as public static final String too with their declarations annotated with the @Metric annotation to tell HBase developers not to change these names. No perf stats as such. I was going more by the java-doc of enum map. {code} * Enum maps * are represented internally as arrays. This representation is extremely * compact and efficient. {code} I was thinking more in terms of map collisions and having to expand the map when more metrics get added. Not sure if the argument would apply here though since number of metrics are limited. bq. And would this be just for the way phoenix accesses the metrics or could hbase benefit too? One benefit I can think of is that in the proto-buf serialization, if we have metrics as enums, we can get away with just serializing the ordinal position of the enum instead of sending the entire metric string. It will save us a few bytes of network. > Formalize Scan Metric names > --- > > Key: HBASE-17716 > URL: https://issues.apache.org/jira/browse/HBASE-17716 > Project: HBase > Issue Type: Bug > Components: metrics >Reporter: Karan Mehta >Assignee: Karan Mehta >Priority: Minor > Attachments: HBASE-17716.patch > > > HBase provides various metrics through the API's exposed by ScanMetrics > class. > The JIRA PHOENIX-3248 requires them to be surfaced through the Phoenix > Metrics API. Currently these metrics are referred via hard-coded strings, > which are not formal and can break the Phoenix API. Hence we need to refactor > the code to assign enums for these metrics. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17716) Formalize Scan Metric names
[ https://issues.apache.org/jira/browse/HBASE-17716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893600#comment-15893600 ] Samarth Jain commented on HBASE-17716: -- In Phoenix, we have a framework where we collect metrics for every sql statement. The idea of PHOENIX-3248 is to include these scan metrics collected by HBase to provide information on how much work the scans are doing for a sql. I am guessing the dump in JMX or metrics page is probably just some aggregated info so may not be useful for us. bq. Is there a history of our randomly changing metric names out from under phoenix (other than at say key junctions such as a major release?). Well, this is the first time that we are exposing HBase scan metrics via phoenix. We would like to have these metric names as constants to provide users capability of looking up metrics of their choice via static metric names. bq. And if enum'ing has a value, should we do it for all metrics rather than just a subset as here? Enums are just convenient. It could very well be strings defined as public static final :). With enums though we can use an EnumMap which is more performant and compact as compared to a HashMap. > Formalize Scan Metric names > --- > > Key: HBASE-17716 > URL: https://issues.apache.org/jira/browse/HBASE-17716 > Project: HBase > Issue Type: Bug > Components: metrics >Reporter: Karan Mehta >Assignee: Karan Mehta >Priority: Minor > Attachments: HBASE-17716.patch > > > HBase provides various metrics through the API's exposed by ScanMetrics > class. > The JIRA PHOENIX-3248 requires them to be surfaced through the Phoenix > Metrics API. Currently these metrics are referred via hard-coded strings, > which are not formal and can break the Phoenix API. Hence we need to refactor > the code to assign enums for these metrics. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17716) Formalize Scan Metric names
[ https://issues.apache.org/jira/browse/HBASE-17716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892816#comment-15892816 ] Samarth Jain commented on HBASE-17716: -- [~saint@gmail.com] - the general idea behind this JIRA was to make sure that users of scan metrics like Phoenix can guard against the metric name getting changed behind the scenes. The metric keys aren't exposed today unless someone goes and looks at the HBase source code. Having a metric enum sort of formalizes the contract of the API instead of having plain strings. On a side note, I am not sure of the motivation behind exposing the setCounter() like methods in the ServerSideScanMetrics class. Was it intended to be like a grab-bag where clients can add and update whatever metrics they would like to? If not, then we should really get rid of such methods, and simply initialize the backing map by creating counters for all the enums in the Metric enum. > Formalize Scan Metric names > --- > > Key: HBASE-17716 > URL: https://issues.apache.org/jira/browse/HBASE-17716 > Project: HBase > Issue Type: Bug > Components: metrics >Reporter: Karan Mehta >Assignee: Karan Mehta >Priority: Minor > Attachments: HBASE-17716.patch > > > HBase provides various metrics through the API's exposed by ScanMetrics > class. > The JIRA PHOENIX-3248 requires them to be surfaced through the Phoenix > Metrics API. Currently these metrics are referred via hard-coded strings, > which are not formal and can break the Phoenix API. Hence we need to refactor > the code to assign enums for these metrics. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HBASE-17714) Client heartbeats seems to be broken
[ https://issues.apache.org/jira/browse/HBASE-17714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain resolved HBASE-17714. -- Resolution: Not A Bug > Client heartbeats seems to be broken > > > Key: HBASE-17714 > URL: https://issues.apache.org/jira/browse/HBASE-17714 > Project: HBase > Issue Type: Bug >Reporter: Samarth Jain > > We have a test in Phoenix where we introduce an artificial sleep of 2 times > the RPC timeout in preScannerNext() hook of a co-processor. > {code} > public static class SleepingRegionObserver extends SimpleRegionObserver { > public SleepingRegionObserver() {} > > @Override > public boolean preScannerNext(final > ObserverContext c, > final InternalScanner s, final List results, > final int limit, final boolean hasMore) throws IOException { > try { > if (SLEEP_NOW && > c.getEnvironment().getRegion().getRegionInfo().getTable().getNameAsString().equals(TABLE_NAME)) > { > Thread.sleep(RPC_TIMEOUT * 2); > } > } catch (InterruptedException e) { > throw new IOException(e); > } > return super.preScannerNext(c, s, results, limit, hasMore); > } > } > {code} > This test was passing fine till 1.1.3 but started failing sometime before > 1.1.9 with an OutOfOrderScannerException. See PHOENIX-3702. [~lhofhansl] > mentioned that we have client heartbeats enabled and that should prevent us > from running into issues like this. FYI, this test fails with 1.2.3 version > of HBase too. > CC [~apurtell], [~jamestaylor] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17714) Client heartbeats seems to be broken
[ https://issues.apache.org/jira/browse/HBASE-17714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892681#comment-15892681 ] Samarth Jain commented on HBASE-17714: -- This eventually turned out to be an issue in the test. With HBase 1.1.4 or before, the test was passing because the RPC timeout wasn't honored which was fixed in 1.1.5. With 1.1.5 and beyond, this test started acting up since the actual timeout that it should instead have been overriding was the server side setting of hbase.client.scanner.timeout. > Client heartbeats seems to be broken > > > Key: HBASE-17714 > URL: https://issues.apache.org/jira/browse/HBASE-17714 > Project: HBase > Issue Type: Bug >Reporter: Samarth Jain > > We have a test in Phoenix where we introduce an artificial sleep of 2 times > the RPC timeout in preScannerNext() hook of a co-processor. > {code} > public static class SleepingRegionObserver extends SimpleRegionObserver { > public SleepingRegionObserver() {} > > @Override > public boolean preScannerNext(final > ObserverContext c, > final InternalScanner s, final List results, > final int limit, final boolean hasMore) throws IOException { > try { > if (SLEEP_NOW && > c.getEnvironment().getRegion().getRegionInfo().getTable().getNameAsString().equals(TABLE_NAME)) > { > Thread.sleep(RPC_TIMEOUT * 2); > } > } catch (InterruptedException e) { > throw new IOException(e); > } > return super.preScannerNext(c, s, results, limit, hasMore); > } > } > {code} > This test was passing fine till 1.1.3 but started failing sometime before > 1.1.9 with an OutOfOrderScannerException. See PHOENIX-3702. [~lhofhansl] > mentioned that we have client heartbeats enabled and that should prevent us > from running into issues like this. FYI, this test fails with 1.2.3 version > of HBase too. > CC [~apurtell], [~jamestaylor] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-17716) Formalize Scan Metric names
[ https://issues.apache.org/jira/browse/HBASE-17716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891425#comment-15891425 ] Samarth Jain edited comment on HBASE-17716 at 3/2/17 1:35 AM: -- [~tedyu] - I see that the ServerSideScanMetrics class is marked as @InterfaceStability.Evolving. Would it be a bad thing if we break compatibility for the next minor release? Today, the way we have exposed the setCounter() api, we are letting users supply random counter names. Such random metrics wouldn't really be of use since the code would never update them. So IMHO it is better to enforce metric types via enum. If we want it for older versions of HBase, I guess we can just have constant strings defined in ScanMetrics or ServerSideScanMetrics classes. was (Author: samarthjain): [~tedyu] - I see that the ServerSideScanMetrics class is marked as @InterfaceStability.Evolving. Would it be a bad thing if we break compatibility for the next minor release? Today, the way we have exposed the setCounter() api, we are letting users supply random counter names. Such random metrics wouldn't really be of use since the code would never update them. So IMHO it is better to enforce metric types via enum. > Formalize Scan Metric names > --- > > Key: HBASE-17716 > URL: https://issues.apache.org/jira/browse/HBASE-17716 > Project: HBase > Issue Type: Bug > Components: metrics >Reporter: Karan Mehta >Assignee: Karan Mehta >Priority: Minor > Attachments: HBASE-17716.patch > > > HBase provides various metrics through the API's exposed by ScanMetrics > class. > The JIRA PHOENIX-3248 requires them to be surfaced through the Phoenix > Metrics API. Currently these metrics are referred via hard-coded strings, > which are not formal and can break the Phoenix API. Hence we need to refactor > the code to assign enums for these metrics. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17716) Formalize Scan Metric names
[ https://issues.apache.org/jira/browse/HBASE-17716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891425#comment-15891425 ] Samarth Jain commented on HBASE-17716: -- [~tedyu] - I see that the ServerSideScanMetrics class is marked as @InterfaceStability.Evolving. Would it be a bad thing if we break compatibility for the next minor release? Today, the way we have exposed the setCounter() api, we are letting users supply random counter names. Such random metrics wouldn't really be of use since the code would never update them. So IMHO it is better to enforce metric types via enum. > Formalize Scan Metric names > --- > > Key: HBASE-17716 > URL: https://issues.apache.org/jira/browse/HBASE-17716 > Project: HBase > Issue Type: Bug > Components: metrics >Reporter: Karan Mehta >Assignee: Karan Mehta >Priority: Minor > Attachments: HBASE-17716.patch > > > HBase provides various metrics through the API's exposed by ScanMetrics > class. > The JIRA PHOENIX-3248 requires them to be surfaced through the Phoenix > Metrics API. Currently these metrics are referred via hard-coded strings, > which are not formal and can break the Phoenix API. Hence we need to refactor > the code to assign enums for these metrics. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17714) Client heartbeats seems to be broken
[ https://issues.apache.org/jira/browse/HBASE-17714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891301#comment-15891301 ] Samarth Jain commented on HBASE-17714: -- Thanks for the investigation, [~apurtell]. I will make config changes in the test to increase the frequency of heartbeat checks and to see if enabling renewing leases would help. For the latter, my guess is that it wouldn't help because the call to renew lease is synchronized from the client side and would be blocked till scanner.next() returns. > Client heartbeats seems to be broken > > > Key: HBASE-17714 > URL: https://issues.apache.org/jira/browse/HBASE-17714 > Project: HBase > Issue Type: Bug >Reporter: Samarth Jain > > We have a test in Phoenix where we introduce an artificial sleep of 2 times > the RPC timeout in preScannerNext() hook of a co-processor. > {code} > public static class SleepingRegionObserver extends SimpleRegionObserver { > public SleepingRegionObserver() {} > > @Override > public boolean preScannerNext(final > ObserverContext c, > final InternalScanner s, final List results, > final int limit, final boolean hasMore) throws IOException { > try { > if (SLEEP_NOW && > c.getEnvironment().getRegion().getRegionInfo().getTable().getNameAsString().equals(TABLE_NAME)) > { > Thread.sleep(RPC_TIMEOUT * 2); > } > } catch (InterruptedException e) { > throw new IOException(e); > } > return super.preScannerNext(c, s, results, limit, hasMore); > } > } > {code} > This test was passing fine till 1.1.3 but started failing sometime before > 1.1.9 with an OutOfOrderScannerException. See PHOENIX-3702. [~lhofhansl] > mentioned that we have client heartbeats enabled and that should prevent us > from running into issues like this. FYI, this test fails with 1.2.3 version > of HBase too. > CC [~apurtell], [~jamestaylor] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17714) Client heartbeats seems to be broken
Samarth Jain created HBASE-17714: Summary: Client heartbeats seems to be broken Key: HBASE-17714 URL: https://issues.apache.org/jira/browse/HBASE-17714 Project: HBase Issue Type: Bug Reporter: Samarth Jain We have a test in Phoenix where we introduce an artificial sleep of 2 times the RPC timeout in preScannerNext() hook of a co-processor. {code} public static class SleepingRegionObserver extends SimpleRegionObserver { public SleepingRegionObserver() {} @Override public boolean preScannerNext(final ObserverContext c, final InternalScanner s, final List results, final int limit, final boolean hasMore) throws IOException { try { if (SLEEP_NOW && c.getEnvironment().getRegion().getRegionInfo().getTable().getNameAsString().equals(TABLE_NAME)) { Thread.sleep(RPC_TIMEOUT * 2); } } catch (InterruptedException e) { throw new IOException(e); } return super.preScannerNext(c, s, results, limit, hasMore); } } {code} This test was passing fine till 1.1.3 but started failing sometime before 1.1.9 with an OutOfOrderScannerException. See PHOENIX-3702. [~lhofhansl] mentioned that we have client heartbeats enabled and that should prevent us from running into issues like this. FYI, this test fails with 1.2.3 version of HBase too. CC [~apurtell], [~jamestaylor] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17300) Concurrently calling checkAndPut with expected value as null returns true unexpectedly
[ https://issues.apache.org/jira/browse/HBASE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746291#comment-15746291 ] Samarth Jain commented on HBASE-17300: -- Hmm. Sounds like the documentation needs to be updated? {code} /** * Creates a new table. * Synchronous operation. * * @param desc table descriptor for table * * @throws IllegalArgumentException if the table name is reserved * @throws MasterNotRunningException if master is not running * @throws TableExistsException if table already exists (If concurrent * threads, the table may have been created between test-for-existence * and attempt-at-creation). * @throws IOException if a remote or network exception occurs */ public void createTable(HTableDescriptor desc) {code} > Concurrently calling checkAndPut with expected value as null returns true > unexpectedly > -- > > Key: HBASE-17300 > URL: https://issues.apache.org/jira/browse/HBASE-17300 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.23, 1.2.4 >Reporter: Samarth Jain > Attachments: HBASE-17300.patch > > > Attached is the test case. I have added some comments so hopefully the test > makes sense. It actually is causing test failures on the Phoenix branches. > The test fails consistently using HBase-0.98.23. It exhibits flappy behavior > with the 1.2 branch (failed twice in 5 tries). > {code} > @Test > public void testNullCheckAndPut() throws Exception { > try (HBaseAdmin admin = TEST_UTIL.getHBaseAdmin()) { > Callable c1 = new CheckAndPutCallable(); > Callable c2 = new CheckAndPutCallable(); > ExecutorService e = Executors.newFixedThreadPool(5); > Future f1 = e.submit(c1); > Future f2 = e.submit(c2); > assertTrue(f1.get() || f2.get()); > assertFalse(f1.get() && f2.get()); > } > } > } > > > private static final class CheckAndPutCallable implements > Callable { > @Override > public Boolean call() throws Exception { > byte[] rowToLock = "ROW".getBytes(); > byte[] colFamily = "COLUMN_FAMILY".getBytes(); > byte[] column = "COLUMN".getBytes(); > byte[] newValue = "NEW_VALUE".getBytes(); > byte[] oldValue = "OLD_VALUE".getBytes(); > byte[] tableName = "table".getBytes(); > boolean acquired = false; > try (HBaseAdmin admin = TEST_UTIL.getHBaseAdmin()) { > HTableDescriptor tableDesc = new > HTableDescriptor(TableName.valueOf(tableName)); > HColumnDescriptor columnDesc = new > HColumnDescriptor(colFamily); > columnDesc.setTimeToLive(600); > tableDesc.addFamily(columnDesc); > try { > admin.createTable(tableDesc); > } catch (TableExistsException e) { > // ignore > } > try (HTableInterface table = > admin.getConnection().getTable(tableName)) { > Put put = new Put(rowToLock); > put.add(colFamily, column, oldValue); // add a row > with column set to oldValue > table.put(put); > put = new Put(rowToLock); > put.add(colFamily, column, newValue); > // only one of the threads should be able to get > return value of true for the expected value of oldValue > acquired = table.checkAndPut(rowToLock, colFamily, > column, oldValue, put); > if (!acquired) { >// if a thread didn't get true before, then it > shouldn't get true this time either >// because the column DOES exist >acquired = table.checkAndPut(rowToLock, colFamily, > column, null, put); > } > } > } > } > return acquired; > } > } > {code} > cc [~apurtell], [~jamestaylor], [~lhofhansl]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17300) Concurrently calling checkAndPut with expected value as null returns true unexpectedly
[ https://issues.apache.org/jira/browse/HBASE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746258#comment-15746258 ] Samarth Jain commented on HBASE-17300: -- Thanks for taking a look, [~apurtell]. You are right the test passes consistently if I remove the table creation code from the Callable. So it does look like there is a race between table creation and and other operations on it. I can try and see if I can adjust the Phoenix upgrade code to not do the table creation. But it does sound like there is some bug lurking here since table creation is a sync operation? > Concurrently calling checkAndPut with expected value as null returns true > unexpectedly > -- > > Key: HBASE-17300 > URL: https://issues.apache.org/jira/browse/HBASE-17300 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.23, 1.2.4 >Reporter: Samarth Jain > Attachments: HBASE-17300.patch > > > Attached is the test case. I have added some comments so hopefully the test > makes sense. It actually is causing test failures on the Phoenix branches. > The test fails consistently using HBase-0.98.23. It exhibits flappy behavior > with the 1.2 branch (failed twice in 5 tries). > {code} > @Test > public void testNullCheckAndPut() throws Exception { > try (HBaseAdmin admin = TEST_UTIL.getHBaseAdmin()) { > Callable c1 = new CheckAndPutCallable(); > Callable c2 = new CheckAndPutCallable(); > ExecutorService e = Executors.newFixedThreadPool(5); > Future f1 = e.submit(c1); > Future f2 = e.submit(c2); > assertTrue(f1.get() || f2.get()); > assertFalse(f1.get() && f2.get()); > } > } > } > > > private static final class CheckAndPutCallable implements > Callable { > @Override > public Boolean call() throws Exception { > byte[] rowToLock = "ROW".getBytes(); > byte[] colFamily = "COLUMN_FAMILY".getBytes(); > byte[] column = "COLUMN".getBytes(); > byte[] newValue = "NEW_VALUE".getBytes(); > byte[] oldValue = "OLD_VALUE".getBytes(); > byte[] tableName = "table".getBytes(); > boolean acquired = false; > try (HBaseAdmin admin = TEST_UTIL.getHBaseAdmin()) { > HTableDescriptor tableDesc = new > HTableDescriptor(TableName.valueOf(tableName)); > HColumnDescriptor columnDesc = new > HColumnDescriptor(colFamily); > columnDesc.setTimeToLive(600); > tableDesc.addFamily(columnDesc); > try { > admin.createTable(tableDesc); > } catch (TableExistsException e) { > // ignore > } > try (HTableInterface table = > admin.getConnection().getTable(tableName)) { > Put put = new Put(rowToLock); > put.add(colFamily, column, oldValue); // add a row > with column set to oldValue > table.put(put); > put = new Put(rowToLock); > put.add(colFamily, column, newValue); > // only one of the threads should be able to get > return value of true for the expected value of oldValue > acquired = table.checkAndPut(rowToLock, colFamily, > column, oldValue, put); > if (!acquired) { >// if a thread didn't get true before, then it > shouldn't get true this time either >// because the column DOES exist >acquired = table.checkAndPut(rowToLock, colFamily, > column, null, put); > } > } > } > } > return acquired; > } > } > {code} > cc [~apurtell], [~jamestaylor], [~lhofhansl]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17300) Concurrently calling checkAndPut with expected value as null returns true unexpectedly
[ https://issues.apache.org/jira/browse/HBASE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744453#comment-15744453 ] Samarth Jain commented on HBASE-17300: -- I changed code in Phoenix land to pass null for existing value which is when I found this bug. The test fails with 0.98.17 too. I went back up to 0.98.15 and the test is failing there too. So looks like this issue has been around for a while. > Concurrently calling checkAndPut with expected value as null returns true > unexpectedly > -- > > Key: HBASE-17300 > URL: https://issues.apache.org/jira/browse/HBASE-17300 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.23, 1.2.4 >Reporter: Samarth Jain > > Attached is the test case. I have added some comments so hopefully the test > makes sense. It actually is causing test failures on the Phoenix branches. > The test fails consistently using HBase-0.98.23. It exhibits flappy behavior > with the 1.2 branch (failed twice in 5 tries). > {code} > @Test > public void testNullCheckAndPut() throws Exception { > try (HBaseAdmin admin = TEST_UTIL.getHBaseAdmin()) { > Callable c1 = new CheckAndPutCallable(); > Callable c2 = new CheckAndPutCallable(); > ExecutorService e = Executors.newFixedThreadPool(5); > Future f1 = e.submit(c1); > Future f2 = e.submit(c2); > assertTrue(f1.get() || f2.get()); > assertFalse(f1.get() && f2.get()); > } > } > } > > > private static final class CheckAndPutCallable implements > Callable { > @Override > public Boolean call() throws Exception { > byte[] rowToLock = "ROW".getBytes(); > byte[] colFamily = "COLUMN_FAMILY".getBytes(); > byte[] column = "COLUMN".getBytes(); > byte[] newValue = "NEW_VALUE".getBytes(); > byte[] oldValue = "OLD_VALUE".getBytes(); > byte[] tableName = "table".getBytes(); > boolean acquired = false; > try (HBaseAdmin admin = TEST_UTIL.getHBaseAdmin()) { > HTableDescriptor tableDesc = new > HTableDescriptor(TableName.valueOf(tableName)); > HColumnDescriptor columnDesc = new > HColumnDescriptor(colFamily); > columnDesc.setTimeToLive(600); > tableDesc.addFamily(columnDesc); > try { > admin.createTable(tableDesc); > } catch (TableExistsException e) { > // ignore > } > try (HTableInterface table = > admin.getConnection().getTable(tableName)) { > Put put = new Put(rowToLock); > put.add(colFamily, column, oldValue); // add a row > with column set to oldValue > table.put(put); > put = new Put(rowToLock); > put.add(colFamily, column, newValue); > // only one of the threads should be able to get > return value of true for the expected value of oldValue > acquired = table.checkAndPut(rowToLock, colFamily, > column, oldValue, put); > if (!acquired) { >// if a thread didn't get true before, then it > shouldn't get true this time either >// because the column DOES exist >acquired = table.checkAndPut(rowToLock, colFamily, > column, null, put); > } > } > } > } > return acquired; > } > } > {code} > cc [~apurtell], [~jamestaylor], [~lhofhansl]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17300) Concurrently calling checkAndPut with expected value as null returns true unexpectedly
[ https://issues.apache.org/jira/browse/HBASE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744438#comment-15744438 ] Samarth Jain commented on HBASE-17300: -- Thanks! Updated the patch to use TEST_UTIL.getHBaseAdmin(). Hopefully this will make it easy to add it in an IT test. > Concurrently calling checkAndPut with expected value as null returns true > unexpectedly > -- > > Key: HBASE-17300 > URL: https://issues.apache.org/jira/browse/HBASE-17300 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.23, 1.2.4 >Reporter: Samarth Jain > > Attached is the test case. I have added some comments so hopefully the test > makes sense. It actually is causing test failures on the Phoenix branches. > The test fails consistently using HBase-0.98.23. It exhibits flappy behavior > with the 1.2 branch (failed twice in 5 tries). > {code} > @Test > public void testNullCheckAndPut() throws Exception { > try (HBaseAdmin admin = TEST_UTIL.getHBaseAdmin()) { > Callable c1 = new CheckAndPutCallable(); > Callable c2 = new CheckAndPutCallable(); > ExecutorService e = Executors.newFixedThreadPool(5); > Future f1 = e.submit(c1); > Future f2 = e.submit(c2); > assertTrue(f1.get() || f2.get()); > assertFalse(f1.get() && f2.get()); > } > } > } > > > private static final class CheckAndPutCallable implements > Callable { > @Override > public Boolean call() throws Exception { > byte[] rowToLock = "ROW".getBytes(); > byte[] colFamily = "COLUMN_FAMILY".getBytes(); > byte[] column = "COLUMN".getBytes(); > byte[] newValue = "NEW_VALUE".getBytes(); > byte[] oldValue = "OLD_VALUE".getBytes(); > byte[] tableName = "table".getBytes(); > boolean acquired = false; > try (HBaseAdmin admin = TEST_UTIL.getHBaseAdmin()) { > HTableDescriptor tableDesc = new > HTableDescriptor(TableName.valueOf(tableName)); > HColumnDescriptor columnDesc = new > HColumnDescriptor(colFamily); > columnDesc.setTimeToLive(600); > tableDesc.addFamily(columnDesc); > try { > admin.createTable(tableDesc); > } catch (TableExistsException e) { > // ignore > } > try (HTableInterface table = > admin.getConnection().getTable(tableName)) { > Put put = new Put(rowToLock); > put.add(colFamily, column, oldValue); // add a row > with column set to oldValue > table.put(put); > put = new Put(rowToLock); > put.add(colFamily, column, newValue); > // only one of the threads should be able to get > return value of true for the expected value of oldValue > acquired = table.checkAndPut(rowToLock, colFamily, > column, oldValue, put); > if (!acquired) { >// if a thread didn't get true before, then it > shouldn't get true this time either >// because the column DOES exist >acquired = table.checkAndPut(rowToLock, colFamily, > column, null, put); > } > } > } > } > return acquired; > } > } > {code} > cc [~apurtell], [~jamestaylor], [~lhofhansl]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17300) Concurrently calling checkAndPut with expected value as null returns true unexpectedly
[ https://issues.apache.org/jira/browse/HBASE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated HBASE-17300: - Description: Attached is the test case. I have added some comments so hopefully the test makes sense. It actually is causing test failures on the Phoenix branches. The test fails consistently using HBase-0.98.23. It exhibits flappy behavior with the 1.2 branch (failed twice in 5 tries). {code} @Test public void testNullCheckAndPut() throws Exception { try (HBaseAdmin admin = TEST_UTIL.getHBaseAdmin()) { Callable c1 = new CheckAndPutCallable(); Callable c2 = new CheckAndPutCallable(); ExecutorService e = Executors.newFixedThreadPool(5); Future f1 = e.submit(c1); Future f2 = e.submit(c2); assertTrue(f1.get() || f2.get()); assertFalse(f1.get() && f2.get()); } } } private static final class CheckAndPutCallable implements Callable { @Override public Boolean call() throws Exception { byte[] rowToLock = "ROW".getBytes(); byte[] colFamily = "COLUMN_FAMILY".getBytes(); byte[] column = "COLUMN".getBytes(); byte[] newValue = "NEW_VALUE".getBytes(); byte[] oldValue = "OLD_VALUE".getBytes(); byte[] tableName = "table".getBytes(); boolean acquired = false; try (HBaseAdmin admin = TEST_UTIL.getHBaseAdmin()) { HTableDescriptor tableDesc = new HTableDescriptor(TableName.valueOf(tableName)); HColumnDescriptor columnDesc = new HColumnDescriptor(colFamily); columnDesc.setTimeToLive(600); tableDesc.addFamily(columnDesc); try { admin.createTable(tableDesc); } catch (TableExistsException e) { // ignore } try (HTableInterface table = admin.getConnection().getTable(tableName)) { Put put = new Put(rowToLock); put.add(colFamily, column, oldValue); // add a row with column set to oldValue table.put(put); put = new Put(rowToLock); put.add(colFamily, column, newValue); // only one of the threads should be able to get return value of true for the expected value of oldValue acquired = table.checkAndPut(rowToLock, colFamily, column, oldValue, put); if (!acquired) { // if a thread didn't get true before, then it shouldn't get true this time either // because the column DOES exist acquired = table.checkAndPut(rowToLock, colFamily, column, null, put); } } } } return acquired; } } {code} cc [~apurtell], [~jamestaylor], [~lhofhansl]. was: Attached is the test case. I have added some comments so hopefully the test makes sense. It actually is causing test failures on the Phoenix branches. PS - I am using a bit of Phoenix API to get hold of HBaseAdmin. But it should be fairly straightforward to adopt it for HBase IT tests. The test fails consistently using HBase-0.98.23. It exhibits flappy behavior with the 1.2 branch (failed twice in 5 tries). {code} @Test public void testNullCheckAndPut() throws Exception { try (Connection conn = DriverManager.getConnection(getUrl())) { try (HBaseAdmin admin = conn.unwrap(PhoenixConnection.class).getQueryServices().getAdmin()) { Callable c1 = new CheckAndPutCallable(); Callable c2 = new CheckAndPutCallable(); ExecutorService e = Executors.newFixedThreadPool(5); Future f1 = e.submit(c1); Future f2 = e.submit(c2); assertTrue(f1.get() || f2.get()); assertFalse(f1.get() && f2.get()); } } } private static final class CheckAndPutCallable implements Callable { @Override public Boolean call() throws Exception { byte[] rowToLock = "ROW".getBytes(); byte[] colFamily = "COLUMN_FAMILY".getBytes(); byte[] column = "COLUMN".getBytes(); byte[] newValue = "NEW_VALUE".getBytes(); byte[] oldValue = "OLD_VALUE".getBytes(); byte[] tableName = "table".getBytes(); boolean acquired = false; try (Connection conn = DriverManager.getConnection(getUrl())) { try (HBaseAdmin admin = conn.unwrap(PhoenixConnection.class).getQueryServices().getAdmin()) {
[jira] [Created] (HBASE-17300) Concurrently calling checkAndPut with expected value as null returns true unexpectedly
Samarth Jain created HBASE-17300: Summary: Concurrently calling checkAndPut with expected value as null returns true unexpectedly Key: HBASE-17300 URL: https://issues.apache.org/jira/browse/HBASE-17300 Project: HBase Issue Type: Bug Reporter: Samarth Jain Attached is the test case. I have added some comments so hopefully the test makes sense. It actually is causing test failures on the Phoenix branches. PS - I am using a bit of Phoenix API to get hold of HBaseAdmin. But it should be fairly straightforward to adopt it for HBase IT tests. The test fails consistently using HBase-0.98.23. It exhibits flappy behavior with the 1.2 branch (failed twice in 5 tries). {code} @Test public void testNullCheckAndPut() throws Exception { try (Connection conn = DriverManager.getConnection(getUrl())) { try (HBaseAdmin admin = conn.unwrap(PhoenixConnection.class).getQueryServices().getAdmin()) { Callable c1 = new CheckAndPutCallable(); Callable c2 = new CheckAndPutCallable(); ExecutorService e = Executors.newFixedThreadPool(5); Future f1 = e.submit(c1); Future f2 = e.submit(c2); assertTrue(f1.get() || f2.get()); assertFalse(f1.get() && f2.get()); } } } private static final class CheckAndPutCallable implements Callable { @Override public Boolean call() throws Exception { byte[] rowToLock = "ROW".getBytes(); byte[] colFamily = "COLUMN_FAMILY".getBytes(); byte[] column = "COLUMN".getBytes(); byte[] newValue = "NEW_VALUE".getBytes(); byte[] oldValue = "OLD_VALUE".getBytes(); byte[] tableName = "table".getBytes(); boolean acquired = false; try (Connection conn = DriverManager.getConnection(getUrl())) { try (HBaseAdmin admin = conn.unwrap(PhoenixConnection.class).getQueryServices().getAdmin()) { HTableDescriptor tableDesc = new HTableDescriptor(TableName.valueOf(tableName)); HColumnDescriptor columnDesc = new HColumnDescriptor(colFamily); columnDesc.setTimeToLive(600); tableDesc.addFamily(columnDesc); try { admin.createTable(tableDesc); } catch (TableExistsException e) { // ignore } try (HTableInterface table = admin.getConnection().getTable(tableName)) { Put put = new Put(rowToLock); put.add(colFamily, column, oldValue); // add a row with column set to oldValue table.put(put); put = new Put(rowToLock); put.add(colFamily, column, newValue); // only one of the threads should be able to get return value of true for the expected value of oldValue acquired = table.checkAndPut(rowToLock, colFamily, column, oldValue, put); if (!acquired) { // if a thread didn't get true before, then it shouldn't get true this time either // because the column DOES exist acquired = table.checkAndPut(rowToLock, colFamily, column, null, put); } } } } return acquired; } } {code} cc [~apurtell], [~jamestaylor], [~lhofhansl]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17122) Change in behavior when creating a scanner for a disabled table
[ https://issues.apache.org/jira/browse/HBASE-17122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15684239#comment-15684239 ] Samarth Jain commented on HBASE-17122: -- Yes, the problem is with just 0.98. [~stack] - the behavior changed starting 0.98.21. [~apurtell], I had to add a really hacky work around in Phoenix to deal with this bug. But if the plan is to freeze 0.98 branch soon, then I am OK with not having to fix this. > Change in behavior when creating a scanner for a disabled table > --- > > Key: HBASE-17122 > URL: https://issues.apache.org/jira/browse/HBASE-17122 > Project: HBase > Issue Type: Bug >Reporter: Samarth Jain > > {code} > @Test > public void testQueryingDisabledTable() throws Exception { > try (Connection conn = DriverManager.getConnection(getUrl())) { > String tableName = generateUniqueName(); > conn.createStatement().execute( > "CREATE TABLE " + tableName > + " (k1 VARCHAR NOT NULL, k2 VARCHAR, CONSTRAINT PK > PRIMARY KEY(K1,K2)) "); > try (HBaseAdmin admin = > conn.unwrap(PhoenixConnection.class).getQueryServices().getAdmin()) { > admin.disableTable(Bytes.toBytes(tableName)); > } > String query = "SELECT * FROM " + tableName + " WHERE 1=1"; > try (Connection conn2 = DriverManager.getConnection(getUrl())) { > try (ResultSet rs = > conn2.createStatement().executeQuery(query)) { > assertFalse(rs.next()); > } > } > } > } > {code} > This is a phoenix specific test case. I will try an come up with something > using the HBase API. But the gist is that with HBase 0.98.21 and beyond, we > are seeing that creating a scanner is throwing a NotServingRegionException. > Stacktrace for NotServingRegionException > {code} > org.apache.phoenix.exception.PhoenixIOException: > org.apache.phoenix.exception.PhoenixIOException: callTimeout=120, > callDuration=9000104: row '' on table 'T01' at > region=T01,,1479429739864.643dde31cc19b549192576eea7791a6f., > hostname=localhost,60022,1479429692090, seqNum=1 > at > org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:113) > at > org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:752) > at > org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:696) > at > org.apache.phoenix.iterate.ConcatResultIterator.getIterators(ConcatResultIterator.java:50) > at > org.apache.phoenix.iterate.ConcatResultIterator.currentIterator(ConcatResultIterator.java:97) > at > org.apache.phoenix.iterate.ConcatResultIterator.next(ConcatResultIterator.java:117) > at > org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:778) > at > org.apache.phoenix.end2end.PhoenixRuntimeIT.testQueryingDisabledTable(PhoenixRuntimeIT.java:167) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) > at >
[jira] [Commented] (HBASE-17122) Change in behavior when creating a scanner for a disabled table
[ https://issues.apache.org/jira/browse/HBASE-17122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675309#comment-15675309 ] Samarth Jain commented on HBASE-17122: -- [~gjacoby] > Change in behavior when creating a scanner for a disabled table > --- > > Key: HBASE-17122 > URL: https://issues.apache.org/jira/browse/HBASE-17122 > Project: HBase > Issue Type: Bug >Reporter: Samarth Jain > > {code} > @Test > public void testQueryingDisabledTable() throws Exception { > try (Connection conn = DriverManager.getConnection(getUrl())) { > String tableName = generateUniqueName(); > conn.createStatement().execute( > "CREATE TABLE " + tableName > + " (k1 VARCHAR NOT NULL, k2 VARCHAR, CONSTRAINT PK > PRIMARY KEY(K1,K2)) "); > try (HBaseAdmin admin = > conn.unwrap(PhoenixConnection.class).getQueryServices().getAdmin()) { > admin.disableTable(Bytes.toBytes(tableName)); > } > String query = "SELECT * FROM " + tableName + " WHERE 1=1"; > try (Connection conn2 = DriverManager.getConnection(getUrl())) { > try (ResultSet rs = > conn2.createStatement().executeQuery(query)) { > assertFalse(rs.next()); > } > } > } > } > {code} > This is a phoenix specific test case. I will try an come up with something > using the HBase API. But the gist is that with HBase 0.98.21 and beyond, we > are seeing that creating a scanner is throwing a NotServingRegionException. > Stacktrace for NotServingRegionException > {code} > org.apache.phoenix.exception.PhoenixIOException: > org.apache.phoenix.exception.PhoenixIOException: callTimeout=120, > callDuration=9000104: row '' on table 'T01' at > region=T01,,1479429739864.643dde31cc19b549192576eea7791a6f., > hostname=localhost,60022,1479429692090, seqNum=1 > at > org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:113) > at > org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:752) > at > org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:696) > at > org.apache.phoenix.iterate.ConcatResultIterator.getIterators(ConcatResultIterator.java:50) > at > org.apache.phoenix.iterate.ConcatResultIterator.currentIterator(ConcatResultIterator.java:97) > at > org.apache.phoenix.iterate.ConcatResultIterator.next(ConcatResultIterator.java:117) > at > org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:778) > at > org.apache.phoenix.end2end.PhoenixRuntimeIT.testQueryingDisabledTable(PhoenixRuntimeIT.java:167) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) > at > org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) >
[jira] [Created] (HBASE-17122) Change in behavior when creating a scanner for a disabled table
Samarth Jain created HBASE-17122: Summary: Change in behavior when creating a scanner for a disabled table Key: HBASE-17122 URL: https://issues.apache.org/jira/browse/HBASE-17122 Project: HBase Issue Type: Bug Reporter: Samarth Jain {code} @Test public void testQueryingDisabledTable() throws Exception { try (Connection conn = DriverManager.getConnection(getUrl())) { String tableName = generateUniqueName(); conn.createStatement().execute( "CREATE TABLE " + tableName + " (k1 VARCHAR NOT NULL, k2 VARCHAR, CONSTRAINT PK PRIMARY KEY(K1,K2)) "); try (HBaseAdmin admin = conn.unwrap(PhoenixConnection.class).getQueryServices().getAdmin()) { admin.disableTable(Bytes.toBytes(tableName)); } String query = "SELECT * FROM " + tableName + " WHERE 1=1"; try (Connection conn2 = DriverManager.getConnection(getUrl())) { try (ResultSet rs = conn2.createStatement().executeQuery(query)) { assertFalse(rs.next()); } } } } {code} This is a phoenix specific test case. I will try an come up with something using the HBase API. But the gist is that with HBase 0.98.21 and beyond, we are seeing that creating a scanner is throwing a NotServingRegionException. Stacktrace for NotServingRegionException {code} org.apache.phoenix.exception.PhoenixIOException: org.apache.phoenix.exception.PhoenixIOException: callTimeout=120, callDuration=9000104: row '' on table 'T01' at region=T01,,1479429739864.643dde31cc19b549192576eea7791a6f., hostname=localhost,60022,1479429692090, seqNum=1 at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:113) at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:752) at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:696) at org.apache.phoenix.iterate.ConcatResultIterator.getIterators(ConcatResultIterator.java:50) at org.apache.phoenix.iterate.ConcatResultIterator.currentIterator(ConcatResultIterator.java:97) at org.apache.phoenix.iterate.ConcatResultIterator.next(ConcatResultIterator.java:117) at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:778) at org.apache.phoenix.end2end.PhoenixRuntimeIT.testQueryingDisabledTable(PhoenixRuntimeIT.java:167) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: java.util.concurrent.ExecutionException: org.apache.phoenix.exception.PhoenixIOException: callTimeout=120,
[jira] [Updated] (HBASE-17096) checkAndMutateApi doesn't work correctly on 0.98.19+
[ https://issues.apache.org/jira/browse/HBASE-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated HBASE-17096: - Summary: checkAndMutateApi doesn't work correctly on 0.98.19+ (was: checkAndMutateApi doesn't work correctly on 0.98.23) > checkAndMutateApi doesn't work correctly on 0.98.19+ > > > Key: HBASE-17096 > URL: https://issues.apache.org/jira/browse/HBASE-17096 > Project: HBase > Issue Type: Bug >Reporter: Samarth Jain > > Below is the test case. It uses some Phoenix APIs for getting hold of admin > and HConnection but should be easily adopted for an HBase IT test. The second > checkAndMutate should return false but it is returning true. This test fails > with HBase-0.98.23 and works fine with HBase-0.98.17 > {code} > @Test > public void testCheckAndMutateApi() throws Exception { > byte[] row = Bytes.toBytes("ROW"); > byte[] tableNameBytes = Bytes.toBytes(generateUniqueName()); > byte[] family = Bytes.toBytes(generateUniqueName()); > byte[] qualifier = Bytes.toBytes("QUALIFIER"); > byte[] oldValue = null; > byte[] newValue = Bytes.toBytes("VALUE"); > Put put = new Put(row); > put.add(family, qualifier, newValue); > try (Connection conn = DriverManager.getConnection(getUrl())) { > PhoenixConnection phxConn = conn.unwrap(PhoenixConnection.class); > try (HBaseAdmin admin = phxConn.getQueryServices().getAdmin()) { > HTableDescriptor tableDesc = new HTableDescriptor( > TableName.valueOf(tableNameBytes)); > HColumnDescriptor columnDesc = new HColumnDescriptor(family); > columnDesc.setTimeToLive(120); > tableDesc.addFamily(columnDesc); > admin.createTable(tableDesc); > HTableInterface tableDescriptor = > admin.getConnection().getTable(tableNameBytes); > assertTrue(tableDescriptor.checkAndPut(row, family, > qualifier, oldValue, put)); > Delete delete = new Delete(row); > RowMutations mutations = new RowMutations(row); > mutations.add(delete); > assertTrue(tableDescriptor.checkAndMutate(row, family, > qualifier, CompareOp.EQUAL, newValue, mutations)); > assertFalse(tableDescriptor.checkAndMutate(row, family, > qualifier, CompareOp.EQUAL, newValue, mutations)); > } > } > } > {code} > FYI, [~apurtell], [~jamestaylor], [~lhofhansl]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17096) checkAndMutateApi doesn't work correctly on 0.98.23
[ https://issues.apache.org/jira/browse/HBASE-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666087#comment-15666087 ] Samarth Jain commented on HBASE-17096: -- If it helps, the test fails starting with HBase 0.98.19. I will update the title accordingly. > checkAndMutateApi doesn't work correctly on 0.98.23 > --- > > Key: HBASE-17096 > URL: https://issues.apache.org/jira/browse/HBASE-17096 > Project: HBase > Issue Type: Bug >Reporter: Samarth Jain > > Below is the test case. It uses some Phoenix APIs for getting hold of admin > and HConnection but should be easily adopted for an HBase IT test. The second > checkAndMutate should return false but it is returning true. This test fails > with HBase-0.98.23 and works fine with HBase-0.98.17 > {code} > @Test > public void testCheckAndMutateApi() throws Exception { > byte[] row = Bytes.toBytes("ROW"); > byte[] tableNameBytes = Bytes.toBytes(generateUniqueName()); > byte[] family = Bytes.toBytes(generateUniqueName()); > byte[] qualifier = Bytes.toBytes("QUALIFIER"); > byte[] oldValue = null; > byte[] newValue = Bytes.toBytes("VALUE"); > Put put = new Put(row); > put.add(family, qualifier, newValue); > try (Connection conn = DriverManager.getConnection(getUrl())) { > PhoenixConnection phxConn = conn.unwrap(PhoenixConnection.class); > try (HBaseAdmin admin = phxConn.getQueryServices().getAdmin()) { > HTableDescriptor tableDesc = new HTableDescriptor( > TableName.valueOf(tableNameBytes)); > HColumnDescriptor columnDesc = new HColumnDescriptor(family); > columnDesc.setTimeToLive(120); > tableDesc.addFamily(columnDesc); > admin.createTable(tableDesc); > HTableInterface tableDescriptor = > admin.getConnection().getTable(tableNameBytes); > assertTrue(tableDescriptor.checkAndPut(row, family, > qualifier, oldValue, put)); > Delete delete = new Delete(row); > RowMutations mutations = new RowMutations(row); > mutations.add(delete); > assertTrue(tableDescriptor.checkAndMutate(row, family, > qualifier, CompareOp.EQUAL, newValue, mutations)); > assertFalse(tableDescriptor.checkAndMutate(row, family, > qualifier, CompareOp.EQUAL, newValue, mutations)); > } > } > } > {code} > FYI, [~apurtell], [~jamestaylor], [~lhofhansl]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17096) checkAndMutateApi doesn't work correctly on 0.98.23
Samarth Jain created HBASE-17096: Summary: checkAndMutateApi doesn't work correctly on 0.98.23 Key: HBASE-17096 URL: https://issues.apache.org/jira/browse/HBASE-17096 Project: HBase Issue Type: Bug Reporter: Samarth Jain Below is the test case. It uses some Phoenix APIs for getting hold of admin and HConnection but should be easily adopted for an HBase IT test. The second checkAndMutate should return false but it is returning true. This test fails with HBase-0.98.23 and works fine with HBase-0.98.17 {code} @Test public void testCheckAndMutateApi() throws Exception { byte[] row = Bytes.toBytes("ROW"); byte[] tableNameBytes = Bytes.toBytes(generateUniqueName()); byte[] family = Bytes.toBytes(generateUniqueName()); byte[] qualifier = Bytes.toBytes("QUALIFIER"); byte[] oldValue = null; byte[] newValue = Bytes.toBytes("VALUE"); Put put = new Put(row); put.add(family, qualifier, newValue); try (Connection conn = DriverManager.getConnection(getUrl())) { PhoenixConnection phxConn = conn.unwrap(PhoenixConnection.class); try (HBaseAdmin admin = phxConn.getQueryServices().getAdmin()) { HTableDescriptor tableDesc = new HTableDescriptor( TableName.valueOf(tableNameBytes)); HColumnDescriptor columnDesc = new HColumnDescriptor(family); columnDesc.setTimeToLive(120); tableDesc.addFamily(columnDesc); admin.createTable(tableDesc); HTableInterface tableDescriptor = admin.getConnection().getTable(tableNameBytes); assertTrue(tableDescriptor.checkAndPut(row, family, qualifier, oldValue, put)); Delete delete = new Delete(row); RowMutations mutations = new RowMutations(row); mutations.add(delete); assertTrue(tableDescriptor.checkAndMutate(row, family, qualifier, CompareOp.EQUAL, newValue, mutations)); assertFalse(tableDescriptor.checkAndMutate(row, family, qualifier, CompareOp.EQUAL, newValue, mutations)); } } } {code} FYI, [~apurtell], [~jamestaylor], [~lhofhansl]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14822) Renewing leases of scanners doesn't work
[ https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15071147#comment-15071147 ] Samarth Jain commented on HBASE-14822: -- [~lhofhansl] - to be clear, I did test the patch. Phoenix queries which would have failed with lease timeout exceptions are now passing. So functionally your patch works. However, the patch ended up causing an inadvertent performance regression. Calling renew lease ends up increasing the nextCallSeq too. The subsequent OutOfOrderScannerNextException thrown is handled silently (once) by the ClientScanner#loadCache code which ends up setting the callable object to null. {code} if (e instanceof OutOfOrderScannerNextException) { if (retryAfterOutOfOrderException) { retryAfterOutOfOrderException = false; } else { // TODO: Why wrap this in a DNRIOE when it already is a DNRIOE? throw new DoNotRetryIOException("Failed after retry of " + "OutOfOrderScannerNextException: was there a rpc timeout?", e); } } // Clear region. this.currentRegion = null; {code} I am calling renewLease and scanner.next() using the same ClientScanner in different threads. However, I have proper synchronization in place that makes sure I am not calling both at the same time. It doesn't seem like a concurrency issue as I can reproduce this behavior consistently. > Renewing leases of scanners doesn't work > > > Key: HBASE-14822 > URL: https://issues.apache.org/jira/browse/HBASE-14822 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14 >Reporter: Samarth Jain >Assignee: Lars Hofhansl > Fix For: 2.0.0, 1.3.0, 1.2.1, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98.txt, > 14822-v3-0.98.txt, 14822-v4-0.98.txt, 14822-v4.txt, 14822-v5-0.98.txt, > 14822-v5-1.3.txt, 14822-v5.txt, 14822.txt, HBASE-14822_98_nextseq.diff > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14822) Renewing leases of scanners doesn't work
[ https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated HBASE-14822: - Attachment: HBASE-14822_98_nextseq.diff [~lhofhansl] - I tried out the latest on 0.98, and looks like there are some more issues lurking with lease renewal. I noticed that on the region server side I was still getting the following message even though I made sure Phoenix was calling renewLease() for the scanners. INFO [RS:0;localhost:55383.leaseChecker] org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener(2633): Scanner 59 lease expired on region After a bit of digging around, it turns out that the lease renewal is actually causing the regular scan() to fail and vice-versa. This is because renewLease ends up also increasing the nextCallSeq member variable in the ScannerCallable object. There are checks in place in the HRegionServer class that causes an OutOfOrderScannerNextException to be thrown because the nextSeq didn't match. See this stacktrace: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 2 But the nextCallSeq got from client: 10; request=scanner_id: 56 number_of_rows: 2 close_scanner: false next_call_seq: 10 renew: false at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3277) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31190) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2149) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) at java.lang.Thread.run(Thread.java:745) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:298) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:216) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:58) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:115) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:91) at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:387) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:340) at org.apache.phoenix.iterate.ScanningResultIterator.next(ScanningResultIterator.java:57) at org.apache.phoenix.iterate.TableResultIterator.next(TableResultIterator.java:112) at In this case the number of times renewLease() was called was 8 which also happens to be the difference between the expected nextCallSeq (2) and the actual nextCallSeq(10). This error isn't surfaced to the clients though because the HBase client ends up creating a new scanner altogether behind the scenes. One possible simple fix (in the attached patch) would be to not increment the nextCallSeq when renewing lease. FWIW, after this change, I no longer see the OutOfOrderScannerNextException and INFO message about scanner lease expiration is also gone. > Renewing leases of scanners doesn't work > > > Key: HBASE-14822 > URL: https://issues.apache.org/jira/browse/HBASE-14822 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14 >Reporter: Samarth Jain >Assignee: Lars Hofhansl > Fix For: 2.0.0, 1.3.0, 1.2.1, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98.txt, > 14822-v3-0.98.txt, 14822-v4-0.98.txt, 14822-v4.txt, 14822-v5-0.98.txt, > 14822-v5-1.3.txt, 14822-v5.txt, 14822.txt, HBASE-14822_98_nextseq.diff > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14822) Renewing leases of scanners doesn't work
[ https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014275#comment-15014275 ] Samarth Jain commented on HBASE-14822: -- The latest patch looks good [~lhofhansl]. I no longer see UnknownScannerException in the logs. > Renewing leases of scanners doesn't work > > > Key: HBASE-14822 > URL: https://issues.apache.org/jira/browse/HBASE-14822 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14 >Reporter: Samarth Jain >Assignee: Lars Hofhansl > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14822) Renewing leases of scanners doesn't work
[ https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated HBASE-14822: - Attachment: 14822-0.98-v4.txt Patch with checkstyle errors fixed. The diff looks more complex than it really is because of indentation changes. I have added the below check on top of Lar's patch in HRegionServer.java : {code} + if (rows > 0) { // Limit the initial allocation of the result array to the minimum // of 'rows' or 100 {code} > Renewing leases of scanners doesn't work > > > Key: HBASE-14822 > URL: https://issues.apache.org/jira/browse/HBASE-14822 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14 >Reporter: Samarth Jain >Assignee: Lars Hofhansl > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98-v4.txt, > 14822-0.98.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14822) Renewing leases of scanners doesn't work
[ https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated HBASE-14822: - Attachment: 14822-0.98-v3.txt Looks like with with the changes made in v2 patch, scanners were getting removed because they were not returning any rows (which they are not supposed to). Attached patch fixes that. > Renewing leases of scanners doesn't work > > > Key: HBASE-14822 > URL: https://issues.apache.org/jira/browse/HBASE-14822 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14 >Reporter: Samarth Jain >Assignee: Lars Hofhansl > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14822) Renewing leases of scanners doesn't work
[ https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010544#comment-15010544 ] Samarth Jain commented on HBASE-14822: -- Getting the same issue with the patch applied on latest of 0.98 branch. Relevant part of stacktrace: Caused by: org.apache.hadoop.hbase.client.ScannerTimeoutException: 60073ms passed since the last invocation, timeout is currently set to 6 at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:338) at org.apache.phoenix.iterate.ScanningResultIterator.next(ScanningResultIterator.java:55) ... 12 more Caused by: org.apache.hadoop.hbase.UnknownScannerException: org.apache.hadoop.hbase.UnknownScannerException: Name: 37176, already closed? at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3222) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31068) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) at java.lang.Thread.run(Thread.java:745) at sun.reflect.GeneratedConstructorAccessor59.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:298) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:214) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:58) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:115) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:91) at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:385) ... 14 more Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.UnknownScannerException): org.apache.hadoop.hbase.UnknownScannerException: Name: 37176, already closed? at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3222) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31068) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) at java.lang.Thread.run(Thread.java:745) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1489) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1691) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1750) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:31514) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:173) ... 18 more > Renewing leases of scanners doesn't work > > > Key: HBASE-14822 > URL: https://issues.apache.org/jira/browse/HBASE-14822 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14 >Reporter: Samarth Jain >Assignee: Lars Hofhansl > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14822-0.98-v2.txt, 14822-0.98.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14822) Renewing leases of scanners doesn't work
[ https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012082#comment-15012082 ] Samarth Jain commented on HBASE-14822: -- Just to be clear, I ran into UnknownScannerException *without* the client calling renewLease on the scanners. So it was the original phoenix code with the patched HBase 0.98. > Renewing leases of scanners doesn't work > > > Key: HBASE-14822 > URL: https://issues.apache.org/jira/browse/HBASE-14822 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14 >Reporter: Samarth Jain >Assignee: Lars Hofhansl > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98-v4.txt, > 14822-0.98.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14822) Renewing leases of scanners doesn't work
[ https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010426#comment-15010426 ] Samarth Jain commented on HBASE-14822: -- After applying the patch that [~lhofhansl] uploaded on 0.98.14 and not 0.98.17, I am seeing a lot of org.apache.hadoop.hbase.regionserver.HRegionServer(3220): Client tried to access missing scanner messages on the console when Phoenix is trying to create the SYSTEM.CATALOG table. The table creation eventually fails with the error: org.apache.phoenix.exception.PhoenixIOException: org.apache.phoenix.exception.PhoenixIOException: 60001ms passed since the last invocation, timeout is currently set to 6 Things go back to normal after I revert the patch. I will recheck with 0.98.17 too just to be sure. > Renewing leases of scanners doesn't work > > > Key: HBASE-14822 > URL: https://issues.apache.org/jira/browse/HBASE-14822 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14 >Reporter: Samarth Jain >Assignee: Lars Hofhansl > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: 14822-0.98-v2.txt, 14822-0.98.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14822) Renewing leases of scanners doesn't work
Samarth Jain created HBASE-14822: Summary: Renewing leases of scanners doesn't work Key: HBASE-14822 URL: https://issues.apache.org/jira/browse/HBASE-14822 Project: HBase Issue Type: Bug Affects Versions: 0.98.14 Reporter: Samarth Jain Assignee: Lars Hofhansl -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12911) Client-side metrics
[ https://issues.apache.org/jira/browse/HBASE-12911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735612#comment-14735612 ] Samarth Jain commented on HBASE-12911: -- Thanks for pointing to the Phoenix jira, [~ndimiduk]! Summary of what we discussed via email: As part of its metrics collection, Phoenix provides some metrics around the scans/puts it executes. For example - number of bytes read/sent over to hbase. If we could surface these additional metrics that reflect the work done by client initiated scans and puts, then it would be a really nice addition to the metrics that Phoenix reports at a statement and global client level. ScanMetrics class looks like a perfect fit for this. But it doesn't look like that ScanMetrics are exposed via a client API currently. Maybe this work fits HBASE-14381 better? > Client-side metrics > --- > > Key: HBASE-12911 > URL: https://issues.apache.org/jira/browse/HBASE-12911 > Project: HBase > Issue Type: New Feature > Components: Client, Operability, Performance >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 2.0.0, 1.3.0 > > Attachments: 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, am.jpg, client metrics > RS-Master.jpg, client metrics client.jpg, conn_agg.jpg, connection > attributes.jpg, ltt.jpg, standalone.jpg > > > There's very little visibility into the hbase client. Folks who care to add > some kind of metrics collection end up wrapping Table method invocations with > {{System.currentTimeMillis()}}. For a crude example of this, have a look at > what I did in {{PerformanceEvaluation}} for exposing requests latencies up to > {{IntegrationTestRegionReplicaPerf}}. The client is quite complex, there's a > lot going on under the hood that is impossible to see right now without a > profiler. Being a crucial part of the performance of this distributed system, > we should have deeper visibility into the client's function. > I'm not sure that wiring into the hadoop metrics system is the right choice > because the client is often embedded as a library in a user's application. We > should have integration with our metrics tools so that, i.e., a client > embedded in a coprocessor can report metrics through the usual RS channels, > or a client used in a MR job can do the same. > I would propose an interface-based system with pluggable implementations. Out > of the box we'd include a hadoop-metrics implementation and one other, > possibly [dropwizard/metrics|https://github.com/dropwizard/metrics]. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)