[jira] [Created] (HIVE-23726) Create table may throw MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string)
Istvan Fajth created HIVE-23726: --- Summary: Create table may throw MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string) Key: HIVE-23726 URL: https://issues.apache.org/jira/browse/HIVE-23726 Project: Hive Issue Type: Bug Reporter: Istvan Fajth - Given: metastore.warehouse.tenant.colocation is set to true a test database was created as {{create database test location '/data'}} - When: I try to create a table as {{create table t1 (a int) location '/data/t1'}} - Then: The create table fails with the following exception: {code} org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1138) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1143) at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:148) at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98) at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hive.metastore.api.MetaException: java.lang.IllegalArgumentException: Can not create a Path from a null string at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63325) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63293) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result.read(ThriftHiveMetastore.java:63219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_req(ThriftHiveMetastore.java:1780) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_req(ThriftHiveMetastore.java:1767) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:3518) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:145) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:1052) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:1037) at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at
[jira] [Created] (HIVE-21265) Hive miss-uses HBase HConnection object and that puts high load on Zookeeper
Istvan Fajth created HIVE-21265: --- Summary: Hive miss-uses HBase HConnection object and that puts high load on Zookeeper Key: HIVE-21265 URL: https://issues.apache.org/jira/browse/HIVE-21265 Project: Hive Issue Type: Bug Components: HBase Handler Reporter: Istvan Fajth When there is a table in Hive backed by an HBase table, then the following access pattern is shown multiple times in Zookeeper even for a simple query like "SELECT * FROM table": - A client is connecting to Zookeeper - Checks whether the /hbase ZNode exists - Reads /hbase/hbaseid - Client closes the connection. The amount of these accesses are depending on the amount of data most likely it is correlating to the number of HBase regions. The same access pattern one can see in ZK when one runs the following Java code: {code}import org.apache.hadoop.hbase.client.*; public class Test { public static void main(String args[]) throws Exception { Connection c = ConnectionFactory.createConnection(); c.close(); } }{code} The problem with this is that for large tables this creates an enormous amount of session creation which is expensive in ZK, and if the amount of queries to this table is high, then the ZK transaction log is heavily written, and there are way more snapshots created then otherwise due to the amount of createSession closeSession transaction in Zookeeper. In this particular case the Zookeeper data directory was filled with about 24GB of data and caused the device to almost fill under the Zookeeper data directory. ~90% of the data written was createSession and closeSession transactions. I am not sure what logs I should provide, but reproducing the behaviour is easy enough. In Zookeeper if one enables DEBUG level logging, the logs are showing what is being read by sessions. These sessions live for 1-5ms tops. I imagine that the solution is to somehow share the connection object between the mappers if possible, and use one connection according to the suggestion in the API documentation of ConnectionFactory and request table/admin/any object from the one connection, or at least use only one connection object per map/reduce, and make it a longer living connection that is there for the whole map/reduce lifetime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)