[jira] [Created] (HIVE-23726) Create table may throw MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string)

2020-06-19 Thread Istvan Fajth (Jira)
Istvan Fajth created HIVE-23726:
---

 Summary: Create table may throw 
MetaException(message:java.lang.IllegalArgumentException: Can not create a Path 
from a null string)
 Key: HIVE-23726
 URL: https://issues.apache.org/jira/browse/HIVE-23726
 Project: Hive
  Issue Type: Bug
Reporter: Istvan Fajth


- Given:
 metastore.warehouse.tenant.colocation is set to true
 a test database was created as {{create database test location '/data'}}
 - When:
 I try to create a table as {{create table t1 (a int) location '/data/t1'}}
 - Then:
The create table fails with the following exception:
{code}
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:java.lang.IllegalArgumentException: Can not create a Path 
from a null string)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1138)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1143)
at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:148)
at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98)
at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359)
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.metastore.api.MetaException: 
java.lang.IllegalArgumentException: Can not create a Path from a null string
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63325)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63293)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result.read(ThriftHiveMetastore.java:63219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_req(ThriftHiveMetastore.java:1780)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_req(ThriftHiveMetastore.java:1767)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:3518)
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:145)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:1052)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:1037)
at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 

[jira] [Created] (HIVE-21265) Hive miss-uses HBase HConnection object and that puts high load on Zookeeper

2019-02-13 Thread Istvan Fajth (JIRA)
Istvan Fajth created HIVE-21265:
---

 Summary: Hive miss-uses HBase HConnection object and that puts 
high load on Zookeeper
 Key: HIVE-21265
 URL: https://issues.apache.org/jira/browse/HIVE-21265
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Reporter: Istvan Fajth


When there is a table in Hive backed by an HBase table, then the following 
access pattern is shown multiple times in Zookeeper even for a simple query 
like "SELECT * FROM table":
- A client is connecting to Zookeeper
- Checks whether the /hbase ZNode exists
- Reads /hbase/hbaseid
- Client closes the connection.

The amount of these accesses are depending on the amount of data most likely it 
is correlating to the number of HBase regions.

The same access pattern one can see in ZK when one runs the following Java code:
{code}import org.apache.hadoop.hbase.client.*;
public class Test {
public static void main(String args[]) throws Exception {
Connection c = ConnectionFactory.createConnection();
c.close();
}
}{code}

The problem with this is that for large tables this creates an enormous amount 
of session creation which is expensive in ZK, and if the amount of queries to 
this table is high, then the ZK transaction log is heavily written, and there 
are way more snapshots created then otherwise due to the amount of 
createSession closeSession transaction in Zookeeper. In this particular case 
the Zookeeper data directory was filled with about 24GB of data and caused the 
device to almost fill under the Zookeeper data directory. ~90% of the data 
written was createSession and closeSession transactions.

I am not sure what logs I should provide, but reproducing the behaviour is easy 
enough. In Zookeeper if one enables DEBUG level logging, the logs are showing 
what is being read by sessions. These sessions live for 1-5ms tops.

I imagine that the solution is to somehow share the connection object between 
the mappers if possible, and use one connection according to the suggestion in 
the API documentation of ConnectionFactory and request table/admin/any object 
from the one connection, or at least use only one connection object per 
map/reduce, and make it a longer living connection that is there for the whole 
map/reduce lifetime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)