[GitHub] [hudi] vansimonsen opened a new issue #2797: [SUPPORT] Can not create a Path from an empty string on unpartitioned table

GitBox Fri, 09 Apr 2021 11:09:57 -0700


vansimonsen opened a new issue #2797:
URL: https://github.com/apache/hudi/issues/2797



   **Describe the problem you faced**
   
   * Issue trying to create unpartitioned tables to hive metastore (in aws glue 
data catalog) using hudi (Tested on `0.6.0`, `0.7.0` and `0.8.0` )
   * Using hudi on AWS EMR, with pyspark
   
    * Hudi config for unpartitioned tables
    ```
   hudiConfig = {
       "hoodie.datasource.write.precombine.field": <column>,
       "hoodie.datasource.write.recordkey.field": _PRIMARY_KEY_COLUMN,
       "hoodie.datasource.write.keygenerator.class": 
'org.apache.hudi.keygen.NonpartitionedKeyGenerator',
       "hoodie.datasource.hive_sync.partition_extractor_class": 
'org.apache.hudi.hive.NonPartitionedExtractor',
       "hoodie.datasource.write.hive_style_partitioning": "true",
       "className": "org.apache.hudi",
       "hoodie.datasource.hive_sync.use_jdbc": "false",
       "hoodie.consistency.check.enabled": "true",
       "hoodie.datasource.hive_sync.database": DB_NAME,
       "hoodie.datasource.hive_sync.enable": "true",
       "hoodie.datasource.hive_sync.support_timestamp": "true",
   }
    ```
    
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Run hudi with hive integration
   2. Try to create an unpartitioned table, with config previously specified
   
   **Expected behavior**
   
   The table would be created without throw the exception, without any 
partition or `default` partitionpath
   
   **Environment Description**
   
   * Hudi version : `0.6.0`, `0.7.0` and `0.8.0`
   
   * Spark version :  `2.4.7` 
   
   * Hive version : Aws glue data catalog integration on EMR
   
   * Hadoop version : Amazon Hadoop distribution
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Stacktrace**
   
   
    ```sql
    org.apache.hudi.hive.HoodieHiveSyncException: Failed to get update last 
commit time synced to 20210407181606
        at 
org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:496)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:150)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:355)
        at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:403)
        at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:399)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:399)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:460)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:217)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
        at 
org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83)
        at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94)
        at 
org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
        at 
org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178)
        at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.IllegalArgumentException: Can not create a Path from an 
empty string
        at org.apache.hadoop.fs.Path.checkPathArg(Path.java:168)
        at org.apache.hadoop.fs.Path.<init>(Path.java:180)
        at 
org.apache.hadoop.hive.metastore.Warehouse.getDatabasePath(Warehouse.java:172)
        at 
org.apache.hadoop.hive.metastore.Warehouse.getTablePath(Warehouse.java:184)
        at 
org.apache.hadoop.hive.metastore.Warehouse.getFileStatusesForUnpartitionedTable(Warehouse.java:520)
        at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.updateUnpartitionedTableStatsFast(MetaStoreUtils.java:180)
        at 
com.amazonaws.glue.shims.AwsGlueSparkHiveShims.updateTableStatsFast(AwsGlueSparkHiveShims.java:62)
        at 
com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.alterTable(GlueMetastoreClientDelegate.java:552)
        at 
com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table(AWSCatalogMetastoreClient.java:400)
        at 
com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table(AWSCatalogMetastoreClient.java:385)
        at 
org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:494)
        ... 46 more
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] vansimonsen opened a new issue #2797: [SUPPORT] Can not create a Path from an empty string on unpartitioned table

Reply via email to