vansimonsen opened a new issue #2797:
URL: https://github.com/apache/hudi/issues/2797


   **Describe the problem you faced**
   
   * Issue trying to create unpartitioned tables to hive metastore (in aws glue 
data catalog) using hudi (Tested on `0.6.0`, `0.7.0` and `0.8.0` )
   * Using hudi on AWS EMR, with pyspark
   
    * Hudi config for unpartitioned tables
    ```
   hudiConfig = {
       "hoodie.datasource.write.precombine.field": <column>,
       "hoodie.datasource.write.recordkey.field": _PRIMARY_KEY_COLUMN,
       "hoodie.datasource.write.keygenerator.class": 
'org.apache.hudi.keygen.NonpartitionedKeyGenerator',
       "hoodie.datasource.hive_sync.partition_extractor_class": 
'org.apache.hudi.hive.NonPartitionedExtractor',
       "hoodie.datasource.write.hive_style_partitioning": "true",
       "className": "org.apache.hudi",
       "hoodie.datasource.hive_sync.use_jdbc": "false",
       "hoodie.consistency.check.enabled": "true",
       "hoodie.datasource.hive_sync.database": DB_NAME,
       "hoodie.datasource.hive_sync.enable": "true",
       "hoodie.datasource.hive_sync.support_timestamp": "true",
   }
    ```
    
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Run hudi with hive integration
   2. Try to create an unpartitioned table, with config previously specified
   
   **Expected behavior**
   
   The table would be created without throw the exception, without any 
partition or `default` partitionpath
   
   **Environment Description**
   
   * Hudi version : `0.6.0`, `0.7.0` and `0.8.0`
   
   * Spark version :  `2.4.7` 
   
   * Hive version : Aws glue data catalog integration on EMR
   
   * Hadoop version : Amazon Hadoop distribution
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Stacktrace**
   
   
    ```sql
    org.apache.hudi.hive.HoodieHiveSyncException: Failed to get update last 
commit time synced to 20210407181606
        at 
org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:496)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:150)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:355)
        at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:403)
        at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:399)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:399)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:460)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:217)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
        at 
org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83)
        at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94)
        at 
org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
        at 
org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178)
        at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.IllegalArgumentException: Can not create a Path from an 
empty string
        at org.apache.hadoop.fs.Path.checkPathArg(Path.java:168)
        at org.apache.hadoop.fs.Path.<init>(Path.java:180)
        at 
org.apache.hadoop.hive.metastore.Warehouse.getDatabasePath(Warehouse.java:172)
        at 
org.apache.hadoop.hive.metastore.Warehouse.getTablePath(Warehouse.java:184)
        at 
org.apache.hadoop.hive.metastore.Warehouse.getFileStatusesForUnpartitionedTable(Warehouse.java:520)
        at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.updateUnpartitionedTableStatsFast(MetaStoreUtils.java:180)
        at 
com.amazonaws.glue.shims.AwsGlueSparkHiveShims.updateTableStatsFast(AwsGlueSparkHiveShims.java:62)
        at 
com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.alterTable(GlueMetastoreClientDelegate.java:552)
        at 
com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table(AWSCatalogMetastoreClient.java:400)
        at 
com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table(AWSCatalogMetastoreClient.java:385)
        at 
org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:494)
        ... 46 more
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to