date:20211229

[jira] [Created] (HUDI-3132) Minor fixes for HoodieCatalog

2021-12-29 Thread Danny Chen (Jira)

Danny Chen created HUDI-3132:


 Summary: Minor fixes for HoodieCatalog
 Key: HUDI-3132
 URL: https://issues.apache.org/jira/browse/HUDI-3132
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: Danny Chen
 Fix For: 0.11.0, 0.10.1






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HUDI-2661) java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable.copy

2021-12-29 Thread Yann Byron (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yann Byron reassigned HUDI-2661:


Assignee: Forward Xu  (was: Yann Byron)

> java.lang.NoSuchMethodError: 
> org.apache.spark.sql.catalyst.catalog.CatalogTable.copy
> 
>
> Key: HUDI-2661
> URL: https://issues.apache.org/jira/browse/HUDI-2661
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.10.0
>Reporter: Changjun Zhang
>Assignee: Forward Xu
>Priority: Critical
> Fix For: 0.11.0, 0.10.1
>
> Attachments: image-2021-11-01-21-47-44-538.png, 
> image-2021-11-01-21-48-22-765.png
>
>
> Hudi Integrate with Spark SQL  :
> when I add :
> {code:sh}
> // Some comments here
> spark-sql --conf 
> 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
> --conf 
> 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
> {code}
> to create a table on an existing hudi table: 
> {code:sql}
> create table testdb.tb_hudi_operation_test using hudi 
> location '/tmp/flinkdb/datas/tb_hudi_operation';
> {code}
> then throw Exception :
>  !image-2021-11-01-21-47-44-538.png|thumbnail! 
>  !image-2021-11-01-21-48-22-765.png|thumbnail! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HUDI-1850) Read on table fails if the first write to table failed

2021-12-29 Thread Yann Byron (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yann Byron reassigned HUDI-1850:


Assignee: sivabalan narayanan  (was: Yann Byron)

> Read on table fails if the first write to table failed
> --
>
> Key: HUDI-1850
> URL: https://issues.apache.org/jira/browse/HUDI-1850
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.8.0
>Reporter: Vaibhav Sinha
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: core-flow-ds, pull-request-available, release-blocker, 
> sev:high, spark
> Fix For: 0.11.0, 0.10.1
>
> Attachments: Screenshot 2021-04-24 at 7.53.22 PM.png
>
>
> {code:java}
> ava.util.NoSuchElementException: No value present in Option
>   at org.apache.hudi.common.util.Option.get(Option.java:88) 
> ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromCommitMetadata(TableSchemaResolver.java:215)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:166)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:155)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.MergeOnReadSnapshotRelation.(MergeOnReadSnapshotRelation.scala:65)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:99) 
> ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:63) 
> ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
>  ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326) 
> ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)
>  ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at scala.Option.getOrElse(Option.scala:189) 
> ~[scala-library-2.12.10.jar:?]
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308) 
> ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240) 
> ~[spark-sql_2.12-3.1.1.jar:3.1.1]
> {code}
> The screenshot shows the files that got created before the write had failed.
>  
> !Screenshot 2021-04-24 at 7.53.22 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] vinothchandar commented on pull request #3173: [HUDI-1951] Add bucket hash index, compatible with the hive bucket

2021-12-29 Thread GitBox



vinothchandar commented on pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#issuecomment-1002904375


   ```
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
125.714 s - in org.apache.hudi.integ.command.ITTestHoodieSyncCommand
   
   [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 1, Time elapsed: 
29.812 s <<< FAILURE! - in org.apache.hudi.integ.ITTestHoodieDemo
   [ERROR] org.apache.hudi.integ.ITTestHoodieDemo.testParquetDemo  Time 
elapsed: 29.622 s  <<< FAILURE!
   org.opentest4j.AssertionFailedError: Command ([hdfs, dfsadmin, -safemode, 
wait]) expected to succeed. Exit (255) ==> expected: <0> but was: <255>
at 
org.apache.hudi.integ.ITTestHoodieDemo.setupDemo(ITTestHoodieDemo.java:167)
at 
org.apache.hudi.integ.ITTestHoodieDemo.testParquetDemo(ITTestHoodieDemo.java:107)
   
   ```
   
   This keeps failing. Could you rebase again with latest master? want to try 
running the tests again


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-2901) Fixed the bug clustering jobs are not running in parallel

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2901:
-
Sprint: Hudi-Sprint-0.10.1

> Fixed the bug clustering jobs are not running in parallel
> -
>
> Key: HUDI-2901
> URL: https://issues.apache.org/jira/browse/HUDI-2901
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 0.9.0
> Environment: spark2.4.5
>Reporter: tao meng
>Assignee: tao meng
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> Fixed the bug clustering jobs are not running in parasllel。
> [https://github.com/apache/hudi/issues/4135]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2901) Fixed the bug clustering jobs are not running in parallel

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2901:
-
Story Points: 1

> Fixed the bug clustering jobs are not running in parallel
> -
>
> Key: HUDI-2901
> URL: https://issues.apache.org/jira/browse/HUDI-2901
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 0.9.0
> Environment: spark2.4.5
>Reporter: tao meng
>Assignee: tao meng
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> Fixed the bug clustering jobs are not running in parasllel。
> [https://github.com/apache/hudi/issues/4135]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2938) Code Refactor: Metadata util to get latest file slices for readers and writers

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2938:
-
Issue Type: Improvement  (was: Task)

> Code Refactor: Metadata util to get latest file slices for readers and writers
> --
>
> Key: HUDI-2938
> URL: https://issues.apache.org/jira/browse/HUDI-2938
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> Need to address review comments for 
> https://issues.apache.org/jira/browse/HUDI-2923
> https://github.com/apache/hudi/pull/4206/files



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HUDI-281) HiveSync failure through Spark when useJdbc is set to false

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-281:
---

Assignee: (was: Raymond Xu)

> HiveSync failure through Spark when useJdbc is set to false
> ---
>
> Key: HUDI-281
> URL: https://issues.apache.org/jira/browse/HUDI-281
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration, Spark Integration, Usability
>Reporter: Udit Mehrotra
>Priority: Major
>  Labels: query-eng, user-support-issues
> Fix For: 0.11.0, 0.10.1
>
>
> Table creation with Hive sync through Spark fails, when I set *useJdbc* to 
> *false*. Currently I had to modify the code to set *useJdbc* to *false* as 
> there is not *DataSourceOption* through which I can specify this field when 
> running Hudi code.
> Here is the failure:
> {noformat}
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.hive.ql.session.SessionState.start(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/session/SessionState;
>   at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:527)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:517)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:507)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:272)
>   at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:132)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:96)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:68)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:235)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
>   at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
>   at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229){noformat}
> I was expecting this to fail through Spark, becuase *hive-exec* is not shaded 
> inside *hudi-spark-bundle*, while *HiveConf* is shaded and relocated. This 
> *SessionState* is coming from the spark-hive jar and obviously it does not 
> accept the relocated *HiveConf*.
> We in *EMR* are running into same problem when trying to integrate with Glue 
> Catalog. For this we have to create Hive metastore client through 
> *Hive.get(conf).getMsc()* instead of how it is being down now, so that 
> alternate implementations of metastore can get created. However, because 
> hive-exec is not shaded but HiveConf is relocated we run into same issues 
> there.
> It would not be recommended to shade *hive-exec* either because it itself is 
> an Uber jar that shades a lot of things, and all of them would end up in 
> *hudi-spark-bundle* jar. We would not want to head that route.

[GitHub] [hudi] harsh1231 commented on a change in pull request #4404: [HUDI-2558] Fixing Clustering w/ sort columns with null values fails

2021-12-29 Thread GitBox



harsh1231 commented on a change in pull request #4404:
URL: https://github.com/apache/hudi/pull/4404#discussion_r776596935



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java
##
@@ -55,8 +55,17 @@ public RDDCustomColumnsSortPartitioner(String[] columnNames, 
Schema schema) {
 final String[] sortColumns = this.sortColumnNames;
 final SerializableSchema schema = this.serializableSchema;
 return records.sortBy(
-record -> HoodieAvroUtils.getRecordColumnValues(record, sortColumns, 
schema),
+record -> {
+  Object recordValue = HoodieAvroUtils.getRecordColumnValues(record, 
sortColumns, schema);
+  // null values are replaced with empty string for null_first order
+  if (recordValue == null) {
+return "";

Review comment:
   Will update using `StringUtils.EMPTY_STRING`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] harsh1231 commented on a change in pull request #4404: [HUDI-2558] Fixing Clustering w/ sort columns with null values fails

2021-12-29 Thread GitBox



harsh1231 commented on a change in pull request #4404:
URL: https://github.com/apache/hudi/pull/4404#discussion_r776596753



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java
##
@@ -55,8 +55,17 @@ public RDDCustomColumnsSortPartitioner(String[] columnNames, 
Schema schema) {
 final String[] sortColumns = this.sortColumnNames;
 final SerializableSchema schema = this.serializableSchema;
 return records.sortBy(
-record -> HoodieAvroUtils.getRecordColumnValues(record, sortColumns, 
schema),
+record -> {
+  Object recordValue = HoodieAvroUtils.getRecordColumnValues(record, 
sortColumns, schema);
+  // null values are replaced with empty string for null_first order
+  if (recordValue == null) {
+return "";

Review comment:
   `if (columns.length == 1) {
   return HoodieAvroUtils.getNestedFieldVal(genericRecord, columns[0], 
true);
 }`  this nested trace can return data type other than string   from 
`convertValueForAvroLogicalTypes` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-2946) Upgrade maven plugin to make Hudi be compatible with higher Java versions

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2946:
-
Fix Version/s: (was: 0.10.1)

> Upgrade maven plugin to make Hudi be compatible with higher Java versions
> -
>
> Key: HUDI-2946
> URL: https://issues.apache.org/jira/browse/HUDI-2946
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Wenning Ding
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> I saw several issues while building Hudi w/ Java 11:
>  
> {{[ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar (default) on project 
> hudi-common: Execution default of goal 
> org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar failed: An API 
> incompatibility was encountered while executing 
> org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar: 
> java.lang.ExceptionInInitializerError: null[ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-shade-plugin:3.1.1:shade (default) on project 
> hudi-hadoop-mr-bundle: Error creating shaded jar: Problem shading JAR 
> /workspace/workspace/rchertar.bigtop.hudi-rpm-mainline-6.x-0.9.0/build/hudi/rpm/BUILD/hudi-0.9.0-amzn-1-SNAPSHOT/packaging/hudi-hadoop-mr-bundle/target/hudi-hadoop-mr-bundle-0.9.0-amzn-1-SNAPSHOT.jar
>  entry org/apache/hudi/hadoop/bundle/Main.class: 
> java.lang.IllegalArgumentException -> [Help 1]}}
>  
> We need to upgrade maven plugin versions to make it be compatible with Java 
> 11.
> Also upgrade dockerfile-maven-plugin to latest versions to support Java 11 
> [https://github.com/spotify/dockerfile-maven/pull/230]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2426) spark sql extensions breaks read.table from metastore

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2426:
-
Story Points: 1

> spark sql extensions breaks read.table from metastore
> -
>
> Key: HUDI-2426
> URL: https://issues.apache.org/jira/browse/HUDI-2426
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: nicolas paris
>Assignee: Yann Byron
>Priority: Critical
>  Labels: sev:critical, user-support-issues
> Fix For: 0.11.0, 0.10.1
>
>
> when adding the hudi spark sql support, this breaks the ability to read a 
> hudi metastore from spark:
>  bash-4.2$ ./spark3.0.2/bin/spark-shell --packages 
> org.apache.hudi:hudi-spark3-bundle_2.12:0.9.0,org.apache.spark:spark-avro_2.12:3.1.2
>  --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf 
> 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
>  
> scala> spark.table("default.test_hudi_table").show
> java.lang.UnsupportedOperationException: Unsupported parseMultipartIdentifier 
> method
>  at 
> org.apache.spark.sql.parser.HoodieCommonSqlParser.parseMultipartIdentifier(HoodieCommonSqlParser.scala:65)
>  at org.apache.spark.sql.SparkSession.table(SparkSession.scala:581)
>  ... 47 elided
>  
> removing the config makes the hive table readable again from spark
> this affect at least spark 3.0.x and 3.1.x



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2611) `create table if not exists` should print message instead of throwing error

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2611:
-
Story Points: 1

> `create table if not exists` should print message instead of throwing error
> ---
>
> Key: HUDI-2611
> URL: https://issues.apache.org/jira/browse/HUDI-2611
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Raymond Xu
>Assignee: Yann Byron
>Priority: Critical
>  Labels: user-support-issues
> Fix For: 0.11.0, 0.10.1
>
>
> See details in
> https://github.com/apache/hudi/issues/3845#issue-1033218877



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2661) java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable.copy

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2661:
-
Story Points: 1

> java.lang.NoSuchMethodError: 
> org.apache.spark.sql.catalyst.catalog.CatalogTable.copy
> 
>
> Key: HUDI-2661
> URL: https://issues.apache.org/jira/browse/HUDI-2661
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.10.0
>Reporter: Changjun Zhang
>Assignee: Yann Byron
>Priority: Critical
> Fix For: 0.11.0, 0.10.1
>
> Attachments: image-2021-11-01-21-47-44-538.png, 
> image-2021-11-01-21-48-22-765.png
>
>
> Hudi Integrate with Spark SQL  :
> when I add :
> {code:sh}
> // Some comments here
> spark-sql --conf 
> 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
> --conf 
> 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
> {code}
> to create a table on an existing hudi table: 
> {code:sql}
> create table testdb.tb_hudi_operation_test using hudi 
> location '/tmp/flinkdb/datas/tb_hudi_operation';
> {code}
> then throw Exception :
>  !image-2021-11-01-21-47-44-538.png|thumbnail! 
>  !image-2021-11-01-21-48-22-765.png|thumbnail! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2915) Fix field not found in record error for spark-sql

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2915:
-
Story Points: 1  (was: 2)

> Fix field not found in record error for spark-sql
> -
>
> Key: HUDI-2915
> URL: https://issues.apache.org/jira/browse/HUDI-2915
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Raymond Xu
>Assignee: Forward Xu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
> Attachments: image-2021-12-02-19-37-10-346.png
>
>
> !image-2021-12-02-19-37-10-346.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1850) Read on table fails if the first write to table failed

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1850:
-
Story Points: 1

> Read on table fails if the first write to table failed
> --
>
> Key: HUDI-1850
> URL: https://issues.apache.org/jira/browse/HUDI-1850
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.8.0
>Reporter: Vaibhav Sinha
>Assignee: Yann Byron
>Priority: Major
>  Labels: core-flow-ds, pull-request-available, release-blocker, 
> sev:high, spark
> Fix For: 0.11.0, 0.10.1
>
> Attachments: Screenshot 2021-04-24 at 7.53.22 PM.png
>
>
> {code:java}
> ava.util.NoSuchElementException: No value present in Option
>   at org.apache.hudi.common.util.Option.get(Option.java:88) 
> ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromCommitMetadata(TableSchemaResolver.java:215)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:166)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:155)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.MergeOnReadSnapshotRelation.(MergeOnReadSnapshotRelation.scala:65)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:99) 
> ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:63) 
> ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
>  ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326) 
> ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)
>  ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at scala.Option.getOrElse(Option.scala:189) 
> ~[scala-library-2.12.10.jar:?]
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308) 
> ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240) 
> ~[spark-sql_2.12-3.1.1.jar:3.1.1]
> {code}
> The screenshot shows the files that got created before the write had failed.
>  
> !Screenshot 2021-04-24 at 7.53.22 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3100) Hive Conditional sync cannot be set from deltastreamer

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3100:
-
Story Points: 1

> Hive Conditional sync cannot be set from deltastreamer
> --
>
> Key: HUDI-3100
> URL: https://issues.apache.org/jira/browse/HUDI-3100
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer, Hive Integration
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2915) Fix field not found in record error for spark-sql

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2915:
-
Story Points: 2

> Fix field not found in record error for spark-sql
> -
>
> Key: HUDI-2915
> URL: https://issues.apache.org/jira/browse/HUDI-2915
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Raymond Xu
>Assignee: Forward Xu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
> Attachments: image-2021-12-02-19-37-10-346.png
>
>
> !image-2021-12-02-19-37-10-346.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2966) Add TaskCompletionListener for HoodieMergeOnReadRDD to close logScanner

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2966:
-
Story Points: 1

> Add TaskCompletionListener for HoodieMergeOnReadRDD to close logScanner
> ---
>
> Key: HUDI-2966
> URL: https://issues.apache.org/jira/browse/HUDI-2966
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: core-flow-ds, pull-request-available, sev:high
> Fix For: 0.11.0, 0.10.1
>
>
> Add TaskCompletionListener for HoodieMergeOnReadRDD to close logScanner When 
> the query is completed。



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3107) Fix HiveSyncTool drop partitions using JDBC

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3107:
-
Story Points: 1

> Fix HiveSyncTool drop partitions using JDBC
> ---
>
> Key: HUDI-3107
> URL: https://issues.apache.org/jira/browse/HUDI-3107
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Yue Zhang
>Assignee: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> ```
>  org.apache.hudi.exception.HoodieException: Unable to delete table partitions 
> in /Users/yuezhang/tmp/hudiAfTable/forecast_agg
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:240)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.main(HoodieDropPartitionsTool.java:212)
>   at HoodieDropPartitionsToolTest.main(HoodieDropPartitionsToolTest.java:31)
> Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
> when hive syncing forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:119)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncHive(HoodieDropPartitionsTool.java:404)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncToHiveIfNecessary(HoodieDropPartitionsTool.java:270)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.doDeleteTablePartitionsEager(HoodieDropPartitionsTool.java:252)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:230)
>   ... 2 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
> partitions for table forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:368)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:202)
>   at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:116)
>   ... 6 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing 
> SQL ALTER TABLE `forecast_agg` DROP PARTITION (20210623/0/20210623)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:64)
>   at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
>   at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>   at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
>   at 
> org.apache.hudi.hive.ddl.JDBCExecutor.dropPartitionsToTable(JDBCExecutor.java:149)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.dropPartitionsToTable(HoodieHiveClient.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:363)
>   ... 9 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)
>   at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:242)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:62)
>   ... 21 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>   at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
>   at 
>

[jira] [Updated] (HUDI-281) HiveSync failure through Spark when useJdbc is set to false

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-281:

Story Points: 1

> HiveSync failure through Spark when useJdbc is set to false
> ---
>
> Key: HUDI-281
> URL: https://issues.apache.org/jira/browse/HUDI-281
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration, Spark Integration, Usability
>Reporter: Udit Mehrotra
>Assignee: Raymond Xu
>Priority: Major
>  Labels: query-eng, user-support-issues
> Fix For: 0.11.0, 0.10.1
>
>
> Table creation with Hive sync through Spark fails, when I set *useJdbc* to 
> *false*. Currently I had to modify the code to set *useJdbc* to *false* as 
> there is not *DataSourceOption* through which I can specify this field when 
> running Hudi code.
> Here is the failure:
> {noformat}
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.hive.ql.session.SessionState.start(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/session/SessionState;
>   at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:527)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:517)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:507)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:272)
>   at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:132)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:96)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:68)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:235)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
>   at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
>   at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229){noformat}
> I was expecting this to fail through Spark, becuase *hive-exec* is not shaded 
> inside *hudi-spark-bundle*, while *HiveConf* is shaded and relocated. This 
> *SessionState* is coming from the spark-hive jar and obviously it does not 
> accept the relocated *HiveConf*.
> We in *EMR* are running into same problem when trying to integrate with Glue 
> Catalog. For this we have to create Hive metastore client through 
> *Hive.get(conf).getMsc()* instead of how it is being down now, so that 
> alternate implementations of metastore can get created. However, because 
> hive-exec is not shaded but HiveConf is relocated we run into same issues 
> there.
> It would not be recommended to shade *hive-exec* either because it itself is 
> an Uber jar that shades a lot of things, and all of them would end up in 
> *hudi-spark-bundle* jar. We would not want to head

[jira] [Assigned] (HUDI-281) HiveSync failure through Spark when useJdbc is set to false

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-281:
---

Assignee: Raymond Xu

> HiveSync failure through Spark when useJdbc is set to false
> ---
>
> Key: HUDI-281
> URL: https://issues.apache.org/jira/browse/HUDI-281
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration, Spark Integration, Usability
>Reporter: Udit Mehrotra
>Assignee: Raymond Xu
>Priority: Major
>  Labels: query-eng, user-support-issues
> Fix For: 0.11.0, 0.10.1
>
>
> Table creation with Hive sync through Spark fails, when I set *useJdbc* to 
> *false*. Currently I had to modify the code to set *useJdbc* to *false* as 
> there is not *DataSourceOption* through which I can specify this field when 
> running Hudi code.
> Here is the failure:
> {noformat}
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.hive.ql.session.SessionState.start(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/session/SessionState;
>   at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:527)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:517)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:507)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:272)
>   at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:132)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:96)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:68)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:235)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
>   at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
>   at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
>   at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229){noformat}
> I was expecting this to fail through Spark, becuase *hive-exec* is not shaded 
> inside *hudi-spark-bundle*, while *HiveConf* is shaded and relocated. This 
> *SessionState* is coming from the spark-hive jar and obviously it does not 
> accept the relocated *HiveConf*.
> We in *EMR* are running into same problem when trying to integrate with Glue 
> Catalog. For this we have to create Hive metastore client through 
> *Hive.get(conf).getMsc()* instead of how it is being down now, so that 
> alternate implementations of metastore can get created. However, because 
> hive-exec is not shaded but HiveConf is relocated we run into same issues 
> there.
> It would not be recommended to shade *hive-exec* either because it itself is 
> an Uber jar that shades a lot of things, and all of them would end up in 
> *hudi-spark-bundle* jar. We would not

[jira] [Updated] (HUDI-3104) Hudi-kafka-connect can not scan hadoop config files by HADOOP_CONF_DIR

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3104:
-
Status: In Progress  (was: Open)

> Hudi-kafka-connect can not scan hadoop config files by HADOOP_CONF_DIR
> --
>
> Key: HUDI-3104
> URL: https://issues.apache.org/jira/browse/HUDI-3104
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: configs
>Reporter: cdmikechen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> I used hudi-kafka-connect to test pull kafka topic datas to hudi. I've build 
> a kafka connect docker by this dockerfile:
> {code}
> FROM confluentinc/cp-kafka-connect:6.1.1
> RUN confluent-hub install --no-prompt confluentinc/kafka-connect-hdfs:10.1.3
> COPY hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar 
> /usr/share/confluent-hub-components/confluentinc-kafka-connect-hdfs/lib
> {code}
> When I started this docker container and submit a task, hudi report this 
> error:
> {code}
> [2021-12-27 15:04:55,214] INFO Setting record key volume and partition fields 
> date for table 
> hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi-test-topichudi-test-topic
>  (org.apache.hudi.connect.writers.KafkaConnectTransactionServices)
> [2021-12-27 15:04:55,224] INFO Initializing 
> hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi-test-topic as hoodie 
> table hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi-test-topic 
> (org.apache.hudi.common.table.HoodieTableMetaClient)
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by 
> org.apache.hadoop.security.authentication.util.KerberosUtil 
> (file:/usr/share/confluent-hub-components/confluentinc-kafka-connect-hdfs/lib/hadoop-auth-2.10.1.jar)
>  to method sun.security.krb5.Config.getInstance()
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.hadoop.security.authentication.util.KerberosUtil
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> [2021-12-27 15:04:55,571] WARN Unable to load native-hadoop library for your 
> platform... using builtin-java classes where applicable 
> (org.apache.hadoop.util.NativeCodeLoader)
> [2021-12-27 15:04:56,154] ERROR Fatal error initializing task null for 
> partition 0 (org.apache.hudi.connect.HoodieSinkTask)
> org.apache.hudi.exception.HoodieException: Fatal error instantiating Hudi 
> Transaction Services 
>   at 
> org.apache.hudi.connect.writers.KafkaConnectTransactionServices.(KafkaConnectTransactionServices.java:113)
>  ~[hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
>   at 
> org.apache.hudi.connect.transaction.ConnectTransactionCoordinator.(ConnectTransactionCoordinator.java:88)
>  ~[hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
>   at 
> org.apache.hudi.connect.HoodieSinkTask.bootstrap(HoodieSinkTask.java:191) 
> [hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
>   at org.apache.hudi.connect.HoodieSinkTask.open(HoodieSinkTask.java:151) 
> [hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.openPartitions(WorkerSinkTask.java:640)
>  [connect-runtime-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.access$1100(WorkerSinkTask.java:71)
>  [connect-runtime-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance.onPartitionsAssigned(WorkerSinkTask.java:705)
>  [connect-runtime-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsAssigned(ConsumerCoordinator.java:293)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:430)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:449)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:365)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:508)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1257)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1226) 
> [kafka-clients-6.1.1-ccs.jar:?]
>   at 
>

[jira] [Assigned] (HUDI-3104) Hudi-kafka-connect can not scan hadoop config files by HADOOP_CONF_DIR

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-3104:


Assignee: cdmikechen

> Hudi-kafka-connect can not scan hadoop config files by HADOOP_CONF_DIR
> --
>
> Key: HUDI-3104
> URL: https://issues.apache.org/jira/browse/HUDI-3104
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: configs
>Reporter: cdmikechen
>Assignee: cdmikechen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> I used hudi-kafka-connect to test pull kafka topic datas to hudi. I've build 
> a kafka connect docker by this dockerfile:
> {code}
> FROM confluentinc/cp-kafka-connect:6.1.1
> RUN confluent-hub install --no-prompt confluentinc/kafka-connect-hdfs:10.1.3
> COPY hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar 
> /usr/share/confluent-hub-components/confluentinc-kafka-connect-hdfs/lib
> {code}
> When I started this docker container and submit a task, hudi report this 
> error:
> {code}
> [2021-12-27 15:04:55,214] INFO Setting record key volume and partition fields 
> date for table 
> hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi-test-topichudi-test-topic
>  (org.apache.hudi.connect.writers.KafkaConnectTransactionServices)
> [2021-12-27 15:04:55,224] INFO Initializing 
> hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi-test-topic as hoodie 
> table hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi-test-topic 
> (org.apache.hudi.common.table.HoodieTableMetaClient)
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by 
> org.apache.hadoop.security.authentication.util.KerberosUtil 
> (file:/usr/share/confluent-hub-components/confluentinc-kafka-connect-hdfs/lib/hadoop-auth-2.10.1.jar)
>  to method sun.security.krb5.Config.getInstance()
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.hadoop.security.authentication.util.KerberosUtil
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> [2021-12-27 15:04:55,571] WARN Unable to load native-hadoop library for your 
> platform... using builtin-java classes where applicable 
> (org.apache.hadoop.util.NativeCodeLoader)
> [2021-12-27 15:04:56,154] ERROR Fatal error initializing task null for 
> partition 0 (org.apache.hudi.connect.HoodieSinkTask)
> org.apache.hudi.exception.HoodieException: Fatal error instantiating Hudi 
> Transaction Services 
>   at 
> org.apache.hudi.connect.writers.KafkaConnectTransactionServices.(KafkaConnectTransactionServices.java:113)
>  ~[hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
>   at 
> org.apache.hudi.connect.transaction.ConnectTransactionCoordinator.(ConnectTransactionCoordinator.java:88)
>  ~[hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
>   at 
> org.apache.hudi.connect.HoodieSinkTask.bootstrap(HoodieSinkTask.java:191) 
> [hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
>   at org.apache.hudi.connect.HoodieSinkTask.open(HoodieSinkTask.java:151) 
> [hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.openPartitions(WorkerSinkTask.java:640)
>  [connect-runtime-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.access$1100(WorkerSinkTask.java:71)
>  [connect-runtime-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance.onPartitionsAssigned(WorkerSinkTask.java:705)
>  [connect-runtime-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsAssigned(ConsumerCoordinator.java:293)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:430)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:449)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:365)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:508)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1257)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1226) 
> [kafka-clients-6.1.1-ccs.jar:?]
>   at 
>

[jira] [Updated] (HUDI-3125) Spark SQL writing timestamp type don't need to disable `spark.sql.datetime.java8API.enabled` manually

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3125:
-
Story Points: 1

> Spark SQL writing timestamp type don't need to disable 
> `spark.sql.datetime.java8API.enabled` manually
> -
>
> Key: HUDI-3125
> URL: https://issues.apache.org/jira/browse/HUDI-3125
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> {code:java}
> create table h0_p(id int, name string, price double, dt timestamp) using hudi 
> partitioned by(dt) options(type = 'cow', primaryKey = 'id');
> insert into h0_p values (3, 'a1', 10, cast('2021-05-08 00:00:00' as 
> timestamp)); {code}
> By default, that run the sql above will throw exception:
> {code:java}
> Caused by: java.lang.ClassCastException: java.time.Instant cannot be cast to 
> java.sql.Timestamp
>     at 
> org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$8(AvroConversionHelper.scala:306)
>     at 
> org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$8$adapted(AvroConversionHelper.scala:306)
>     at scala.Option.map(Option.scala:230) {code}
> We need disable `spark.sql.datetime.java8API.enabled` manually to make it 
> work:
> {code:java}
> set spark.sql.datetime.java8API.enabled=false; {code}
> And the command must be executed in the runtime. It can't work if provide 
> this by spark-sql command: `spark-sql --conf 
> spark.sql.datetime.java8API.enabled=false`. That's because this config is 
> forced to enable when launch spark-sql.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3131) Spark3.1.1 CTAS error

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3131:
-
Story Points: 1

> Spark3.1.1 CTAS error
> -
>
> Key: HUDI-3131
> URL: https://issues.apache.org/jira/browse/HUDI-3131
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> Fail to run CTAS with Hudi0.10.0 and Spark3.1.1.
>  
> Sql:
> {code:java}
> create table h1_p using hudi partitioned by(dt) options(type = 'cow', 
> primaryKey = 'id') as select '2021-05-07' as dt, 1 as id, 'a1' as name, 10 as 
> price; {code}
> Error:
> {code:java}
> java.lang.NoSuchMethodError: 
> org.apache.spark.sql.catalyst.plans.logical.Command.producedAttributes$(Lorg/apache/spark/sql/catalyst/plans/logical/Command;)Lorg/apache/spark/sql/catalyst/expressions/AttributeSet;
>     at 
> org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.producedAttributes(CreateHoodieTableAsSelectCommand.scala:39)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3125) Spark SQL writing timestamp type don't need to disable `spark.sql.datetime.java8API.enabled` manually

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3125:
-
Issue Type: Bug  (was: Improvement)

> Spark SQL writing timestamp type don't need to disable 
> `spark.sql.datetime.java8API.enabled` manually
> -
>
> Key: HUDI-3125
> URL: https://issues.apache.org/jira/browse/HUDI-3125
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> {code:java}
> create table h0_p(id int, name string, price double, dt timestamp) using hudi 
> partitioned by(dt) options(type = 'cow', primaryKey = 'id');
> insert into h0_p values (3, 'a1', 10, cast('2021-05-08 00:00:00' as 
> timestamp)); {code}
> By default, that run the sql above will throw exception:
> {code:java}
> Caused by: java.lang.ClassCastException: java.time.Instant cannot be cast to 
> java.sql.Timestamp
>     at 
> org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$8(AvroConversionHelper.scala:306)
>     at 
> org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$8$adapted(AvroConversionHelper.scala:306)
>     at scala.Option.map(Option.scala:230) {code}
> We need disable `spark.sql.datetime.java8API.enabled` manually to make it 
> work:
> {code:java}
> set spark.sql.datetime.java8API.enabled=false; {code}
> And the command must be executed in the runtime. It can't work if provide 
> this by spark-sql command: `spark-sql --conf 
> spark.sql.datetime.java8API.enabled=false`. That's because this config is 
> forced to enable when launch spark-sql.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3100) Hive Conditional sync cannot be set from deltastreamer

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3100:
-
Status: In Progress  (was: Open)

> Hive Conditional sync cannot be set from deltastreamer
> --
>
> Key: HUDI-3100
> URL: https://issues.apache.org/jira/browse/HUDI-3100
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer, Hive Integration
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2987) event time not recorded in commit metadata when insert or bulk insert

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2987:
-
Fix Version/s: (was: 0.10.1)

> event time not recorded in commit metadata when insert or bulk insert
> -
>
> Key: HUDI-2987
> URL: https://issues.apache.org/jira/browse/HUDI-2987
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Critical
>  Labels: pull-request-available, sev:high
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] LuPan2015 commented on issue #4475: [SUPPORT] Hudi and aws S3 integration exception

2021-12-29 Thread GitBox



LuPan2015 commented on issue #4475:
URL: https://github.com/apache/hudi/issues/4475#issuecomment-1002895931


   solved #4474 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] LuPan2015 closed issue #4475: [SUPPORT] Hudi and aws S3 integration exception

2021-12-29 Thread GitBox



LuPan2015 closed issue #4475:
URL: https://github.com/apache/hudi/issues/4475


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on pull request #4065: [HUDI-2817] Sync the configuration inference for HoodieFlinkStreamer

2021-12-29 Thread GitBox



danny0405 commented on pull request #4065:
URL: https://github.com/apache/hudi/pull/4065#issuecomment-1002894948


   Can we sync up all the inference logic ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on pull request #4189: [HUDI-2913] Disable auto clean in writer task

2021-12-29 Thread GitBox



danny0405 commented on pull request #4189:
URL: https://github.com/apache/hudi/pull/4189#issuecomment-1002894548


   Close because it is not necessary ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 closed pull request #4189: [HUDI-2913] Disable auto clean in writer task

2021-12-29 Thread GitBox



danny0405 closed pull request #4189:
URL: https://github.com/apache/hudi/pull/4189


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 closed pull request #3386: [HUDI-2270] Remove corrupted clean action

2021-12-29 Thread GitBox



danny0405 closed pull request #3386:
URL: https://github.com/apache/hudi/pull/3386


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on pull request #3386: [HUDI-2270] Remove corrupted clean action

2021-12-29 Thread GitBox



danny0405 commented on pull request #3386:
URL: https://github.com/apache/hudi/pull/3386#issuecomment-1002893737


   Close because #4016 solves the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] LuPan2015 commented on issue #4474: [SUPPORT] Should we shade all aws dependencies to avoid class conflicts?

2021-12-29 Thread GitBox



LuPan2015 commented on issue #4474:
URL: https://github.com/apache/hudi/issues/4474#issuecomment-1002893633


   yes.  But it works fine。
   Next I need to store the metadata in glue。 
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] boneanxs commented on issue #4474: [SUPPORT] Should we shade all aws dependencies to avoid class conflicts?

2021-12-29 Thread GitBox



boneanxs commented on issue #4474:
URL: https://github.com/apache/hudi/issues/4474#issuecomment-1002890635


   > Error in query: Specified schema in create table statement is not equal to 
the table schema.You should not specify the schema for an exist table: 
`default`.`hudi_mor_s32`
   
   Not the same exception?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] YannByron commented on issue #4429: [SUPPORT] Spark SQL CTAS command doesn't work with 0.10.0 version and Spark 3.1.1

2021-12-29 Thread GitBox



YannByron commented on issue #4429:
URL: https://github.com/apache/hudi/issues/4429#issuecomment-1002887429


   @vingov
   as the picture I mentioned above, need to `set 
spark.sql.datetime.java8API.enabled=false;` manually at this time.
   And i also try to improve it that don't need set by user, #4471. In this, 
just work for `insert`, not for `CTAS`, but i'm still working on it..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] LuPan2015 edited a comment on issue #4474: [SUPPORT] Should we shade all aws dependencies to avoid class conflicts?

2021-12-29 Thread GitBox



LuPan2015 edited a comment on issue #4474:
URL: https://github.com/apache/hudi/issues/4474#issuecomment-1002882980


   I tried it, but the following exception was still thrown。
   ```
   spark/bin/spark-sql --packages 
org.apache.hadoop:hadoop-aws:3.2.0,com.amazonaws:aws-java-sdk:1.12.22 --jars 
hudi-spark3-bundle_2.12-0.11.0-SNAPSHOT.jar \
  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
  --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
   ```
   ```
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   21/12/30 13:46:32 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout 
does not exist
   21/12/30 13:46:32 WARN HiveConf: HiveConf of name hive.stats.retries.wait 
does not exist
   21/12/30 13:46:34 WARN ObjectStore: Version information not found in 
metastore. hive.metastore.schema.verification is not enabled so recording the 
schema version 2.3.0
   21/12/30 13:46:34 WARN ObjectStore: setMetaStoreSchemaVersion called but 
recording version is disabled: version = 2.3.0, comment = Set by MetaStore 
lupan@127.0.1.1
   Spark master: local[*], Application Id: local-1640843189215
   spark-sql> create table default.hudi_mor_s32 (
>   id bigint,
>   name string,
>   dt string
> ) using hudi
> tblproperties (
>   type = 'mor',
>   primaryKey = 'id'
>  )
> partitioned by (dt)
> location 's3a://iceberg-bucket/hudi-warehouse/';
   ANTLR Tool version 4.7 used for code generation does not match the current 
runtime version 4.8ANTLR Tool version 4.7 used for code generation does not 
match the current runtime version 4.821/12/30 13:46:45 WARN MetricsConfig: 
Cannot locate configuration: tried 
hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
   Error in query: Specified schema in create table statement is not equal to 
the table schema.You should not specify the schema for an exist table: 
`default`.`hudi_mor_s32`
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] LuPan2015 commented on issue #4474: [SUPPORT] Should we shade all aws dependencies to avoid class conflicts?

2021-12-29 Thread GitBox



LuPan2015 commented on issue #4474:
URL: https://github.com/apache/hudi/issues/4474#issuecomment-1002882980


   I tried it, but the following exception was still thrown。
   ```
   spark/bin/spark-sql --packages 
org.apache.hadoop:hadoop-aws:3.2.0,com.amazonaws:aws-java-sdk:1.12.22 --jars 
hudi-spark3-bundle_2.12-0.11.0-SNAPSHOT.jar \
  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
  --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
   ```
   ```
   spark-sql> create table default.hudi_mor_s32 (
>   id bigint,
>   name string,
>   dt string
> ) using hudi
> tblproperties (
>   type = 'mor',
>   primaryKey = 'id'
>  )
> partitioned by (dt)
> location 's3a://iceberg-bucket/hudi-warehouse/';
   ANTLR Tool version 4.7 used for code generation does not match the current 
runtime version 4.8ANTLR Tool version 4.7 used for code generation does not 
match the current runtime version 4.821/12/30 13:46:45 WARN MetricsConfig: 
Cannot locate configuration: tried 
hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
   Error in query: Specified schema in create table statement is not equal to 
the table schema.You should not specify the schema for an exist table: 
`default`.`hudi_mor_s32`
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] vingov edited a comment on issue #4429: [SUPPORT] Spark SQL CTAS command doesn't work with 0.10.0 version and Spark 3.1.1

2021-12-29 Thread GitBox



vingov edited a comment on issue #4429:
URL: https://github.com/apache/hudi/issues/4429#issuecomment-1002881175


   @YannByron - Thanks for the quick turnaround, I appreciate it!
   
   @xushiyan - There are more errors with Spark 3.1.2 as well, see below:
   
   ```
   spark-sql> create table h0_p using hudi partitioned by(dt)
> tblproperties(type = 'cow', primaryKey = 'id')
> as select cast('2021-05-07 00:00:00' as timestamp) as dt,
>   1 as id, 'a1' as name, 10 as price;
   21/12/30 05:28:02 WARN DFSPropertiesConfiguration: Cannot find 
HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
   21/12/30 05:28:02 WARN DFSPropertiesConfiguration: Properties file 
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
   21/12/30 05:28:07 WARN package: Truncated the string representation of a 
plan since it was too large. This behavior can be adjusted by setting 
'spark.sql.debug.maxToStringFields'.
   21/12/30 05:28:14 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
   org.apache.spark.SparkException: Failed to execute user defined 
function(UDFRegistration$$Lambda$3034/1190042877: 
(struct) => string)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at 
org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:41)
at 
org.apache.spark.RangePartitioner$.$anonfun$sketch$1(Partitioner.scala:306)
at 
org.apache.spark.RangePartitioner$.$anonfun$sketch$1$adapted(Partitioner.scala:304)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.IllegalArgumentException: Invalid format: 
"2021-05-07T00:00:00Z" is malformed at "T00:00:00Z"
at 
org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
at 
org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826)
at 
org.apache.spark.sql.hudi.command.SqlKeyGenerator.$anonfun$convertPartitionPathToSqlType$1(SqlKeyGenerator.scala:97)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at 
org.apache.spark.sql.hudi.command.SqlKeyGenerator.convertPartitionPathToSqlType(SqlKeyGenerator.scala:88)
at 
org.apache.spark.sql.hudi.command.SqlKeyGenerator.getPartitionPath(SqlKeyGenerator.scala:118)
at 
org.apache.spark.sql.UDFRegistration.$anonfun$register$352(UDFRegistration.scala:777)
... 22 more
   ```
   
   another error with 0.10.0, but these statements are working with 0.9.0 
version:
   
   ```
   spark-sql> use analytics;
   Time taken: 0.103 seconds
   spark-sql> desc insert_overwrite_table;
   _hoodie_commit_time  string  NULL
   _hoodie_commit_seqno string  NULL
   _hoodie_record_key   string  NULL
   _hoodie_partition_path   string  NULL
   _hoodie_file_namestring  NULL
   id   string  NULL
   name string  NULL
   ts   timestamp   NULL
   Time taken: 0.391

[jira] [Updated] (HUDI-3120) Cache compactionPlan in buffer

2021-12-29 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-3120:
-
Fix Version/s: 0.11.0
   0.10.1

> Cache compactionPlan in buffer
> --
>
> Key: HUDI-3120
> URL: https://issues.apache.org/jira/browse/HUDI-3120
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] danny0405 commented on a change in pull request #4463: [HUDI-3120] Cache compactionPlan in buffer

2021-12-29 Thread GitBox



danny0405 commented on a change in pull request #4463:
URL: https://github.com/apache/hudi/pull/4463#discussion_r776575452



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactionCommitSink.java
##
@@ -108,8 +124,15 @@ public void invoke(CompactionCommitEvent event, Context 
context) throws Exceptio
* @param events  Commit events ever received for the instant
*/
   private void commitIfNecessary(String instant, 
Collection events) throws IOException {
-HoodieCompactionPlan compactionPlan = CompactionUtils.getCompactionPlan(
-this.writeClient.getHoodieTable().getMetaClient(), instant);
+HoodieCompactionPlan compactionPlan;
+if (compactionPlanCache.containsKey(instant)) {
+  compactionPlan = compactionPlanCache.get(instant);
+} else {
+  compactionPlan = CompactionUtils.getCompactionPlan(
+  this.table.getMetaClient(), instant);
+  compactionPlanCache.put(instant, compactionPlan);

Review comment:
   Got your idea, can we use `computeIfAbsent` instead ? And can we remove 
the refresh logic in `notifyCheckpointComplete` ? New a fresh meta client for 
each compaction instant is more reasonable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a change in pull request #4463: [HUDI-3120] Cache compactionPlan in buffer

2021-12-29 Thread GitBox



danny0405 commented on a change in pull request #4463:
URL: https://github.com/apache/hudi/pull/4463#discussion_r776575452



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactionCommitSink.java
##
@@ -108,8 +124,15 @@ public void invoke(CompactionCommitEvent event, Context 
context) throws Exceptio
* @param events  Commit events ever received for the instant
*/
   private void commitIfNecessary(String instant, 
Collection events) throws IOException {
-HoodieCompactionPlan compactionPlan = CompactionUtils.getCompactionPlan(
-this.writeClient.getHoodieTable().getMetaClient(), instant);
+HoodieCompactionPlan compactionPlan;
+if (compactionPlanCache.containsKey(instant)) {
+  compactionPlan = compactionPlanCache.get(instant);
+} else {
+  compactionPlan = CompactionUtils.getCompactionPlan(
+  this.table.getMetaClient(), instant);
+  compactionPlanCache.put(instant, compactionPlan);

Review comment:
   Got your idea, can we use `computeIfAbsent` instead ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-3107) Fix HiveSyncTool drop partitions using JDBC

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-3107:


Assignee: Yue Zhang  (was: Yue Zhang)

> Fix HiveSyncTool drop partitions using JDBC
> ---
>
> Key: HUDI-3107
> URL: https://issues.apache.org/jira/browse/HUDI-3107
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Yue Zhang
>Assignee: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> ```
>  org.apache.hudi.exception.HoodieException: Unable to delete table partitions 
> in /Users/yuezhang/tmp/hudiAfTable/forecast_agg
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:240)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.main(HoodieDropPartitionsTool.java:212)
>   at HoodieDropPartitionsToolTest.main(HoodieDropPartitionsToolTest.java:31)
> Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
> when hive syncing forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:119)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncHive(HoodieDropPartitionsTool.java:404)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncToHiveIfNecessary(HoodieDropPartitionsTool.java:270)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.doDeleteTablePartitionsEager(HoodieDropPartitionsTool.java:252)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:230)
>   ... 2 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
> partitions for table forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:368)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:202)
>   at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:116)
>   ... 6 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing 
> SQL ALTER TABLE `forecast_agg` DROP PARTITION (20210623/0/20210623)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:64)
>   at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
>   at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>   at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
>   at 
> org.apache.hudi.hive.ddl.JDBCExecutor.dropPartitionsToTable(JDBCExecutor.java:149)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.dropPartitionsToTable(HoodieHiveClient.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:363)
>   ... 9 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)
>   at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:242)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:62)
>   ... 21 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>   at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
>   at 
>

[jira] [Commented] (HUDI-3107) Fix HiveSyncTool drop partitions using JDBC

2021-12-29 Thread Raymond Xu (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466690#comment-17466690
 ] 

Raymond Xu commented on HUDI-3107:
--

[~danielzhang] 

> Fix HiveSyncTool drop partitions using JDBC
> ---
>
> Key: HUDI-3107
> URL: https://issues.apache.org/jira/browse/HUDI-3107
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Yue Zhang
>Assignee: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> ```
>  org.apache.hudi.exception.HoodieException: Unable to delete table partitions 
> in /Users/yuezhang/tmp/hudiAfTable/forecast_agg
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:240)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.main(HoodieDropPartitionsTool.java:212)
>   at HoodieDropPartitionsToolTest.main(HoodieDropPartitionsToolTest.java:31)
> Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
> when hive syncing forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:119)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncHive(HoodieDropPartitionsTool.java:404)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncToHiveIfNecessary(HoodieDropPartitionsTool.java:270)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.doDeleteTablePartitionsEager(HoodieDropPartitionsTool.java:252)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:230)
>   ... 2 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
> partitions for table forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:368)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:202)
>   at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:116)
>   ... 6 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing 
> SQL ALTER TABLE `forecast_agg` DROP PARTITION (20210623/0/20210623)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:64)
>   at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
>   at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>   at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
>   at 
> org.apache.hudi.hive.ddl.JDBCExecutor.dropPartitionsToTable(JDBCExecutor.java:149)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.dropPartitionsToTable(HoodieHiveClient.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:363)
>   ... 9 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)
>   at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:242)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:62)
>   ... 21 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>   at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
>   at 
>

[jira] [Comment Edited] (HUDI-3107) Fix HiveSyncTool drop partitions using JDBC

2021-12-29 Thread Raymond Xu (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466690#comment-17466690
 ] 

Raymond Xu edited comment on HUDI-3107 at 12/30/21, 5:41 AM:
-

[~danielzhang] fixed!


was (Author: xushiyan):
[~danielzhang] 

> Fix HiveSyncTool drop partitions using JDBC
> ---
>
> Key: HUDI-3107
> URL: https://issues.apache.org/jira/browse/HUDI-3107
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Yue Zhang
>Assignee: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> ```
>  org.apache.hudi.exception.HoodieException: Unable to delete table partitions 
> in /Users/yuezhang/tmp/hudiAfTable/forecast_agg
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:240)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.main(HoodieDropPartitionsTool.java:212)
>   at HoodieDropPartitionsToolTest.main(HoodieDropPartitionsToolTest.java:31)
> Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
> when hive syncing forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:119)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncHive(HoodieDropPartitionsTool.java:404)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncToHiveIfNecessary(HoodieDropPartitionsTool.java:270)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.doDeleteTablePartitionsEager(HoodieDropPartitionsTool.java:252)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:230)
>   ... 2 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
> partitions for table forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:368)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:202)
>   at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:116)
>   ... 6 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing 
> SQL ALTER TABLE `forecast_agg` DROP PARTITION (20210623/0/20210623)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:64)
>   at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
>   at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>   at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
>   at 
> org.apache.hudi.hive.ddl.JDBCExecutor.dropPartitionsToTable(JDBCExecutor.java:149)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.dropPartitionsToTable(HoodieHiveClient.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:363)
>   ... 9 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)
>   at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:242)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:62)
>   ... 21 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>   at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>   at 
>

[GitHub] [hudi] vingov commented on issue #4429: [SUPPORT] Spark SQL CTAS command doesn't work with 0.10.0 version and Spark 3.1.1

2021-12-29 Thread GitBox



vingov commented on issue #4429:
URL: https://github.com/apache/hudi/issues/4429#issuecomment-1002881175


   @YannByron - Thanks for the quick turnaround, I appreciate it!
   
   @xushiyan - There are more errors with Spark 3.1.2 as well, see below:
   
   ```
   spark-sql> create table h0_p using hudi partitioned by(dt)
> tblproperties(type = 'cow', primaryKey = 'id')
> as select cast('2021-05-07 00:00:00' as timestamp) as dt,
>   1 as id, 'a1' as name, 10 as price;
   21/12/30 05:28:02 WARN DFSPropertiesConfiguration: Cannot find 
HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
   21/12/30 05:28:02 WARN DFSPropertiesConfiguration: Properties file 
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
   21/12/30 05:28:07 WARN package: Truncated the string representation of a 
plan since it was too large. This behavior can be adjusted by setting 
'spark.sql.debug.maxToStringFields'.
   21/12/30 05:28:14 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
   org.apache.spark.SparkException: Failed to execute user defined 
function(UDFRegistration$$Lambda$3034/1190042877: 
(struct) => string)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at 
org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:41)
at 
org.apache.spark.RangePartitioner$.$anonfun$sketch$1(Partitioner.scala:306)
at 
org.apache.spark.RangePartitioner$.$anonfun$sketch$1$adapted(Partitioner.scala:304)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.IllegalArgumentException: Invalid format: 
"2021-05-07T00:00:00Z" is malformed at "T00:00:00Z"
at 
org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
at 
org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826)
at 
org.apache.spark.sql.hudi.command.SqlKeyGenerator.$anonfun$convertPartitionPathToSqlType$1(SqlKeyGenerator.scala:97)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at 
org.apache.spark.sql.hudi.command.SqlKeyGenerator.convertPartitionPathToSqlType(SqlKeyGenerator.scala:88)
at 
org.apache.spark.sql.hudi.command.SqlKeyGenerator.getPartitionPath(SqlKeyGenerator.scala:118)
at 
org.apache.spark.sql.UDFRegistration.$anonfun$register$352(UDFRegistration.scala:777)
... 22 more
   21/12/30 05:28:14 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 3) 
(1d19b4f5cd46 executor driver): org.apache.spark.SparkException: Failed to 
execute user defined function(UDFRegistration$$Lambda$3034/1190042877: 
(struct) => string)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
 Source)
at

[jira] [Commented] (HUDI-3107) Fix HiveSyncTool drop partitions using JDBC

2021-12-29 Thread Yue Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466689#comment-17466689
 ] 

Yue Zhang commented on HUDI-3107:
-

Hi Raymond, this is the wrong GitHub id.


-- 
Yue (Daniel) Zhang, Ph.D.
Department of Computer Science Engineering, University of Notre Dame
765-714-9689
dyzhang.net


> Fix HiveSyncTool drop partitions using JDBC
> ---
>
> Key: HUDI-3107
> URL: https://issues.apache.org/jira/browse/HUDI-3107
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Yue Zhang
>Assignee: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> ```
>  org.apache.hudi.exception.HoodieException: Unable to delete table partitions 
> in /Users/yuezhang/tmp/hudiAfTable/forecast_agg
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:240)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.main(HoodieDropPartitionsTool.java:212)
>   at HoodieDropPartitionsToolTest.main(HoodieDropPartitionsToolTest.java:31)
> Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
> when hive syncing forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:119)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncHive(HoodieDropPartitionsTool.java:404)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncToHiveIfNecessary(HoodieDropPartitionsTool.java:270)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.doDeleteTablePartitionsEager(HoodieDropPartitionsTool.java:252)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:230)
>   ... 2 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
> partitions for table forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:368)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:202)
>   at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:116)
>   ... 6 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing 
> SQL ALTER TABLE `forecast_agg` DROP PARTITION (20210623/0/20210623)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:64)
>   at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
>   at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>   at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
>   at 
> org.apache.hudi.hive.ddl.JDBCExecutor.dropPartitionsToTable(JDBCExecutor.java:149)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.dropPartitionsToTable(HoodieHiveClient.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:363)
>   ... 9 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)
>   at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:242)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:62)
>   ... 21 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>   at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>   at 
>

[jira] [Updated] (HUDI-3106) Fix HiveSyncTool not sync schema

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3106:
-
Reviewers: Raymond Xu

> Fix HiveSyncTool not sync schema
> 
>
> Key: HUDI-3106
> URL: https://issues.apache.org/jira/browse/HUDI-3106
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Raymond Xu
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> Envisioned scene
> t1  commit ->  write to add new column
> t2  commit -> drop partition
> t3  Run hive sync tool manually
> The schema will not be updated at this time



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2990) Sync to HMS when deleting partitions

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2990:
-
Reviewers: Raymond Xu

> Sync to HMS when deleting partitions
> 
>
> Key: HUDI-2990
> URL: https://issues.apache.org/jira/browse/HUDI-2990
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Forward Xu
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available, sev:normal
> Fix For: 0.11.0, 0.10.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3107) Fix HiveSyncTool drop partitions using JDBC

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3107:
-
Reviewers: Raymond Xu

> Fix HiveSyncTool drop partitions using JDBC
> ---
>
> Key: HUDI-3107
> URL: https://issues.apache.org/jira/browse/HUDI-3107
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Yue Zhang
>Assignee: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> ```
>  org.apache.hudi.exception.HoodieException: Unable to delete table partitions 
> in /Users/yuezhang/tmp/hudiAfTable/forecast_agg
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:240)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.main(HoodieDropPartitionsTool.java:212)
>   at HoodieDropPartitionsToolTest.main(HoodieDropPartitionsToolTest.java:31)
> Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
> when hive syncing forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:119)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncHive(HoodieDropPartitionsTool.java:404)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.syncToHiveIfNecessary(HoodieDropPartitionsTool.java:270)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.doDeleteTablePartitionsEager(HoodieDropPartitionsTool.java:252)
>   at 
> org.apache.hudi.utilities.HoodieDropPartitionsTool.run(HoodieDropPartitionsTool.java:230)
>   ... 2 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
> partitions for table forecast_agg
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:368)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:202)
>   at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:116)
>   ... 6 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing 
> SQL ALTER TABLE `forecast_agg` DROP PARTITION (20210623/0/20210623)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:64)
>   at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
>   at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>   at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
>   at 
> org.apache.hudi.hive.ddl.JDBCExecutor.dropPartitionsToTable(JDBCExecutor.java:149)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.dropPartitionsToTable(HoodieHiveClient.java:130)
>   at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:363)
>   ... 9 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)
>   at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:242)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
>   at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:62)
>   ... 21 more
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: ParseException line 1:43 cannot recognize input 
> near '20210623' '/' '0' in drop partition statement
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>   at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
>   at 
>

[jira] [Updated] (HUDI-2426) spark sql extensions breaks read.table from metastore

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2426:
-
Reviewers: Raymond Xu

> spark sql extensions breaks read.table from metastore
> -
>
> Key: HUDI-2426
> URL: https://issues.apache.org/jira/browse/HUDI-2426
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: nicolas paris
>Assignee: Yann Byron
>Priority: Critical
>  Labels: sev:critical, user-support-issues
> Fix For: 0.11.0, 0.10.1
>
>
> when adding the hudi spark sql support, this breaks the ability to read a 
> hudi metastore from spark:
>  bash-4.2$ ./spark3.0.2/bin/spark-shell --packages 
> org.apache.hudi:hudi-spark3-bundle_2.12:0.9.0,org.apache.spark:spark-avro_2.12:3.1.2
>  --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf 
> 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
>  
> scala> spark.table("default.test_hudi_table").show
> java.lang.UnsupportedOperationException: Unsupported parseMultipartIdentifier 
> method
>  at 
> org.apache.spark.sql.parser.HoodieCommonSqlParser.parseMultipartIdentifier(HoodieCommonSqlParser.scala:65)
>  at org.apache.spark.sql.SparkSession.table(SparkSession.scala:581)
>  ... 47 elided
>  
> removing the config makes the hive table readable again from spark
> this affect at least spark 3.0.x and 3.1.x



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2611) `create table if not exists` should print message instead of throwing error

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2611:
-
Reviewers: Raymond Xu

> `create table if not exists` should print message instead of throwing error
> ---
>
> Key: HUDI-2611
> URL: https://issues.apache.org/jira/browse/HUDI-2611
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Raymond Xu
>Assignee: Yann Byron
>Priority: Critical
>  Labels: user-support-issues
> Fix For: 0.11.0, 0.10.1
>
>
> See details in
> https://github.com/apache/hudi/issues/3845#issue-1033218877



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2915) Fix field not found in record error for spark-sql

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2915:
-
Reviewers: Raymond Xu

> Fix field not found in record error for spark-sql
> -
>
> Key: HUDI-2915
> URL: https://issues.apache.org/jira/browse/HUDI-2915
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Raymond Xu
>Assignee: Forward Xu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
> Attachments: image-2021-12-02-19-37-10-346.png
>
>
> !image-2021-12-02-19-37-10-346.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1850) Read on table fails if the first write to table failed

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1850:
-
Reviewers: Raymond Xu

> Read on table fails if the first write to table failed
> --
>
> Key: HUDI-1850
> URL: https://issues.apache.org/jira/browse/HUDI-1850
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.8.0
>Reporter: Vaibhav Sinha
>Assignee: Yann Byron
>Priority: Major
>  Labels: core-flow-ds, pull-request-available, release-blocker, 
> sev:high, spark
> Fix For: 0.11.0, 0.10.1
>
> Attachments: Screenshot 2021-04-24 at 7.53.22 PM.png
>
>
> {code:java}
> ava.util.NoSuchElementException: No value present in Option
>   at org.apache.hudi.common.util.Option.get(Option.java:88) 
> ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromCommitMetadata(TableSchemaResolver.java:215)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:166)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:155)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.MergeOnReadSnapshotRelation.(MergeOnReadSnapshotRelation.scala:65)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:99) 
> ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:63) 
> ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
>  ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326) 
> ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)
>  ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at scala.Option.getOrElse(Option.scala:189) 
> ~[scala-library-2.12.10.jar:?]
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308) 
> ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240) 
> ~[spark-sql_2.12-3.1.1.jar:3.1.1]
> {code}
> The screenshot shows the files that got created before the write had failed.
>  
> !Screenshot 2021-04-24 at 7.53.22 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2661) java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable.copy

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2661:
-
Reviewers: Raymond Xu

> java.lang.NoSuchMethodError: 
> org.apache.spark.sql.catalyst.catalog.CatalogTable.copy
> 
>
> Key: HUDI-2661
> URL: https://issues.apache.org/jira/browse/HUDI-2661
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.10.0
>Reporter: Changjun Zhang
>Assignee: Yann Byron
>Priority: Critical
> Fix For: 0.11.0, 0.10.1
>
> Attachments: image-2021-11-01-21-47-44-538.png, 
> image-2021-11-01-21-48-22-765.png
>
>
> Hudi Integrate with Spark SQL  :
> when I add :
> {code:sh}
> // Some comments here
> spark-sql --conf 
> 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
> --conf 
> 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
> {code}
> to create a table on an existing hudi table: 
> {code:sql}
> create table testdb.tb_hudi_operation_test using hudi 
> location '/tmp/flinkdb/datas/tb_hudi_operation';
> {code}
> then throw Exception :
>  !image-2021-11-01-21-47-44-538.png|thumbnail! 
>  !image-2021-11-01-21-48-22-765.png|thumbnail! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HUDI-3125) Spark SQL writing timestamp type don't need to disable `spark.sql.datetime.java8API.enabled` manually

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-3125:


Assignee: Yann Byron

> Spark SQL writing timestamp type don't need to disable 
> `spark.sql.datetime.java8API.enabled` manually
> -
>
> Key: HUDI-3125
> URL: https://issues.apache.org/jira/browse/HUDI-3125
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> {code:java}
> create table h0_p(id int, name string, price double, dt timestamp) using hudi 
> partitioned by(dt) options(type = 'cow', primaryKey = 'id');
> insert into h0_p values (3, 'a1', 10, cast('2021-05-08 00:00:00' as 
> timestamp)); {code}
> By default, that run the sql above will throw exception:
> {code:java}
> Caused by: java.lang.ClassCastException: java.time.Instant cannot be cast to 
> java.sql.Timestamp
>     at 
> org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$8(AvroConversionHelper.scala:306)
>     at 
> org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$8$adapted(AvroConversionHelper.scala:306)
>     at scala.Option.map(Option.scala:230) {code}
> We need disable `spark.sql.datetime.java8API.enabled` manually to make it 
> work:
> {code:java}
> set spark.sql.datetime.java8API.enabled=false; {code}
> And the command must be executed in the runtime. It can't work if provide 
> this by spark-sql command: `spark-sql --conf 
> spark.sql.datetime.java8API.enabled=false`. That's because this config is 
> forced to enable when launch spark-sql.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3125) Spark SQL writing timestamp type don't need to disable `spark.sql.datetime.java8API.enabled` manually

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3125:
-
Reviewers: Raymond Xu

> Spark SQL writing timestamp type don't need to disable 
> `spark.sql.datetime.java8API.enabled` manually
> -
>
> Key: HUDI-3125
> URL: https://issues.apache.org/jira/browse/HUDI-3125
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> {code:java}
> create table h0_p(id int, name string, price double, dt timestamp) using hudi 
> partitioned by(dt) options(type = 'cow', primaryKey = 'id');
> insert into h0_p values (3, 'a1', 10, cast('2021-05-08 00:00:00' as 
> timestamp)); {code}
> By default, that run the sql above will throw exception:
> {code:java}
> Caused by: java.lang.ClassCastException: java.time.Instant cannot be cast to 
> java.sql.Timestamp
>     at 
> org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$8(AvroConversionHelper.scala:306)
>     at 
> org.apache.hudi.AvroConversionHelper$.$anonfun$createConverterToAvro$8$adapted(AvroConversionHelper.scala:306)
>     at scala.Option.map(Option.scala:230) {code}
> We need disable `spark.sql.datetime.java8API.enabled` manually to make it 
> work:
> {code:java}
> set spark.sql.datetime.java8API.enabled=false; {code}
> And the command must be executed in the runtime. It can't work if provide 
> this by spark-sql command: `spark-sql --conf 
> spark.sql.datetime.java8API.enabled=false`. That's because this config is 
> forced to enable when launch spark-sql.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3104) Hudi-kafka-connect can not scan hadoop config files by HADOOP_CONF_DIR

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3104:
-
Fix Version/s: 0.11.0

> Hudi-kafka-connect can not scan hadoop config files by HADOOP_CONF_DIR
> --
>
> Key: HUDI-3104
> URL: https://issues.apache.org/jira/browse/HUDI-3104
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: configs
>Reporter: cdmikechen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> I used hudi-kafka-connect to test pull kafka topic datas to hudi. I've build 
> a kafka connect docker by this dockerfile:
> {code}
> FROM confluentinc/cp-kafka-connect:6.1.1
> RUN confluent-hub install --no-prompt confluentinc/kafka-connect-hdfs:10.1.3
> COPY hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar 
> /usr/share/confluent-hub-components/confluentinc-kafka-connect-hdfs/lib
> {code}
> When I started this docker container and submit a task, hudi report this 
> error:
> {code}
> [2021-12-27 15:04:55,214] INFO Setting record key volume and partition fields 
> date for table 
> hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi-test-topichudi-test-topic
>  (org.apache.hudi.connect.writers.KafkaConnectTransactionServices)
> [2021-12-27 15:04:55,224] INFO Initializing 
> hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi-test-topic as hoodie 
> table hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi-test-topic 
> (org.apache.hudi.common.table.HoodieTableMetaClient)
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by 
> org.apache.hadoop.security.authentication.util.KerberosUtil 
> (file:/usr/share/confluent-hub-components/confluentinc-kafka-connect-hdfs/lib/hadoop-auth-2.10.1.jar)
>  to method sun.security.krb5.Config.getInstance()
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.hadoop.security.authentication.util.KerberosUtil
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> [2021-12-27 15:04:55,571] WARN Unable to load native-hadoop library for your 
> platform... using builtin-java classes where applicable 
> (org.apache.hadoop.util.NativeCodeLoader)
> [2021-12-27 15:04:56,154] ERROR Fatal error initializing task null for 
> partition 0 (org.apache.hudi.connect.HoodieSinkTask)
> org.apache.hudi.exception.HoodieException: Fatal error instantiating Hudi 
> Transaction Services 
>   at 
> org.apache.hudi.connect.writers.KafkaConnectTransactionServices.(KafkaConnectTransactionServices.java:113)
>  ~[hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
>   at 
> org.apache.hudi.connect.transaction.ConnectTransactionCoordinator.(ConnectTransactionCoordinator.java:88)
>  ~[hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
>   at 
> org.apache.hudi.connect.HoodieSinkTask.bootstrap(HoodieSinkTask.java:191) 
> [hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
>   at org.apache.hudi.connect.HoodieSinkTask.open(HoodieSinkTask.java:151) 
> [hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.openPartitions(WorkerSinkTask.java:640)
>  [connect-runtime-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.access$1100(WorkerSinkTask.java:71)
>  [connect-runtime-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance.onPartitionsAssigned(WorkerSinkTask.java:705)
>  [connect-runtime-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsAssigned(ConsumerCoordinator.java:293)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:430)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:449)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:365)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:508)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1257)
>  [kafka-clients-6.1.1-ccs.jar:?]
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1226) 
> [kafka-clients-6.1.1-ccs.jar:?]
>   at 
>

[jira] [Updated] (HUDI-3112) KafkaConnect can not sync to Hive

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3112:
-
Fix Version/s: 0.11.0

> KafkaConnect can not sync to Hive
> -
>
> Key: HUDI-3112
> URL: https://issues.apache.org/jira/browse/HUDI-3112
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: cdmikechen
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> Right now KafkaConnect can not sync to Hive.
> KafkaConnect use *org.apache.hudi.DataSourceUtils* to build HiveSyncConfig 
> now, but in *DataSourceUtils* class import some spark dependencies. So that 
> KafkaConnect will fail because of the application of related classes.
> {code}
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.spark.sql.types.DataType
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   ... 66 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3131) Spark3.1.1 CTAS error

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3131:
-
Sprint: Hudi-Sprint-0.10.1

> Spark3.1.1 CTAS error
> -
>
> Key: HUDI-3131
> URL: https://issues.apache.org/jira/browse/HUDI-3131
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> Fail to run CTAS with Hudi0.10.0 and Spark3.1.1.
>  
> Sql:
> {code:java}
> create table h1_p using hudi partitioned by(dt) options(type = 'cow', 
> primaryKey = 'id') as select '2021-05-07' as dt, 1 as id, 'a1' as name, 10 as 
> price; {code}
> Error:
> {code:java}
> java.lang.NoSuchMethodError: 
> org.apache.spark.sql.catalyst.plans.logical.Command.producedAttributes$(Lorg/apache/spark/sql/catalyst/plans/logical/Command;)Lorg/apache/spark/sql/catalyst/expressions/AttributeSet;
>     at 
> org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.producedAttributes(CreateHoodieTableAsSelectCommand.scala:39)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3131) Spark3.1.1 CTAS error

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3131:
-
Reviewers: Raymond Xu

> Spark3.1.1 CTAS error
> -
>
> Key: HUDI-3131
> URL: https://issues.apache.org/jira/browse/HUDI-3131
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> Fail to run CTAS with Hudi0.10.0 and Spark3.1.1.
>  
> Sql:
> {code:java}
> create table h1_p using hudi partitioned by(dt) options(type = 'cow', 
> primaryKey = 'id') as select '2021-05-07' as dt, 1 as id, 'a1' as name, 10 as 
> price; {code}
> Error:
> {code:java}
> java.lang.NoSuchMethodError: 
> org.apache.spark.sql.catalyst.plans.logical.Command.producedAttributes$(Lorg/apache/spark/sql/catalyst/plans/logical/Command;)Lorg/apache/spark/sql/catalyst/expressions/AttributeSet;
>     at 
> org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.producedAttributes(CreateHoodieTableAsSelectCommand.scala:39)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HUDI-3131) Spark3.1.1 CTAS error

2021-12-29 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-3131:


Assignee: Yann Byron

> Spark3.1.1 CTAS error
> -
>
> Key: HUDI-3131
> URL: https://issues.apache.org/jira/browse/HUDI-3131
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> Fail to run CTAS with Hudi0.10.0 and Spark3.1.1.
>  
> Sql:
> {code:java}
> create table h1_p using hudi partitioned by(dt) options(type = 'cow', 
> primaryKey = 'id') as select '2021-05-07' as dt, 1 as id, 'a1' as name, 10 as 
> price; {code}
> Error:
> {code:java}
> java.lang.NoSuchMethodError: 
> org.apache.spark.sql.catalyst.plans.logical.Command.producedAttributes$(Lorg/apache/spark/sql/catalyst/plans/logical/Command;)Lorg/apache/spark/sql/catalyst/expressions/AttributeSet;
>     at 
> org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.producedAttributes(CreateHoodieTableAsSelectCommand.scala:39)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HUDI-1079) Cannot upsert on schema with Array of Record with single field

2021-12-29 Thread Raymond Xu (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466687#comment-17466687
 ] 

Raymond Xu commented on HUDI-1079:
--

This should be resolved by having parquet 1.12, which will be the main version 
built with Hudi in 0.11.0. Linked the issue above.
 

> Cannot upsert on schema with Array of Record with single field
> --
>
> Key: HUDI-1079
> URL: https://issues.apache.org/jira/browse/HUDI-1079
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.9.0
> Environment: spark 2.4.4, local 
>Reporter: Adrian Tanase
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: schema, sev:critical, user-support-issues
> Fix For: 0.11.0
>
>
> I am trying to trigger upserts on a table that has an array field with 
> records of just one field.
>  Here is the code to reproduce:
> {code:scala}
>   val spark = SparkSession.builder()
>   .master("local[1]")
>   .appName("SparkByExamples.com")
>   .config("spark.serializer", 
> "org.apache.spark.serializer.KryoSerializer")
>   .getOrCreate();
>   // https://sparkbyexamples.com/spark/spark-dataframe-array-of-struct/
>   val arrayStructData = Seq(
> Row("James",List(Row("Java","XX",120),Row("Scala","XA",300))),
> Row("Michael",List(Row("Java","XY",200),Row("Scala","XB",500))),
> Row("Robert",List(Row("Java","XZ",400),Row("Scala","XC",250))),
> Row("Washington",null)
>   )
>   val arrayStructSchema = new StructType()
>   .add("name",StringType)
>   .add("booksIntersted",ArrayType(
> new StructType()
>   .add("bookName",StringType)
> //  .add("author",StringType)
> //  .add("pages",IntegerType)
>   ))
> val df = 
> spark.createDataFrame(spark.sparkContext.parallelize(arrayStructData),arrayStructSchema)
> {code}
> Running insert following by upsert will fail:
> {code:scala}
>   df.write
>   .format("hudi")
>   .options(getQuickstartWriteConfigs)
>   .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "name")
>   .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "name")
>   .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, "COPY_ON_WRITE")
>   .option(HoodieWriteConfig.TABLE_NAME, tableName)
>   .mode(Overwrite)
>   .save(basePath)
>   df.write
>   .format("hudi")
>   .options(getQuickstartWriteConfigs)
>   .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "name")
>   .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "name")
>   .option(HoodieWriteConfig.TABLE_NAME, tableName)
>   .mode(Append)
>   .save(basePath)
> {code}
> If I create the books record with all the fields (at least 2), it works as 
> expected.
> The relevant part of the exception is this:
> {noformat}
> Caused by: java.lang.ClassCastException: required binary bookName (UTF8) is 
> not a groupCaused by: java.lang.ClassCastException: required binary bookName 
> (UTF8) is not a group at 
> org.apache.parquet.schema.Type.asGroupType(Type.java:207) at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:279)
>  at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:232)
>  at 
> org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:78)
>  at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:536)
>  at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:486)
>  at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:289)
>  at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:141)
>  at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95)
>  at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>  at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:183)
>  at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156) at 
> org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) at 
> org.apache.hudi.client.utils.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
>  at 
> org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ... 4

[jira] [Commented] (HUDI-2323) Upsert of Case Class with single field causes SchemaParseException

2021-12-29 Thread Raymond Xu (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466686#comment-17466686
 ] 

Raymond Xu commented on HUDI-2323:
--

This should be resolved by having parquet 1.12, which will be the main version 
built with Hudi in 0.11.0. Linked the issue above.

> Upsert of Case Class with single field causes SchemaParseException
> --
>
> Key: HUDI-2323
> URL: https://issues.apache.org/jira/browse/HUDI-2323
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.8.0
>Reporter: Tyler Jackson
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: schema, sev:critical
> Fix For: 0.11.0
>
> Attachments: HudiSchemaGenerationTest.scala
>
>
> Additional background information:
> Spark version 3.1.1
>  Scala version 2.12
>  Hudi version 0.8.0 (hudi-spark-bundle_2.12 artifact)
>  
> While testing a spark job in EMR of inserting and then upserting data for a 
> fairly complex nested case class structure, I ran into an issue that I was 
> having a hard time tracking down. It seems when part of the case class in the 
> dataframe to be written has a single field in it, the avro schema generation 
> fails with the following stacktrace, but only on the upsert:
> {{21/08/19 15:08:34 ERROR BoundedInMemoryExecutor: error producing records}}
>  {{org.apache.avro.SchemaParseException: Can't redefine: array}}
>  \{{ at org.apache.avro.Schema$Names.put(Schema.java:1128) }}
>  \{{ at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:562) }}
>  \{{ at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:690) }}
>  \{{ at org.apache.avro.Schema$ArraySchema.toJson(Schema.java:805) }}
>  \{{ at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882) }}
>  \{{ at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716) }}
>  \{{ at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701)}}
>  \{{ at org.apache.avro.Schema.toString(Schema.java:324)}}
>  \{{ at 
> org.apache.avro.SchemaCompatibility.checkReaderWriterCompatibility(SchemaCompatibility.java:68)}}
>  \{{ at 
> org.apache.parquet.avro.AvroRecordConverter.isElementType(AvroRecordConverter.java:866)}}
>  \{{ at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:475)}}
>  \{{ at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:289)}}
>  \{{ at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:141)}}
>  \{{ at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:279)}}
>  \{{ at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:141)}}
>  \{{ at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95)}}
>  \{{ at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)}}
>  \{{ at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)}}
>  \{{ at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:183)}}
>  \{{ at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)}}
>  \{{ at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)}}
>  \{{ at 
> org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)}}
>  \{{ at 
> org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)}}
>  \{{ at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)}}
>  \{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
>  \{{ at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}}
>  \{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266) }}
>  \{{ at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  }}
>  \{{ at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  }}
>  \{{ at java.lang.Thread.run(Thread.java:748) }}
>  
> I am able to replicate the problem in my local IntelliJ setup using the test 
> that has been attached to this issue. The problem can be observed in the 
> DummyStepParent case class. Simply adding an additional field to the case 
> class eliminates the problem altogether (which is an acceptable workaround 
> for our purposes, but shouldn't ultimately be necessary).
> {{case class DummyObject (}}
>  {{ fieldOne: String,}}
>  {{ listTwo: Seq[String],}}
>  {{ listThree: Seq[DummyChild],}}
>  {{ listFour: Seq[DummyStepChild],}}
>  {{ fieldFive: Boolean,}}
>  {{ listSix: Seq[DummyParent],}}
>  {{ listSeven: Seq[DummyCousin],}}
>  {{

[GitHub] [hudi] hudi-bot removed a comment on pull request #4476: [HUDI-3131] fix ctas error in spark3.1.1

2021-12-29 Thread GitBox



hudi-bot removed a comment on pull request #4476:
URL: https://github.com/apache/hudi/pull/4476#issuecomment-1002862870


   
   ## CI report:
   
   * 037786516e6622a5e60d1543bef8ad77ed39b490 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4811)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4476: [HUDI-3131] fix ctas error in spark3.1.1

2021-12-29 Thread GitBox



hudi-bot commented on pull request #4476:
URL: https://github.com/apache/hudi/pull/4476#issuecomment-1002873623


   
   ## CI report:
   
   * 037786516e6622a5e60d1543bef8ad77ed39b490 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4811)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-2735) Fix archival of commits in Java client for Kafka Connect

2021-12-29 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-2735:

Story Points: 2

> Fix archival of commits in Java client for Kafka Connect
> 
>
> Key: HUDI-2735
> URL: https://issues.apache.org/jira/browse/HUDI-2735
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] zuyanton commented on issue #4457: [SUPPORT] Hudi archive stopped working

2021-12-29 Thread GitBox



zuyanton commented on issue #4457:
URL: https://github.com/apache/hudi/issues/4457#issuecomment-1002870461


   @nsivabalan do you mean content of .hoodie folder ? - its bunch of old 
commit files plus following subfolders ".aux", ".temp", "metadata", "archived"
   
![image](https://user-images.githubusercontent.com/67354813/147721123-4227e385-76cb-4120-8510-a472fbb9643c.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yuzhaojing commented on a change in pull request #4463: [HUDI-3120] Cache compactionPlan in buffer

2021-12-29 Thread GitBox



yuzhaojing commented on a change in pull request #4463:
URL: https://github.com/apache/hudi/pull/4463#discussion_r776565278



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactionCommitSink.java
##
@@ -108,8 +124,15 @@ public void invoke(CompactionCommitEvent event, Context 
context) throws Exceptio
* @param events  Commit events ever received for the instant
*/
   private void commitIfNecessary(String instant, 
Collection events) throws IOException {
-HoodieCompactionPlan compactionPlan = CompactionUtils.getCompactionPlan(
-this.writeClient.getHoodieTable().getMetaClient(), instant);
+HoodieCompactionPlan compactionPlan;
+if (compactionPlanCache.containsKey(instant)) {
+  compactionPlan = compactionPlanCache.get(instant);
+} else {
+  compactionPlan = CompactionUtils.getCompactionPlan(
+  this.table.getMetaClient(), instant);
+  compactionPlanCache.put(instant, compactionPlan);

Review comment:
   When the number of filegroups contained in the compression plan is 
relatively large, it will be very expensive to createTable and read the 
compression plan for each event received.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a change in pull request #4016: [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean

2021-12-29 Thread GitBox



danny0405 commented on a change in pull request #4016:
URL: https://github.com/apache/hudi/pull/4016#discussion_r776562829



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java
##
@@ -190,12 +190,36 @@ public static void deleteInstantFile(FileSystem fs, 
String metaPath, HoodieInsta
 }
   }
 
+  public void deleteEmptyInstantIfExists(HoodieInstant instant) {
+ValidationUtils.checkArgument(isEmpty(instant));
+deleteInstantFileIfExists(instant);
+  }
+
   public void deleteCompactionRequested(HoodieInstant instant) {
 ValidationUtils.checkArgument(instant.isRequested());
 ValidationUtils.checkArgument(Objects.equals(instant.getAction(), 
HoodieTimeline.COMPACTION_ACTION));
 deleteInstantFile(instant);
   }
 
+  private void deleteInstantFileIfExists(HoodieInstant instant) {
+LOG.info("Deleting instant " + instant);
+Path inFlightCommitFilePath = new Path(metaClient.getMetaPath(), 
instant.getFileName());
+try {

Review comment:
   nit: the deleted instant may or may not in inflight state from the usage 
!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (HUDI-2658) When disable auto clean, do not check if MIN_COMMITS_TO_KEEP was larger CLEANER_COMMITS_RETAINED

2021-12-29 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan closed HUDI-2658.
-
Resolution: Invalid

> When disable auto clean, do not check if MIN_COMMITS_TO_KEEP was larger 
> CLEANER_COMMITS_RETAINED
> 
>
> Key: HUDI-2658
> URL: https://issues.apache.org/jira/browse/HUDI-2658
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available, sev:normal
>
> Exception mentioned blow will throw even though disable auto clean.
> {code:java}
> 21/10/18 05:54:20,149 ERROR Misc: Streaming batch fail, shutting down whole 
> application immediately.21/10/18 05:54:20,149 ERROR Misc: Streaming batch 
> fail, shutting down whole application 
> immediately.java.lang.IllegalArgumentException: Increase 
> hoodie.keep.min.commits=3 to be greater than 
> hoodie.cleaner.commits.retained=10. Otherwise, there is risk of incremental 
> pull missing data from few instants. at 
> org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
>  at 
> org.apache.hudi.config.HoodieCompactionConfig$Builder.build(HoodieCompactionConfig.java:355)
>  at 
> org.apache.hudi.config.HoodieWriteConfig$Builder.setDefaults(HoodieWriteConfig.java:1396)
>  at 
> org.apache.hudi.config.HoodieWriteConfig$Builder.build(HoodieWriteConfig.java:1436)
>  at 
> org.apache.hudi.DataSourceUtils.createHoodieConfig(DataSourceUtils.java:188) 
> at 
> org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:193) 
> at 
> org.apache.hudi.HoodieSparkSqlWriter$$anonfun$3.apply(HoodieSparkSqlWriter.scala:166)
>  at 
> org.apache.hudi.HoodieSparkSqlWriter$$anonfun$3.apply(HoodieSparkSqlWriter.scala:166)
>  at scala.Option.getOrElse(Option.scala:121) at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:166) 
> at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145) at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) 
> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at 
> tv.freewheel.reporting.ssql.sinkers.HudiSinker.sink(HudiSinker.scala:20) at 
> tv.freewheel.reporting.realtime.core.schedulers.RuleScheduler$$anonfun$execSink$1$$anonfun$apply$1.apply$mcV$sp(RuleScheduler.scala:73)
>  at tv.freewheel.reporting.realtime.utils.Misc$.failFast(Misc.scala:72) at 
> tv.freewheel.reporting.realtime.core.schedulers.RuleScheduler$$anonfun$execSink$1.apply(RuleScheduler.scala:73)
>  at 
> tv.freewheel.reporting.realtime.core.schedulers.RuleScheduler$$anonfun$execSink$1.apply(RuleScheduler.scala:71)
>  at scala.Option.foreach(Option.scala:257) at 
> tv.freewheel.reporting.realtime.core.schedulers.RuleScheduler.execSink(RuleScheduler.scala:71)
>  at 
> tv.freewheel.reporting.realtime.core.schedulers.RuleScheduler$$anonfun$submitRecursively$3$$anonfun$1.apply$mcV$sp(RuleScheduler.scala:35)
>  at

[jira] [Commented] (HUDI-2658) When disable auto clean, do not check if MIN_COMMITS_TO_KEEP was larger CLEANER_COMMITS_RETAINED

2021-12-29 Thread sivabalan narayanan (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466674#comment-17466674
 ] 

sivabalan narayanan commented on HUDI-2658:
---

Closing as invalid. 

comment from the PR

I think putting this conditional validity could compromise the integrity of 
min-instant as user can toggle auto clean any time. What if on the same table 
there is a writer and a compactor with different auto clean settings: the 
writer could disable auto clean and trigger archival and have less number of 
commits, then compactor runs and see actual instants less than min-instants? I 
found having consistency over the logic here is important.

 

> When disable auto clean, do not check if MIN_COMMITS_TO_KEEP was larger 
> CLEANER_COMMITS_RETAINED
> 
>
> Key: HUDI-2658
> URL: https://issues.apache.org/jira/browse/HUDI-2658
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available, sev:normal
>
> Exception mentioned blow will throw even though disable auto clean.
> {code:java}
> 21/10/18 05:54:20,149 ERROR Misc: Streaming batch fail, shutting down whole 
> application immediately.21/10/18 05:54:20,149 ERROR Misc: Streaming batch 
> fail, shutting down whole application 
> immediately.java.lang.IllegalArgumentException: Increase 
> hoodie.keep.min.commits=3 to be greater than 
> hoodie.cleaner.commits.retained=10. Otherwise, there is risk of incremental 
> pull missing data from few instants. at 
> org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
>  at 
> org.apache.hudi.config.HoodieCompactionConfig$Builder.build(HoodieCompactionConfig.java:355)
>  at 
> org.apache.hudi.config.HoodieWriteConfig$Builder.setDefaults(HoodieWriteConfig.java:1396)
>  at 
> org.apache.hudi.config.HoodieWriteConfig$Builder.build(HoodieWriteConfig.java:1436)
>  at 
> org.apache.hudi.DataSourceUtils.createHoodieConfig(DataSourceUtils.java:188) 
> at 
> org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:193) 
> at 
> org.apache.hudi.HoodieSparkSqlWriter$$anonfun$3.apply(HoodieSparkSqlWriter.scala:166)
>  at 
> org.apache.hudi.HoodieSparkSqlWriter$$anonfun$3.apply(HoodieSparkSqlWriter.scala:166)
>  at scala.Option.getOrElse(Option.scala:121) at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:166) 
> at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145) at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) 
> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at 
> tv.freewheel.reporting.ssql.sinkers.HudiSinker.sink(HudiSinker.scala:20) at 
> tv.freewheel.reporting.realtime.core.schedulers.RuleScheduler$$anonfun$execSink$1$$anonfun$apply$1.apply$mcV$sp(RuleScheduler.scala:73)
>  at tv.freewheel.reporting.realtime.utils.Misc$.failFast(Misc.scala:72) at 
>

[GitHub] [hudi] danny0405 commented on a change in pull request #4016: [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean

2021-12-29 Thread GitBox



danny0405 commented on a change in pull request #4016:
URL: https://github.com/apache/hudi/pull/4016#discussion_r776561623



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java
##
@@ -321,10 +321,19 @@ public void archive(HoodieEngineContext context, 
List instants) t
   List records = new ArrayList<>();
   for (HoodieInstant hoodieInstant : instants) {
 try {
-  deleteAnyLeftOverMarkers(context, hoodieInstant);
-  records.add(convertToAvroRecord(hoodieInstant));
-  if (records.size() >= this.config.getCommitArchivalBatchSize()) {
-writeToFile(wrapperSchema, records);
+  if (table.getActiveTimeline().isEmpty(hoodieInstant)
+  && (
+  
hoodieInstant.getAction().equals(HoodieTimeline.CLEAN_ACTION)
+  || 
(hoodieInstant.getAction().equals(HoodieTimeline.ROLLBACK_ACTION) && 
hoodieInstant.isCompleted())
+ )

Review comment:
   We better add a uniform util do decide what kind of instant file(action 
type and state) should be non-empty.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4476: [HUDI-3131] fix ctas error in spark3.1.1

2021-12-29 Thread GitBox



hudi-bot removed a comment on pull request #4476:
URL: https://github.com/apache/hudi/pull/4476#issuecomment-1002862236


   
   ## CI report:
   
   * 037786516e6622a5e60d1543bef8ad77ed39b490 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4476: [HUDI-3131] fix ctas error in spark3.1.1

2021-12-29 Thread GitBox



hudi-bot commented on pull request #4476:
URL: https://github.com/apache/hudi/pull/4476#issuecomment-1002862870


   
   ## CI report:
   
   * 037786516e6622a5e60d1543bef8ad77ed39b490 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4811)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3879: [SUPPORT] Incomplete Table Migration

2021-12-29 Thread GitBox



nsivabalan commented on issue #3879:
URL: https://github.com/apache/hudi/issues/3879#issuecomment-1002862574


   sure. let us know once you have the dataset available to share. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4476: [HUDI-3131] fix ctas error in spark3.1.1

2021-12-29 Thread GitBox



hudi-bot commented on pull request #4476:
URL: https://github.com/apache/hudi/pull/4476#issuecomment-1002862236


   
   ## CI report:
   
   * 037786516e6622a5e60d1543bef8ad77ed39b490 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 commented on a change in pull request #4463: [HUDI-3120] Cache compactionPlan in buffer

2021-12-29 Thread GitBox



danny0405 commented on a change in pull request #4463:
URL: https://github.com/apache/hudi/pull/4463#discussion_r776560447



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactionCommitSink.java
##
@@ -108,8 +124,15 @@ public void invoke(CompactionCommitEvent event, Context 
context) throws Exceptio
* @param events  Commit events ever received for the instant
*/
   private void commitIfNecessary(String instant, 
Collection events) throws IOException {
-HoodieCompactionPlan compactionPlan = CompactionUtils.getCompactionPlan(
-this.writeClient.getHoodieTable().getMetaClient(), instant);
+HoodieCompactionPlan compactionPlan;
+if (compactionPlanCache.containsKey(instant)) {
+  compactionPlan = compactionPlanCache.get(instant);
+} else {
+  compactionPlan = CompactionUtils.getCompactionPlan(
+  this.table.getMetaClient(), instant);
+  compactionPlanCache.put(instant, compactionPlan);

Review comment:
   What is the benefit to cache the compaction plan ? Personally i prefer 
less cached items, the less states the better.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] YannByron commented on issue #4429: [SUPPORT] Spark SQL CTAS command doesn't work with 0.10.0 version and Spark 3.1.1

2021-12-29 Thread GitBox



YannByron commented on issue #4429:
URL: https://github.com/apache/hudi/issues/4429#issuecomment-1002861916


   @vingov @xushiyan 
   I can reproduce this in spark3.1.1. It's a real bug, and i've opened a 
ticket and committed a pr for this, #4476 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-3131) Spark3.1.1 CTAS error

2021-12-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3131:
-
Labels: pull-request-available  (was: )

> Spark3.1.1 CTAS error
> -
>
> Key: HUDI-3131
> URL: https://issues.apache.org/jira/browse/HUDI-3131
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>
> Fail to run CTAS with Hudi0.10.0 and Spark3.1.1.
>  
> Sql:
> {code:java}
> create table h1_p using hudi partitioned by(dt) options(type = 'cow', 
> primaryKey = 'id') as select '2021-05-07' as dt, 1 as id, 'a1' as name, 10 as 
> price; {code}
> Error:
> {code:java}
> java.lang.NoSuchMethodError: 
> org.apache.spark.sql.catalyst.plans.logical.Command.producedAttributes$(Lorg/apache/spark/sql/catalyst/plans/logical/Command;)Lorg/apache/spark/sql/catalyst/expressions/AttributeSet;
>     at 
> org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.producedAttributes(CreateHoodieTableAsSelectCommand.scala:39)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HUDI-3124) Bootstrap when timeline have completed instant

2021-12-29 Thread Danny Chen (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466672#comment-17466672
 ] 

Danny Chen commented on HUDI-3124:
--

Fixed via master branch: 0f0088fe4b740c4acec0cb25988250db8fb483b6

> Bootstrap when timeline have completed instant
> --
>
> Key: HUDI-3124
> URL: https://issues.apache.org/jira/browse/HUDI-3124
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] nsivabalan closed issue #4432: [SUPPORT] The parquet file size exceeds the configured value

2021-12-29 Thread GitBox



nsivabalan closed issue #4432:
URL: https://github.com/apache/hudi/issues/4432


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #4432: [SUPPORT] The parquet file size exceeds the configured value

2021-12-29 Thread GitBox



nsivabalan commented on issue #4432:
URL: https://github.com/apache/hudi/issues/4432#issuecomment-1002861656


   please re-open if the proposed solution does not work.thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] YannByron opened a new pull request #4476: [HUDI-3131] fix ctas error in spark3.1.1

2021-12-29 Thread GitBox



YannByron opened a new pull request #4476:
URL: https://github.com/apache/hudi/pull/4476


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-3124) Bootstrap when timeline have completed instant

2021-12-29 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-3124:
-
Fix Version/s: 0.11.0

> Bootstrap when timeline have completed instant
> --
>
> Key: HUDI-3124
> URL: https://issues.apache.org/jira/browse/HUDI-3124
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HUDI-3131) Spark3.1.1 CTAS error

2021-12-29 Thread Yann Byron (Jira)

Yann Byron created HUDI-3131:


 Summary: Spark3.1.1 CTAS error
 Key: HUDI-3131
 URL: https://issues.apache.org/jira/browse/HUDI-3131
 Project: Apache Hudi
  Issue Type: Bug
  Components: Spark Integration
Reporter: Yann Byron
 Fix For: 0.11.0, 0.10.1


Fail to run CTAS with Hudi0.10.0 and Spark3.1.1.

 

Sql:
{code:java}
create table h1_p using hudi partitioned by(dt) options(type = 'cow', 
primaryKey = 'id') as select '2021-05-07' as dt, 1 as id, 'a1' as name, 10 as 
price; {code}
Error:
{code:java}
java.lang.NoSuchMethodError: 
org.apache.spark.sql.catalyst.plans.logical.Command.producedAttributes$(Lorg/apache/spark/sql/catalyst/plans/logical/Command;)Lorg/apache/spark/sql/catalyst/expressions/AttributeSet;
    at 
org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.producedAttributes(CreateHoodieTableAsSelectCommand.scala:39)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HUDI-3124) Bootstrap when timeline have completed instant

2021-12-29 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-3124.
--

> Bootstrap when timeline have completed instant
> --
>
> Key: HUDI-3124
> URL: https://issues.apache.org/jira/browse/HUDI-3124
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[hudi] branch master updated (436becf -> 0f0088f)

2021-12-29 Thread danny0405

This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 436becf  [HUDI-2675] Fix the exception 'Not an Avro data file' when 
archive and clean (#4016)
 add 0f0088f  [HUDI-3124] Bootstrap when timeline have completed instant 
(#4467)

No new revisions were added by this update.

Summary of changes:
 .../main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

[GitHub] [hudi] danny0405 merged pull request #4467: [HUDI-3124] Bootstrap when timeline have completed instant

2021-12-29 Thread GitBox



danny0405 merged pull request #4467:
URL: https://github.com/apache/hudi/pull/4467


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #4466: [SUPPORT]ERROR table.HoodieTimelineArchiveLog: Failed to archive commits,Not an Avro data file

2021-12-29 Thread GitBox



nsivabalan commented on issue #4466:
URL: https://github.com/apache/hudi/issues/4466#issuecomment-1002861348


   We have a fix [here](https://github.com/apache/hudi/pull/4016). Can you try 
out the patch. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated (674c149 -> 436becf)

2021-12-29 Thread sivabalan

This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 674c149  [HUDI-3083] Support component data types for flink 
bulk_insert (#4470)
 add 436becf  [HUDI-2675] Fix the exception 'Not an Avro data file' when 
archive and clean (#4016)

No new revisions were added by this update.

Summary of changes:
 .../hudi/table/HoodieTimelineArchiveLog.java   | 37 ++-
 .../table/action/clean/CleanActionExecutor.java| 14 +++--
 .../hudi/table/action/clean/CleanPlanner.java  | 14 +++--
 .../hudi/io/TestHoodieTimelineArchiveLog.java  | 45 --
 .../java/org/apache/hudi/table/TestCleaner.java| 71 +++---
 .../hudi/testutils/HoodieClientTestHarness.java| 34 ++-
 .../table/timeline/HoodieActiveTimeline.java   | 24 
 .../table/timeline/HoodieDefaultTimeline.java  |  7 ++-
 .../hudi/common/table/timeline/HoodieTimeline.java |  2 +
 .../hudi/common/testutils/FileCreateUtils.java | 17 +-
 .../hudi/common/testutils/HoodieTestTable.java | 18 --
 11 files changed, 207 insertions(+), 76 deletions(-)

[GitHub] [hudi] nsivabalan merged pull request #4016: [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean

2021-12-29 Thread GitBox



nsivabalan merged pull request #4016:
URL: https://github.com/apache/hudi/pull/4016


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4473: [HUDI-2590] Adding tests to validate different key generators

2021-12-29 Thread GitBox



hudi-bot commented on pull request #4473:
URL: https://github.com/apache/hudi/pull/4473#issuecomment-1002860504


   
   ## CI report:
   
   * 35841cdbffb0edd8d7e1f114147b12ee3daf0872 UNKNOWN
   * b91d9c4a42a05e01ee5a75449e861d9bf88b69c7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4807)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4473: [HUDI-2590] Adding tests to validate different key generators

2021-12-29 Thread GitBox



hudi-bot removed a comment on pull request #4473:
URL: https://github.com/apache/hudi/pull/4473#issuecomment-1002849419


   
   ## CI report:
   
   * 35841cdbffb0edd8d7e1f114147b12ee3daf0872 UNKNOWN
   * b91d9c4a42a05e01ee5a75449e861d9bf88b69c7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4807)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] boneanxs commented on issue #4474: [SUPPORT] Should we shade all aws dependencies to avoid class conflicts?

2021-12-29 Thread GitBox



boneanxs commented on issue #4474:
URL: https://github.com/apache/hudi/issues/4474#issuecomment-1002859639


   For our internal hudi version, we shade aws dependencies, you can add new 
relocation and build a new bundle package:
   
   For example, to shade aws dependencies in spark, add following codes in 
**packaging/hudi-spark-bundle/pom.xml**
   
   ```xml
   
   
com.amazonaws.

${spark.bundle.spark.shade.prefix}com.amazonaws.
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] BruceKellan closed issue #4247: [SUPPORT] Unsupport operation exception occur when using flink+hudi in bulk_insert mode

2021-12-29 Thread GitBox



BruceKellan closed issue #4247:
URL: https://github.com/apache/hudi/issues/4247


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] BruceKellan commented on issue #4247: [SUPPORT] Unsupport operation exception occur when using flink+hudi in bulk_insert mode

2021-12-29 Thread GitBox



BruceKellan commented on issue #4247:
URL: https://github.com/apache/hudi/issues/4247#issuecomment-1002858083


   Thanks. I have seen the PR.
   [Link](https://github.com/apache/hudi/pull/4470)
   I will close it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 3 >

1 - 100 of 290 matches

Mail list logo