[jira] [Commented] (HUDI-1741) Row Level TTL Support for records stored in Hudi
[ https://issues.apache.org/jira/browse/HUDI-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625407#comment-17625407 ] leesf commented on HUDI-1741: - [~nicholasjiang] agree with the solution > Row Level TTL Support for records stored in Hudi > > > Key: HUDI-1741 > URL: https://issues.apache.org/jira/browse/HUDI-1741 > Project: Apache Hudi > Issue Type: New Feature > Components: Utilities >Reporter: Balaji Varadarajan >Priority: Major > > For e:g : Have records only updated last month > > GH: https://github.com/apache/hudi/issues/2743 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4546) Optimize catalog cast logic in HoodieSpark3Analysis
leesf created HUDI-4546: --- Summary: Optimize catalog cast logic in HoodieSpark3Analysis Key: HUDI-4546 URL: https://issues.apache.org/jira/browse/HUDI-4546 Project: Apache Hudi Issue Type: Improvement Reporter: leesf Assignee: leesf In HoodieSpark3Analysis, if it is CreateV2Table, there is no need to cast the HoodieCatalog since CreateV2Table contains TableCatalog and we would use it directly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-4433) Hudi-CLI repair deduplicate not working with non-partitioned dataset
[ https://issues.apache.org/jira/browse/HUDI-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-4433: --- Assignee: brightwon > Hudi-CLI repair deduplicate not working with non-partitioned dataset > > > Key: HUDI-4433 > URL: https://issues.apache.org/jira/browse/HUDI-4433 > Project: Apache Hudi > Issue Type: Improvement > Components: cli >Reporter: brightwon >Assignee: brightwon >Priority: Minor > > hudi-cli's *repair deduplicate* command is not working with non-partitioned > dataset. > because can't pass *empty value* for *--duplicatedPartitionPath* parameter. > for example, this command > repair deduplicate --duplicatedPartitionPath "" --repairedOutputPath > "s3://myBucket/table/" --sparkMaster yarn --sparkMemory 4G --dryrun true > --dedupeType "upsert_type" > result is, +_You should specify value for option 'duplicatedPartitionPath' > for this command_+ > > My slack message link in #general channel > [https://apache-hudi.slack.com/archives/C4D716NPQ/p1657854371469139|http://example.com/] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4315) Do not throw exception in BaseSpark3Adapter#isHoodieTable
[ https://issues.apache.org/jira/browse/HUDI-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-4315: Summary: Do not throw exception in BaseSpark3Adapter#isHoodieTable (was: Do not throw exception in BaseSpark3Adapter#toTableIdentifier ) > Do not throw exception in BaseSpark3Adapter#isHoodieTable > - > > Key: HUDI-4315 > URL: https://issues.apache.org/jira/browse/HUDI-4315 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Major > > When using other CatalogPlugin with name leesf along with HoodieCatalog, the > sql for the following > `insert into leesf.db.table select id, name from hudi_db.hudi_table`, the > BaseSpark3Adapter#toTableIdentifier method will throw the following exception > > ``` > org.apache.spark.sql.AnalysisException: leesf.db.table is not a valid > TableIdentifier as it has more than 2 name parts. > at > org.apache.spark.sql.errors.QueryCompilationErrors$.identifierHavingMoreThanTwoNamePartsError(QueryCompilationErrors.scala:1394) > ``` -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HUDI-4315) Do not throw exception in BaseSpark3Adapter#toTableIdentifier
[ https://issues.apache.org/jira/browse/HUDI-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-4315: Summary: Do not throw exception in BaseSpark3Adapter#toTableIdentifier (was: Do not throw exception when using BaseSpark3Adapter#toTableIdentifier ) > Do not throw exception in BaseSpark3Adapter#toTableIdentifier > -- > > Key: HUDI-4315 > URL: https://issues.apache.org/jira/browse/HUDI-4315 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Major > > When using other CatalogPlugin with name leesf along with HoodieCatalog, the > sql for the following > `insert into leesf.db.table select id, name from hudi_db.hudi_table`, the > BaseSpark3Adapter#toTableIdentifier method will throw the following exception > > ``` > org.apache.spark.sql.AnalysisException: leesf.db.table is not a valid > TableIdentifier as it has more than 2 name parts. > at > org.apache.spark.sql.errors.QueryCompilationErrors$.identifierHavingMoreThanTwoNamePartsError(QueryCompilationErrors.scala:1394) > ``` -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HUDI-4315) Do not throw exception when using BaseSpark3Adapter#toTableIdentifier
[ https://issues.apache.org/jira/browse/HUDI-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-4315: Summary: Do not throw exception when using BaseSpark3Adapter#toTableIdentifier (was: Do not throw exception when using toTableIdentifier ) > Do not throw exception when using BaseSpark3Adapter#toTableIdentifier > -- > > Key: HUDI-4315 > URL: https://issues.apache.org/jira/browse/HUDI-4315 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Major > > When using other CatalogPlugin with name leesf along with HoodieCatalog, the > sql for the following > `insert into leesf.db.table select id, name from hudi_db.hudi_table`, the > BaseSpark3Adapter#toTableIdentifier method will throw the following exception > > ``` > org.apache.spark.sql.AnalysisException: leesf.db.table is not a valid > TableIdentifier as it has more than 2 name parts. > at > org.apache.spark.sql.errors.QueryCompilationErrors$.identifierHavingMoreThanTwoNamePartsError(QueryCompilationErrors.scala:1394) > ``` -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HUDI-4315) Do not throw exception when using toTableIdentifier
leesf created HUDI-4315: --- Summary: Do not throw exception when using toTableIdentifier Key: HUDI-4315 URL: https://issues.apache.org/jira/browse/HUDI-4315 Project: Apache Hudi Issue Type: Improvement Reporter: leesf Assignee: leesf When using other CatalogPlugin with name leesf along with HoodieCatalog, the sql for the following `insert into leesf.db.table select id, name from hudi_db.hudi_table`, the BaseSpark3Adapter#toTableIdentifier method will throw the following exception ``` org.apache.spark.sql.AnalysisException: leesf.db.table is not a valid TableIdentifier as it has more than 2 name parts. at org.apache.spark.sql.errors.QueryCompilationErrors$.identifierHavingMoreThanTwoNamePartsError(QueryCompilationErrors.scala:1394) ``` -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HUDI-4183) Fix using HoodieCatalog to create non-hudi tables
[ https://issues.apache.org/jira/browse/HUDI-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-4183: Description: For now, when users specify `HoodieCatalog` in 0.11.0, they would not create non-hudi tables since HoodieCatalog#createTable do not handle the logic of non-hudi tables, in fact the logic is missed in #createTable method, and we should fix it. > Fix using HoodieCatalog to create non-hudi tables > - > > Key: HUDI-4183 > URL: https://issues.apache.org/jira/browse/HUDI-4183 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > For now, when users specify `HoodieCatalog` in 0.11.0, they would not create > non-hudi tables since HoodieCatalog#createTable do not handle the logic of > non-hudi tables, in fact the logic is missed in #createTable method, and we > should fix it. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (HUDI-4178) HoodieSpark3Analysis does not pass schema from Spark Catalog
[ https://issues.apache.org/jira/browse/HUDI-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17545899#comment-17545899 ] leesf edited comment on HUDI-4178 at 6/3/22 2:46 PM: - [~alexey.kudinkin] `making them fetch the schema from storage (either from commit's metadata or data file) every time.` means fetch only once schema for a write operation or fetch many times for a write operation? and how much performance it affects while fetching from storage? was (Author: xleesf): [~alexey.kudinkin] `making them fetch the schema from storage (either from commit's metadata or data file) every time.` means fetch only once schema for a write operation or fetch many times for a write operation? > HoodieSpark3Analysis does not pass schema from Spark Catalog > > > Key: HUDI-4178 > URL: https://issues.apache.org/jira/browse/HUDI-4178 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.11.0 >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.1 > > > Currently, HoodieSpark3Analysis rule does not pass table's schema from the > Spark Catalog to Hudi's relations making them fetch the schema from storage > (either from commit's metadata or data file) every time. > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (HUDI-4178) HoodieSpark3Analysis does not pass schema from Spark Catalog
[ https://issues.apache.org/jira/browse/HUDI-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17545899#comment-17545899 ] leesf edited comment on HUDI-4178 at 6/3/22 2:46 PM: - [~alexey.kudinkin] hi, `making them fetch the schema from storage (either from commit's metadata or data file) every time.` means fetch only once schema for a write operation or fetch many times for a write operation? and how much performance it affects while fetching from storage? was (Author: xleesf): [~alexey.kudinkin] `making them fetch the schema from storage (either from commit's metadata or data file) every time.` means fetch only once schema for a write operation or fetch many times for a write operation? and how much performance it affects while fetching from storage? > HoodieSpark3Analysis does not pass schema from Spark Catalog > > > Key: HUDI-4178 > URL: https://issues.apache.org/jira/browse/HUDI-4178 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.11.0 >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.1 > > > Currently, HoodieSpark3Analysis rule does not pass table's schema from the > Spark Catalog to Hudi's relations making them fetch the schema from storage > (either from commit's metadata or data file) every time. > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HUDI-4178) HoodieSpark3Analysis does not pass schema from Spark Catalog
[ https://issues.apache.org/jira/browse/HUDI-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17545899#comment-17545899 ] leesf commented on HUDI-4178: - [~alexey.kudinkin] `making them fetch the schema from storage (either from commit's metadata or data file) every time.` means fetch only once schema for a write operation or fetch many times for a write operation? > HoodieSpark3Analysis does not pass schema from Spark Catalog > > > Key: HUDI-4178 > URL: https://issues.apache.org/jira/browse/HUDI-4178 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.11.0 >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.1 > > > Currently, HoodieSpark3Analysis rule does not pass table's schema from the > Spark Catalog to Hudi's relations making them fetch the schema from storage > (either from commit's metadata or data file) every time. > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Closed] (HUDI-4183) Fix using HoodieCatalog to create non-hudi tables
[ https://issues.apache.org/jira/browse/HUDI-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-4183. --- Resolution: Fixed > Fix using HoodieCatalog to create non-hudi tables > - > > Key: HUDI-4183 > URL: https://issues.apache.org/jira/browse/HUDI-4183 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HUDI-4183) Fix using HoodieCatalog to create non-hudi tables
[ https://issues.apache.org/jira/browse/HUDI-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-4183: Fix Version/s: 0.12.0 > Fix using HoodieCatalog to create non-hudi tables > - > > Key: HUDI-4183 > URL: https://issues.apache.org/jira/browse/HUDI-4183 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HUDI-4183) Fix using HoodieCatalog to create non-hudi tables
[ https://issues.apache.org/jira/browse/HUDI-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-4183. - > Fix using HoodieCatalog to create non-hudi tables > - > > Key: HUDI-4183 > URL: https://issues.apache.org/jira/browse/HUDI-4183 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HUDI-4183) Fix using HoodieCatalog to create non-hudi tables
[ https://issues.apache.org/jira/browse/HUDI-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-4183: Summary: Fix using HoodieCatalog to create non-hudi tables (was: Fix using HoodieCatalog to create non hudi tables) > Fix using HoodieCatalog to create non-hudi tables > - > > Key: HUDI-4183 > URL: https://issues.apache.org/jira/browse/HUDI-4183 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HUDI-4183) Fix using HoodieCatalog to create non hudi tables
leesf created HUDI-4183: --- Summary: Fix using HoodieCatalog to create non hudi tables Key: HUDI-4183 URL: https://issues.apache.org/jira/browse/HUDI-4183 Project: Apache Hudi Issue Type: Improvement Reporter: leesf Assignee: leesf -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HUDI-3861) 'path' in CatalogTable#properties failed to be updated when renaming table
[ https://issues.apache.org/jira/browse/HUDI-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521087#comment-17521087 ] leesf commented on HUDI-3861: - [~KnightChess] yeah, if the real table path is updated, the tblp should also be updated, would you mind opening a PR to fix it? > 'path' in CatalogTable#properties failed to be updated when renaming table > -- > > Key: HUDI-3861 > URL: https://issues.apache.org/jira/browse/HUDI-3861 > Project: Apache Hudi > Issue Type: Bug >Reporter: Jin Xing >Priority: Minor > > Reproduce the issue as below > {code:java} > 1. Create a MOR table > create table mor_simple( > id int, > name string, > price double > ) > using hudi > options ( > type = 'cow', > primaryKey = 'id' > ) > 2. Renaming > alter table mor_simple rename to mor_simple0 > 3. Show create table mor_simple0 > Output as > CREATE TABLE hudi.mor_simple0 ( > `_hoodie_commit_time` STRING, > `_hoodie_commit_seqno` STRING, > `_hoodie_record_key` STRING, > `_hoodie_partition_path` STRING, > `_hoodie_file_name` STRING, > `id` INT, > `name` STRING, > `price` DOUBLE) > USING hudi > OPTIONS( > 'primaryKey' = 'id', > 'type' = 'cow') > TBLPROPERTIES( > 'path' = '/user/hive/warehous/hudi.db/mor_simple'){code} > As we can see, the 'path' property is > '/user/hive/warehous/hudi.db/mor_simple', rather than > '/user/hive/warehous/hudi.db/mor_simple0'. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HUDI-3861) 'path' in CatalogTable#properties failed to be updated when renaming table
[ https://issues.apache.org/jira/browse/HUDI-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521054#comment-17521054 ] leesf commented on HUDI-3861: - [~jinxing6...@126.com] Thanks for reporting this, but I think renaming should not change the path, but only change the table name in hoodie.properties. > 'path' in CatalogTable#properties failed to be updated when renaming table > -- > > Key: HUDI-3861 > URL: https://issues.apache.org/jira/browse/HUDI-3861 > Project: Apache Hudi > Issue Type: Bug >Reporter: Jin Xing >Priority: Minor > > Reproduce the issue as below > {code:java} > 1. Create a MOR table > create table mor_simple( > id int, > name string, > price double > ) > using hudi > options ( > type = 'cow', > primaryKey = 'id' > ) > 2. Renaming > alter table mor_simple rename to mor_simple0 > 3. Show create table mor_simple0 > Output as > CREATE TABLE hudi.mor_simple0 ( > `_hoodie_commit_time` STRING, > `_hoodie_commit_seqno` STRING, > `_hoodie_record_key` STRING, > `_hoodie_partition_path` STRING, > `_hoodie_file_name` STRING, > `id` INT, > `name` STRING, > `price` DOUBLE) > USING hudi > OPTIONS( > 'primaryKey' = 'id', > 'type' = 'cow') > TBLPROPERTIES( > 'path' = '/user/hive/warehous/hudi.db/mor_simple'){code} > As we can see, the 'path' property is > '/user/hive/warehous/hudi.db/mor_simple', rather than > '/user/hive/warehous/hudi.db/mor_simple0'. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HUDI-2520) Certify sync with Hive 3
[ https://issues.apache.org/jira/browse/HUDI-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17512797#comment-17512797 ] leesf commented on HUDI-2520: - [~rex_xiong] hi, are you working on a fix? > Certify sync with Hive 3 > > > Key: HUDI-2520 > URL: https://issues.apache.org/jira/browse/HUDI-2520 > Project: Apache Hudi > Issue Type: Task > Components: hive, meta-sync >Reporter: Sagar Sumit >Assignee: rex xiong >Priority: Blocker > Fix For: 0.11.0 > > Attachments: image-2022-03-14-15-52-02-021.png > > > # when execute CTAS statment,the query failed due to twice sync meta problem: > HoodieSparkSqlWriter synced meta first time, followed by > HoodieCatalog.createHoodieTable synced the second time when > HoodieStagedTable.commitStagedChanges > {code:java} > create table if not exists h3_cow using hudi partitioned by (dt) options > (type = 'cow', primaryKey = 'id,name') as select 1 as id, 'a1' as name, 20 as > price, '2021-01-03' as dt; > 22/03/14 14:26:21 ERROR [main] Utils: Aborting task > org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException: Table or > view 'h3_cow' already exists in database 'default' > at > org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createHiveDataSourceTable(CreateHoodieTableCommand.scala:172) > at > org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createTableInCatalog(CreateHoodieTableCommand.scala:148) > at > org.apache.spark.sql.hudi.catalog.HoodieCatalog.createHoodieTable(HoodieCatalog.scala:254) > at > org.apache.spark.sql.hudi.catalog.HoodieStagedTable.commitStagedChanges(HoodieStagedTable.scala:62) > at > org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.$anonfun$writeToTable$1(WriteToDataSourceV2Exec.scala:484) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496) > at > org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable(WriteToDataSourceV2Exec.scala:468) > at > org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable$(WriteToDataSourceV2Exec.scala:463) > at > org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.writeToTable(WriteToDataSourceV2Exec.scala:106) > at > org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:127) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at > org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481){code} > 2. when truncate partition table,neither metadata nor data is truncated and > truncate partition table with partition specs fails > {code:java} > // truncate partition table without partition spec, the query is success but > never delete data > spark-sql> truncate table mor_partition_table_0314; > Time taken: 0.256 seconds > // truncate partition table with partition spec, > spark-sql> truncate table mor_partition_table_0314 partition(dt=3); > Error in query: Table spark_catalog.default.mor_partition_table_0314 does not > support partition management.; > 'TruncatePartition unresolvedpartitionspec((dt,3), None) > +- ResolvedTable org.apache.spark.sql.hudi.catalog.HoodieCatalog@63f609a4, > default.mor_partition_table_0314, > {code} > 3. re-drop exist partition
[jira] [Created] (HUDI-3489) Unify config to avoid duplicate code
leesf created HUDI-3489: --- Summary: Unify config to avoid duplicate code Key: HUDI-3489 URL: https://issues.apache.org/jira/browse/HUDI-3489 Project: Apache Hudi Issue Type: Improvement Reporter: leesf Assignee: leesf -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HUDI-2732) Spark Datasource V2 integration RFC
[ https://issues.apache.org/jira/browse/HUDI-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496453#comment-17496453 ] leesf commented on HUDI-2732: - [~shivnarayan] yes, we can close the Jira. > Spark Datasource V2 integration RFC > > > Key: HUDI-2732 > URL: https://issues.apache.org/jira/browse/HUDI-2732 > Project: Apache Hudi > Issue Type: Task > Components: spark >Reporter: leesf >Assignee: leesf >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3416) Incremental read using v2 datasource
[ https://issues.apache.org/jira/browse/HUDI-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-3416: Issue Type: Improvement (was: Bug) > Incremental read using v2 datasource > > > Key: HUDI-3416 > URL: https://issues.apache.org/jira/browse/HUDI-3416 > Project: Apache Hudi > Issue Type: Improvement > Components: spark >Reporter: leesf >Assignee: leesf >Priority: Major > Fix For: 0.12.0 > > > currently, we still use v1 format for incremental read, and need to use v2 > format as well. > see comment: https://github.com/apache/hudi/pull/4611#discussion_r795089099 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3416) Incremental read using v2 datasource
leesf created HUDI-3416: --- Summary: Incremental read using v2 datasource Key: HUDI-3416 URL: https://issues.apache.org/jira/browse/HUDI-3416 Project: Apache Hudi Issue Type: Bug Reporter: leesf Assignee: leesf currently, we still use v1 format for incremental read, and need to use v2 format as well. see comment: https://github.com/apache/hudi/pull/4611#discussion_r795089099 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-2645) Rewrite Zoptimize and other files in scala into Java
[ https://issues.apache.org/jira/browse/HUDI-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-2645: --- Assignee: shibei > Rewrite Zoptimize and other files in scala into Java > > > Key: HUDI-2645 > URL: https://issues.apache.org/jira/browse/HUDI-2645 > Project: Apache Hudi > Issue Type: Task >Reporter: Vinoth Chandar >Assignee: shibei >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-2873) Support optimize data layout by sql and make the build more fast
[ https://issues.apache.org/jira/browse/HUDI-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-2873: --- Assignee: shibei > Support optimize data layout by sql and make the build more fast > > > Key: HUDI-2873 > URL: https://issues.apache.org/jira/browse/HUDI-2873 > Project: Apache Hudi > Issue Type: Task > Components: Performance, spark >Reporter: tao meng >Assignee: shibei >Priority: Critical > Labels: sev:high > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-3172) Refactor hudi existing modules to make more code reuse in V2 implementation
[ https://issues.apache.org/jira/browse/HUDI-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-3172. --- > Refactor hudi existing modules to make more code reuse in V2 implementation > --- > > Key: HUDI-3172 > URL: https://issues.apache.org/jira/browse/HUDI-3172 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Reopened] (HUDI-3172) Refactor hudi existing modules to make more code reuse in V2 implementation
[ https://issues.apache.org/jira/browse/HUDI-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reopened HUDI-3172: - > Refactor hudi existing modules to make more code reuse in V2 implementation > --- > > Key: HUDI-3172 > URL: https://issues.apache.org/jira/browse/HUDI-3172 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HUDI-3172) Refactor hudi existing modules to make more code reuse in V2 implementation
[ https://issues.apache.org/jira/browse/HUDI-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-3172. - > Refactor hudi existing modules to make more code reuse in V2 implementation > --- > > Key: HUDI-3172 > URL: https://issues.apache.org/jira/browse/HUDI-3172 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3254) Introduce HoodieCatalog to manage tables for Spark Datasource V2
leesf created HUDI-3254: --- Summary: Introduce HoodieCatalog to manage tables for Spark Datasource V2 Key: HUDI-3254 URL: https://issues.apache.org/jira/browse/HUDI-3254 Project: Apache Hudi Issue Type: New Feature Reporter: leesf Assignee: leesf -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HUDI-3172) Refactor hudi existing modules to make more code reuse in V2 implementation
[ https://issues.apache.org/jira/browse/HUDI-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-3172. - > Refactor hudi existing modules to make more code reuse in V2 implementation > --- > > Key: HUDI-3172 > URL: https://issues.apache.org/jira/browse/HUDI-3172 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-3172) Refactor hudi existing modules to make more code reuse in V2 implementation
[ https://issues.apache.org/jira/browse/HUDI-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-3172. --- > Refactor hudi existing modules to make more code reuse in V2 implementation > --- > > Key: HUDI-3172 > URL: https://issues.apache.org/jira/browse/HUDI-3172 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3172) Refactor hudi existing modules to make more code reuse in V2 implementation
[ https://issues.apache.org/jira/browse/HUDI-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-3172: Issue Type: Improvement (was: Bug) > Refactor hudi existing modules to make more code reuse in V2 implementation > --- > > Key: HUDI-3172 > URL: https://issues.apache.org/jira/browse/HUDI-3172 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-3172) Refactor hudi existing modules to make more code reuse in V2 implementation
[ https://issues.apache.org/jira/browse/HUDI-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-3172: --- Assignee: leesf > Refactor hudi existing modules to make more code reuse in V2 implementation > --- > > Key: HUDI-3172 > URL: https://issues.apache.org/jira/browse/HUDI-3172 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3140) Fix bulk_insert failure on Spark 3.2.0
[ https://issues.apache.org/jira/browse/HUDI-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-3140: Fix Version/s: 0.11.0 > Fix bulk_insert failure on Spark 3.2.0 > -- > > Key: HUDI-3140 > URL: https://issues.apache.org/jira/browse/HUDI-3140 > Project: Apache Hudi > Issue Type: Task >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HUDI-3140) Fix bulk_insert failure on Spark 3.2.0
[ https://issues.apache.org/jira/browse/HUDI-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-3140. - > Fix bulk_insert failure on Spark 3.2.0 > -- > > Key: HUDI-3140 > URL: https://issues.apache.org/jira/browse/HUDI-3140 > Project: Apache Hudi > Issue Type: Task >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-3140) Fix bulk_insert failure on Spark 3.2.0
[ https://issues.apache.org/jira/browse/HUDI-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-3140. --- > Fix bulk_insert failure on Spark 3.2.0 > -- > > Key: HUDI-3140 > URL: https://issues.apache.org/jira/browse/HUDI-3140 > Project: Apache Hudi > Issue Type: Task >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-3140) Fix bulk_insert failure on Spark 3.2.0
[ https://issues.apache.org/jira/browse/HUDI-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-3140: --- Assignee: leesf > Fix bulk_insert failure on Spark 3.2.0 > -- > > Key: HUDI-3140 > URL: https://issues.apache.org/jira/browse/HUDI-3140 > Project: Apache Hudi > Issue Type: Task >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3172) Refactor hudi existing modules to make more code reuse in V2 implementation
leesf created HUDI-3172: --- Summary: Refactor hudi existing modules to make more code reuse in V2 implementation Key: HUDI-3172 URL: https://issues.apache.org/jira/browse/HUDI-3172 Project: Apache Hudi Issue Type: Bug Reporter: leesf -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3140) Fix bulk_insert failure on Spark 3.2.0
leesf created HUDI-3140: --- Summary: Fix bulk_insert failure on Spark 3.2.0 Key: HUDI-3140 URL: https://issues.apache.org/jira/browse/HUDI-3140 Project: Apache Hudi Issue Type: Sub-task Reporter: leesf -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-3134) Fix Insert error after adding columns on Spark 3.2.0
[ https://issues.apache.org/jira/browse/HUDI-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-3134. --- > Fix Insert error after adding columns on Spark 3.2.0 > > > Key: HUDI-3134 > URL: https://issues.apache.org/jira/browse/HUDI-3134 > Project: Apache Hudi > Issue Type: Sub-task > Components: Spark Integration >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > On Spark 3.2.0, after altering table to add columns, the insert statement > will fail with the following exception. > Caused by: org.apache.hudi.exception.HoodieException: > java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieException: operation has failed > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:147) > at > org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100) > ... 31 more > Caused by: java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieException: operation has failed > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141) > ... 32 more > Caused by: org.apache.hudi.exception.HoodieException: operation has failed > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:248) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:278) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > Caused by: java.lang.NoSuchMethodError: > org.apache.avro.Schema$Field.defaultValue()Lorg/codehaus/jackson/JsonNode; > at > org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:168) > at > org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95) > at > org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33) > at > org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185) > at > org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156) > at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) > at > org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49) > at > org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ... 4 more -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3134) Fix Insert error after adding columns on Spark 3.2.0
[ https://issues.apache.org/jira/browse/HUDI-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-3134: Component/s: Spark Integration > Fix Insert error after adding columns on Spark 3.2.0 > > > Key: HUDI-3134 > URL: https://issues.apache.org/jira/browse/HUDI-3134 > Project: Apache Hudi > Issue Type: Sub-task > Components: Spark Integration >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > On Spark 3.2.0, after altering table to add columns, the insert statement > will fail with the following exception. > Caused by: org.apache.hudi.exception.HoodieException: > java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieException: operation has failed > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:147) > at > org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100) > ... 31 more > Caused by: java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieException: operation has failed > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141) > ... 32 more > Caused by: org.apache.hudi.exception.HoodieException: operation has failed > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:248) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:278) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > Caused by: java.lang.NoSuchMethodError: > org.apache.avro.Schema$Field.defaultValue()Lorg/codehaus/jackson/JsonNode; > at > org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:168) > at > org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95) > at > org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33) > at > org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185) > at > org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156) > at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) > at > org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49) > at > org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ... 4 more -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3134) Fix Insert error after adding columns on Spark 3.2.0
[ https://issues.apache.org/jira/browse/HUDI-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-3134: Fix Version/s: 0.11.0 > Fix Insert error after adding columns on Spark 3.2.0 > > > Key: HUDI-3134 > URL: https://issues.apache.org/jira/browse/HUDI-3134 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > On Spark 3.2.0, after altering table to add columns, the insert statement > will fail with the following exception. > Caused by: org.apache.hudi.exception.HoodieException: > java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieException: operation has failed > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:147) > at > org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100) > ... 31 more > Caused by: java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieException: operation has failed > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141) > ... 32 more > Caused by: org.apache.hudi.exception.HoodieException: operation has failed > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:248) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:278) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > Caused by: java.lang.NoSuchMethodError: > org.apache.avro.Schema$Field.defaultValue()Lorg/codehaus/jackson/JsonNode; > at > org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:168) > at > org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95) > at > org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33) > at > org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185) > at > org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156) > at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) > at > org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49) > at > org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ... 4 more -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HUDI-3134) Fix Insert error after adding columns on Spark 3.2.0
[ https://issues.apache.org/jira/browse/HUDI-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-3134. - > Fix Insert error after adding columns on Spark 3.2.0 > > > Key: HUDI-3134 > URL: https://issues.apache.org/jira/browse/HUDI-3134 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > > On Spark 3.2.0, after altering table to add columns, the insert statement > will fail with the following exception. > Caused by: org.apache.hudi.exception.HoodieException: > java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieException: operation has failed > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:147) > at > org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100) > ... 31 more > Caused by: java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieException: operation has failed > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141) > ... 32 more > Caused by: org.apache.hudi.exception.HoodieException: operation has failed > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:248) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:278) > at > org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > Caused by: java.lang.NoSuchMethodError: > org.apache.avro.Schema$Field.defaultValue()Lorg/codehaus/jackson/JsonNode; > at > org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:168) > at > org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95) > at > org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33) > at > org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185) > at > org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156) > at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) > at > org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49) > at > org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45) > at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ... 4 more -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3134) Fix Insert error after adding columns on Spark 3.2.0
leesf created HUDI-3134: --- Summary: Fix Insert error after adding columns on Spark 3.2.0 Key: HUDI-3134 URL: https://issues.apache.org/jira/browse/HUDI-3134 Project: Apache Hudi Issue Type: Sub-task Reporter: leesf Assignee: leesf On Spark 3.2.0, after altering table to add columns, the insert statement will fail with the following exception. Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:147) at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100) ... 31 more Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: operation has failed at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141) ... 32 more Caused by: org.apache.hudi.exception.HoodieException: operation has failed at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:248) at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226) at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52) at org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:278) at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more Caused by: java.lang.NoSuchMethodError: org.apache.avro.Schema$Field.defaultValue()Lorg/codehaus/jackson/JsonNode; at org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:168) at org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95) at org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33) at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138) at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185) at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156) at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49) at org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ... 4 more -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3047) Basic Implementation of Spark Datasource V2
[ https://issues.apache.org/jira/browse/HUDI-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-3047: Fix Version/s: 0.11.0 > Basic Implementation of Spark Datasource V2 > --- > > Key: HUDI-3047 > URL: https://issues.apache.org/jira/browse/HUDI-3047 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > Introduce HoodieCatalog and HoodieInternalTableV2 to implement read and write > path -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3047) Basic Implementation of Spark Datasource V2
[ https://issues.apache.org/jira/browse/HUDI-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-3047: Priority: Blocker (was: Major) > Basic Implementation of Spark Datasource V2 > --- > > Key: HUDI-3047 > URL: https://issues.apache.org/jira/browse/HUDI-3047 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: leesf >Assignee: leesf >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > Introduce HoodieCatalog and HoodieInternalTableV2 to implement read and write > path -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3047) Basic Implementation of Spark Datasource V2
[ https://issues.apache.org/jira/browse/HUDI-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-3047: Summary: Basic Implementation of Spark Datasource V2 (was: Basic Implement of Spark Datasource V2) > Basic Implementation of Spark Datasource V2 > --- > > Key: HUDI-3047 > URL: https://issues.apache.org/jira/browse/HUDI-3047 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: leesf >Assignee: leesf >Priority: Major > > Introduce HoodieCatalog and HoodieInternalTableV2 to implement read and write > path -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3047) Basic Implement of Spark Datasource V2
leesf created HUDI-3047: --- Summary: Basic Implement of Spark Datasource V2 Key: HUDI-3047 URL: https://issues.apache.org/jira/browse/HUDI-3047 Project: Apache Hudi Issue Type: Sub-task Reporter: leesf Assignee: leesf Introduce HoodieCatalog and HoodieInternalTableV2 to implement read and write path -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-2813) Claim RFC number for RFC for spark datasource V2 Integration
[ https://issues.apache.org/jira/browse/HUDI-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-2813. --- > Claim RFC number for RFC for spark datasource V2 Integration > - > > Key: HUDI-2813 > URL: https://issues.apache.org/jira/browse/HUDI-2813 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2813) Claim RFC number for RFC for spark datasource V2 Integration
[ https://issues.apache.org/jira/browse/HUDI-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-2813: Fix Version/s: 0.11.0 > Claim RFC number for RFC for spark datasource V2 Integration > - > > Key: HUDI-2813 > URL: https://issues.apache.org/jira/browse/HUDI-2813 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HUDI-2813) Claim RFC number for RFC for spark datasource V2 Integration
[ https://issues.apache.org/jira/browse/HUDI-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-2813. - > Claim RFC number for RFC for spark datasource V2 Integration > - > > Key: HUDI-2813 > URL: https://issues.apache.org/jira/browse/HUDI-2813 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2916) Add IssueNavigationLink for IDEA
[ https://issues.apache.org/jira/browse/HUDI-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-2916: Summary: Add IssueNavigationLink for IDEA (was: Add issue and jira navigation link for IDEA) > Add IssueNavigationLink for IDEA > > > Key: HUDI-2916 > URL: https://issues.apache.org/jira/browse/HUDI-2916 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2916) Add issue and jira navigation link for IDEA
[ https://issues.apache.org/jira/browse/HUDI-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-2916: Summary: Add issue and jira navigation link for IDEA (was: Add IssueNavigationLink for IDEA git log) > Add issue and jira navigation link for IDEA > > > Key: HUDI-2916 > URL: https://issues.apache.org/jira/browse/HUDI-2916 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-2916) Add IssueNavigationLink for IDEA git log
leesf created HUDI-2916: --- Summary: Add IssueNavigationLink for IDEA git log Key: HUDI-2916 URL: https://issues.apache.org/jira/browse/HUDI-2916 Project: Apache Hudi Issue Type: Improvement Reporter: leesf Assignee: leesf -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2100) [UMBRELLA] Support Space curve for hudi
[ https://issues.apache.org/jira/browse/HUDI-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-2100: Fix Version/s: 0.11.0 > [UMBRELLA] Support Space curve for hudi > --- > > Key: HUDI-2100 > URL: https://issues.apache.org/jira/browse/HUDI-2100 > Project: Apache Hudi > Issue Type: New Feature > Components: Spark Integration >Reporter: tao meng >Assignee: tao meng >Priority: Blocker > Labels: hudi-umbrellas > Fix For: 0.11.0 > > > supoort space curve to optimize the cluster of hudi file to improve query > performance. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-2813) Claim RFC number for RFC for spark datasource V2 Integration
leesf created HUDI-2813: --- Summary: Claim RFC number for RFC for spark datasource V2 Integration Key: HUDI-2813 URL: https://issues.apache.org/jira/browse/HUDI-2813 Project: Apache Hudi Issue Type: Sub-task Reporter: leesf Assignee: leesf -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-2732) Spark Datasource V2 integration RFC
[ https://issues.apache.org/jira/browse/HUDI-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-2732: --- Assignee: leesf > Spark Datasource V2 integration RFC > > > Key: HUDI-2732 > URL: https://issues.apache.org/jira/browse/HUDI-2732 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-2732) Spark Datasource V2 integration RFC
leesf created HUDI-2732: --- Summary: Spark Datasource V2 integration RFC Key: HUDI-2732 URL: https://issues.apache.org/jira/browse/HUDI-2732 Project: Apache Hudi Issue Type: Sub-task Reporter: leesf -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-2413) Sql source in delta streamer does not work
[ https://issues.apache.org/jira/browse/HUDI-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-2413: --- Assignee: Jian Feng > Sql source in delta streamer does not work > -- > > Key: HUDI-2413 > URL: https://issues.apache.org/jira/browse/HUDI-2413 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Jian Feng >Assignee: Jian Feng >Priority: Major > > sql source return null checkpoint, in DeltaSync null checkpoint will be > judged as no new data,should return a empty string -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2064) Fix TestHoodieBackedMetadata#testOnlyValidPartitionsAdded
leesf created HUDI-2064: --- Summary: Fix TestHoodieBackedMetadata#testOnlyValidPartitionsAdded Key: HUDI-2064 URL: https://issues.apache.org/jira/browse/HUDI-2064 Project: Apache Hudi Issue Type: Bug Reporter: leesf Assignee: leesf -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1922) bulk insert with row writer supports mor table
[ https://issues.apache.org/jira/browse/HUDI-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1922: Affects Version/s: 0.8.0 > bulk insert with row writer supports mor table > --- > > Key: HUDI-1922 > URL: https://issues.apache.org/jira/browse/HUDI-1922 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: leesf >Assignee: leesf >Priority: Major > > now when using bulk insert mode with row writer and set table type to mor, > the bulk insert fails -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1922) bulk insert with row writer supports mor table
[ https://issues.apache.org/jira/browse/HUDI-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1922: Fix Version/s: 0.9.0 > bulk insert with row writer supports mor table > --- > > Key: HUDI-1922 > URL: https://issues.apache.org/jira/browse/HUDI-1922 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: leesf >Assignee: leesf >Priority: Major > Fix For: 0.9.0 > > > now when using bulk insert mode with row writer and set table type to mor, > the bulk insert fails -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1922) bulk insert with row writer supports mor table
leesf created HUDI-1922: --- Summary: bulk insert with row writer supports mor table Key: HUDI-1922 URL: https://issues.apache.org/jira/browse/HUDI-1922 Project: Apache Hudi Issue Type: Bug Reporter: leesf Assignee: leesf now when using bulk insert mode with row writer and set table type to mor, the bulk insert fails -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1460) Time Travel (querying the historical versions of data) ability for Hudi Table
[ https://issues.apache.org/jira/browse/HUDI-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249079#comment-17249079 ] leesf commented on HUDI-1460: - [~qian heng] sorry would not access the google doc you provided, and it would be better if you would send a discuss email to dev ML. > Time Travel (querying the historical versions of data) ability for Hudi Table > - > > Key: HUDI-1460 > URL: https://issues.apache.org/jira/browse/HUDI-1460 > Project: Apache Hudi > Issue Type: New Feature > Components: Common Core >Reporter: qian heng >Priority: Major > > Hi, all: > We plan to use Hudi to sync mysql binlog data. There will be a flink ETL task > to consume binlog records from kafka and save data to hudi every one hour. > The binlog records are also grouped every one hour and all records of one > hour will be saved in one commit. The data transmission pipeline should be > like -- binlog -> kafka -> flink -> parquet. > After the data is synced to hudi, we want to querying the historical hourly > versions of the Hudi table in hive SQL. > Here is a more detailed description of our issue along with a simply design > of Time Travel for Hudi, the design is under development and testing: > [https://docs.google.com/document/d/1r0iwUsklw9aKSDMzZaiq43dy57cSJSAqT9KCvgjbtUo/edit#] > We have to support Time Travel ability recently for our business needs. We > also have seen the [RFC > 07|https://cwiki.apache.org/confluence/display/HUDI/RFC+-+07+%3A+Point+in+time+Time-Travel+queries+on+Hudi+table]. > Be glad to receive any suggestion or dicussion. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1161) Support update partial fields for MoR table
[ https://issues.apache.org/jira/browse/HUDI-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-1161: --- Assignee: Nicholas Jiang (was: leesf) > Support update partial fields for MoR table > --- > > Key: HUDI-1161 > URL: https://issues.apache.org/jira/browse/HUDI-1161 > Project: Apache Hudi > Issue Type: Sub-task > Components: Writer Core >Reporter: leesf >Assignee: Nicholas Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1123) Document the usage of user define metrics reporter
[ https://issues.apache.org/jira/browse/HUDI-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1123. --- > Document the usage of user define metrics reporter > -- > > Key: HUDI-1123 > URL: https://issues.apache.org/jira/browse/HUDI-1123 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: Zheren Yu >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1123) Document the usage of user define metrics reporter
[ https://issues.apache.org/jira/browse/HUDI-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1123. - Fix Version/s: 0.6.0 Resolution: Fixed > Document the usage of user define metrics reporter > -- > > Key: HUDI-1123 > URL: https://issues.apache.org/jira/browse/HUDI-1123 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: Zheren Yu >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1124) Document the usage of Tencent COSN
[ https://issues.apache.org/jira/browse/HUDI-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1124: Status: Open (was: New) > Document the usage of Tencent COSN > -- > > Key: HUDI-1124 > URL: https://issues.apache.org/jira/browse/HUDI-1124 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: deyzhong >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1124) Document the usage of Tencent COSN
[ https://issues.apache.org/jira/browse/HUDI-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1124. --- > Document the usage of Tencent COSN > -- > > Key: HUDI-1124 > URL: https://issues.apache.org/jira/browse/HUDI-1124 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: deyzhong >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1124) Document the usage of Tencent COSN
[ https://issues.apache.org/jira/browse/HUDI-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1124. - Fix Version/s: 0.6.0 Resolution: Fixed > Document the usage of Tencent COSN > -- > > Key: HUDI-1124 > URL: https://issues.apache.org/jira/browse/HUDI-1124 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: deyzhong >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1123) Document the usage of user define metrics reporter
[ https://issues.apache.org/jira/browse/HUDI-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1123: Status: Open (was: New) > Document the usage of user define metrics reporter > -- > > Key: HUDI-1123 > URL: https://issues.apache.org/jira/browse/HUDI-1123 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: Zheren Yu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1287) Make deltastrmer supports custom ETL transformer
[ https://issues.apache.org/jira/browse/HUDI-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198994#comment-17198994 ] leesf commented on HUDI-1287: - [~liujinhui] DeltaStreamer should support user custom Transformer. you would just implement your own transformer to implement Transformer interface. > Make deltastrmer supports custom ETL transformer > > > Key: HUDI-1287 > URL: https://issues.apache.org/jira/browse/HUDI-1287 > Project: Apache Hudi > Issue Type: Improvement > Components: DeltaStreamer >Reporter: liujinhui >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1288) DeltaSync:writeToSink fails with Unknown datum type org.apache.avro.JsonProperties$Null
[ https://issues.apache.org/jira/browse/HUDI-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198992#comment-17198992 ] leesf commented on HUDI-1288: - [~soltar] I found there are still some users face the issue https://github.com/apache/avro/pull/290#issuecomment-625731714. and does 0.5.2-incubating works well? > DeltaSync:writeToSink fails with Unknown datum type > org.apache.avro.JsonProperties$Null > --- > > Key: HUDI-1288 > URL: https://issues.apache.org/jira/browse/HUDI-1288 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Michal Swiatowy >Priority: Major > > After updating to Hudi version 0.5.3 (prev. 0.5.2-incubating) I run into > following error message on write to HDFS: > {code:java} > 2020-09-18 12:54:38,651 [Driver] INFO > HoodieTableMetaClient:initTableAndGetMetaClient:379 - Finished initializing > Table of type MERGE_ON_READ from > /master_data/6FQS/hudi_test/S_INCOMINGMESSAGEDETAIL_CDC > 2020-09-18 12:54:38,663 [Driver] INFO DeltaSync:setupWriteClient:470 - > Setting up Hoodie Write Client > 2020-09-18 12:54:38,695 [Driver] INFO DeltaSync:registerAvroSchemas:522 - > Registering Schema >
[jira] [Resolved] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly
[ https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-802. Resolution: Fixed > AWSDmsTransformer does not handle insert -> delete of a row in a single batch > correctly > --- > > Key: HUDI-802 > URL: https://issues.apache.org/jira/browse/HUDI-802 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Christopher Weaver >Assignee: Balaji Varadarajan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.6.1 > > > The provided AWSDmsAvroPayload class > ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java]) > currently handles cases where the "Op" column is a "D" for updates, and > successfully removes the row from the resulting table. > However, when an insert is quickly followed by a delete on the row (e.g. DMS > processes them together and puts the update records together in the same > parquet file), the row incorrectly appears in the resulting table. In this > case, the record is not in the table and getInsertValue is called rather than > combineAndGetUpdateValue. Since the logic to check for a delete is in > combineAndGetUpdateValue, it is skipped and the delete is missed. Something > like this could fix this issue: > [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java]. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly
[ https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-802. -- > AWSDmsTransformer does not handle insert -> delete of a row in a single batch > correctly > --- > > Key: HUDI-802 > URL: https://issues.apache.org/jira/browse/HUDI-802 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Christopher Weaver >Assignee: Balaji Varadarajan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.6.1 > > > The provided AWSDmsAvroPayload class > ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java]) > currently handles cases where the "Op" column is a "D" for updates, and > successfully removes the row from the resulting table. > However, when an insert is quickly followed by a delete on the row (e.g. DMS > processes them together and puts the update records together in the same > parquet file), the row incorrectly appears in the resulting table. In this > case, the record is not in the table and getInsertValue is called rather than > combineAndGetUpdateValue. Since the logic to check for a delete is in > combineAndGetUpdateValue, it is skipped and the delete is missed. Something > like this could fix this issue: > [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java]. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1255) Combine and get updateValue in multiFields
[ https://issues.apache.org/jira/browse/HUDI-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1255. --- > Combine and get updateValue in multiFields > -- > > Key: HUDI-1255 > URL: https://issues.apache.org/jira/browse/HUDI-1255 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: karl wang >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > update current value for several fields that you want to change. > The default payload OverwriteWithLatestAvroPayload overwrite the whole record > when > compare to orderingVal.This doesn't meet our need when we just want to change > specified fields. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1254) TypedProperties can not get values by initializing an existing properties
[ https://issues.apache.org/jira/browse/HUDI-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1254. --- > TypedProperties can not get values by initializing an existing properties > - > > Key: HUDI-1254 > URL: https://issues.apache.org/jira/browse/HUDI-1254 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Reporter: cdmikechen >Assignee: linshan-ma >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > If I create a test to new a TypedProperties by a Properties that exists like > blow: > {code:java} > public class TestTypedProperties { > @Test > public void testNewTypedProperties() { > Properties properties = new Properties(); > properties.put("test_key1", "test_value1"); > TypedProperties typedProperties = new TypedProperties(properties); > assertEquals("test_value1", typedProperties.getString("test_key1")); > } > } > {code} > Test can not pass and get this error: *java.lang.IllegalArgumentException: > Property test_key1 not found* > I think this is a bug and need to be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1254) TypedProperties can not get values by initializing an existing properties
[ https://issues.apache.org/jira/browse/HUDI-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1254. - Resolution: Fixed > TypedProperties can not get values by initializing an existing properties > - > > Key: HUDI-1254 > URL: https://issues.apache.org/jira/browse/HUDI-1254 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Reporter: cdmikechen >Assignee: linshan-ma >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > If I create a test to new a TypedProperties by a Properties that exists like > blow: > {code:java} > public class TestTypedProperties { > @Test > public void testNewTypedProperties() { > Properties properties = new Properties(); > properties.put("test_key1", "test_value1"); > TypedProperties typedProperties = new TypedProperties(properties); > assertEquals("test_value1", typedProperties.getString("test_key1")); > } > } > {code} > Test can not pass and get this error: *java.lang.IllegalArgumentException: > Property test_key1 not found* > I think this is a bug and need to be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1254) TypedProperties can not get values by initializing an existing properties
[ https://issues.apache.org/jira/browse/HUDI-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1254: Status: Open (was: New) > TypedProperties can not get values by initializing an existing properties > - > > Key: HUDI-1254 > URL: https://issues.apache.org/jira/browse/HUDI-1254 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Reporter: cdmikechen >Assignee: linshan-ma >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > If I create a test to new a TypedProperties by a Properties that exists like > blow: > {code:java} > public class TestTypedProperties { > @Test > public void testNewTypedProperties() { > Properties properties = new Properties(); > properties.put("test_key1", "test_value1"); > TypedProperties typedProperties = new TypedProperties(properties); > assertEquals("test_value1", typedProperties.getString("test_key1")); > } > } > {code} > Test can not pass and get this error: *java.lang.IllegalArgumentException: > Property test_key1 not found* > I think this is a bug and need to be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1130) Allow for schema evolution within DAG for hudi test suite
[ https://issues.apache.org/jira/browse/HUDI-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1130. --- > Allow for schema evolution within DAG for hudi test suite > - > > Key: HUDI-1130 > URL: https://issues.apache.org/jira/browse/HUDI-1130 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing >Reporter: Nishith Agarwal >Assignee: Nishith Agarwal >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1130) Allow for schema evolution within DAG for hudi test suite
[ https://issues.apache.org/jira/browse/HUDI-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1130. - Fix Version/s: 0.6.1 Resolution: Fixed > Allow for schema evolution within DAG for hudi test suite > - > > Key: HUDI-1130 > URL: https://issues.apache.org/jira/browse/HUDI-1130 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing >Reporter: Nishith Agarwal >Assignee: Nishith Agarwal >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1130) Allow for schema evolution within DAG for hudi test suite
[ https://issues.apache.org/jira/browse/HUDI-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1130: Status: Open (was: New) > Allow for schema evolution within DAG for hudi test suite > - > > Key: HUDI-1130 > URL: https://issues.apache.org/jira/browse/HUDI-1130 > Project: Apache Hudi > Issue Type: Improvement > Components: Testing >Reporter: Nishith Agarwal >Assignee: Nishith Agarwal >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1181) Decimal type display issue for record key field
[ https://issues.apache.org/jira/browse/HUDI-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1181. - Fix Version/s: 0.6.1 Resolution: Fixed > Decimal type display issue for record key field > --- > > Key: HUDI-1181 > URL: https://issues.apache.org/jira/browse/HUDI-1181 > Project: Apache Hudi > Issue Type: Bug >Reporter: Wenning Ding >Assignee: Wenning Ding >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > When using *fixed_len_byte_array* decimal type as Hudi record key, Hudi would > not correctly display the decimal value, instead, Hudi would display it as a > byte array. > During the Hudi writing phase, Hudi would save the parquet source data into > Avro Generic Record. For example, the source parquet data has a column with > decimal type: > > {code:java} > optional fixed_len_byte_array(16) OBJ_ID (DECIMAL(38,0));{code} > > Then Hudi will convert it into the following avro decimal type: > {code:java} > { > "name" : "OBJ_ID", > "type" : [ { > "type" : "fixed", > "name" : "fixed", > "namespace" : "hoodie.hudi_ln.hudi_ln_record.OBJ_ID", > "size" : 16, > "logicalType" : "decimal", > "precision" : 38, > "scale" : 0 > }, "null" ] > } > {code} > This decimal field would be stored as a fixed length bytes array. And in the > reading phase, Hudi will convert this bytes array back to a readable decimal > value through this > [converter|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala#L58]. > However, the problem is, when setting decimal type as record keys, Hudi would > read the value from Avro Generic Record and then directly convert it into > String type(See > [here|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L76]). > As a result, what shows in the _hoodie_record_key field would be something > like: LN_LQDN_OBJ_ID:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 40, 95, -71].So > we need to handle this special case to convert bytes array back before > converting to String. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1181) Decimal type display issue for record key field
[ https://issues.apache.org/jira/browse/HUDI-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1181. --- > Decimal type display issue for record key field > --- > > Key: HUDI-1181 > URL: https://issues.apache.org/jira/browse/HUDI-1181 > Project: Apache Hudi > Issue Type: Bug >Reporter: Wenning Ding >Assignee: Wenning Ding >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > When using *fixed_len_byte_array* decimal type as Hudi record key, Hudi would > not correctly display the decimal value, instead, Hudi would display it as a > byte array. > During the Hudi writing phase, Hudi would save the parquet source data into > Avro Generic Record. For example, the source parquet data has a column with > decimal type: > > {code:java} > optional fixed_len_byte_array(16) OBJ_ID (DECIMAL(38,0));{code} > > Then Hudi will convert it into the following avro decimal type: > {code:java} > { > "name" : "OBJ_ID", > "type" : [ { > "type" : "fixed", > "name" : "fixed", > "namespace" : "hoodie.hudi_ln.hudi_ln_record.OBJ_ID", > "size" : 16, > "logicalType" : "decimal", > "precision" : 38, > "scale" : 0 > }, "null" ] > } > {code} > This decimal field would be stored as a fixed length bytes array. And in the > reading phase, Hudi will convert this bytes array back to a readable decimal > value through this > [converter|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala#L58]. > However, the problem is, when setting decimal type as record keys, Hudi would > read the value from Avro Generic Record and then directly convert it into > String type(See > [here|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L76]). > As a result, what shows in the _hoodie_record_key field would be something > like: LN_LQDN_OBJ_ID:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 40, 95, -71].So > we need to handle this special case to convert bytes array back before > converting to String. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1181) Decimal type display issue for record key field
[ https://issues.apache.org/jira/browse/HUDI-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1181: Status: Open (was: New) > Decimal type display issue for record key field > --- > > Key: HUDI-1181 > URL: https://issues.apache.org/jira/browse/HUDI-1181 > Project: Apache Hudi > Issue Type: Bug >Reporter: Wenning Ding >Assignee: Wenning Ding >Priority: Major > Labels: pull-request-available > > When using *fixed_len_byte_array* decimal type as Hudi record key, Hudi would > not correctly display the decimal value, instead, Hudi would display it as a > byte array. > During the Hudi writing phase, Hudi would save the parquet source data into > Avro Generic Record. For example, the source parquet data has a column with > decimal type: > > {code:java} > optional fixed_len_byte_array(16) OBJ_ID (DECIMAL(38,0));{code} > > Then Hudi will convert it into the following avro decimal type: > {code:java} > { > "name" : "OBJ_ID", > "type" : [ { > "type" : "fixed", > "name" : "fixed", > "namespace" : "hoodie.hudi_ln.hudi_ln_record.OBJ_ID", > "size" : 16, > "logicalType" : "decimal", > "precision" : 38, > "scale" : 0 > }, "null" ] > } > {code} > This decimal field would be stored as a fixed length bytes array. And in the > reading phase, Hudi will convert this bytes array back to a readable decimal > value through this > [converter|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala#L58]. > However, the problem is, when setting decimal type as record keys, Hudi would > read the value from Avro Generic Record and then directly convert it into > String type(See > [here|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L76]). > As a result, what shows in the _hoodie_record_key field would be something > like: LN_LQDN_OBJ_ID:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 40, 95, -71].So > we need to handle this special case to convert bytes array back before > converting to String. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1268) Fix UpgradeDowngrade Rename Exception in aliyun OSS
leesf created HUDI-1268: --- Summary: Fix UpgradeDowngrade Rename Exception in aliyun OSS Key: HUDI-1268 URL: https://issues.apache.org/jira/browse/HUDI-1268 Project: Apache Hudi Issue Type: Bug Components: Writer Core Reporter: leesf Fix For: 0.6.1 when using HoodieWriteClient API to write data to hudi with following config: ``` Properties properties = new Properties(); properties.setProperty(HoodieTableConfig.HOODIE_TABLE_NAME_PROP_NAME, tableName); properties.setProperty(HoodieTableConfig.HOODIE_TABLE_TYPE_PROP_NAME, tableType.name()); properties.setProperty(HoodieTableConfig.HOODIE_PAYLOAD_CLASS_PROP_NAME, OverwriteWithLatestAvroPayload.class.getName()); properties.setProperty(HoodieTableConfig.HOODIE_ARCHIVELOG_FOLDER_PROP_NAME, "archived"); return HoodieTableMetaClient.initTableAndGetMetaClient(hadoopConf, basePath, properties); ``` the exception will be thrown with FileAlreadyExistsException in aliyun OSS, after debugging, it is the following code throws the exception. ``` // Rename the .updated file to hoodie.properties. This is atomic in hdfs, but not in cloud stores. // But as long as this does not leave a partial hoodie.properties file, we are okay. fs.rename(updatedPropsFilePath, propsFilePath); ``` however, we would ignore the FileAlreadyExistsException since hoodie.properties already exists. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1225) Avro Date logical type not handled correctly when converting to Spark Row
[ https://issues.apache.org/jira/browse/HUDI-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1225. --- > Avro Date logical type not handled correctly when converting to Spark Row > - > > Key: HUDI-1225 > URL: https://issues.apache.org/jira/browse/HUDI-1225 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: Balaji Varadarajan >Assignee: Balaji Varadarajan >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > [https://github.com/apache/hudi/issues/2034] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1225) Avro Date logical type not handled correctly when converting to Spark Row
[ https://issues.apache.org/jira/browse/HUDI-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1225. - Resolution: Fixed > Avro Date logical type not handled correctly when converting to Spark Row > - > > Key: HUDI-1225 > URL: https://issues.apache.org/jira/browse/HUDI-1225 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: Balaji Varadarajan >Assignee: Balaji Varadarajan >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > [https://github.com/apache/hudi/issues/2034] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1231) Duplicate record while querying from hive synced table
[ https://issues.apache.org/jira/browse/HUDI-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186874#comment-17186874 ] leesf commented on HUDI-1231: - [~vbalaji] would you please take a look > Duplicate record while querying from hive synced table > -- > > Key: HUDI-1231 > URL: https://issues.apache.org/jira/browse/HUDI-1231 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ashok Kumar >Priority: Major > > I am writting in upsert mode with precombine flag enabled. Still when i query > i see same record available 3 times in same parquet file > > spark.sql("select > _hoodie_commit_time,_hoodie_commit_seqno,_hoodie_record_key,_hoodie_partition_path,_hoodie_file_name > from hudi5_mor_ro where id1=1086187 and timestamp=1598461500 and > _hoodie_record_key='timestamp:1598461500,id1:1086187,id2:1872725,flowId:23'").show(10,false) > > +--+ > |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name| > +--+ > |20200826171813|20200826171813_13856_855766|timestamp:1598461500,id1:1086187,id2:1872725,flowId:23|1086187/2020082617|5ecb020f-29be-4eed-b130-8c02ae819603-0_13856-104-296775_20200826171813.parquet| > |20200826171813|20200826171813_13856_855766|timestamp:1598461500,id1:1086187,id2:1872725,flowId:23|1086187/2020082617|5ecb020f-29be-4eed-b130-8c02ae819603-0_13856-104-296775_20200826171813.parquet| > |20200826171813|20200826171813_13856_855766|timestamp:1598461500,id1:1086187,id2:1872725,flowId:23|1086187/2020082617|5ecb020f-29be-4eed-b130-8c02ae819603-0_13856-104-296775_20200826171813.parquet| > +--+ > > This issue i am getting with both kind of table i.e COW and MOR. > I have tried it 0.6.3 version but i had tried 0.5.3 and in that also this bug > was coming. > This issue is not coming with small data set. > > Strange thing is when i query only parquet file it gives only one record(i.e > correct) > df.filter(col("_hoodie_record_key")==="timestamp:1598461500,id1:1086187,id2:1872725,flowId:23").count > res13: Long = 1 > > Note: > When i query filesystem, its fine. > This issue i see when i query from hive synced table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1234) Insert new records regardless of small file when using insert operation
leesf created HUDI-1234: --- Summary: Insert new records regardless of small file when using insert operation Key: HUDI-1234 URL: https://issues.apache.org/jira/browse/HUDI-1234 Project: Apache Hudi Issue Type: Bug Components: Writer Core Reporter: leesf context here [https://github.com/apache/hudi/issues/2051] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1227) Document the usage of CLI
[ https://issues.apache.org/jira/browse/HUDI-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1227: Issue Type: Improvement (was: Bug) > Document the usage of CLI > - > > Key: HUDI-1227 > URL: https://issues.apache.org/jira/browse/HUDI-1227 > Project: Apache Hudi > Issue Type: Improvement > Components: CLI >Reporter: leesf >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1227) Document the usage of CLI
leesf created HUDI-1227: --- Summary: Document the usage of CLI Key: HUDI-1227 URL: https://issues.apache.org/jira/browse/HUDI-1227 Project: Apache Hudi Issue Type: Bug Components: CLI Reporter: leesf -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key
[ https://issues.apache.org/jira/browse/HUDI-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1083: Fix Version/s: 0.6.1 > Minor optimization in Determining insert bucket location for a given key > > > Key: HUDI-1083 > URL: https://issues.apache.org/jira/browse/HUDI-1083 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: sivabalan narayanan >Assignee: shenh062326 >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > As of now, this is how bucket for a given key is determined. > In every partition, we find all insert buckets and assign weights. > for eg: 0.2, 0.3, 0.5 for a given partition with 100 records to be inserted > means, 20 will go into B0, 30 will go into B1 and 50 will go into B2. > within getPartition(Object key), we linearly walk through the bucket weights > and find the right bucket for a given key. for instance if mod (hash value) > is 90/100 = 0.9, we keep adding the bucket weights until the value exceeds > 0.9. > Instead we could calculate cumulative weights upfront and do a binary search > within getPartition() > so, 0.2, 0.5, 1 > so with mod(hash value), we could do binary search and find the right bucket > and would cut cost from O(N) to log N. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key
[ https://issues.apache.org/jira/browse/HUDI-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1083. - Resolution: Fixed > Minor optimization in Determining insert bucket location for a given key > > > Key: HUDI-1083 > URL: https://issues.apache.org/jira/browse/HUDI-1083 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: sivabalan narayanan >Assignee: shenh062326 >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > As of now, this is how bucket for a given key is determined. > In every partition, we find all insert buckets and assign weights. > for eg: 0.2, 0.3, 0.5 for a given partition with 100 records to be inserted > means, 20 will go into B0, 30 will go into B1 and 50 will go into B2. > within getPartition(Object key), we linearly walk through the bucket weights > and find the right bucket for a given key. for instance if mod (hash value) > is 90/100 = 0.9, we keep adding the bucket weights until the value exceeds > 0.9. > Instead we could calculate cumulative weights upfront and do a binary search > within getPartition() > so, 0.2, 0.5, 1 > so with mod(hash value), we could do binary search and find the right bucket > and would cut cost from O(N) to log N. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key
[ https://issues.apache.org/jira/browse/HUDI-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1083: Status: Open (was: New) > Minor optimization in Determining insert bucket location for a given key > > > Key: HUDI-1083 > URL: https://issues.apache.org/jira/browse/HUDI-1083 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: sivabalan narayanan >Assignee: shenh062326 >Priority: Major > Labels: pull-request-available > > As of now, this is how bucket for a given key is determined. > In every partition, we find all insert buckets and assign weights. > for eg: 0.2, 0.3, 0.5 for a given partition with 100 records to be inserted > means, 20 will go into B0, 30 will go into B1 and 50 will go into B2. > within getPartition(Object key), we linearly walk through the bucket weights > and find the right bucket for a given key. for instance if mod (hash value) > is 90/100 = 0.9, we keep adding the bucket weights until the value exceeds > 0.9. > Instead we could calculate cumulative weights upfront and do a binary search > within getPartition() > so, 0.2, 0.5, 1 > so with mod(hash value), we could do binary search and find the right bucket > and would cut cost from O(N) to log N. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key
[ https://issues.apache.org/jira/browse/HUDI-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1083. --- > Minor optimization in Determining insert bucket location for a given key > > > Key: HUDI-1083 > URL: https://issues.apache.org/jira/browse/HUDI-1083 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: sivabalan narayanan >Assignee: shenh062326 >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > As of now, this is how bucket for a given key is determined. > In every partition, we find all insert buckets and assign weights. > for eg: 0.2, 0.3, 0.5 for a given partition with 100 records to be inserted > means, 20 will go into B0, 30 will go into B1 and 50 will go into B2. > within getPartition(Object key), we linearly walk through the bucket weights > and find the right bucket for a given key. for instance if mod (hash value) > is 90/100 = 0.9, we keep adding the bucket weights until the value exceeds > 0.9. > Instead we could calculate cumulative weights upfront and do a binary search > within getPartition() > so, 0.2, 0.5, 1 > so with mod(hash value), we could do binary search and find the right bucket > and would cut cost from O(N) to log N. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1177) fix TimestampBasedKeyGenerator Task not serializableException
[ https://issues.apache.org/jira/browse/HUDI-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1177. --- > fix TimestampBasedKeyGenerator Task not serializableException > -- > > Key: HUDI-1177 > URL: https://issues.apache.org/jira/browse/HUDI-1177 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: liujinhui >Assignee: Pratyaksh Sharma >Priority: Blocker > Labels: pull-request-available > Fix For: 0.6.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1188) MOR hbase index tables not deduplicating records
[ https://issues.apache.org/jira/browse/HUDI-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1188. --- > MOR hbase index tables not deduplicating records > > > Key: HUDI-1188 > URL: https://issues.apache.org/jira/browse/HUDI-1188 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ryan Pifer >Assignee: Ryan Pifer >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > After fetching hbase index for a record, Hudi performs a validation that the > commit timestamp stored in hbase for that record is a commit on the timeline. > This makes any record that is stored to hbase index during a deltacommit > (upsert on MOR table) considered an invalid commit and treated as a new > record. This causes the hbase index to be updated every time which leads to > records being able to be in multiple partitions and even in different file > groups within same partition. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1188) MOR hbase index tables not deduplicating records
[ https://issues.apache.org/jira/browse/HUDI-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1188: Status: Open (was: New) > MOR hbase index tables not deduplicating records > > > Key: HUDI-1188 > URL: https://issues.apache.org/jira/browse/HUDI-1188 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ryan Pifer >Assignee: Ryan Pifer >Priority: Major > Labels: pull-request-available > > After fetching hbase index for a record, Hudi performs a validation that the > commit timestamp stored in hbase for that record is a commit on the timeline. > This makes any record that is stored to hbase index during a deltacommit > (upsert on MOR table) considered an invalid commit and treated as a new > record. This causes the hbase index to be updated every time which leads to > records being able to be in multiple partitions and even in different file > groups within same partition. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1188) MOR hbase index tables not deduplicating records
[ https://issues.apache.org/jira/browse/HUDI-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1188. - Fix Version/s: 0.6.1 Resolution: Fixed > MOR hbase index tables not deduplicating records > > > Key: HUDI-1188 > URL: https://issues.apache.org/jira/browse/HUDI-1188 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ryan Pifer >Assignee: Ryan Pifer >Priority: Major > Labels: pull-request-available > Fix For: 0.6.1 > > > After fetching hbase index for a record, Hudi performs a validation that the > commit timestamp stored in hbase for that record is a commit on the timeline. > This makes any record that is stored to hbase index during a deltacommit > (upsert on MOR table) considered an invalid commit and treated as a new > record. This causes the hbase index to be updated every time which leads to > records being able to be in multiple partitions and even in different file > groups within same partition. -- This message was sent by Atlassian Jira (v8.3.4#803005)