[jira] [Commented] (HUDI-1741) Row Level TTL Support for records stored in Hudi

2022-10-27 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625407#comment-17625407
 ] 

leesf commented on HUDI-1741:
-

[~nicholasjiang] agree with the solution

> Row Level TTL Support for records stored in Hudi
> 
>
> Key: HUDI-1741
> URL: https://issues.apache.org/jira/browse/HUDI-1741
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Utilities
>Reporter: Balaji Varadarajan
>Priority: Major
>
> For e:g : Have records only updated last month 
>  
> GH: https://github.com/apache/hudi/issues/2743



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4546) Optimize catalog cast logic in HoodieSpark3Analysis

2022-08-04 Thread leesf (Jira)
leesf created HUDI-4546:
---

 Summary: Optimize catalog cast logic in HoodieSpark3Analysis
 Key: HUDI-4546
 URL: https://issues.apache.org/jira/browse/HUDI-4546
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: leesf
Assignee: leesf


In HoodieSpark3Analysis, if it is CreateV2Table, there is no need to cast the 
HoodieCatalog since CreateV2Table contains TableCatalog and we would use it 
directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-4433) Hudi-CLI repair deduplicate not working with non-partitioned dataset

2022-07-26 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-4433:
---

Assignee: brightwon

> Hudi-CLI repair deduplicate not working with non-partitioned dataset
> 
>
> Key: HUDI-4433
> URL: https://issues.apache.org/jira/browse/HUDI-4433
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: brightwon
>Assignee: brightwon
>Priority: Minor
>
> hudi-cli's *repair deduplicate* command is not working with non-partitioned 
> dataset.
> because can't pass *empty value* for *--duplicatedPartitionPath* parameter.
> for example, this command
> repair deduplicate --duplicatedPartitionPath "" --repairedOutputPath 
> "s3://myBucket/table/" --sparkMaster yarn --sparkMemory 4G --dryrun true 
> --dedupeType "upsert_type"
> result is, +_You should specify value for option 'duplicatedPartitionPath' 
> for this command_+
>  
> My slack message link in #general channel
> [https://apache-hudi.slack.com/archives/C4D716NPQ/p1657854371469139|http://example.com/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4315) Do not throw exception in BaseSpark3Adapter#isHoodieTable

2022-06-24 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-4315:

Summary: Do not throw exception in BaseSpark3Adapter#isHoodieTable  (was: 
Do not throw exception in BaseSpark3Adapter#toTableIdentifier )

> Do not throw exception in BaseSpark3Adapter#isHoodieTable
> -
>
> Key: HUDI-4315
> URL: https://issues.apache.org/jira/browse/HUDI-4315
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>
> When using other CatalogPlugin with name leesf along with HoodieCatalog, the 
> sql for the following
> `insert into leesf.db.table select id, name from hudi_db.hudi_table`, the 
> BaseSpark3Adapter#toTableIdentifier method will throw the following exception 
>  
> ```
> org.apache.spark.sql.AnalysisException: leesf.db.table is not a valid 
> TableIdentifier as it has more than 2 name parts.
>     at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.identifierHavingMoreThanTwoNamePartsError(QueryCompilationErrors.scala:1394)
> ```



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HUDI-4315) Do not throw exception in BaseSpark3Adapter#toTableIdentifier

2022-06-24 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-4315:

Summary: Do not throw exception in BaseSpark3Adapter#toTableIdentifier   
(was: Do not throw exception when using BaseSpark3Adapter#toTableIdentifier )

> Do not throw exception in BaseSpark3Adapter#toTableIdentifier 
> --
>
> Key: HUDI-4315
> URL: https://issues.apache.org/jira/browse/HUDI-4315
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>
> When using other CatalogPlugin with name leesf along with HoodieCatalog, the 
> sql for the following
> `insert into leesf.db.table select id, name from hudi_db.hudi_table`, the 
> BaseSpark3Adapter#toTableIdentifier method will throw the following exception 
>  
> ```
> org.apache.spark.sql.AnalysisException: leesf.db.table is not a valid 
> TableIdentifier as it has more than 2 name parts.
>     at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.identifierHavingMoreThanTwoNamePartsError(QueryCompilationErrors.scala:1394)
> ```



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HUDI-4315) Do not throw exception when using BaseSpark3Adapter#toTableIdentifier

2022-06-24 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-4315:

Summary: Do not throw exception when using 
BaseSpark3Adapter#toTableIdentifier   (was: Do not throw exception when using 
toTableIdentifier )

> Do not throw exception when using BaseSpark3Adapter#toTableIdentifier 
> --
>
> Key: HUDI-4315
> URL: https://issues.apache.org/jira/browse/HUDI-4315
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>
> When using other CatalogPlugin with name leesf along with HoodieCatalog, the 
> sql for the following
> `insert into leesf.db.table select id, name from hudi_db.hudi_table`, the 
> BaseSpark3Adapter#toTableIdentifier method will throw the following exception 
>  
> ```
> org.apache.spark.sql.AnalysisException: leesf.db.table is not a valid 
> TableIdentifier as it has more than 2 name parts.
>     at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.identifierHavingMoreThanTwoNamePartsError(QueryCompilationErrors.scala:1394)
> ```



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HUDI-4315) Do not throw exception when using toTableIdentifier

2022-06-24 Thread leesf (Jira)
leesf created HUDI-4315:
---

 Summary: Do not throw exception when using toTableIdentifier 
 Key: HUDI-4315
 URL: https://issues.apache.org/jira/browse/HUDI-4315
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: leesf
Assignee: leesf


When using other CatalogPlugin with name leesf along with HoodieCatalog, the 
sql for the following

`insert into leesf.db.table select id, name from hudi_db.hudi_table`, the 
BaseSpark3Adapter#toTableIdentifier method will throw the following exception 

 

```

org.apache.spark.sql.AnalysisException: leesf.db.table is not a valid 
TableIdentifier as it has more than 2 name parts.
    at 
org.apache.spark.sql.errors.QueryCompilationErrors$.identifierHavingMoreThanTwoNamePartsError(QueryCompilationErrors.scala:1394)

```



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HUDI-4183) Fix using HoodieCatalog to create non-hudi tables

2022-06-03 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-4183:

Description: For now, when users specify `HoodieCatalog` in 0.11.0, they 
would not create non-hudi tables since HoodieCatalog#createTable do not handle 
the logic of non-hudi tables, in fact the logic is missed in #createTable 
method, and we should fix it.

> Fix using HoodieCatalog to create non-hudi tables
> -
>
> Key: HUDI-4183
> URL: https://issues.apache.org/jira/browse/HUDI-4183
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> For now, when users specify `HoodieCatalog` in 0.11.0, they would not create 
> non-hudi tables since HoodieCatalog#createTable do not handle the logic of 
> non-hudi tables, in fact the logic is missed in #createTable method, and we 
> should fix it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (HUDI-4178) HoodieSpark3Analysis does not pass schema from Spark Catalog

2022-06-03 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17545899#comment-17545899
 ] 

leesf edited comment on HUDI-4178 at 6/3/22 2:46 PM:
-

[~alexey.kudinkin] `making them fetch the schema from storage (either from 
commit's metadata or data file) every time.` means fetch only once schema for a 
write operation or fetch many times for a write operation? and how much 
performance it affects while fetching from storage?


was (Author: xleesf):
[~alexey.kudinkin] `making them fetch the schema from storage (either from 
commit's metadata or data file) every time.` means fetch only once schema for a 
write operation or fetch many times for a write operation?

> HoodieSpark3Analysis does not pass schema from Spark Catalog
> 
>
> Key: HUDI-4178
> URL: https://issues.apache.org/jira/browse/HUDI-4178
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.1
>
>
> Currently, HoodieSpark3Analysis rule does not pass table's schema from the 
> Spark Catalog to Hudi's relations making them fetch the schema from storage 
> (either from commit's metadata or data file) every time.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (HUDI-4178) HoodieSpark3Analysis does not pass schema from Spark Catalog

2022-06-03 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17545899#comment-17545899
 ] 

leesf edited comment on HUDI-4178 at 6/3/22 2:46 PM:
-

[~alexey.kudinkin] hi, `making them fetch the schema from storage (either from 
commit's metadata or data file) every time.` means fetch only once schema for a 
write operation or fetch many times for a write operation? and how much 
performance it affects while fetching from storage?


was (Author: xleesf):
[~alexey.kudinkin] `making them fetch the schema from storage (either from 
commit's metadata or data file) every time.` means fetch only once schema for a 
write operation or fetch many times for a write operation? and how much 
performance it affects while fetching from storage?

> HoodieSpark3Analysis does not pass schema from Spark Catalog
> 
>
> Key: HUDI-4178
> URL: https://issues.apache.org/jira/browse/HUDI-4178
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.1
>
>
> Currently, HoodieSpark3Analysis rule does not pass table's schema from the 
> Spark Catalog to Hudi's relations making them fetch the schema from storage 
> (either from commit's metadata or data file) every time.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HUDI-4178) HoodieSpark3Analysis does not pass schema from Spark Catalog

2022-06-03 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17545899#comment-17545899
 ] 

leesf commented on HUDI-4178:
-

[~alexey.kudinkin] `making them fetch the schema from storage (either from 
commit's metadata or data file) every time.` means fetch only once schema for a 
write operation or fetch many times for a write operation?

> HoodieSpark3Analysis does not pass schema from Spark Catalog
> 
>
> Key: HUDI-4178
> URL: https://issues.apache.org/jira/browse/HUDI-4178
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.1
>
>
> Currently, HoodieSpark3Analysis rule does not pass table's schema from the 
> Spark Catalog to Hudi's relations making them fetch the schema from storage 
> (either from commit's metadata or data file) every time.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Closed] (HUDI-4183) Fix using HoodieCatalog to create non-hudi tables

2022-06-03 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-4183.
---
Resolution: Fixed

> Fix using HoodieCatalog to create non-hudi tables
> -
>
> Key: HUDI-4183
> URL: https://issues.apache.org/jira/browse/HUDI-4183
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HUDI-4183) Fix using HoodieCatalog to create non-hudi tables

2022-06-03 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-4183:

Fix Version/s: 0.12.0

> Fix using HoodieCatalog to create non-hudi tables
> -
>
> Key: HUDI-4183
> URL: https://issues.apache.org/jira/browse/HUDI-4183
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HUDI-4183) Fix using HoodieCatalog to create non-hudi tables

2022-06-03 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-4183.
-

> Fix using HoodieCatalog to create non-hudi tables
> -
>
> Key: HUDI-4183
> URL: https://issues.apache.org/jira/browse/HUDI-4183
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HUDI-4183) Fix using HoodieCatalog to create non-hudi tables

2022-06-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-4183:

Summary: Fix using HoodieCatalog to create non-hudi tables  (was: Fix using 
HoodieCatalog to create non hudi tables)

> Fix using HoodieCatalog to create non-hudi tables
> -
>
> Key: HUDI-4183
> URL: https://issues.apache.org/jira/browse/HUDI-4183
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HUDI-4183) Fix using HoodieCatalog to create non hudi tables

2022-06-02 Thread leesf (Jira)
leesf created HUDI-4183:
---

 Summary: Fix using HoodieCatalog to create non hudi tables
 Key: HUDI-4183
 URL: https://issues.apache.org/jira/browse/HUDI-4183
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: leesf
Assignee: leesf






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HUDI-3861) 'path' in CatalogTable#properties failed to be updated when renaming table

2022-04-12 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521087#comment-17521087
 ] 

leesf commented on HUDI-3861:
-

[~KnightChess] yeah, if the real table path is updated, the tblp should also be 
updated, would you mind opening a PR to fix it?

> 'path' in CatalogTable#properties failed to be updated when renaming table
> --
>
> Key: HUDI-3861
> URL: https://issues.apache.org/jira/browse/HUDI-3861
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jin Xing
>Priority: Minor
>
> Reproduce the issue as below
> {code:java}
> 1. Create a MOR table 
> create table mor_simple(
>   id int,
>   name string,
>   price double
> )
> using hudi
> options (
>   type = 'cow',
>   primaryKey = 'id'
> )
> 2. Renaming
> alter table mor_simple rename to mor_simple0
> 3. Show create table mor_simple0
> Output as
> CREATE TABLE hudi.mor_simple0 (
>   `_hoodie_commit_time` STRING,
>   `_hoodie_commit_seqno` STRING,
>   `_hoodie_record_key` STRING,
>   `_hoodie_partition_path` STRING,
>   `_hoodie_file_name` STRING,
>   `id` INT,
>   `name` STRING,
>   `price` DOUBLE)
> USING hudi
> OPTIONS(
>   'primaryKey' = 'id',
>   'type' = 'cow')
> TBLPROPERTIES(
>   'path' = '/user/hive/warehous/hudi.db/mor_simple'){code}
> As we can see, the 'path' property is 
> '/user/hive/warehous/hudi.db/mor_simple', rather than 
> '/user/hive/warehous/hudi.db/mor_simple0'.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HUDI-3861) 'path' in CatalogTable#properties failed to be updated when renaming table

2022-04-12 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521054#comment-17521054
 ] 

leesf commented on HUDI-3861:
-

[~jinxing6...@126.com] Thanks for reporting this, but I think renaming should 
not change the path, but only change the table name in hoodie.properties.

> 'path' in CatalogTable#properties failed to be updated when renaming table
> --
>
> Key: HUDI-3861
> URL: https://issues.apache.org/jira/browse/HUDI-3861
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jin Xing
>Priority: Minor
>
> Reproduce the issue as below
> {code:java}
> 1. Create a MOR table 
> create table mor_simple(
>   id int,
>   name string,
>   price double
> )
> using hudi
> options (
>   type = 'cow',
>   primaryKey = 'id'
> )
> 2. Renaming
> alter table mor_simple rename to mor_simple0
> 3. Show create table mor_simple0
> Output as
> CREATE TABLE hudi.mor_simple0 (
>   `_hoodie_commit_time` STRING,
>   `_hoodie_commit_seqno` STRING,
>   `_hoodie_record_key` STRING,
>   `_hoodie_partition_path` STRING,
>   `_hoodie_file_name` STRING,
>   `id` INT,
>   `name` STRING,
>   `price` DOUBLE)
> USING hudi
> OPTIONS(
>   'primaryKey' = 'id',
>   'type' = 'cow')
> TBLPROPERTIES(
>   'path' = '/user/hive/warehous/hudi.db/mor_simple'){code}
> As we can see, the 'path' property is 
> '/user/hive/warehous/hudi.db/mor_simple', rather than 
> '/user/hive/warehous/hudi.db/mor_simple0'.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HUDI-2520) Certify sync with Hive 3

2022-03-26 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17512797#comment-17512797
 ] 

leesf commented on HUDI-2520:
-

[~rex_xiong] hi, are you working on a fix?

> Certify sync with Hive 3
> 
>
> Key: HUDI-2520
> URL: https://issues.apache.org/jira/browse/HUDI-2520
> Project: Apache Hudi
>  Issue Type: Task
>  Components: hive, meta-sync
>Reporter: Sagar Sumit
>Assignee: rex xiong
>Priority: Blocker
> Fix For: 0.11.0
>
> Attachments: image-2022-03-14-15-52-02-021.png
>
>
> # when execute CTAS statment,the query failed due to twice sync meta problem: 
> HoodieSparkSqlWriter synced meta first time, followed by 
> HoodieCatalog.createHoodieTable synced the second time when 
> HoodieStagedTable.commitStagedChanges
> {code:java}
> create table if not exists h3_cow using hudi partitioned by (dt) options 
> (type = 'cow', primaryKey = 'id,name') as select 1 as id, 'a1' as name, 20 as 
> price, '2021-01-03' as dt;
> 22/03/14 14:26:21 ERROR [main] Utils: Aborting task
> org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException: Table or 
> view 'h3_cow' already exists in database 'default'
>         at 
> org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createHiveDataSourceTable(CreateHoodieTableCommand.scala:172)
>         at 
> org.apache.spark.sql.hudi.command.CreateHoodieTableCommand$.createTableInCatalog(CreateHoodieTableCommand.scala:148)
>         at 
> org.apache.spark.sql.hudi.catalog.HoodieCatalog.createHoodieTable(HoodieCatalog.scala:254)
>         at 
> org.apache.spark.sql.hudi.catalog.HoodieStagedTable.commitStagedChanges(HoodieStagedTable.scala:62)
>         at 
> org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.$anonfun$writeToTable$1(WriteToDataSourceV2Exec.scala:484)
>         at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496)
>         at 
> org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable(WriteToDataSourceV2Exec.scala:468)
>         at 
> org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable$(WriteToDataSourceV2Exec.scala:463)
>         at 
> org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.writeToTable(WriteToDataSourceV2Exec.scala:106)
>         at 
> org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:127)
>         at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
>         at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
>         at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
>         at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
>         at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
>         at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
>         at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
>         at 
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>         at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>         at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
>         at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
>         at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
>         at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>         at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481){code}
> 2. when truncate partition table,neither metadata nor data is truncated and 
> truncate partition table with partition specs fails 
> {code:java}
> // truncate partition table without partition spec, the query is success but 
> never delete data
> spark-sql> truncate table mor_partition_table_0314;
> Time taken: 0.256 seconds
> // truncate partition table with partition spec, 
> spark-sql> truncate table mor_partition_table_0314 partition(dt=3);
> Error in query: Table spark_catalog.default.mor_partition_table_0314 does not 
> support partition management.;
> 'TruncatePartition unresolvedpartitionspec((dt,3), None)
> +- ResolvedTable org.apache.spark.sql.hudi.catalog.HoodieCatalog@63f609a4, 
> default.mor_partition_table_0314,
> {code}
> 3. re-drop exist partition 

[jira] [Created] (HUDI-3489) Unify config to avoid duplicate code

2022-02-23 Thread leesf (Jira)
leesf created HUDI-3489:
---

 Summary: Unify config to avoid duplicate code
 Key: HUDI-3489
 URL: https://issues.apache.org/jira/browse/HUDI-3489
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: leesf
Assignee: leesf






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HUDI-2732) Spark Datasource V2 integration RFC

2022-02-22 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496453#comment-17496453
 ] 

leesf commented on HUDI-2732:
-

[~shivnarayan] yes, we can close the Jira.

> Spark Datasource V2 integration RFC 
> 
>
> Key: HUDI-2732
> URL: https://issues.apache.org/jira/browse/HUDI-2732
> Project: Apache Hudi
>  Issue Type: Task
>  Components: spark
>Reporter: leesf
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3416) Incremental read using v2 datasource

2022-02-13 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-3416:

Issue Type: Improvement  (was: Bug)

> Incremental read using v2 datasource
> 
>
> Key: HUDI-3416
> URL: https://issues.apache.org/jira/browse/HUDI-3416
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 0.12.0
>
>
> currently, we still use v1 format for incremental read, and need to use v2 
> format as well.
> see comment: https://github.com/apache/hudi/pull/4611#discussion_r795089099



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3416) Incremental read using v2 datasource

2022-02-13 Thread leesf (Jira)
leesf created HUDI-3416:
---

 Summary: Incremental read using v2 datasource
 Key: HUDI-3416
 URL: https://issues.apache.org/jira/browse/HUDI-3416
 Project: Apache Hudi
  Issue Type: Bug
Reporter: leesf
Assignee: leesf


currently, we still use v1 format for incremental read, and need to use v2 
format as well.

see comment: https://github.com/apache/hudi/pull/4611#discussion_r795089099



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-2645) Rewrite Zoptimize and other files in scala into Java

2022-01-17 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-2645:
---

Assignee: shibei

> Rewrite Zoptimize and other files in scala into Java
> 
>
> Key: HUDI-2645
> URL: https://issues.apache.org/jira/browse/HUDI-2645
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Vinoth Chandar
>Assignee: shibei
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-2873) Support optimize data layout by sql and make the build more fast

2022-01-17 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-2873:
---

Assignee: shibei

> Support optimize data layout by sql and make the build more fast
> 
>
> Key: HUDI-2873
> URL: https://issues.apache.org/jira/browse/HUDI-2873
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Performance, spark
>Reporter: tao meng
>Assignee: shibei
>Priority: Critical
>  Labels: sev:high
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-3172) Refactor hudi existing modules to make more code reuse in V2 implementation

2022-01-16 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-3172.
---

> Refactor hudi existing modules to make more code reuse in V2 implementation
> ---
>
> Key: HUDI-3172
> URL: https://issues.apache.org/jira/browse/HUDI-3172
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Reopened] (HUDI-3172) Refactor hudi existing modules to make more code reuse in V2 implementation

2022-01-16 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reopened HUDI-3172:
-

> Refactor hudi existing modules to make more code reuse in V2 implementation
> ---
>
> Key: HUDI-3172
> URL: https://issues.apache.org/jira/browse/HUDI-3172
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HUDI-3172) Refactor hudi existing modules to make more code reuse in V2 implementation

2022-01-16 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-3172.
-

> Refactor hudi existing modules to make more code reuse in V2 implementation
> ---
>
> Key: HUDI-3172
> URL: https://issues.apache.org/jira/browse/HUDI-3172
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3254) Introduce HoodieCatalog to manage tables for Spark Datasource V2

2022-01-16 Thread leesf (Jira)
leesf created HUDI-3254:
---

 Summary: Introduce HoodieCatalog to manage tables for Spark 
Datasource V2
 Key: HUDI-3254
 URL: https://issues.apache.org/jira/browse/HUDI-3254
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: leesf
Assignee: leesf






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HUDI-3172) Refactor hudi existing modules to make more code reuse in V2 implementation

2022-01-16 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-3172.
-

> Refactor hudi existing modules to make more code reuse in V2 implementation
> ---
>
> Key: HUDI-3172
> URL: https://issues.apache.org/jira/browse/HUDI-3172
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-3172) Refactor hudi existing modules to make more code reuse in V2 implementation

2022-01-16 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-3172.
---

> Refactor hudi existing modules to make more code reuse in V2 implementation
> ---
>
> Key: HUDI-3172
> URL: https://issues.apache.org/jira/browse/HUDI-3172
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3172) Refactor hudi existing modules to make more code reuse in V2 implementation

2022-01-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-3172:

Issue Type: Improvement  (was: Bug)

> Refactor hudi existing modules to make more code reuse in V2 implementation
> ---
>
> Key: HUDI-3172
> URL: https://issues.apache.org/jira/browse/HUDI-3172
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-3172) Refactor hudi existing modules to make more code reuse in V2 implementation

2022-01-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-3172:
---

Assignee: leesf

> Refactor hudi existing modules to make more code reuse in V2 implementation
> ---
>
> Key: HUDI-3172
> URL: https://issues.apache.org/jira/browse/HUDI-3172
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3140) Fix bulk_insert failure on Spark 3.2.0

2022-01-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-3140:

Fix Version/s: 0.11.0

> Fix bulk_insert failure on Spark 3.2.0
> --
>
> Key: HUDI-3140
> URL: https://issues.apache.org/jira/browse/HUDI-3140
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HUDI-3140) Fix bulk_insert failure on Spark 3.2.0

2022-01-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-3140.
-

> Fix bulk_insert failure on Spark 3.2.0
> --
>
> Key: HUDI-3140
> URL: https://issues.apache.org/jira/browse/HUDI-3140
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-3140) Fix bulk_insert failure on Spark 3.2.0

2022-01-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-3140.
---

> Fix bulk_insert failure on Spark 3.2.0
> --
>
> Key: HUDI-3140
> URL: https://issues.apache.org/jira/browse/HUDI-3140
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-3140) Fix bulk_insert failure on Spark 3.2.0

2022-01-05 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-3140:
---

Assignee: leesf

> Fix bulk_insert failure on Spark 3.2.0
> --
>
> Key: HUDI-3140
> URL: https://issues.apache.org/jira/browse/HUDI-3140
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3172) Refactor hudi existing modules to make more code reuse in V2 implementation

2022-01-05 Thread leesf (Jira)
leesf created HUDI-3172:
---

 Summary: Refactor hudi existing modules to make more code reuse in 
V2 implementation
 Key: HUDI-3172
 URL: https://issues.apache.org/jira/browse/HUDI-3172
 Project: Apache Hudi
  Issue Type: Bug
Reporter: leesf






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3140) Fix bulk_insert failure on Spark 3.2.0

2022-01-03 Thread leesf (Jira)
leesf created HUDI-3140:
---

 Summary: Fix bulk_insert failure on Spark 3.2.0
 Key: HUDI-3140
 URL: https://issues.apache.org/jira/browse/HUDI-3140
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: leesf






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-3134) Fix Insert error after adding columns on Spark 3.2.0

2022-01-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-3134.
---

> Fix Insert error after adding columns on Spark 3.2.0
> 
>
> Key: HUDI-3134
> URL: https://issues.apache.org/jira/browse/HUDI-3134
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> On Spark 3.2.0, after altering table to add columns, the insert statement 
> will fail with the following exception.
> Caused by: org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:147)
>   at 
> org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
>   ... 31 more
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
>   ... 32 more
> Caused by: org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:248)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:278)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   ... 3 more
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.avro.Schema$Field.defaultValue()Lorg/codehaus/jackson/JsonNode;
>   at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:168)
>   at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95)
>   at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>   at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185)
>   at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
>   at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>   at 
> org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
>   at 
> org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   ... 4 more



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3134) Fix Insert error after adding columns on Spark 3.2.0

2022-01-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-3134:

Component/s: Spark Integration

> Fix Insert error after adding columns on Spark 3.2.0
> 
>
> Key: HUDI-3134
> URL: https://issues.apache.org/jira/browse/HUDI-3134
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> On Spark 3.2.0, after altering table to add columns, the insert statement 
> will fail with the following exception.
> Caused by: org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:147)
>   at 
> org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
>   ... 31 more
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
>   ... 32 more
> Caused by: org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:248)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:278)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   ... 3 more
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.avro.Schema$Field.defaultValue()Lorg/codehaus/jackson/JsonNode;
>   at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:168)
>   at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95)
>   at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>   at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185)
>   at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
>   at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>   at 
> org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
>   at 
> org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   ... 4 more



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3134) Fix Insert error after adding columns on Spark 3.2.0

2022-01-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-3134:

Fix Version/s: 0.11.0

> Fix Insert error after adding columns on Spark 3.2.0
> 
>
> Key: HUDI-3134
> URL: https://issues.apache.org/jira/browse/HUDI-3134
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> On Spark 3.2.0, after altering table to add columns, the insert statement 
> will fail with the following exception.
> Caused by: org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:147)
>   at 
> org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
>   ... 31 more
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
>   ... 32 more
> Caused by: org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:248)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:278)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   ... 3 more
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.avro.Schema$Field.defaultValue()Lorg/codehaus/jackson/JsonNode;
>   at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:168)
>   at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95)
>   at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>   at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185)
>   at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
>   at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>   at 
> org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
>   at 
> org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   ... 4 more



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HUDI-3134) Fix Insert error after adding columns on Spark 3.2.0

2022-01-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-3134.
-

> Fix Insert error after adding columns on Spark 3.2.0
> 
>
> Key: HUDI-3134
> URL: https://issues.apache.org/jira/browse/HUDI-3134
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>
> On Spark 3.2.0, after altering table to add columns, the insert statement 
> will fail with the following exception.
> Caused by: org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:147)
>   at 
> org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
>   ... 31 more
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
>   ... 32 more
> Caused by: org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:248)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:278)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   ... 3 more
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.avro.Schema$Field.defaultValue()Lorg/codehaus/jackson/JsonNode;
>   at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:168)
>   at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95)
>   at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>   at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185)
>   at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
>   at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>   at 
> org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
>   at 
> org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   ... 4 more



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3134) Fix Insert error after adding columns on Spark 3.2.0

2021-12-31 Thread leesf (Jira)
leesf created HUDI-3134:
---

 Summary: Fix Insert error after adding columns on Spark 3.2.0
 Key: HUDI-3134
 URL: https://issues.apache.org/jira/browse/HUDI-3134
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: leesf
Assignee: leesf


On Spark 3.2.0, after altering table to add columns, the insert statement will 
fail with the following exception.

Caused by: org.apache.hudi.exception.HoodieException: 
java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: operation has failed
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:147)
  at 
org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
  ... 31 more
Caused by: java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: operation has failed
  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
  ... 32 more
Caused by: org.apache.hudi.exception.HoodieException: operation has failed
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:248)
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:278)
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  ... 3 more
Caused by: java.lang.NoSuchMethodError: 
org.apache.avro.Schema$Field.defaultValue()Lorg/codehaus/jackson/JsonNode;
  at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:168)
  at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95)
  at 
org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
  at 
org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)
  at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185)
  at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
  at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
  at 
org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
  at 
org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  ... 4 more



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3047) Basic Implementation of Spark Datasource V2

2021-12-31 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-3047:

Fix Version/s: 0.11.0

> Basic Implementation of Spark Datasource V2
> ---
>
> Key: HUDI-3047
> URL: https://issues.apache.org/jira/browse/HUDI-3047
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> Introduce HoodieCatalog and HoodieInternalTableV2 to implement read and write 
> path 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3047) Basic Implementation of Spark Datasource V2

2021-12-31 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-3047:

Priority: Blocker  (was: Major)

> Basic Implementation of Spark Datasource V2
> ---
>
> Key: HUDI-3047
> URL: https://issues.apache.org/jira/browse/HUDI-3047
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> Introduce HoodieCatalog and HoodieInternalTableV2 to implement read and write 
> path 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3047) Basic Implementation of Spark Datasource V2

2021-12-16 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-3047:

Summary: Basic Implementation of Spark Datasource V2  (was: Basic Implement 
of Spark Datasource V2)

> Basic Implementation of Spark Datasource V2
> ---
>
> Key: HUDI-3047
> URL: https://issues.apache.org/jira/browse/HUDI-3047
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>
> Introduce HoodieCatalog and HoodieInternalTableV2 to implement read and write 
> path 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-3047) Basic Implement of Spark Datasource V2

2021-12-16 Thread leesf (Jira)
leesf created HUDI-3047:
---

 Summary: Basic Implement of Spark Datasource V2
 Key: HUDI-3047
 URL: https://issues.apache.org/jira/browse/HUDI-3047
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: leesf
Assignee: leesf


Introduce HoodieCatalog and HoodieInternalTableV2 to implement read and write 
path 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-2813) Claim RFC number for RFC for spark datasource V2 Integration

2021-12-03 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-2813.
---

> Claim RFC number for RFC for spark datasource V2 Integration 
> -
>
> Key: HUDI-2813
> URL: https://issues.apache.org/jira/browse/HUDI-2813
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2813) Claim RFC number for RFC for spark datasource V2 Integration

2021-12-03 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-2813:

Fix Version/s: 0.11.0

> Claim RFC number for RFC for spark datasource V2 Integration 
> -
>
> Key: HUDI-2813
> URL: https://issues.apache.org/jira/browse/HUDI-2813
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HUDI-2813) Claim RFC number for RFC for spark datasource V2 Integration

2021-12-03 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-2813.
-

> Claim RFC number for RFC for spark datasource V2 Integration 
> -
>
> Key: HUDI-2813
> URL: https://issues.apache.org/jira/browse/HUDI-2813
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2916) Add IssueNavigationLink for IDEA

2021-12-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-2916:

Summary: Add IssueNavigationLink for IDEA  (was:  Add issue and jira 
navigation link for IDEA)

> Add IssueNavigationLink for IDEA
> 
>
> Key: HUDI-2916
> URL: https://issues.apache.org/jira/browse/HUDI-2916
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2916) Add issue and jira navigation link for IDEA

2021-12-02 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-2916:

Summary:  Add issue and jira navigation link for IDEA  (was:  Add 
IssueNavigationLink for IDEA git log)

>  Add issue and jira navigation link for IDEA
> 
>
> Key: HUDI-2916
> URL: https://issues.apache.org/jira/browse/HUDI-2916
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: leesf
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-2916) Add IssueNavigationLink for IDEA git log

2021-12-02 Thread leesf (Jira)
leesf created HUDI-2916:
---

 Summary:  Add IssueNavigationLink for IDEA git log
 Key: HUDI-2916
 URL: https://issues.apache.org/jira/browse/HUDI-2916
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: leesf
Assignee: leesf






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2100) [UMBRELLA] Support Space curve for hudi

2021-11-27 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-2100:

Fix Version/s: 0.11.0

> [UMBRELLA] Support Space curve for hudi
> ---
>
> Key: HUDI-2100
> URL: https://issues.apache.org/jira/browse/HUDI-2100
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Spark Integration
>Reporter: tao meng
>Assignee: tao meng
>Priority: Blocker
>  Labels: hudi-umbrellas
> Fix For: 0.11.0
>
>
> supoort space curve to optimize the cluster of hudi file to improve query 
> performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-2813) Claim RFC number for RFC for spark datasource V2 Integration

2021-11-20 Thread leesf (Jira)
leesf created HUDI-2813:
---

 Summary: Claim RFC number for RFC for spark datasource V2 
Integration 
 Key: HUDI-2813
 URL: https://issues.apache.org/jira/browse/HUDI-2813
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: leesf
Assignee: leesf






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-2732) Spark Datasource V2 integration RFC

2021-11-10 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-2732:
---

Assignee: leesf

> Spark Datasource V2 integration RFC 
> 
>
> Key: HUDI-2732
> URL: https://issues.apache.org/jira/browse/HUDI-2732
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-2732) Spark Datasource V2 integration RFC

2021-11-10 Thread leesf (Jira)
leesf created HUDI-2732:
---

 Summary: Spark Datasource V2 integration RFC 
 Key: HUDI-2732
 URL: https://issues.apache.org/jira/browse/HUDI-2732
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: leesf






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-2413) Sql source in delta streamer does not work

2021-09-11 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-2413:
---

Assignee: Jian Feng

> Sql source in delta streamer does not work
> --
>
> Key: HUDI-2413
> URL: https://issues.apache.org/jira/browse/HUDI-2413
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Jian Feng
>Assignee: Jian Feng
>Priority: Major
>
> sql source return null checkpoint,  in DeltaSync null checkpoint will be 
> judged as no new data,should return a empty string



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2064) Fix TestHoodieBackedMetadata#testOnlyValidPartitionsAdded

2021-06-23 Thread leesf (Jira)
leesf created HUDI-2064:
---

 Summary: Fix TestHoodieBackedMetadata#testOnlyValidPartitionsAdded
 Key: HUDI-2064
 URL: https://issues.apache.org/jira/browse/HUDI-2064
 Project: Apache Hudi
  Issue Type: Bug
Reporter: leesf
Assignee: leesf






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1922) bulk insert with row writer supports mor table

2021-05-23 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1922:

Affects Version/s: 0.8.0

> bulk insert with row writer supports mor table 
> ---
>
> Key: HUDI-1922
> URL: https://issues.apache.org/jira/browse/HUDI-1922
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>
> now when using bulk insert mode with row writer and set table type to mor, 
> the bulk insert fails



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1922) bulk insert with row writer supports mor table

2021-05-23 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1922:

Fix Version/s: 0.9.0

> bulk insert with row writer supports mor table 
> ---
>
> Key: HUDI-1922
> URL: https://issues.apache.org/jira/browse/HUDI-1922
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 0.9.0
>
>
> now when using bulk insert mode with row writer and set table type to mor, 
> the bulk insert fails



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1922) bulk insert with row writer supports mor table

2021-05-23 Thread leesf (Jira)
leesf created HUDI-1922:
---

 Summary: bulk insert with row writer supports mor table 
 Key: HUDI-1922
 URL: https://issues.apache.org/jira/browse/HUDI-1922
 Project: Apache Hudi
  Issue Type: Bug
Reporter: leesf
Assignee: leesf


now when using bulk insert mode with row writer and set table type to mor, the 
bulk insert fails



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1460) Time Travel (querying the historical versions of data) ability for Hudi Table

2020-12-14 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249079#comment-17249079
 ] 

leesf commented on HUDI-1460:
-

[~qian heng] sorry would not access the google doc you provided, and it would 
be better if you would send a discuss email to dev ML. 

> Time Travel (querying the historical versions of data) ability for Hudi Table
> -
>
> Key: HUDI-1460
> URL: https://issues.apache.org/jira/browse/HUDI-1460
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Common Core
>Reporter: qian heng
>Priority: Major
>
> Hi, all:
> We plan to use Hudi to sync mysql binlog data. There will be a flink ETL task 
> to consume binlog records from kafka and save data to hudi every one hour. 
> The binlog records are also grouped every one hour and all records of one 
> hour will be saved in one commit. The data transmission pipeline should be 
> like -- binlog -> kafka -> flink -> parquet.
> After the data is synced to hudi, we want to querying the historical hourly 
> versions of the Hudi table in hive SQL.
> Here is a more detailed description of our issue along with a simply design 
> of Time Travel for Hudi, the design is under development and testing:
> [https://docs.google.com/document/d/1r0iwUsklw9aKSDMzZaiq43dy57cSJSAqT9KCvgjbtUo/edit#]
> We have to support Time Travel ability recently for our business needs. We 
> also have seen the [RFC 
> 07|https://cwiki.apache.org/confluence/display/HUDI/RFC+-+07+%3A+Point+in+time+Time-Travel+queries+on+Hudi+table].
> Be glad to receive any suggestion or dicussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1161) Support update partial fields for MoR table

2020-09-25 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-1161:
---

Assignee: Nicholas Jiang  (was: leesf)

> Support update partial fields for MoR table
> ---
>
> Key: HUDI-1161
> URL: https://issues.apache.org/jira/browse/HUDI-1161
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: leesf
>Assignee: Nicholas Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1123) Document the usage of user define metrics reporter

2020-09-20 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1123.
---

> Document the usage of user define metrics reporter
> --
>
> Key: HUDI-1123
> URL: https://issues.apache.org/jira/browse/HUDI-1123
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: Zheren Yu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1123) Document the usage of user define metrics reporter

2020-09-20 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-1123.
-
Fix Version/s: 0.6.0
   Resolution: Fixed

> Document the usage of user define metrics reporter
> --
>
> Key: HUDI-1123
> URL: https://issues.apache.org/jira/browse/HUDI-1123
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: Zheren Yu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1124) Document the usage of Tencent COSN

2020-09-20 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1124:

Status: Open  (was: New)

> Document the usage of Tencent COSN
> --
>
> Key: HUDI-1124
> URL: https://issues.apache.org/jira/browse/HUDI-1124
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: deyzhong
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1124) Document the usage of Tencent COSN

2020-09-20 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1124.
---

> Document the usage of Tencent COSN
> --
>
> Key: HUDI-1124
> URL: https://issues.apache.org/jira/browse/HUDI-1124
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: deyzhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1124) Document the usage of Tencent COSN

2020-09-20 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-1124.
-
Fix Version/s: 0.6.0
   Resolution: Fixed

> Document the usage of Tencent COSN
> --
>
> Key: HUDI-1124
> URL: https://issues.apache.org/jira/browse/HUDI-1124
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: deyzhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1123) Document the usage of user define metrics reporter

2020-09-20 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1123:

Status: Open  (was: New)

> Document the usage of user define metrics reporter
> --
>
> Key: HUDI-1123
> URL: https://issues.apache.org/jira/browse/HUDI-1123
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: Zheren Yu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1287) Make deltastrmer supports custom ETL transformer

2020-09-20 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198994#comment-17198994
 ] 

leesf commented on HUDI-1287:
-

[~liujinhui] DeltaStreamer should support user custom Transformer. you would 
just implement your own transformer to implement Transformer interface.

> Make deltastrmer supports custom ETL transformer
> 
>
> Key: HUDI-1287
> URL: https://issues.apache.org/jira/browse/HUDI-1287
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: liujinhui
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1288) DeltaSync:writeToSink fails with Unknown datum type org.apache.avro.JsonProperties$Null

2020-09-20 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198992#comment-17198992
 ] 

leesf commented on HUDI-1288:
-

[~soltar] I found there are still some users face the issue 
https://github.com/apache/avro/pull/290#issuecomment-625731714. and does  
0.5.2-incubating works well?

> DeltaSync:writeToSink fails with Unknown datum type 
> org.apache.avro.JsonProperties$Null
> ---
>
> Key: HUDI-1288
> URL: https://issues.apache.org/jira/browse/HUDI-1288
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Michal Swiatowy
>Priority: Major
>
> After updating to Hudi version 0.5.3 (prev. 0.5.2-incubating) I run into 
> following error message on write to HDFS:
> {code:java}
> 2020-09-18 12:54:38,651 [Driver] INFO  
> HoodieTableMetaClient:initTableAndGetMetaClient:379 - Finished initializing 
> Table of type MERGE_ON_READ from 
> /master_data/6FQS/hudi_test/S_INCOMINGMESSAGEDETAIL_CDC
> 2020-09-18 12:54:38,663 [Driver] INFO  DeltaSync:setupWriteClient:470 - 
> Setting up Hoodie Write Client
> 2020-09-18 12:54:38,695 [Driver] INFO  DeltaSync:registerAvroSchemas:522 - 
> Registering Schema 
> 

[jira] [Resolved] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly

2020-09-12 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-802.

Resolution: Fixed

> AWSDmsTransformer does not handle insert -> delete of a row in a single batch 
> correctly
> ---
>
> Key: HUDI-802
> URL: https://issues.apache.org/jira/browse/HUDI-802
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Christopher Weaver
>Assignee: Balaji Varadarajan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> The provided AWSDmsAvroPayload class 
> ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java])
>  currently handles cases where the "Op" column is a "D" for updates, and 
> successfully removes the row from the resulting table. 
> However, when an insert is quickly followed by a delete on the row (e.g. DMS 
> processes them together and puts the update records together in the same 
> parquet file), the row incorrectly appears in the resulting table. In this 
> case, the record is not in the table and getInsertValue is called rather than 
> combineAndGetUpdateValue. Since the logic to check for a delete is in 
> combineAndGetUpdateValue, it is skipped and the delete is missed. Something 
> like this could fix this issue: 
> [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly

2020-09-12 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-802.
--

> AWSDmsTransformer does not handle insert -> delete of a row in a single batch 
> correctly
> ---
>
> Key: HUDI-802
> URL: https://issues.apache.org/jira/browse/HUDI-802
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Christopher Weaver
>Assignee: Balaji Varadarajan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> The provided AWSDmsAvroPayload class 
> ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java])
>  currently handles cases where the "Op" column is a "D" for updates, and 
> successfully removes the row from the resulting table. 
> However, when an insert is quickly followed by a delete on the row (e.g. DMS 
> processes them together and puts the update records together in the same 
> parquet file), the row incorrectly appears in the resulting table. In this 
> case, the record is not in the table and getInsertValue is called rather than 
> combineAndGetUpdateValue. Since the logic to check for a delete is in 
> combineAndGetUpdateValue, it is skipped and the delete is missed. Something 
> like this could fix this issue: 
> [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1255) Combine and get updateValue in multiFields

2020-09-12 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1255.
---

> Combine and get updateValue in multiFields
> --
>
> Key: HUDI-1255
> URL: https://issues.apache.org/jira/browse/HUDI-1255
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: karl wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> update current value for several fields that you want to change.
> The default payload OverwriteWithLatestAvroPayload overwrite the whole record 
> when 
> compare to orderingVal.This doesn't meet our need when we just want to change 
> specified fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1254) TypedProperties can not get values by initializing an existing properties

2020-09-12 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1254.
---

> TypedProperties can not get values by initializing an existing properties
> -
>
> Key: HUDI-1254
> URL: https://issues.apache.org/jira/browse/HUDI-1254
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: cdmikechen
>Assignee: linshan-ma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> If I create a test to new a TypedProperties by a Properties that exists like 
> blow:
> {code:java}
> public class TestTypedProperties {
> @Test
> public void testNewTypedProperties() {
> Properties properties = new Properties();
> properties.put("test_key1", "test_value1");
> TypedProperties typedProperties = new TypedProperties(properties);
> assertEquals("test_value1", typedProperties.getString("test_key1"));
> }
> }
> {code}
> Test can not pass and get this error: *java.lang.IllegalArgumentException: 
> Property test_key1 not found*
> I think this is a bug and need to be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1254) TypedProperties can not get values by initializing an existing properties

2020-09-12 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-1254.
-
Resolution: Fixed

> TypedProperties can not get values by initializing an existing properties
> -
>
> Key: HUDI-1254
> URL: https://issues.apache.org/jira/browse/HUDI-1254
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: cdmikechen
>Assignee: linshan-ma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> If I create a test to new a TypedProperties by a Properties that exists like 
> blow:
> {code:java}
> public class TestTypedProperties {
> @Test
> public void testNewTypedProperties() {
> Properties properties = new Properties();
> properties.put("test_key1", "test_value1");
> TypedProperties typedProperties = new TypedProperties(properties);
> assertEquals("test_value1", typedProperties.getString("test_key1"));
> }
> }
> {code}
> Test can not pass and get this error: *java.lang.IllegalArgumentException: 
> Property test_key1 not found*
> I think this is a bug and need to be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1254) TypedProperties can not get values by initializing an existing properties

2020-09-12 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1254:

Status: Open  (was: New)

> TypedProperties can not get values by initializing an existing properties
> -
>
> Key: HUDI-1254
> URL: https://issues.apache.org/jira/browse/HUDI-1254
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: cdmikechen
>Assignee: linshan-ma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> If I create a test to new a TypedProperties by a Properties that exists like 
> blow:
> {code:java}
> public class TestTypedProperties {
> @Test
> public void testNewTypedProperties() {
> Properties properties = new Properties();
> properties.put("test_key1", "test_value1");
> TypedProperties typedProperties = new TypedProperties(properties);
> assertEquals("test_value1", typedProperties.getString("test_key1"));
> }
> }
> {code}
> Test can not pass and get this error: *java.lang.IllegalArgumentException: 
> Property test_key1 not found*
> I think this is a bug and need to be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1130) Allow for schema evolution within DAG for hudi test suite

2020-09-12 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1130.
---

> Allow for schema evolution within DAG for hudi test suite
> -
>
> Key: HUDI-1130
> URL: https://issues.apache.org/jira/browse/HUDI-1130
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1130) Allow for schema evolution within DAG for hudi test suite

2020-09-12 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-1130.
-
Fix Version/s: 0.6.1
   Resolution: Fixed

> Allow for schema evolution within DAG for hudi test suite
> -
>
> Key: HUDI-1130
> URL: https://issues.apache.org/jira/browse/HUDI-1130
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1130) Allow for schema evolution within DAG for hudi test suite

2020-09-12 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1130:

Status: Open  (was: New)

> Allow for schema evolution within DAG for hudi test suite
> -
>
> Key: HUDI-1130
> URL: https://issues.apache.org/jira/browse/HUDI-1130
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1181) Decimal type display issue for record key field

2020-09-12 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-1181.
-
Fix Version/s: 0.6.1
   Resolution: Fixed

> Decimal type display issue for record key field
> ---
>
> Key: HUDI-1181
> URL: https://issues.apache.org/jira/browse/HUDI-1181
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Wenning Ding
>Assignee: Wenning Ding
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> When using *fixed_len_byte_array* decimal type as Hudi record key, Hudi would 
> not correctly display the decimal value, instead, Hudi would display it as a 
> byte array.
> During the Hudi writing phase, Hudi would save the parquet source data into 
> Avro Generic Record. For example, the source parquet data has a column with 
> decimal type:
>  
> {code:java}
> optional fixed_len_byte_array(16) OBJ_ID (DECIMAL(38,0));{code}
>  
> Then Hudi will convert it into the following avro decimal type:
> {code:java}
> {
> "name" : "OBJ_ID",
> "type" : [ {
>   "type" : "fixed",
>   "name" : "fixed",
>   "namespace" : "hoodie.hudi_ln.hudi_ln_record.OBJ_ID",
>   "size" : 16,
>   "logicalType" : "decimal",
>   "precision" : 38,
>   "scale" : 0
> }, "null" ]
> }
> {code}
> This decimal field would be stored as a fixed length bytes array. And in the 
> reading phase, Hudi will convert this bytes array back to a readable decimal 
> value through this 
> [converter|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala#L58].
> However, the problem is, when setting decimal type as record keys, Hudi would 
> read the value from Avro Generic Record and then directly convert it into 
> String type(See 
> [here|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L76]).
> As a result, what shows in the _hoodie_record_key field would be something 
> like: LN_LQDN_OBJ_ID:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 40, 95, -71].So 
> we need to handle this special case to convert bytes array back before 
> converting to String.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1181) Decimal type display issue for record key field

2020-09-12 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1181.
---

> Decimal type display issue for record key field
> ---
>
> Key: HUDI-1181
> URL: https://issues.apache.org/jira/browse/HUDI-1181
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Wenning Ding
>Assignee: Wenning Ding
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> When using *fixed_len_byte_array* decimal type as Hudi record key, Hudi would 
> not correctly display the decimal value, instead, Hudi would display it as a 
> byte array.
> During the Hudi writing phase, Hudi would save the parquet source data into 
> Avro Generic Record. For example, the source parquet data has a column with 
> decimal type:
>  
> {code:java}
> optional fixed_len_byte_array(16) OBJ_ID (DECIMAL(38,0));{code}
>  
> Then Hudi will convert it into the following avro decimal type:
> {code:java}
> {
> "name" : "OBJ_ID",
> "type" : [ {
>   "type" : "fixed",
>   "name" : "fixed",
>   "namespace" : "hoodie.hudi_ln.hudi_ln_record.OBJ_ID",
>   "size" : 16,
>   "logicalType" : "decimal",
>   "precision" : 38,
>   "scale" : 0
> }, "null" ]
> }
> {code}
> This decimal field would be stored as a fixed length bytes array. And in the 
> reading phase, Hudi will convert this bytes array back to a readable decimal 
> value through this 
> [converter|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala#L58].
> However, the problem is, when setting decimal type as record keys, Hudi would 
> read the value from Avro Generic Record and then directly convert it into 
> String type(See 
> [here|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L76]).
> As a result, what shows in the _hoodie_record_key field would be something 
> like: LN_LQDN_OBJ_ID:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 40, 95, -71].So 
> we need to handle this special case to convert bytes array back before 
> converting to String.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1181) Decimal type display issue for record key field

2020-09-12 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1181:

Status: Open  (was: New)

> Decimal type display issue for record key field
> ---
>
> Key: HUDI-1181
> URL: https://issues.apache.org/jira/browse/HUDI-1181
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Wenning Ding
>Assignee: Wenning Ding
>Priority: Major
>  Labels: pull-request-available
>
> When using *fixed_len_byte_array* decimal type as Hudi record key, Hudi would 
> not correctly display the decimal value, instead, Hudi would display it as a 
> byte array.
> During the Hudi writing phase, Hudi would save the parquet source data into 
> Avro Generic Record. For example, the source parquet data has a column with 
> decimal type:
>  
> {code:java}
> optional fixed_len_byte_array(16) OBJ_ID (DECIMAL(38,0));{code}
>  
> Then Hudi will convert it into the following avro decimal type:
> {code:java}
> {
> "name" : "OBJ_ID",
> "type" : [ {
>   "type" : "fixed",
>   "name" : "fixed",
>   "namespace" : "hoodie.hudi_ln.hudi_ln_record.OBJ_ID",
>   "size" : 16,
>   "logicalType" : "decimal",
>   "precision" : 38,
>   "scale" : 0
> }, "null" ]
> }
> {code}
> This decimal field would be stored as a fixed length bytes array. And in the 
> reading phase, Hudi will convert this bytes array back to a readable decimal 
> value through this 
> [converter|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala#L58].
> However, the problem is, when setting decimal type as record keys, Hudi would 
> read the value from Avro Generic Record and then directly convert it into 
> String type(See 
> [here|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L76]).
> As a result, what shows in the _hoodie_record_key field would be something 
> like: LN_LQDN_OBJ_ID:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 40, 95, -71].So 
> we need to handle this special case to convert bytes array back before 
> converting to String.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1268) Fix UpgradeDowngrade Rename Exception in aliyun OSS

2020-09-03 Thread leesf (Jira)
leesf created HUDI-1268:
---

 Summary: Fix UpgradeDowngrade Rename Exception in aliyun  OSS
 Key: HUDI-1268
 URL: https://issues.apache.org/jira/browse/HUDI-1268
 Project: Apache Hudi
  Issue Type: Bug
  Components: Writer Core
Reporter: leesf
 Fix For: 0.6.1


when using HoodieWriteClient API to write data to hudi with following config:

```

Properties properties = new Properties();
properties.setProperty(HoodieTableConfig.HOODIE_TABLE_NAME_PROP_NAME, 
tableName);
properties.setProperty(HoodieTableConfig.HOODIE_TABLE_TYPE_PROP_NAME, 
tableType.name());
properties.setProperty(HoodieTableConfig.HOODIE_PAYLOAD_CLASS_PROP_NAME, 
OverwriteWithLatestAvroPayload.class.getName());
properties.setProperty(HoodieTableConfig.HOODIE_ARCHIVELOG_FOLDER_PROP_NAME, 
"archived");
return HoodieTableMetaClient.initTableAndGetMetaClient(hadoopConf, basePath, 
properties);

```

the exception will be thrown with FileAlreadyExistsException in aliyun OSS, 
after debugging, it is the following code throws the exception.

 

```

// Rename the .updated file to hoodie.properties. This is atomic in hdfs, but 
not in cloud stores.
// But as long as this does not leave a partial hoodie.properties file, we are 
okay.
fs.rename(updatedPropsFilePath, propsFilePath);

```

however, we would ignore the FileAlreadyExistsException since hoodie.properties 
already exists.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1225) Avro Date logical type not handled correctly when converting to Spark Row

2020-08-29 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1225.
---

> Avro Date logical type not handled correctly when converting to Spark Row
> -
>
> Key: HUDI-1225
> URL: https://issues.apache.org/jira/browse/HUDI-1225
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> [https://github.com/apache/hudi/issues/2034]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1225) Avro Date logical type not handled correctly when converting to Spark Row

2020-08-29 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-1225.
-
Resolution: Fixed

> Avro Date logical type not handled correctly when converting to Spark Row
> -
>
> Key: HUDI-1225
> URL: https://issues.apache.org/jira/browse/HUDI-1225
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> [https://github.com/apache/hudi/issues/2034]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1231) Duplicate record while querying from hive synced table

2020-08-28 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186874#comment-17186874
 ] 

leesf commented on HUDI-1231:
-

[~vbalaji] would you please take a look

> Duplicate record while querying from hive synced table
> --
>
> Key: HUDI-1231
> URL: https://issues.apache.org/jira/browse/HUDI-1231
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ashok Kumar
>Priority: Major
>
> I am writting in upsert mode with precombine flag enabled. Still when i query 
> i see same record available 3 times in same parquet file
>  
> spark.sql("select 
> _hoodie_commit_time,_hoodie_commit_seqno,_hoodie_record_key,_hoodie_partition_path,_hoodie_file_name
>  from hudi5_mor_ro where id1=1086187 and timestamp=1598461500 and 
> _hoodie_record_key='timestamp:1598461500,id1:1086187,id2:1872725,flowId:23'").show(10,false)
>  
> +--+
> |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name|
> +--+
> |20200826171813|20200826171813_13856_855766|timestamp:1598461500,id1:1086187,id2:1872725,flowId:23|1086187/2020082617|5ecb020f-29be-4eed-b130-8c02ae819603-0_13856-104-296775_20200826171813.parquet|
> |20200826171813|20200826171813_13856_855766|timestamp:1598461500,id1:1086187,id2:1872725,flowId:23|1086187/2020082617|5ecb020f-29be-4eed-b130-8c02ae819603-0_13856-104-296775_20200826171813.parquet|
> |20200826171813|20200826171813_13856_855766|timestamp:1598461500,id1:1086187,id2:1872725,flowId:23|1086187/2020082617|5ecb020f-29be-4eed-b130-8c02ae819603-0_13856-104-296775_20200826171813.parquet|
> +--+
>  
> This issue i am getting with both kind of table i.e COW and MOR. 
> I have tried it 0.6.3 version but i had tried 0.5.3 and in that also this bug 
> was coming.
> This issue is not coming with small data set. 
>  
> Strange thing is when i query only parquet file it gives only one record(i.e 
> correct)
> df.filter(col("_hoodie_record_key")==="timestamp:1598461500,id1:1086187,id2:1872725,flowId:23").count
>  res13: Long = 1
>  
> Note:
> When i query filesystem, its fine.
> This issue i see when i query from hive synced table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1234) Insert new records regardless of small file when using insert operation

2020-08-28 Thread leesf (Jira)
leesf created HUDI-1234:
---

 Summary: Insert new records regardless of small file when using 
insert operation
 Key: HUDI-1234
 URL: https://issues.apache.org/jira/browse/HUDI-1234
 Project: Apache Hudi
  Issue Type: Bug
  Components: Writer Core
Reporter: leesf


context here [https://github.com/apache/hudi/issues/2051]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1227) Document the usage of CLI

2020-08-26 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1227:

Issue Type: Improvement  (was: Bug)

> Document the usage of CLI
> -
>
> Key: HUDI-1227
> URL: https://issues.apache.org/jira/browse/HUDI-1227
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: CLI
>Reporter: leesf
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1227) Document the usage of CLI

2020-08-26 Thread leesf (Jira)
leesf created HUDI-1227:
---

 Summary: Document the usage of CLI
 Key: HUDI-1227
 URL: https://issues.apache.org/jira/browse/HUDI-1227
 Project: Apache Hudi
  Issue Type: Bug
  Components: CLI
Reporter: leesf






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key

2020-08-22 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1083:

Fix Version/s: 0.6.1

> Minor optimization in Determining insert bucket location for a given key
> 
>
> Key: HUDI-1083
> URL: https://issues.apache.org/jira/browse/HUDI-1083
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: shenh062326
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> As of now, this is how bucket for a given key is determined.
> In every partition, we find all insert buckets and assign weights. 
> for eg: 0.2, 0.3, 0.5 for a given partition with 100 records to be inserted 
> means, 20 will go into B0, 30 will go into B1 and 50 will go into B2.
> within getPartition(Object key), we linearly walk through the bucket weights 
> and find the right bucket for a given key. for instance if mod (hash value) 
> is 90/100 = 0.9, we keep adding the bucket weights until the value exceeds 
> 0.9.
> Instead we could calculate cumulative weights upfront and do a binary search 
> within getPartition()
> so, 0.2, 0.5, 1
> so with mod(hash value), we could do binary search and find the right bucket 
> and would cut cost from O(N) to log N. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key

2020-08-22 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-1083.
-
Resolution: Fixed

> Minor optimization in Determining insert bucket location for a given key
> 
>
> Key: HUDI-1083
> URL: https://issues.apache.org/jira/browse/HUDI-1083
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: shenh062326
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> As of now, this is how bucket for a given key is determined.
> In every partition, we find all insert buckets and assign weights. 
> for eg: 0.2, 0.3, 0.5 for a given partition with 100 records to be inserted 
> means, 20 will go into B0, 30 will go into B1 and 50 will go into B2.
> within getPartition(Object key), we linearly walk through the bucket weights 
> and find the right bucket for a given key. for instance if mod (hash value) 
> is 90/100 = 0.9, we keep adding the bucket weights until the value exceeds 
> 0.9.
> Instead we could calculate cumulative weights upfront and do a binary search 
> within getPartition()
> so, 0.2, 0.5, 1
> so with mod(hash value), we could do binary search and find the right bucket 
> and would cut cost from O(N) to log N. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key

2020-08-22 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1083:

Status: Open  (was: New)

> Minor optimization in Determining insert bucket location for a given key
> 
>
> Key: HUDI-1083
> URL: https://issues.apache.org/jira/browse/HUDI-1083
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: shenh062326
>Priority: Major
>  Labels: pull-request-available
>
> As of now, this is how bucket for a given key is determined.
> In every partition, we find all insert buckets and assign weights. 
> for eg: 0.2, 0.3, 0.5 for a given partition with 100 records to be inserted 
> means, 20 will go into B0, 30 will go into B1 and 50 will go into B2.
> within getPartition(Object key), we linearly walk through the bucket weights 
> and find the right bucket for a given key. for instance if mod (hash value) 
> is 90/100 = 0.9, we keep adding the bucket weights until the value exceeds 
> 0.9.
> Instead we could calculate cumulative weights upfront and do a binary search 
> within getPartition()
> so, 0.2, 0.5, 1
> so with mod(hash value), we could do binary search and find the right bucket 
> and would cut cost from O(N) to log N. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1083) Minor optimization in Determining insert bucket location for a given key

2020-08-22 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1083.
---

> Minor optimization in Determining insert bucket location for a given key
> 
>
> Key: HUDI-1083
> URL: https://issues.apache.org/jira/browse/HUDI-1083
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: shenh062326
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> As of now, this is how bucket for a given key is determined.
> In every partition, we find all insert buckets and assign weights. 
> for eg: 0.2, 0.3, 0.5 for a given partition with 100 records to be inserted 
> means, 20 will go into B0, 30 will go into B1 and 50 will go into B2.
> within getPartition(Object key), we linearly walk through the bucket weights 
> and find the right bucket for a given key. for instance if mod (hash value) 
> is 90/100 = 0.9, we keep adding the bucket weights until the value exceeds 
> 0.9.
> Instead we could calculate cumulative weights upfront and do a binary search 
> within getPartition()
> so, 0.2, 0.5, 1
> so with mod(hash value), we could do binary search and find the right bucket 
> and would cut cost from O(N) to log N. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1177) fix TimestampBasedKeyGenerator Task not serializableException

2020-08-22 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1177.
---

> fix TimestampBasedKeyGenerator  Task not serializableException
> --
>
> Key: HUDI-1177
> URL: https://issues.apache.org/jira/browse/HUDI-1177
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: liujinhui
>Assignee: Pratyaksh Sharma
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1188) MOR hbase index tables not deduplicating records

2020-08-22 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-1188.
---

> MOR hbase index tables not deduplicating records
> 
>
> Key: HUDI-1188
> URL: https://issues.apache.org/jira/browse/HUDI-1188
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ryan Pifer
>Assignee: Ryan Pifer
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> After fetching hbase index for a record, Hudi performs a validation that the 
> commit timestamp stored in hbase for that record is a commit on the timeline. 
> This makes any record that is stored to hbase index during a deltacommit 
> (upsert on MOR table) considered an invalid commit and treated as a new 
> record. This causes the hbase index to be updated every time which leads to 
> records being able to be in multiple partitions and even in different file 
> groups within same partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1188) MOR hbase index tables not deduplicating records

2020-08-22 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-1188:

Status: Open  (was: New)

> MOR hbase index tables not deduplicating records
> 
>
> Key: HUDI-1188
> URL: https://issues.apache.org/jira/browse/HUDI-1188
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ryan Pifer
>Assignee: Ryan Pifer
>Priority: Major
>  Labels: pull-request-available
>
> After fetching hbase index for a record, Hudi performs a validation that the 
> commit timestamp stored in hbase for that record is a commit on the timeline. 
> This makes any record that is stored to hbase index during a deltacommit 
> (upsert on MOR table) considered an invalid commit and treated as a new 
> record. This causes the hbase index to be updated every time which leads to 
> records being able to be in multiple partitions and even in different file 
> groups within same partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1188) MOR hbase index tables not deduplicating records

2020-08-22 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-1188.
-
Fix Version/s: 0.6.1
   Resolution: Fixed

> MOR hbase index tables not deduplicating records
> 
>
> Key: HUDI-1188
> URL: https://issues.apache.org/jira/browse/HUDI-1188
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ryan Pifer
>Assignee: Ryan Pifer
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.1
>
>
> After fetching hbase index for a record, Hudi performs a validation that the 
> commit timestamp stored in hbase for that record is a commit on the timeline. 
> This makes any record that is stored to hbase index during a deltacommit 
> (upsert on MOR table) considered an invalid commit and treated as a new 
> record. This causes the hbase index to be updated every time which leads to 
> records being able to be in multiple partitions and even in different file 
> groups within same partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   >