subject:"\[jira\] \[Updated\] \(HUDI\-1779\) Fail to bootstrap\/upsert a table which contains timestamp column"

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2023-10-04 Thread Prashant Wason (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Wason updated HUDI-1779:
-
Fix Version/s: 0.14.1
   (was: 0.14.0)

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, spark
>Reporter: lrz
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.1
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2023-05-22 Thread Yue Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yue Zhang updated HUDI-1779:

Fix Version/s: 0.14.0
   (was: 0.13.1)

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, spark
>Reporter: lrz
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2023-04-23 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1779:
--
Fix Version/s: (was: 0.12.3)

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, spark
>Reporter: lrz
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.1
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2023-03-09 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1779:
-
Fix Version/s: 0.12.3

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, spark
>Reporter: lrz
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.1, 0.12.3
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2022-12-20 Thread Alexey Kudinkin (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-1779:
--
Fix Version/s: 0.13.1
   (was: 0.13.0)

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, spark
>Reporter: lrz
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.1
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2022-09-19 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1779:
-
Sprint: 2022/08/22, 2022/09/05  (was: 2022/08/22, 2022/09/05, 2022/09/19)

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, spark
>Reporter: lrz
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2022-09-19 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1779:
-
Sprint: 2022/08/22, 2022/09/05, 2022/09/19  (was: 2022/08/22, 2022/09/05)

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, spark
>Reporter: lrz
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2022-09-07 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1779:
-
Sprint: 2022/08/22, 2022/09/05  (was: 2022/08/22)

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, spark
>Reporter: lrz
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2022-08-22 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-1779:

Fix Version/s: (was: 0.12.1)

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, spark
>Reporter: lrz
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2022-08-22 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-1779:

Fix Version/s: 0.13.0
 Priority: Blocker  (was: Critical)

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, spark
>Reporter: lrz
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.1, 0.13.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2022-08-19 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-1779:

Story Points: 2

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, spark
>Reporter: lrz
>Assignee: Alexey Kudinkin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.1
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2022-08-19 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-1779:

Epic Link: HUDI-1265

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, spark
>Reporter: lrz
>Assignee: Alexey Kudinkin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.1
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2022-08-17 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-1779:
--
Sprint: 2022/08/22

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, spark
>Reporter: lrz
>Assignee: Alexey Kudinkin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.1
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2022-08-16 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-1779:
--
Fix Version/s: 0.12.1
   (was: 0.12.0)

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, spark
>Reporter: lrz
>Assignee: Alexey Kudinkin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.1
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2022-07-04 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1779:
-
Priority: Critical  (was: Major)

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, spark
>Reporter: lrz
>Assignee: Alexey Kudinkin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2022-07-04 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1779:
-
Component/s: dependencies

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, spark
>Reporter: lrz
>Assignee: Alexey Kudinkin
>Priority: Major
>  Labels: pull-request-available, query-eng, sev:high
> Fix For: 0.12.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2022-07-04 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1779:
-
Labels: pull-request-available  (was: pull-request-available query-eng 
sev:high)

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: dependencies, spark
>Reporter: lrz
>Assignee: Alexey Kudinkin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2022-07-04 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1779:
-
Fix Version/s: (was: 0.11.0)

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Reporter: lrz
>Assignee: Alexey Kudinkin
>Priority: Major
>  Labels: pull-request-available, query-eng, sev:high
> Fix For: 0.12.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2022-03-27 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1779:
-
Component/s: spark

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Reporter: lrz
>Assignee: Alexey Kudinkin
>Priority: Major
>  Labels: pull-request-available, query-eng, sev:high
> Fix For: 0.11.0, 0.12.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2022-03-27 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1779:
-
Fix Version/s: 0.12.0

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Assignee: Alexey Kudinkin
>Priority: Major
>  Labels: pull-request-available, query-eng, sev:high
> Fix For: 0.11.0, 0.12.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2021-12-13 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1779:
--
Labels: pull-request-available query-eng sev:high  (was: 
pull-request-available sev:high)

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Assignee: Alexey Kudinkin
>Priority: Major
>  Labels: pull-request-available, query-eng, sev:high
> Fix For: 0.11.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2021-11-26 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-1779:
-
Fix Version/s: (was: 0.10.0)

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available, sev:high
> Fix For: 0.11.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2021-08-25 Thread Udit Mehrotra (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra updated HUDI-1779:

Fix Version/s: (was: 0.9.0)
   0.10.0

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available, sev:high
> Fix For: 0.10.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2021-05-11 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1779:
--
Labels: pull-request-available sev:high  (was: pull-request-available)

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available, sev:high
> Fix For: 0.9.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2021-04-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1779:
-
Labels: pull-request-available  (was: )

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2021-04-08 Thread lrz (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated HUDI-1779:
--
Attachment: upsertFail.png

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
> Fix For: 0.9.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2021-04-08 Thread lrz (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated HUDI-1779:
--
Attachment: unsupportInt96.png

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
> Fix For: 0.9.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

2021-04-08 Thread lrz (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated HUDI-1779:
--
Attachment: upsertFail2.png

> Fail to bootstrap/upsert a table which contains timestamp column
> 
>
> Key: HUDI-1779
> URL: https://issues.apache.org/jira/browse/HUDI-1779
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
> Fix For: 0.9.0
>
> Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png
>
>
> current when hudi bootstrap a parquet file, or upsert into a parquet file 
> which contains timestmap column, it will fail because these issues:
> 1) At bootstrap operation, if the origin parquet file was written by a spark 
> application, then spark will default save timestamp as int96(see 
> spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because 
> of Hudi can not read Int96 type now.(this issue can be solve by upgrade 
> parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check 
> [https://github|https://github/] 
> <[https://github/]>.com/apache/parquet-mr/pull/831/files) 
> 2) after bootstrap, doing upsert will fail because we use hoodie schema to 
> read origin parquet file. The schema is not match because hoodie schema  
> treat timestamp as long and at origin file it’s Int96 
> 3) after bootstrap, and partial update for a parquet file will fail, because 
> we copy the old record and save by hoodie schema( we miss a 
> convertFixedToLong operation like spark does)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column

28 matches

Site Navigation

Mail list logo

Footer information