[jira] [Commented] (HUDI-2023) Validate Schema evolution in hudi

2021-06-28 Thread Sagar Sumit (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370692#comment-17370692
 ] 

Sagar Sumit commented on HUDI-2023:
---

Validated with delta streamer and the results are summarized as below:
|| ||COW||MOR||
|Adding a new nullable column at root level at the end|succeeds|succeeds|
|Adding a new nullable column to inner struct (at the end)|succeeds|succeeds|
|Adding a new non-nullable column at root level at the end|fails|fails|
|Adding a new non-nullable column to inner struct (at the end)|fails|fails |

The failure after adding a new non-nullable column in case of MOR is:
{code:java}
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old 
record into new file for key impression_598 from old file 
file:/tmp/hudi-deltastreamer-op/impressions_mor/user_86/08046c02-14e3-4629-899a-614518dfc545-0_53-6-148_20210628211956.parquet
 to new file 
file:/tmp/hudi-deltastreamer-op/impressions_mor/user_86/08046c02-14e3-4629-899a-614518dfc545-0_8-22-301_20210628212147.parquet
...
at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:320) at 
org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:122)
 at 
org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:112)
 at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
 at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more Caused 
by: java.lang.RuntimeException: Null-value for required field: evolvedField at 
org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:194)
 at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165) 
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
 at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299) at 
org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:89)
 at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:315)
{code}

> Validate Schema evolution in hudi
> -
>
> Key: HUDI-2023
> URL: https://issues.apache.org/jira/browse/HUDI-2023
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> Test schema evolution in hudi and document the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2023) Validate Schema evolution in hudi

2021-06-15 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363845#comment-17363845
 ] 

sivabalan narayanan commented on HUDI-2023:
---

dump of steps : 
https://gist.github.com/nsivabalan/33147072fabf5afa9cf2dfee1734e57a

> Validate Schema evolution in hudi
> -
>
> Key: HUDI-2023
> URL: https://issues.apache.org/jira/browse/HUDI-2023
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> Test schema evolution in hudi and document the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2023) Validate Schema evolution in hudi

2021-06-15 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363761#comment-17363761
 ] 

sivabalan narayanan commented on HUDI-2023:
---

I tested both COW and MOR for simple schema evolution of adding a new column. 
Here are my findings. 

 

// "succeeds" refers to write succeeded and a read following the write 
succeeded to read entire dataset. 

 
|| ||COW||MOR||
|Adding a new nullable column at root level at the end|succeeds|succeeds|
|Adding a new nullable column to inner struct (at the end)|succeeds|succeeds|
|Adding a new non-nullable column at root level at the end|fails|write 
succeeds, but read fails as expected|
|Adding a new non-nullable column to inner struct (at the end)|fails|write 
succeeds, but read fails as expected|

 

Validated so far w/ spark datasource. Will update once I have results w/ delta 
streamer. 

 

 

> Validate Schema evolution in hudi
> -
>
> Key: HUDI-2023
> URL: https://issues.apache.org/jira/browse/HUDI-2023
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> Test schema evolution in hudi and document the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)