[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution

2021-04-19 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1716:
--
Status: Patch Available  (was: In Progress)

> rt view w/ MOR tables fails after schema evolution
> --
>
> Key: HUDI-1716
> URL: https://issues.apache.org/jira/browse/HUDI-1716
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Assignee: Aditya Tiwari
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> Looks like realtime view w/ MOR table fails if schema present in existing log 
> file is evolved to add a new field. no issues w/ writing. but reading fails
> More info: [https://github.com/apache/hudi/issues/2675]
>  
> gist of the stack trace:
> Caused by: org.apache.avro.AvroTypeException: Found 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
> evolvedFieldCaused by: org.apache.avro.AvroTypeException: Found 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
> evolvedField at 
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at 
> org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at 
> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130) 
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:215)
>  at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) 
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) 
> at 
> org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.deserializeRecords(HoodieAvroDataBlock.java:165)
>  at 
> org.apache.hudi.common.table.log.block.HoodieDataBlock.createRecordsFromContentBytes(HoodieDataBlock.java:128)
>  at 
> org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecords(HoodieDataBlock.java:106)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processDataBlock(AbstractHoodieLogRecordScanner.java:289)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:324)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:252)
>  ... 24 more21/03/25 11:27:03 WARN TaskSetManager: Lost task 0.0 in stage 
> 83.0 (TID 667, sivabala-c02xg219jgh6.attlocal.net, executor driver): 
> org.apache.hudi.exception.HoodieException: Exception when reading log file  
> at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:261)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:100)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:93)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:75)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:230)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:328) 
> at 
> org.apache.hudi.HoodieMergeOnReadRDD$$anon$3.(HoodieMergeOnReadRDD.scala:210)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD.payloadCombineFileIterator(HoodieMergeOnReadRDD.scala:200)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:77)
>  
> Logs from local run: 
> [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]
> diff with which above logs were generated: 
> [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]
>  
> Steps to reproduce in spark shell:
>  # create MOR table w/ schema1. 
>  # Ingest (with schema1) until log files are created. // verify via hudi-cli. 
> It took me 2 batch of updates to see a log file.
>  # create a new schema2 with one new additional field. ingest a batch with 
> schema2 that updates existing records. 
>  # read entire dataset. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution

2021-04-19 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1716:
--
Status: Closed  (was: Patch Available)

> rt view w/ MOR tables fails after schema evolution
> --
>
> Key: HUDI-1716
> URL: https://issues.apache.org/jira/browse/HUDI-1716
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Assignee: Aditya Tiwari
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> Looks like realtime view w/ MOR table fails if schema present in existing log 
> file is evolved to add a new field. no issues w/ writing. but reading fails
> More info: [https://github.com/apache/hudi/issues/2675]
>  
> gist of the stack trace:
> Caused by: org.apache.avro.AvroTypeException: Found 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
> evolvedFieldCaused by: org.apache.avro.AvroTypeException: Found 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
> evolvedField at 
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at 
> org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at 
> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130) 
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:215)
>  at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) 
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) 
> at 
> org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.deserializeRecords(HoodieAvroDataBlock.java:165)
>  at 
> org.apache.hudi.common.table.log.block.HoodieDataBlock.createRecordsFromContentBytes(HoodieDataBlock.java:128)
>  at 
> org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecords(HoodieDataBlock.java:106)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processDataBlock(AbstractHoodieLogRecordScanner.java:289)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:324)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:252)
>  ... 24 more21/03/25 11:27:03 WARN TaskSetManager: Lost task 0.0 in stage 
> 83.0 (TID 667, sivabala-c02xg219jgh6.attlocal.net, executor driver): 
> org.apache.hudi.exception.HoodieException: Exception when reading log file  
> at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:261)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:100)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:93)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:75)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:230)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:328) 
> at 
> org.apache.hudi.HoodieMergeOnReadRDD$$anon$3.(HoodieMergeOnReadRDD.scala:210)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD.payloadCombineFileIterator(HoodieMergeOnReadRDD.scala:200)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:77)
>  
> Logs from local run: 
> [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]
> diff with which above logs were generated: 
> [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]
>  
> Steps to reproduce in spark shell:
>  # create MOR table w/ schema1. 
>  # Ingest (with schema1) until log files are created. // verify via hudi-cli. 
> It took me 2 batch of updates to see a log file.
>  # create a new schema2 with one new additional field. ingest a batch with 
> schema2 that updates existing records. 
>  # read entire dataset. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution

2021-04-08 Thread Aditya Tiwari (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Tiwari updated HUDI-1716:

Status: In Progress  (was: Open)

> rt view w/ MOR tables fails after schema evolution
> --
>
> Key: HUDI-1716
> URL: https://issues.apache.org/jira/browse/HUDI-1716
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Assignee: Aditya Tiwari
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> Looks like realtime view w/ MOR table fails if schema present in existing log 
> file is evolved to add a new field. no issues w/ writing. but reading fails
> More info: [https://github.com/apache/hudi/issues/2675]
>  
> gist of the stack trace:
> Caused by: org.apache.avro.AvroTypeException: Found 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
> evolvedFieldCaused by: org.apache.avro.AvroTypeException: Found 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
> evolvedField at 
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at 
> org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at 
> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130) 
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:215)
>  at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) 
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) 
> at 
> org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.deserializeRecords(HoodieAvroDataBlock.java:165)
>  at 
> org.apache.hudi.common.table.log.block.HoodieDataBlock.createRecordsFromContentBytes(HoodieDataBlock.java:128)
>  at 
> org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecords(HoodieDataBlock.java:106)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processDataBlock(AbstractHoodieLogRecordScanner.java:289)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:324)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:252)
>  ... 24 more21/03/25 11:27:03 WARN TaskSetManager: Lost task 0.0 in stage 
> 83.0 (TID 667, sivabala-c02xg219jgh6.attlocal.net, executor driver): 
> org.apache.hudi.exception.HoodieException: Exception when reading log file  
> at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:261)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:100)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:93)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:75)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:230)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:328) 
> at 
> org.apache.hudi.HoodieMergeOnReadRDD$$anon$3.(HoodieMergeOnReadRDD.scala:210)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD.payloadCombineFileIterator(HoodieMergeOnReadRDD.scala:200)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:77)
>  
> Logs from local run: 
> [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]
> diff with which above logs were generated: 
> [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]
>  
> Steps to reproduce in spark shell:
>  # create MOR table w/ schema1. 
>  # Ingest (with schema1) until log files are created. // verify via hudi-cli. 
> It took me 2 batch of updates to see a log file.
>  # create a new schema2 with one new additional field. ingest a batch with 
> schema2 that updates existing records. 
>  # read entire dataset. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution

2021-04-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1716:
-
Labels: pull-request-available sev:critical user-support-issues  (was: 
sev:critical user-support-issues)

> rt view w/ MOR tables fails after schema evolution
> --
>
> Key: HUDI-1716
> URL: https://issues.apache.org/jira/browse/HUDI-1716
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Assignee: Aditya Tiwari
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> Looks like realtime view w/ MOR table fails if schema present in existing log 
> file is evolved to add a new field. no issues w/ writing. but reading fails
> More info: [https://github.com/apache/hudi/issues/2675]
>  
> gist of the stack trace:
> Caused by: org.apache.avro.AvroTypeException: Found 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
> evolvedFieldCaused by: org.apache.avro.AvroTypeException: Found 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
> evolvedField at 
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at 
> org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at 
> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130) 
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:215)
>  at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) 
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) 
> at 
> org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.deserializeRecords(HoodieAvroDataBlock.java:165)
>  at 
> org.apache.hudi.common.table.log.block.HoodieDataBlock.createRecordsFromContentBytes(HoodieDataBlock.java:128)
>  at 
> org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecords(HoodieDataBlock.java:106)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processDataBlock(AbstractHoodieLogRecordScanner.java:289)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:324)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:252)
>  ... 24 more21/03/25 11:27:03 WARN TaskSetManager: Lost task 0.0 in stage 
> 83.0 (TID 667, sivabala-c02xg219jgh6.attlocal.net, executor driver): 
> org.apache.hudi.exception.HoodieException: Exception when reading log file  
> at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:261)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:100)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:93)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:75)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:230)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:328) 
> at 
> org.apache.hudi.HoodieMergeOnReadRDD$$anon$3.(HoodieMergeOnReadRDD.scala:210)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD.payloadCombineFileIterator(HoodieMergeOnReadRDD.scala:200)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:77)
>  
> Logs from local run: 
> [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]
> diff with which above logs were generated: 
> [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]
>  
> Steps to reproduce in spark shell:
>  # create MOR table w/ schema1. 
>  # Ingest (with schema1) until log files are created. // verify via hudi-cli. 
> It took me 2 batch of updates to see a log file.
>  # create a new schema2 with one new additional field. ingest a batch with 
> schema2 that updates existing records. 
>  # read entire dataset. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution

2021-03-25 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1716:
--
Description: 
Looks like realtime view w/ MOR table fails if schema present in existing log 
file is evolved to add a new field. no issues w/ writing. but reading fails

More info: [https://github.com/apache/hudi/issues/2675]

 

gist of the stack trace:

Caused by: org.apache.avro.AvroTypeException: Found 
hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
evolvedFieldCaused by: org.apache.avro.AvroTypeException: Found 
hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
evolvedField at 
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at 
org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at 
org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130) 
at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:215)
 at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
 at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) at 
org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.deserializeRecords(HoodieAvroDataBlock.java:165)
 at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.createRecordsFromContentBytes(HoodieDataBlock.java:128)
 at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecords(HoodieDataBlock.java:106)
 at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processDataBlock(AbstractHoodieLogRecordScanner.java:289)
 at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:324)
 at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:252)
 ... 24 more21/03/25 11:27:03 WARN TaskSetManager: Lost task 0.0 in stage 83.0 
(TID 667, sivabala-c02xg219jgh6.attlocal.net, executor driver): 
org.apache.hudi.exception.HoodieException: Exception when reading log file  at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:261)
 at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:100)
 at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:93)
 at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:75)
 at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:230)
 at 
org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:328) 
at 
org.apache.hudi.HoodieMergeOnReadRDD$$anon$3.(HoodieMergeOnReadRDD.scala:210)
 at 
org.apache.hudi.HoodieMergeOnReadRDD.payloadCombineFileIterator(HoodieMergeOnReadRDD.scala:200)
 at org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:77)

 

Logs from local run: 

[https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]

diff with which above logs were generated: 
[https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]

 

Steps to reproduce in spark shell:
 # create MOR table w/ schema1. 
 # Ingest (with schema1) until log files are created. // verify via hudi-cli. I 
didn't see log files w/ just 1 batch of updates. If not, do multiple rounds 
until you see log files.
 # create a new schema2 with one new additional field. ingest a batch with 
schema2 that updates existing records. 
 # read entire dataset. 

 

 

 

  was:
Looks like realtime view w/ MOR table fails if schema present in existing log 
file is evolved to add a new field. no issues w/ writing. but reading fails

More info: [https://github.com/apache/hudi/issues/2675]

 

Logs from local run: 

[https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]

diff with which above logs were generated: 
[https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]

 

Steps to reproduce in spark shell:
 # create MOR table w/ schema1. 
 # Ingest (with schema1) until log files are created. // verify via hudi-cli. I 
didn't see log files w/ just 1 batch of updates. If not, do multiple rounds 
until you see log files.
 # create a new schema2 with one new additional field. ingest a batch with 
schema2 that updates existing records. 
 # read entire dataset. 

 

 

 


> rt view w/ MOR tables fails after schema evolution
> --
>
> Key: HUDI-1716
> URL: https://issues.apache.org/jira/browse/HUDI-1716
> Project: Apache Hudi
>  Issue Type: Bug
>  

[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution

2021-03-25 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1716:
--
Description: 
Looks like realtime view w/ MOR table fails if schema present in existing log 
file is evolved to add a new field. no issues w/ writing. but reading fails

More info: [https://github.com/apache/hudi/issues/2675]

 

gist of the stack trace:

Caused by: org.apache.avro.AvroTypeException: Found 
hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
evolvedFieldCaused by: org.apache.avro.AvroTypeException: Found 
hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
evolvedField at 
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at 
org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at 
org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130) 
at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:215)
 at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
 at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) at 
org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.deserializeRecords(HoodieAvroDataBlock.java:165)
 at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.createRecordsFromContentBytes(HoodieDataBlock.java:128)
 at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecords(HoodieDataBlock.java:106)
 at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processDataBlock(AbstractHoodieLogRecordScanner.java:289)
 at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:324)
 at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:252)
 ... 24 more21/03/25 11:27:03 WARN TaskSetManager: Lost task 0.0 in stage 83.0 
(TID 667, sivabala-c02xg219jgh6.attlocal.net, executor driver): 
org.apache.hudi.exception.HoodieException: Exception when reading log file  at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:261)
 at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:100)
 at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:93)
 at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:75)
 at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:230)
 at 
org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:328) 
at 
org.apache.hudi.HoodieMergeOnReadRDD$$anon$3.(HoodieMergeOnReadRDD.scala:210)
 at 
org.apache.hudi.HoodieMergeOnReadRDD.payloadCombineFileIterator(HoodieMergeOnReadRDD.scala:200)
 at org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:77)

 

Logs from local run: 

[https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]

diff with which above logs were generated: 
[https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]

 

Steps to reproduce in spark shell:
 # create MOR table w/ schema1. 
 # Ingest (with schema1) until log files are created. // verify via hudi-cli. 
It took me 2 batch of updates to see a log file.
 # create a new schema2 with one new additional field. ingest a batch with 
schema2 that updates existing records. 
 # read entire dataset. 

 

 

 

  was:
Looks like realtime view w/ MOR table fails if schema present in existing log 
file is evolved to add a new field. no issues w/ writing. but reading fails

More info: [https://github.com/apache/hudi/issues/2675]

 

gist of the stack trace:

Caused by: org.apache.avro.AvroTypeException: Found 
hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
evolvedFieldCaused by: org.apache.avro.AvroTypeException: Found 
hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
evolvedField at 
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at 
org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at 
org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130) 
at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:215)
 at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
 at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) at 

[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution

2021-03-24 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1716:
--
Description: 
Looks like realtime view w/ MOR table fails if schema present in existing log 
file is evolved to add a new field. no issues w/ writing. but reading fails

More info: [https://github.com/apache/hudi/issues/2675]

 

Logs from local run: 

[https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]

diff with which above logs were generated: 
[https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]

 

Steps to reproduce in spark shell:
 # create MOR table w/ schema1. 
 # Ingest (with schema1) until log files are created. // verify via hudi-cli. I 
didn't see log files w/ just 1 batch of updates. If not, do multiple rounds 
until you see log files.
 # create a new schema2 with one new additional field. ingest a batch with 
schema2 that updates existing records. 
 # read entire dataset. 

 

 

 

  was:
Looks like realtime view w/ MOR table fails if schema present in existing log 
file is evolved to add a new field. no issues w/ writing. but reading fails

More info: [https://github.com/apache/hudi/issues/2675]

 

Logs from local run: 

[https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]

diff with which above logs were generated: 
[https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]

 

 


> rt view w/ MOR tables fails after schema evolution
> --
>
> Key: HUDI-1716
> URL: https://issues.apache.org/jira/browse/HUDI-1716
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> Looks like realtime view w/ MOR table fails if schema present in existing log 
> file is evolved to add a new field. no issues w/ writing. but reading fails
> More info: [https://github.com/apache/hudi/issues/2675]
>  
> Logs from local run: 
> [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]
> diff with which above logs were generated: 
> [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]
>  
> Steps to reproduce in spark shell:
>  # create MOR table w/ schema1. 
>  # Ingest (with schema1) until log files are created. // verify via hudi-cli. 
> I didn't see log files w/ just 1 batch of updates. If not, do multiple rounds 
> until you see log files.
>  # create a new schema2 with one new additional field. ingest a batch with 
> schema2 that updates existing records. 
>  # read entire dataset. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution

2021-03-24 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1716:
--
Description: 
Looks like realtime view w/ MOR table fails if schema present in existing log 
file is evolved to add a new field. no issues w/ writing. but reading fails

More info: [https://github.com/apache/hudi/issues/2675]

 

Logs from local run: 

[https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]

diff with which above logs were generated: 
[https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]

 

 

  was:
Looks like realtime view w/ MOR table fails if schema present in existing log 
file is evolved to add a new field. no issues w/ writing. but reading fails.

 

More info: https://github.com/apache/hudi/issues/2675


> rt view w/ MOR tables fails after schema evolution
> --
>
> Key: HUDI-1716
> URL: https://issues.apache.org/jira/browse/HUDI-1716
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> Looks like realtime view w/ MOR table fails if schema present in existing log 
> file is evolved to add a new field. no issues w/ writing. but reading fails
> More info: [https://github.com/apache/hudi/issues/2675]
>  
> Logs from local run: 
> [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]
> diff with which above logs were generated: 
> [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution

2021-03-24 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1716:
--
Fix Version/s: 0.9.0

> rt view w/ MOR tables fails after schema evolution
> --
>
> Key: HUDI-1716
> URL: https://issues.apache.org/jira/browse/HUDI-1716
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Priority: Major
> Fix For: 0.9.0
>
>
> Looks like realtime view w/ MOR table fails if schema present in existing log 
> file is evolved to add a new field. no issues w/ writing. but reading fails.
>  
> More info: https://github.com/apache/hudi/issues/2675



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1716) rt view w/ MOR tables fails after schema evolution

2021-03-24 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1716:
--
Labels: sev:critical user-support-issues  (was: )

> rt view w/ MOR tables fails after schema evolution
> --
>
> Key: HUDI-1716
> URL: https://issues.apache.org/jira/browse/HUDI-1716
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Storage Management
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> Looks like realtime view w/ MOR table fails if schema present in existing log 
> file is evolved to add a new field. no issues w/ writing. but reading fails.
>  
> More info: https://github.com/apache/hudi/issues/2675



--
This message was sent by Atlassian Jira
(v8.3.4#803005)