[jira] [Commented] (HUDI-721) AvroConversionUtils is broken for complex types in 0.6

2020-05-25 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17115972#comment-17115972
 ] 

sivabalan narayanan commented on HUDI-721:
--

Merged with 
[https://github.com/apache/hudi/commit/ce0a4c64d07d6eea926d1bfb92b69ae387b88f50]

 

> AvroConversionUtils is broken for complex types in 0.6
> --
>
> Key: HUDI-721
> URL: https://issues.apache.org/jira/browse/HUDI-721
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Alexander Filipchik
>Assignee: Udit Mehrotra
>Priority: Major
>  Labels: bug-bash-0.6.0
> Fix For: 0.6.0
>
>
> hi,
> was working on the upgrade from 0.5 to 0.6 and hit a bug in 
> AvroConversionUtils. I originally blames it on Spark parquet to avro schema 
> generator (convertStructTypeToAvroSchema method), but after some debugging 
> I'm pretty sure the issue is somewhere in the: AvroConversionHelper.
> What happens: when complexes type is extracted using SqlTransformer (using 
> select bla fro ) where bla is complex type with arrays of struct, Kryo 
> serialization breaks with :
>  
> {code:java}
> 28701 [dag-scheduler-event-loop] INFO  
> org.apache.spark.scheduler.DAGScheduler  - ResultStage 1 (isEmpty at 
> DeltaSync.java:337) failed in 12.146 s due to Job aborted due to stage 
> failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 
> 0.0 in stage 1.0 (TID 1, localhost, executor driver): 
> org.apache.avro.UnresolvedUnionException: Not in union 
>   at 
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740)
>   at 
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:192)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:120)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.serializeDatum(GenericAvroSerializer.scala:125)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:159)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:4

[jira] [Commented] (HUDI-721) AvroConversionUtils is broken for complex types in 0.6

2020-05-25 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17115971#comment-17115971
 ] 

sivabalan narayanan commented on HUDI-721:
--

thanks. 

> AvroConversionUtils is broken for complex types in 0.6
> --
>
> Key: HUDI-721
> URL: https://issues.apache.org/jira/browse/HUDI-721
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Alexander Filipchik
>Assignee: Udit Mehrotra
>Priority: Major
>  Labels: bug-bash-0.6.0
> Fix For: 0.6.0
>
>
> hi,
> was working on the upgrade from 0.5 to 0.6 and hit a bug in 
> AvroConversionUtils. I originally blames it on Spark parquet to avro schema 
> generator (convertStructTypeToAvroSchema method), but after some debugging 
> I'm pretty sure the issue is somewhere in the: AvroConversionHelper.
> What happens: when complexes type is extracted using SqlTransformer (using 
> select bla fro ) where bla is complex type with arrays of struct, Kryo 
> serialization breaks with :
>  
> {code:java}
> 28701 [dag-scheduler-event-loop] INFO  
> org.apache.spark.scheduler.DAGScheduler  - ResultStage 1 (isEmpty at 
> DeltaSync.java:337) failed in 12.146 s due to Job aborted due to stage 
> failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 
> 0.0 in stage 1.0 (TID 1, localhost, executor driver): 
> org.apache.avro.UnresolvedUnionException: Not in union 
>   at 
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740)
>   at 
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:192)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:120)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.serializeDatum(GenericAvroSerializer.scala:125)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:159)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:47)
>   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
>   

[jira] [Commented] (HUDI-721) AvroConversionUtils is broken for complex types in 0.6

2020-05-24 Thread Udit Mehrotra (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17115685#comment-17115685
 ] 

Udit Mehrotra commented on HUDI-721:


[~shivnarayan] this particular issue has already been fixed by 
[https://github.com/apache/hudi/pull/1406] . [~afilipchik] has opened another 
ticket for for a follow up issue. This can be resolved.

> AvroConversionUtils is broken for complex types in 0.6
> --
>
> Key: HUDI-721
> URL: https://issues.apache.org/jira/browse/HUDI-721
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Alexander Filipchik
>Assignee: Udit Mehrotra
>Priority: Major
>  Labels: bug-bash-0.6.0
> Fix For: 0.6.0
>
>
> hi,
> was working on the upgrade from 0.5 to 0.6 and hit a bug in 
> AvroConversionUtils. I originally blames it on Spark parquet to avro schema 
> generator (convertStructTypeToAvroSchema method), but after some debugging 
> I'm pretty sure the issue is somewhere in the: AvroConversionHelper.
> What happens: when complexes type is extracted using SqlTransformer (using 
> select bla fro ) where bla is complex type with arrays of struct, Kryo 
> serialization breaks with :
>  
> {code:java}
> 28701 [dag-scheduler-event-loop] INFO  
> org.apache.spark.scheduler.DAGScheduler  - ResultStage 1 (isEmpty at 
> DeltaSync.java:337) failed in 12.146 s due to Job aborted due to stage 
> failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 
> 0.0 in stage 1.0 (TID 1, localhost, executor driver): 
> org.apache.avro.UnresolvedUnionException: Not in union 
>   at 
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740)
>   at 
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:192)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:120)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.serializeDatum(GenericAvroSerializer.scala:125)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:159)
>   a

[jira] [Commented] (HUDI-721) AvroConversionUtils is broken for complex types in 0.6

2020-05-23 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114954#comment-17114954
 ] 

sivabalan narayanan commented on HUDI-721:
--

[~uditme]: did you get a chance to follow up on this ticket.  

> AvroConversionUtils is broken for complex types in 0.6
> --
>
> Key: HUDI-721
> URL: https://issues.apache.org/jira/browse/HUDI-721
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Alexander Filipchik
>Assignee: Udit Mehrotra
>Priority: Major
>  Labels: bug-bash-0.6.0
> Fix For: 0.6.0
>
>
> hi,
> was working on the upgrade from 0.5 to 0.6 and hit a bug in 
> AvroConversionUtils. I originally blames it on Spark parquet to avro schema 
> generator (convertStructTypeToAvroSchema method), but after some debugging 
> I'm pretty sure the issue is somewhere in the: AvroConversionHelper.
> What happens: when complexes type is extracted using SqlTransformer (using 
> select bla fro ) where bla is complex type with arrays of struct, Kryo 
> serialization breaks with :
>  
> {code:java}
> 28701 [dag-scheduler-event-loop] INFO  
> org.apache.spark.scheduler.DAGScheduler  - ResultStage 1 (isEmpty at 
> DeltaSync.java:337) failed in 12.146 s due to Job aborted due to stage 
> failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 
> 0.0 in stage 1.0 (TID 1, localhost, executor driver): 
> org.apache.avro.UnresolvedUnionException: Not in union 
>   at 
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740)
>   at 
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:192)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:120)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.serializeDatum(GenericAvroSerializer.scala:125)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:159)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:47)
>   at com.esotericsoftware

[jira] [Commented] (HUDI-721) AvroConversionUtils is broken for complex types in 0.6

2020-03-20 Thread Alexander Filipchik (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063471#comment-17063471
 ] 

Alexander Filipchik commented on HUDI-721:
--

Serializations works on staging with the fixed. But job can't complete due to: 
HUDI-722

> AvroConversionUtils is broken for complex types in 0.6
> --
>
> Key: HUDI-721
> URL: https://issues.apache.org/jira/browse/HUDI-721
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Alexander Filipchik
>Priority: Major
> Fix For: 0.6.0
>
>
> hi,
> was working on the upgrade from 0.5 to 0.6 and hit a bug in 
> AvroConversionUtils. I originally blames it on Spark parquet to avro schema 
> generator (convertStructTypeToAvroSchema method), but after some debugging 
> I'm pretty sure the issue is somewhere in the: AvroConversionHelper.
> What happens: when complexes type is extracted using SqlTransformer (using 
> select bla fro ) where bla is complex type with arrays of struct, Kryo 
> serialization breaks with :
>  
> {code:java}
> 28701 [dag-scheduler-event-loop] INFO  
> org.apache.spark.scheduler.DAGScheduler  - ResultStage 1 (isEmpty at 
> DeltaSync.java:337) failed in 12.146 s due to Job aborted due to stage 
> failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 
> 0.0 in stage 1.0 (TID 1, localhost, executor driver): 
> org.apache.avro.UnresolvedUnionException: Not in union 
>   at 
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740)
>   at 
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:192)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:120)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.serializeDatum(GenericAvroSerializer.scala:125)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:159)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:47)
>   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.

[jira] [Commented] (HUDI-721) AvroConversionUtils is broken for complex types in 0.6

2020-03-20 Thread Alexander Filipchik (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063208#comment-17063208
 ] 

Alexander Filipchik commented on HUDI-721:
--

looks like it fixed the issue.

> AvroConversionUtils is broken for complex types in 0.6
> --
>
> Key: HUDI-721
> URL: https://issues.apache.org/jira/browse/HUDI-721
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Alexander Filipchik
>Priority: Major
> Fix For: 0.6.0
>
>
> hi,
> was working on the upgrade from 0.5 to 0.6 and hit a bug in 
> AvroConversionUtils. I originally blames it on Spark parquet to avro schema 
> generator (convertStructTypeToAvroSchema method), but after some debugging 
> I'm pretty sure the issue is somewhere in the: AvroConversionHelper.
> What happens: when complexes type is extracted using SqlTransformer (using 
> select bla fro ) where bla is complex type with arrays of struct, Kryo 
> serialization breaks with :
>  
> {code:java}
> 28701 [dag-scheduler-event-loop] INFO  
> org.apache.spark.scheduler.DAGScheduler  - ResultStage 1 (isEmpty at 
> DeltaSync.java:337) failed in 12.146 s due to Job aborted due to stage 
> failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 
> 0.0 in stage 1.0 (TID 1, localhost, executor driver): 
> org.apache.avro.UnresolvedUnionException: Not in union 
>   at 
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740)
>   at 
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:192)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:120)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.serializeDatum(GenericAvroSerializer.scala:125)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:159)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:47)
>   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
>   at 
> com.esotericsoftware.kryo.serializ

[jira] [Commented] (HUDI-721) AvroConversionUtils is broken for complex types in 0.6

2020-03-20 Thread Alexander Filipchik (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063138#comment-17063138
 ] 

Alexander Filipchik commented on HUDI-721:
--

Will try. [~uditme]  could you please also take a look at HUDI-722.

> AvroConversionUtils is broken for complex types in 0.6
> --
>
> Key: HUDI-721
> URL: https://issues.apache.org/jira/browse/HUDI-721
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Alexander Filipchik
>Priority: Major
> Fix For: 0.6.0
>
>
> hi,
> was working on the upgrade from 0.5 to 0.6 and hit a bug in 
> AvroConversionUtils. I originally blames it on Spark parquet to avro schema 
> generator (convertStructTypeToAvroSchema method), but after some debugging 
> I'm pretty sure the issue is somewhere in the: AvroConversionHelper.
> What happens: when complexes type is extracted using SqlTransformer (using 
> select bla fro ) where bla is complex type with arrays of struct, Kryo 
> serialization breaks with :
>  
> {code:java}
> 28701 [dag-scheduler-event-loop] INFO  
> org.apache.spark.scheduler.DAGScheduler  - ResultStage 1 (isEmpty at 
> DeltaSync.java:337) failed in 12.146 s due to Job aborted due to stage 
> failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 
> 0.0 in stage 1.0 (TID 1, localhost, executor driver): 
> org.apache.avro.UnresolvedUnionException: Not in union 
>   at 
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740)
>   at 
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:192)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:120)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.serializeDatum(GenericAvroSerializer.scala:125)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:159)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:47)
>   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
>   at 

[jira] [Commented] (HUDI-721) AvroConversionUtils is broken for complex types in 0.6

2020-03-18 Thread Udit Mehrotra (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062247#comment-17062247
 ] 

Udit Mehrotra commented on HUDI-721:


This could be related to [https://github.com/apache/incubator-hudi/pull/1406] 
where we are fixing an issue with Array of structs as well as Map types. So, if 
you are using complex types it could be possibly because of this. But I have 
never seen this error being thrown out of Spark, but from Hudi code. You may 
want to pull in that patch and retry. If it still doesn't work, a short 
reproduction step would help. I would be happy to take a look.

> AvroConversionUtils is broken for complex types in 0.6
> --
>
> Key: HUDI-721
> URL: https://issues.apache.org/jira/browse/HUDI-721
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Alexander Filipchik
>Priority: Major
> Fix For: 0.6.0
>
>
> hi,
> was working on the upgrade from 0.5 to 0.6 and hit a bug in 
> AvroConversionUtils. I originally blames it on Spark parquet to avro schema 
> generator (convertStructTypeToAvroSchema method), but after some debugging 
> I'm pretty sure the issue is somewhere in the: AvroConversionHelper.
> What happens: when complexes type is extracted using SqlTransformer (using 
> select bla fro ) where bla is complex type with arrays of struct, Kryo 
> serialization breaks with :
>  
> {code:java}
> 28701 [dag-scheduler-event-loop] INFO  
> org.apache.spark.scheduler.DAGScheduler  - ResultStage 1 (isEmpty at 
> DeltaSync.java:337) failed in 12.146 s due to Job aborted due to stage 
> failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 
> 0.0 in stage 1.0 (TID 1, localhost, executor driver): 
> org.apache.avro.UnresolvedUnionException: Not in union 
>   at 
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740)
>   at 
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:192)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:120)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>   at 
> org.apache.spark.se

[jira] [Commented] (HUDI-721) AvroConversionUtils is broken for complex types in 0.6

2020-03-18 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062237#comment-17062237
 ] 

Vinoth Chandar commented on HUDI-721:
-

cc [~uditme] love to get your thoughts on this also.. 

> AvroConversionUtils is broken for complex types in 0.6
> --
>
> Key: HUDI-721
> URL: https://issues.apache.org/jira/browse/HUDI-721
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Alexander Filipchik
>Priority: Major
> Fix For: 0.6.0
>
>
> hi,
> was working on the upgrade from 0.5 to 0.6 and hit a bug in 
> AvroConversionUtils. I originally blames it on Spark parquet to avro schema 
> generator (convertStructTypeToAvroSchema method), but after some debugging 
> I'm pretty sure the issue is somewhere in the: AvroConversionHelper.
> What happens: when complexes type is extracted using SqlTransformer (using 
> select bla fro ) where bla is complex type with arrays of struct, Kryo 
> serialization breaks with :
>  
> {code:java}
> 28701 [dag-scheduler-event-loop] INFO  
> org.apache.spark.scheduler.DAGScheduler  - ResultStage 1 (isEmpty at 
> DeltaSync.java:337) failed in 12.146 s due to Job aborted due to stage 
> failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 
> 0.0 in stage 1.0 (TID 1, localhost, executor driver): 
> org.apache.avro.UnresolvedUnionException: Not in union 
>   at 
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740)
>   at 
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:192)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:120)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.serializeDatum(GenericAvroSerializer.scala:125)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:159)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:47)
>   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultArraySeria

[jira] [Commented] (HUDI-721) AvroConversionUtils is broken for complex types in 0.6

2020-03-18 Thread Alexander Filipchik (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062144#comment-17062144
 ] 

Alexander Filipchik commented on HUDI-721:
--

[~vbalaji] ^^^

> AvroConversionUtils is broken for complex types in 0.6
> --
>
> Key: HUDI-721
> URL: https://issues.apache.org/jira/browse/HUDI-721
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Alexander Filipchik
>Priority: Major
>
> hi,
> was working on the upgrade from 0.5 to 0.6 and hit a bug in 
> AvroConversionUtils. I originally blames it on Spark parquet to avro schema 
> generator (convertStructTypeToAvroSchema method), but after some debugging 
> I'm pretty sure the issue is somewhere in the: AvroConversionHelper.
> What happens: when complexes type is extracted using SqlTransformer (using 
> select bla fro ) where bla is complex type with arrays of struct, Kryo 
> serialization breaks with :
>  
> {code:java}
> 28701 [dag-scheduler-event-loop] INFO  
> org.apache.spark.scheduler.DAGScheduler  - ResultStage 1 (isEmpty at 
> DeltaSync.java:337) failed in 12.146 s due to Job aborted due to stage 
> failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 
> 0.0 in stage 1.0 (TID 1, localhost, executor driver): 
> org.apache.avro.UnresolvedUnionException: Not in union 
>   at 
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740)
>   at 
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:192)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:120)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.serializeDatum(GenericAvroSerializer.scala:125)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:159)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:47)
>   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.ja

[jira] [Commented] (HUDI-721) AvroConversionUtils is broken for complex types in 0.6

2020-03-18 Thread Alexander Filipchik (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062025#comment-17062025
 ] 

Alexander Filipchik commented on HUDI-721:
--

Also, putting old code in AvroConversionHelper solves he issue.

> AvroConversionUtils is broken for complex types in 0.6
> --
>
> Key: HUDI-721
> URL: https://issues.apache.org/jira/browse/HUDI-721
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Alexander Filipchik
>Priority: Major
>
> hi,
> was working on the upgrade from 0.5 to 0.6 and hit a bug in 
> AvroConversionUtils. I originally blames it on Spark parquet to avro schema 
> generator (convertStructTypeToAvroSchema method), but after some debugging 
> I'm pretty sure the issue is somewhere in the: AvroConversionHelper.
>  
> What happens: when complexes type is extracted using SqlTransformer (using 
> select bla fro ) where bla is complex type with arrays of struct, Kryo 
> serialization breaks with :
>  
> {code:java}
> 28701 [dag-scheduler-event-loop] INFO  
> org.apache.spark.scheduler.DAGScheduler  - ResultStage 1 (isEmpty at 
> DeltaSync.java:337) failed in 12.146 s due to Job aborted due to stage 
> failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 
> 0.0 in stage 1.0 (TID 1, localhost, executor driver): 
> org.apache.avro.UnresolvedUnionException: Not in union 
>   at 
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740)
>   at 
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:192)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:120)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>   at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>   at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.serializeDatum(GenericAvroSerializer.scala:125)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:159)
>   at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:47)
>   at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
>   at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$O