[jira] [Commented] (PARQUET-1976) Use net.alchim31.maven:scala-maven-plugin instead of org.scala-tools:maven-scala-plugin
[ https://issues.apache.org/jira/browse/PARQUET-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281920#comment-17281920 ] Michael Heuer commented on PARQUET-1976: Re: Scala 2.12.12, note comment at https://issues.apache.org/jira/browse/SPARK-33921?focusedCommentId=17255394=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17255394 It might be better to stop at Scala 2.12.10 as Spark 3.1.x does or jump ahead to Scala 2.12.13. > Use net.alchim31.maven:scala-maven-plugin instead of > org.scala-tools:maven-scala-plugin > --- > > Key: PARQUET-1976 > URL: https://issues.apache.org/jira/browse/PARQUET-1976 > Project: Parquet > Issue Type: Improvement >Reporter: Martin Tzvetanov Grigorov >Priority: Minor > > org.scala-tools:maven-scala-plugin is not maintained since a long time. > [net.alchim31.maven:scala-maven-plugin|https://github.com/davidB/scala-maven-plugin] > is the replacement. > Also Scala version could be upgraded from 2.12.8 to 2.12.12 > Few other Maven plugins also could be upgraded. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1894) Please fix the related Shaded Jackson Databind CVEs
[ https://issues.apache.org/jira/browse/PARQUET-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169321#comment-17169321 ] Michael Heuer commented on PARQUET-1894: I would love to hear otherwise, but I believe Spark is blocked from upgrading Parquet due to the incompatible transitive Avro upgrade. > Please fix the related Shaded Jackson Databind CVEs > --- > > Key: PARQUET-1894 > URL: https://issues.apache.org/jira/browse/PARQUET-1894 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Rodney Aaron Stainback >Priority: Major > > The following CVEs are all related to version 2.9.10 of Jackson databind > which you shade > |cve|severity|cvss| > |CVE-2019-16942|critical|9.8| > |CVE-2019-16943|critical|9.8| > |CVE-2019-17531|critical|9.8| > |CVE-2019-20330|critical|9.8| > |CVE-2020-10672|high|8.8| > |CVE-2020-10673|high|8.8| > |CVE-2020-10968|high|8.8| > |CVE-2020-10969|high|8.8| > |CVE-2020-1|high|8.8| > |CVE-2020-2|high|8.8| > |CVE-2020-3|high|8.8| > |CVE-2020-11619|critical|9.8| > |CVE-2020-11620|critical|9.8| > |CVE-2020-14060|high|8.1| > |CVE-2020-14061|high|8.1| > |CVE-2020-14062|high|8.1| > |CVE-2020-14195|high|8.1| > |CVE-2020-8840|critical|9.8| > |CVE-2020-9546|critical|9.8| > |CVE-2020-9547|critical|9.8| > |CVE-2020-9548|critical|9.8| > > Our security team is trying to block us from using parquet files because of > this issue -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1758) InternalParquetRecordReader Logging it Too Verbose
[ https://issues.apache.org/jira/browse/PARQUET-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013869#comment-17013869 ] Michael Heuer commented on PARQUET-1758: +1, excessive logging from Parquet has been a pain for us downstream for many years > InternalParquetRecordReader Logging it Too Verbose > -- > > Key: PARQUET-1758 > URL: https://issues.apache.org/jira/browse/PARQUET-1758 > Project: Parquet > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > > A low-level library like Parquet should be pretty quiet. It should just do > its work and keep quiet. Most issues should be addressed by throwing > Exceptions, and the occasional warning message otherwise it will clutter the > logging for the top-level application. If debugging is required, > administrator can enable it for the specific workload. > *Warning:* This is my opinion. No stats to back it up. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1645) Bump Apache Avro to 1.9.1
[ https://issues.apache.org/jira/browse/PARQUET-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969351#comment-16969351 ] Michael Heuer commented on PARQUET-1645: I am very curious about this – Parquet vs Avro version incompatibilities have been a source of major headache for us downstream of Apache Spark. Will Spark be able to accept Avro 1.9.1 and Parquet 1.11.0 upgrades simultaneously? > Bump Apache Avro to 1.9.1 > - > > Key: PARQUET-1645 > URL: https://issues.apache.org/jira/browse/PARQUET-1645 > Project: Parquet > Issue Type: Task > Components: parquet-avro >Affects Versions: 1.10.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.11.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1241) [C++] Use LZ4 frame format
[ https://issues.apache.org/jira/browse/PARQUET-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965675#comment-16965675 ] Michael Heuer commented on PARQUET-1241: For JVM implementations, note that Apache Commons Compress has support for both block and frame compression [https://github.com/apache/commons-compress/tree/master/src/main/java/org/apache/commons/compress/compressors/lz4] It appears that it can detect frame LZ4 from an input stream but not block [https://github.com/apache/commons-compress/blob/master/src/main/java/org/apache/commons/compress/compressors/CompressorStreamFactory.java#L466] > [C++] Use LZ4 frame format > -- > > Key: PARQUET-1241 > URL: https://issues.apache.org/jira/browse/PARQUET-1241 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp, parquet-format >Reporter: Lawrence Chan >Priority: Major > > The parquet-format spec doesn't currently specify whether lz4-compressed data > should be framed or not. We should choose one and make it explicit in the > spec, as they are not inter-operable. After some discussions with others [1], > we think it would be beneficial to use the framed format, which adds a small > header in exchange for more self-contained decompression as well as a richer > feature set (checksums, parallel decompression, etc). > The current arrow implementation compresses using the lz4 block format, and > this would need to be updated when we add the spec clarification. > If backwards compatibility is a concern, I would suggest adding an additional > LZ4_FRAMED compression type, but that may be more noise than anything. > [1] https://github.com/dask/fastparquet/issues/314 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1441) SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter
[ https://issues.apache.org/jira/browse/PARQUET-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700834#comment-16700834 ] Michael Heuer commented on PARQUET-1441: Sorry, which compatibility check and commit? I'm also confused by the version numbers in your comment, both Parquet and Avro have made 1.8.2 releases. The regression is complicated and perhaps not worth discussing here, by Spark moving to Parquet 1.10 and Avro 1.8.2 our [previous workaround of pinning parquet-avro to 1.8.1|https://github.com/bigdatagenomics/adam/blob/master/pom.xml#L520] no longer works. That workaround was necessary because Spark depended on Parquet 1.8.2 and Avro 1.7.x which were incompatible with each other. > SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter > > > Key: PARQUET-1441 > URL: https://issues.apache.org/jira/browse/PARQUET-1441 > Project: Parquet > Issue Type: Bug > Components: parquet-avro >Reporter: Michael Heuer >Priority: Major > Labels: pull-request-available > > The following unit test added to TestAvroSchemaConverter fails > {code:java} > @Test > public void testConvertedSchemaToStringCantRedefineList() throws Exception { > String parquet = "message spark_schema {\n" + > " optional group annotation {\n" + > "optional group transcriptEffects (LIST) {\n" + > " repeated group list {\n" + > "optional group element {\n" + > " optional group effects (LIST) {\n" + > "repeated group list {\n" + > " optional binary element (UTF8);\n" + > "}\n" + > " }\n" + > "}\n" + > " }\n" + > "}\n" + > " }\n" + > "}\n"; > Configuration conf = new Configuration(false); > AvroSchemaConverter avroSchemaConverter = new AvroSchemaConverter(conf); > Schema schema = > avroSchemaConverter.convert(MessageTypeParser.parseMessageType(parquet)); > schema.toString(); > } > {code} > while this one succeeds > {code:java} > @Test > public void testConvertedSchemaToStringCantRedefineList() throws Exception { > String parquet = "message spark_schema {\n" + > " optional group annotation {\n" + > "optional group transcriptEffects (LIST) {\n" + > " repeated group list {\n" + > "optional group element {\n" + > " optional group effects (LIST) {\n" + > "repeated group list {\n" + > " optional binary element (UTF8);\n" + > "}\n" + > " }\n" + > "}\n" + > " }\n" + > "}\n" + > " }\n" + > "}\n"; > > Configuration conf = new Configuration(false); > conf.setBoolean("parquet.avro.add-list-element-records", false); > AvroSchemaConverter avroSchemaConverter = new AvroSchemaConverter(conf); > Schema schema = > avroSchemaConverter.convert(MessageTypeParser.parseMessageType(parquet)); > schema.toString(); > } > {code} > I don't see a way to influence the code path in AvroIndexedRecordConverter to > respect this configuration, resulting in the following stack trace downstream > {noformat} > Cause: org.apache.avro.SchemaParseException: Can't redefine: list > at org.apache.avro.Schema$Names.put(Schema.java:1128) > at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:562) > at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:690) > at org.apache.avro.Schema$ArraySchema.toJson(Schema.java:805) > at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882) > at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716) > at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701) > at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882) > at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716) > at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701) > at org.apache.avro.Schema.toString(Schema.java:324) > at > org.apache.avro.SchemaCompatibility.checkReaderWriterCompatibility(SchemaCompatibility.java:68) > at > org.apache.parquet.avro.AvroRecordConverter.isElementType(AvroRecordConverter.java:866) > at > org.apache.parquet.avro.AvroIndexedRecordConverter$AvroArrayConverter.(AvroIndexedRecordConverter.java:333) > at > org.apache.parquet.avro.AvroIndexedRecordConverter.newConverter(AvroIndexedRecordConverter.java:172) > at > org.apache.parquet.avro.AvroIndexedRecordConverter.(AvroIndexedRecordConverter.java:94) > at > org.apache.parquet.avro.AvroIndexedRecordConverter.newConverter(AvroIndexedRecordConverter.java:168) > at > org.apache.parquet.avro.AvroIndexedRecordConverter.(AvroIndexedRecordConverter.java:94) > at >
[jira] [Commented] (PARQUET-1441) SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter
[ https://issues.apache.org/jira/browse/PARQUET-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699567#comment-16699567 ] Michael Heuer commented on PARQUET-1441: Note as mentioned above that while {{parquet.avro.add-list-element-records=false}} works in the unit tests, it does not appear work with AvroIndexedRecordConverter, which is what we hit downstream in Spark. As far as workarounds, I'm afraid we're so far downstream that I'm not sure we would be able to use one. We use Avro AVDL to generate Java objects for persisting Spark RDDs to Parquet and separately to generate Scala products for persisting Spark Datasets to Parquet. Spark generates the schema for these Datasets-as-Parquet. Up until Spark version 2.4.0, which bumped Parquet to version 1.10 and Avro to 1.8.2, we could write out Datasets-as-Parquet and read in RDDs-as-Parquet without trouble (the two different schema were considered compatible). > SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter > > > Key: PARQUET-1441 > URL: https://issues.apache.org/jira/browse/PARQUET-1441 > Project: Parquet > Issue Type: Bug > Components: parquet-avro >Reporter: Michael Heuer >Priority: Major > Labels: pull-request-available > > The following unit test added to TestAvroSchemaConverter fails > {code:java} > @Test > public void testConvertedSchemaToStringCantRedefineList() throws Exception { > String parquet = "message spark_schema {\n" + > " optional group annotation {\n" + > "optional group transcriptEffects (LIST) {\n" + > " repeated group list {\n" + > "optional group element {\n" + > " optional group effects (LIST) {\n" + > "repeated group list {\n" + > " optional binary element (UTF8);\n" + > "}\n" + > " }\n" + > "}\n" + > " }\n" + > "}\n" + > " }\n" + > "}\n"; > Configuration conf = new Configuration(false); > AvroSchemaConverter avroSchemaConverter = new AvroSchemaConverter(conf); > Schema schema = > avroSchemaConverter.convert(MessageTypeParser.parseMessageType(parquet)); > schema.toString(); > } > {code} > while this one succeeds > {code:java} > @Test > public void testConvertedSchemaToStringCantRedefineList() throws Exception { > String parquet = "message spark_schema {\n" + > " optional group annotation {\n" + > "optional group transcriptEffects (LIST) {\n" + > " repeated group list {\n" + > "optional group element {\n" + > " optional group effects (LIST) {\n" + > "repeated group list {\n" + > " optional binary element (UTF8);\n" + > "}\n" + > " }\n" + > "}\n" + > " }\n" + > "}\n" + > " }\n" + > "}\n"; > > Configuration conf = new Configuration(false); > conf.setBoolean("parquet.avro.add-list-element-records", false); > AvroSchemaConverter avroSchemaConverter = new AvroSchemaConverter(conf); > Schema schema = > avroSchemaConverter.convert(MessageTypeParser.parseMessageType(parquet)); > schema.toString(); > } > {code} > I don't see a way to influence the code path in AvroIndexedRecordConverter to > respect this configuration, resulting in the following stack trace downstream > {noformat} > Cause: org.apache.avro.SchemaParseException: Can't redefine: list > at org.apache.avro.Schema$Names.put(Schema.java:1128) > at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:562) > at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:690) > at org.apache.avro.Schema$ArraySchema.toJson(Schema.java:805) > at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882) > at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716) > at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701) > at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882) > at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716) > at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701) > at org.apache.avro.Schema.toString(Schema.java:324) > at > org.apache.avro.SchemaCompatibility.checkReaderWriterCompatibility(SchemaCompatibility.java:68) > at > org.apache.parquet.avro.AvroRecordConverter.isElementType(AvroRecordConverter.java:866) > at > org.apache.parquet.avro.AvroIndexedRecordConverter$AvroArrayConverter.(AvroIndexedRecordConverter.java:333) > at > org.apache.parquet.avro.AvroIndexedRecordConverter.newConverter(AvroIndexedRecordConverter.java:172) > at >
[jira] [Commented] (PARQUET-1441) SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter
[ https://issues.apache.org/jira/browse/PARQUET-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645480#comment-16645480 ] Michael Heuer commented on PARQUET-1441: I've found I can get a similar stack trace going through AvroRecordConverter instead of AvroIndexedRecordConverter, by setting parquet.avro.compatible to false {code:scala} val job = HadoopUtil.newJob(sc) val conf = ContextUtil.getConfiguration(job) conf.setBoolean("parquet.avro.compatible", false) {code} {noformat} Cause: org.apache.avro.SchemaParseException: Can't redefine: list at org.apache.avro.Schema$Names.put(Schema.java:1128) at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:562) at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:690) at org.apache.avro.Schema$ArraySchema.toJson(Schema.java:805) at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882) at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716) at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701) at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882) at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716) at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701) at org.apache.avro.Schema.toString(Schema.java:324) at org.apache.avro.SchemaCompatibility.checkReaderWriterCompatibility(SchemaCompatibility.java:68) at org.apache.parquet.avro.AvroRecordConverter.isElementType(AvroRecordConverter.java:866) at org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:475) at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:289) at org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:141) at org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:279) at org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:141) at org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95) at org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33) at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138) at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:204) at org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182) at org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140) ... {noformat} > SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter > > > Key: PARQUET-1441 > URL: https://issues.apache.org/jira/browse/PARQUET-1441 > Project: Parquet > Issue Type: Bug > Components: parquet-avro >Reporter: Michael Heuer >Priority: Major > > The following unit test added to TestAvroSchemaConverter fails > {code:java} > @Test > public void testConvertedSchemaToStringCantRedefineList() throws Exception { > String parquet = "message spark_schema {\n" + > " optional group annotation {\n" + > "optional group transcriptEffects (LIST) {\n" + > " repeated group list {\n" + > "optional group element {\n" + > " optional group effects (LIST) {\n" + > "repeated group list {\n" + > " optional binary element (UTF8);\n" + > "}\n" + > " }\n" + > "}\n" + > " }\n" + > "}\n" + > " }\n" + > "}\n"; > Configuration conf = new Configuration(false); > AvroSchemaConverter avroSchemaConverter = new AvroSchemaConverter(conf); > Schema schema = > avroSchemaConverter.convert(MessageTypeParser.parseMessageType(parquet)); > schema.toString(); > } > {code} > while this one succeeds > {code:java} > @Test > public void testConvertedSchemaToStringCantRedefineList() throws Exception { > String parquet = "message spark_schema {\n" + > " optional group annotation {\n" + > "optional group transcriptEffects (LIST) {\n" + > " repeated group list {\n" + > "optional group element {\n" + > " optional group effects (LIST) {\n" + > "repeated group list {\n" + > " optional binary element (UTF8);\n" + > "}\n" + > " }\n" + > "}\n" + > " }\n" + > "}\n" + > " }\n" + > "}\n"; > > Configuration conf = new Configuration(false); > conf.setBoolean("parquet.avro.add-list-element-records", false); > AvroSchemaConverter avroSchemaConverter = new AvroSchemaConverter(conf); > Schema schema = > avroSchemaConverter.convert(MessageTypeParser.parseMessageType(parquet)); >
[jira] [Created] (PARQUET-1441) SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter
Michael Heuer created PARQUET-1441: -- Summary: SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter Key: PARQUET-1441 URL: https://issues.apache.org/jira/browse/PARQUET-1441 Project: Parquet Issue Type: Bug Components: parquet-avro Reporter: Michael Heuer The following unit test added to TestAvroSchemaConverter fails {code:java} @Test public void testConvertedSchemaToStringCantRedefineList() throws Exception { String parquet = "message spark_schema {\n" + " optional group annotation {\n" + "optional group transcriptEffects (LIST) {\n" + " repeated group list {\n" + "optional group element {\n" + " optional group effects (LIST) {\n" + "repeated group list {\n" + " optional binary element (UTF8);\n" + "}\n" + " }\n" + "}\n" + " }\n" + "}\n" + " }\n" + "}\n"; Configuration conf = new Configuration(false); AvroSchemaConverter avroSchemaConverter = new AvroSchemaConverter(conf); Schema schema = avroSchemaConverter.convert(MessageTypeParser.parseMessageType(parquet)); schema.toString(); } {code} while this one succeeds {code:java} @Test public void testConvertedSchemaToStringCantRedefineList() throws Exception { String parquet = "message spark_schema {\n" + " optional group annotation {\n" + "optional group transcriptEffects (LIST) {\n" + " repeated group list {\n" + "optional group element {\n" + " optional group effects (LIST) {\n" + "repeated group list {\n" + " optional binary element (UTF8);\n" + "}\n" + " }\n" + "}\n" + " }\n" + "}\n" + " }\n" + "}\n"; Configuration conf = new Configuration(false); conf.setBoolean("parquet.avro.add-list-element-records", false); AvroSchemaConverter avroSchemaConverter = new AvroSchemaConverter(conf); Schema schema = avroSchemaConverter.convert(MessageTypeParser.parseMessageType(parquet)); schema.toString(); } {code} I don't see a way to influence the code path in AvroIndexedRecordConverter to respect this configuration, resulting in the following stack trace downstream {noformat} Cause: org.apache.avro.SchemaParseException: Can't redefine: list at org.apache.avro.Schema$Names.put(Schema.java:1128) at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:562) at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:690) at org.apache.avro.Schema$ArraySchema.toJson(Schema.java:805) at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882) at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716) at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701) at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882) at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716) at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701) at org.apache.avro.Schema.toString(Schema.java:324) at org.apache.avro.SchemaCompatibility.checkReaderWriterCompatibility(SchemaCompatibility.java:68) at org.apache.parquet.avro.AvroRecordConverter.isElementType(AvroRecordConverter.java:866) at org.apache.parquet.avro.AvroIndexedRecordConverter$AvroArrayConverter.(AvroIndexedRecordConverter.java:333) at org.apache.parquet.avro.AvroIndexedRecordConverter.newConverter(AvroIndexedRecordConverter.java:172) at org.apache.parquet.avro.AvroIndexedRecordConverter.(AvroIndexedRecordConverter.java:94) at org.apache.parquet.avro.AvroIndexedRecordConverter.newConverter(AvroIndexedRecordConverter.java:168) at org.apache.parquet.avro.AvroIndexedRecordConverter.(AvroIndexedRecordConverter.java:94) at org.apache.parquet.avro.AvroIndexedRecordConverter.(AvroIndexedRecordConverter.java:66) at org.apache.parquet.avro.AvroCompatRecordMaterializer.(AvroCompatRecordMaterializer.java:34) at org.apache.parquet.avro.AvroReadSupport.newCompatMaterializer(AvroReadSupport.java:144) at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:136) at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:204) at org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182) at org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140) ... {noformat} See also downstream issues https://issues.apache.org/jira/browse/SPARK-25588 [https://github.com/bigdatagenomics/adam/issues/2058] -- This message was sent by Atlassian JIRA (v7.6.3#76005)