[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1
[ https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161639#comment-16161639 ] Anthony Dotterer commented on SPARK-20958: -- For those not well versed in sbt shading, the following would pin your project at parquet-avro 1.8.1. In your project/plugins.sbt: {code} addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.5") {code} In your build.sbt: {code} libraryDependencies += ("org.apache.parquet" % "parquet-avro" % "1.8.1").intransitive() assemblyShadeRules in assembly := Seq( ShadeRule.rename("org.apache.parquet.avro.**" -> "shade.parquet.avro.@1") .inLibrary("org.apache.parquet" % "parquet-avro" % "1.8.1") .inProject ) {code} > Roll back parquet-mr 1.8.2 to parquet-1.8.1 > --- > > Key: SPARK-20958 > URL: https://issues.apache.org/jira/browse/SPARK-20958 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > Labels: release-notes, release_notes, releasenotes > > We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on > avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 > and avro 1.7.7 used by spark-core 2.2.0-rc2. > Basically, Spark 2.2.0-rc2 introduced two incompatible versions of avro > (1.7.7 and 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the > reasons mentioned in [PR > #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. > Therefore, we don't really have many choices here and have to roll back > parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1
[ https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16158924#comment-16158924 ] Ryan Blue commented on SPARK-20958: --- [~spiricalsalsaz], you need to only pin parquet-avro, not the other Parquet libs. This is caused by a bug in Parquet that has been fixed in 1.8.2, so you want the 1.8.2 version of parquet-hadoop, but the 1.8.1 version of parquet-avro. Alternatively, you can shade and relocate the version of Avro you want and use parquet-avro 1.8.2. That's what I'd recommend. > Roll back parquet-mr 1.8.2 to parquet-1.8.1 > --- > > Key: SPARK-20958 > URL: https://issues.apache.org/jira/browse/SPARK-20958 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > Labels: release-notes, release_notes, releasenotes > > We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on > avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 > and avro 1.7.7 used by spark-core 2.2.0-rc2. > Basically, Spark 2.2.0-rc2 introduced two incompatible versions of avro > (1.7.7 and 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the > reasons mentioned in [PR > #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. > Therefore, we don't really have many choices here and have to roll back > parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1
[ https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155387#comment-16155387 ] Anthony Dotterer commented on SPARK-20958: -- As a user of Spark 2.2.0 that mixes usage of parquet-avro and avro, here are some exceptions that I had. This will hopefully make search engines find this library conflict more quickly for others. {code} java.lang.NoClassDefFoundError: org/apache/avro/LogicalType at org.apache.parquet.avro.AvroParquetWriter.writeSupport(AvroParquetWriter.java:144) at org.apache.parquet.avro.AvroParquetWriter.access$100(AvroParquetWriter.java:35) at org.apache.parquet.avro.AvroParquetWriter$Builder.getWriteSupport(AvroParquetWriter.java:173) ... Caused by: java.lang.ClassNotFoundException: org.apache.avro.LogicalType {code} > Roll back parquet-mr 1.8.2 to parquet-1.8.1 > --- > > Key: SPARK-20958 > URL: https://issues.apache.org/jira/browse/SPARK-20958 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > Labels: release-notes, release_notes, releasenotes > > We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on > avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 > and avro 1.7.7 used by spark-core 2.2.0-rc2. > Basically, Spark 2.2.0-rc2 introduced two incompatible versions of avro > (1.7.7 and 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the > reasons mentioned in [PR > #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. > Therefore, we don't really have many choices here and have to roll back > parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1
[ https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043478#comment-16043478 ] Cheng Lian commented on SPARK-20958: [~marmbrus], here is the draft release note entry: {quote} SPARK-20958: For users who use parquet-avro together with Spark 2.2, please use parquet-avro 1.8.1 instead of parquet-avro 1.8.2. This is because parquet-avro 1.8.2 upgrades avro from 1.7.6 to 1.8.1, which is backward incompatible with 1.7.6. {quote} > Roll back parquet-mr 1.8.2 to parquet-1.8.1 > --- > > Key: SPARK-20958 > URL: https://issues.apache.org/jira/browse/SPARK-20958 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > Labels: release-notes, release_notes, releasenotes > > We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on > avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 > and avro 1.7.7 used by spark-core 2.2.0-rc2. > Basically, Spark 2.2.0-rc2 introduced two incompatible versions of avro > (1.7.7 and 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the > reasons mentioned in [PR > #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. > Therefore, we don't really have many choices here and have to roll back > parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1
[ https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035149#comment-16035149 ] Cheng Lian commented on SPARK-20958: Thanks [~rdblue]! I'm also reluctant to roll it back considering those fixes we wanted so badly... We decided to give this a try because, from the perspective of release management, we'd like to avoid cutting a release with known conflicting dependencies, even transitive ones. For a Spark 2.2 user, it's quite natural to choose parquet-avro 1.8.2, which is part of parquet-mr 1.8.2, which in turn, is a direct dependency of Spark 2.2.0. However, due to PARQUET-389, rolling back is already not an option. Two options I can see here are: # Release Spark 2.2.0 as is with a statement in the release notes saying that users should use parquet-avro 1.8.1 instead of 1.8.2 to avoid the Avro compatibility issue. # Wait for parquet-mr 1.8.3, which hopefully resolves this dependency issue (e.g., by reverting PARQUET-358). > Roll back parquet-mr 1.8.2 to parquet-1.8.1 > --- > > Key: SPARK-20958 > URL: https://issues.apache.org/jira/browse/SPARK-20958 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > > We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on > avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 > and avro 1.7.7 used by spark-core 2.2.0-rc2. > Basically, Spark 2.2.0-rc2 introduced two incompatible versions of avro > (1.7.7 and 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the > reasons mentioned in [PR > #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. > Therefore, we don't really have many choices here and have to roll back > parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1
[ https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034961#comment-16034961 ] Dongjoon Hyun commented on SPARK-20958: --- +1 for [~rdblue]. > Roll back parquet-mr 1.8.2 to parquet-1.8.1 > --- > > Key: SPARK-20958 > URL: https://issues.apache.org/jira/browse/SPARK-20958 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > > We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on > avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 > and avro 1.7.7 used by spark-core 2.2.0-rc2. > Basically, Spark 2.2.0-rc2 introduced two incompatible versions of avro > (1.7.7 and 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the > reasons mentioned in [PR > #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. > Therefore, we don't really have many choices here and have to roll back > parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1
[ https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034950#comment-16034950 ] Ryan Blue commented on SPARK-20958: --- I don't think it is a good idea to roll back. Spark doesn't depend on parquet-avro, where the update to Avro 1.8.1 was made, except for tests where it is fine. The backports for Spark in 1.8.2 are worth keeping since there are reasonable work-arounds in user projects. The problem that I've seen on the dev list is when users add parquet-avro to their dependencies and the version gets managed to 1.8.2. That will require Avro 1.8.1 because parquet-avro calls {{getSchema}} on avro-specific objects. But there are a couple reasonable ways to deal with this: 1. Specify a dependency on parquet-avro 1.8.1 that still uses Avro 1.7.x. Parquet is backward-compatible with older binaries, so parquet-avro 1.8.1 works fine with parquet-hadoop 1.8.2. (This is the recommended work-around.) 2. Shade and relocate Avro 1.8.1 in application Jars, so that Spark can use 1.7.x and parquet-avro can use 1.8.1. This was brought up on the dev list, but the user dismissed these work-arounds without trying them. Long-term, we can do a 1.8.3 release to solve this problem, though I think the best solution there would be to stop using {{getSchema}} instead of downgrading the dependency. > Roll back parquet-mr 1.8.2 to parquet-1.8.1 > --- > > Key: SPARK-20958 > URL: https://issues.apache.org/jira/browse/SPARK-20958 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > > We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on > avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 > and avro 1.7.7 used by spark-core 2.2.0-rc2. > Basically, Spark 2.2.0-rc2 introduced two incompatible versions of avro > (1.7.7 and 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the > reasons mentioned in [PR > #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. > Therefore, we don't really have many choices here and have to roll back > parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1
[ https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034310#comment-16034310 ] Cheng Lian commented on SPARK-20958: [~rdblue] I think the root cause here is we cherry-picked parquet-mr [PR #318|https://github.com/apache/parquet-mr/pull/318] to parquet-mr 1.8.2, and introduced this avro upgrade. Tried to roll back parquet-mr back to 1.8.1 but it doesn't work well because this brings back [PARQUET-389|https://issues.apache.org/jira/browse/PARQUET-389] and breaks some test cases involving schema evolution. It would be nice if we can have a parquet-mr 1.8.3 or 1.8.2.1 release that has [PR #318|https://github.com/apache/parquet-mr/pull/318] reverted from 1.8.2? I think cherry-picking that PR is also problematic for parquet-mr because it introduces a backward-incompatible dependency change in a maintenance release. > Roll back parquet-mr 1.8.2 to parquet-1.8.1 > --- > > Key: SPARK-20958 > URL: https://issues.apache.org/jira/browse/SPARK-20958 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > > We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on > avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 > and avro 1.7.7 used by spark-core 2.2.0-rc2. > Basically, Spark 2.2.0-rc2 introduced two incompatible versions of avro > (1.7.7 and 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the > reasons mentioned in [PR > #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. > Therefore, we don't really have many choices here and have to roll back > parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20958) Roll back parquet-mr 1.8.2 to parquet-1.8.1
[ https://issues.apache.org/jira/browse/SPARK-20958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033987#comment-16033987 ] Apache Spark commented on SPARK-20958: -- User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/18181 > Roll back parquet-mr 1.8.2 to parquet-1.8.1 > --- > > Key: SPARK-20958 > URL: https://issues.apache.org/jira/browse/SPARK-20958 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > > We recently realized that parquet-mr 1.8.2 used by Spark 2.2.0-rc2 depends on > avro 1.8.1, which is incompatible with avro 1.7.6 used by parquet-mr 1.8.1 > and avro 1.7.7 used by spark-core 2.2.0-rc2. > , Spark 2.2.0-rc2 introduced two incompatible versions of avro (1.7.7 and > 1.8.1). Upgrading avro 1.7.7 to 1.8.1 is not preferable due to the reasons > mentioned in [PR > #17163|https://github.com/apache/spark/pull/17163#issuecomment-286563131]. > Therefore, we don't really have many choices here and have to roll back > parquet-mr 1.8.2 to 1.8.1 to resolve this dependency conflict. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org