Ryan Skraba created BEAM-7979: --------------------------------- Summary: Avro incompatibilities with Spark 2.2 and Spark 2.3 Key: BEAM-7979 URL: https://issues.apache.org/jira/browse/BEAM-7979 Project: Beam Issue Type: Bug Components: io-java-gcp, io-java-parquet, sdk-java-core Reporter: Ryan Skraba
Much of the code that depends on Avro (notably the wrappers built with [BeamSQL|https://github.com/apache/beam/blob/ae83448597f64474c3f5754d7b8e3f6b02347a6b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L34] but also [some|https://github.com/apache/beam/blob/ae83448597f64474c3f5754d7b8e3f6b02347a6b/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java] [connectors|https://github.com/apache/beam/blob/ae83448597f64474c3f5754d7b8e3f6b02347a6b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtils.java#L42]) require a version > 1.8.x This library is not present in Spark 2.2 and Spark 2.3 clusters, which are meant to be supported. These pipelines will fail with ClassNotFoundException / MethodNotFoundExceptions. Spark 2.4+ should be unaffected. Relocating or vendoring is probably not appropriate, since Avro is frequently exposed in the API through parameters and potentially in generated specific records. -- This message was sent by Atlassian JIRA (v7.6.14#76016)