Hi, I have a problem trying to get a fairly simple app working which makes use of native avro libraries. The app runs fine on my local machine and in yarn-cluster mode, but when I try to run it on EMR yarn-client mode I get the error below. I'm aware this is a version problem, as EMR runs an earlier version of avro, and I am trying to use avro-1.7.7.
What's confusing me a great deal is the fact that this runs fine in yarn-cluster mode. What is it about yarn-cluster mode that means the application has access to the correct version of the avro library? I need to run in yarn-client mode as I will be caching data to the driver machine in between batches. I think in yarn-cluster mode the driver can run on any machine in the cluster so this would not work. Grateful for any advice as I'm really stuck on this. AWS support are trying but they don't seem to know why this is happening either! Just to note, I'm aware of Databricks spark-avro project and have used it. This is an investigation to see if I can use RDDs instead of dataframes. java.lang.NoSuchMethodError: org.apache.avro.Schema$Parser.parse(Ljava/lang/String;[Ljava/lang/String;)Lorg/apache/avro/Schema; at ophan.thrift.event.Event.<clinit>(Event.java:10) at SimpleApp$.main(SimpleApp.scala:25) at SimpleApp.main(SimpleApp.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Thanks, Tom