Hi: I have a 2 node Spark cluster that I built with Hadoop 2.2.0 compatibility, I also have HDFS on both machines. Everything works great, I can read files from HDFS through spark shell. My question is on the requirement if I want to connect to this cluster from a machine outside my cluster. So, if I am doing a Java+Maven...
a) Can I use the make-distribution.sh bundled in spark directory and give the produced JAR to someone else for him/her to include as a dependency in Maven, and then have them point to the Spark Master in my cluster when they want to run their work? b) Do I need to connect in different way, maybe w/o JAR? Has anyone successfully used Java+Maven to connect to remote cluster? I get the following error, it might just be an issue with how I do things in Maven as this class is included in the JAR file produced when running make-distribution.sh java.lang.NoClassDefFoundError: org/apache/spark/api/java/function/Function I tried looking at http://spark.incubator.apache.org/docs/latest/quick-start.html but if I follow the sample pom.xml listed, I get following error: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AppendRequestProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; Thanks, Gui
