Re: Submitting Spark Applications using Spark Submit

2015-06-22 Thread Andrew Or
Did you restart your master / workers? On the master node, run
`sbin/stop-all.sh` followed by `sbin/start-all.sh`

2015-06-20 17:59 GMT-07:00 Raghav Shankar raghav0110...@gmail.com:

 Hey Andrew,

  I tried the following approach: I modified my Spark build on my local
 machine. I did downloaded the Spark 1.4.0 src code and then made a change
 to ResultTask.scala( I made a simple change to see if it work. I added a
 print statement). Now, I built spark using

 mvn -Dhadoop.version=1.0.4 -Phadoop-1 -DskipTests -Dscala-2.10 clean
 package

 Now, the new assembly jar was built. I started my EC2 Cluster using this
 command:

 ./ec2/spark-ec2 -k key -i ../aggr/key.pem --instance-type=m3.medium
 --zone=us-east-1b -s 9 launch spark-cluster

 I initially launched my application jar and it worked fine. After that I
 scp’d the new assembly jar to the spark lib directory of all my ec2 nodes.
 When I ran the jar again I got the following error:

 5/06/21 00:42:51 INFO AppClient$ClientActor: Connecting to master
 akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077/user/Master...
 15/06/21 00:42:52 WARN Remoting: Tried to associate with unreachable
 remote address [akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077].
 Address is now gated for 5000 ms, all messages to this address will be
 delivered to dead letters. Reason: Connection refused:
 ec2-XXX.compute-1.amazonaws.com/10.165.103.16:7077
 15/06/21 00:42:52 WARN AppClient$ClientActor: Could not connect to
 akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077: 
 akka.remote.InvalidAssociation:
 Invalid address:
 akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077
 15/06/21 00:43:11 INFO AppClient$ClientActor: Connecting to master
 akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077/user/Master...
 15/06/21 00:43:11 WARN AppClient$ClientActor: Could not connect to
 akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077: 
 akka.remote.InvalidAssociation:
 Invalid address:
 akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077
 15/06/21 00:43:11 WARN Remoting: Tried to associate with unreachable
 remote address [akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077].
 Address is now gated for 5000 ms, all messages to this address will be
 delivered to dead letters. Reason: Connection refused:
 ec2-XXX.compute-1.amazonaws.com/10.165.103.16:7077
 15/06/21 00:43:31 INFO AppClient$ClientActor: Connecting to master
 akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077/user/Master...
 15/06/21 00:43:31 WARN AppClient$ClientActor: Could not connect to
 akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077: 
 akka.remote.InvalidAssociation:
 Invalid address: akka.tcp://sparkmas...@xxx.compute-1.amazonaws.com:7077
 15/06/21 00:43:31 WARN Remoting: Tried to associate with unreachable
 remote address [akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077].
 Address is now gated for 5000 ms, all messages to this address will be
 delivered to dead letters. Reason: Connection refused:
 XXX.compute-1.amazonaws.com/10.165.103.16:7077
 15/06/21 00:43:51 ERROR SparkDeploySchedulerBackend: Application has been
 killed. Reason: All masters are unresponsive! Giving up.
 15/06/21 00:43:51 WARN SparkDeploySchedulerBackend: Application ID is not
 initialized yet.
 15/06/21 00:43:51 INFO SparkUI: Stopped Spark web UI at
 http://XXX.compute-1.amazonaws.com:4040
 15/06/21 00:43:51 INFO DAGScheduler: Stopping DAGScheduler
 15/06/21 00:43:51 INFO SparkDeploySchedulerBackend: Shutting down all
 executors
 15/06/21 00:43:51 INFO SparkDeploySchedulerBackend: Asking each executor
 to shut down
 15/06/21 00:43:51 ERROR OneForOneStrategy:
 java.lang.NullPointerException
 at
 org.apache.spark.deploy.client.AppClient$ClientActor$$anonfun$receiveWithLogging$1.applyOrElse(AppClient.scala:160)
 at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
 at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
 at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
 at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
 at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
 at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
 at
 org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at
 org.apache.spark.deploy.client.AppClient$ClientActor.aroundReceive(AppClient.scala:61)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
 at akka.dispatch.Mailbox.run(Mailbox.scala:220)
 at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at
 

Re: Submitting Spark Applications using Spark Submit

2015-06-20 Thread Raghav Shankar
Hey Andrew, 

 I tried the following approach: I modified my Spark build on my local machine. 
I did downloaded the Spark 1.4.0 src code and then made a change to 
ResultTask.scala( I made a simple change to see if it work. I added a print 
statement). Now, I built spark using 

mvn -Dhadoop.version=1.0.4 -Phadoop-1 -DskipTests -Dscala-2.10 clean package

Now, the new assembly jar was built. I started my EC2 Cluster using this 
command:

./ec2/spark-ec2 -k key -i ../aggr/key.pem --instance-type=m3.medium 
--zone=us-east-1b -s 9 launch spark-cluster

I initially launched my application jar and it worked fine. After that I scp’d 
the new assembly jar to the spark lib directory of all my ec2 nodes. When I ran 
the jar again I got the following error:

5/06/21 00:42:51 INFO AppClient$ClientActor: Connecting to master 
akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077/user/Master...
15/06/21 00:42:52 WARN Remoting: Tried to associate with unreachable remote 
address [akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077]. Address 
is now gated for 5000 ms, all messages to this address will be delivered to 
dead letters. Reason: Connection refused: 
ec2-XXX.compute-1.amazonaws.com/10.165.103.16:7077
15/06/21 00:42:52 WARN AppClient$ClientActor: Could not connect to 
akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077: 
akka.remote.InvalidAssociation: Invalid address: 
akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077
15/06/21 00:43:11 INFO AppClient$ClientActor: Connecting to master 
akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077/user/Master...
15/06/21 00:43:11 WARN AppClient$ClientActor: Could not connect to 
akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077: 
akka.remote.InvalidAssociation: Invalid address: 
akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077
15/06/21 00:43:11 WARN Remoting: Tried to associate with unreachable remote 
address [akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077]. Address 
is now gated for 5000 ms, all messages to this address will be delivered to 
dead letters. Reason: Connection refused: 
ec2-XXX.compute-1.amazonaws.com/10.165.103.16:7077
15/06/21 00:43:31 INFO AppClient$ClientActor: Connecting to master 
akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077/user/Master...
15/06/21 00:43:31 WARN AppClient$ClientActor: Could not connect to 
akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077: 
akka.remote.InvalidAssociation: Invalid address: 
akka.tcp://sparkmas...@xxx.compute-1.amazonaws.com:7077
15/06/21 00:43:31 WARN Remoting: Tried to associate with unreachable remote 
address [akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077]. Address 
is now gated for 5000 ms, all messages to this address will be delivered to 
dead letters. Reason: Connection refused: 
XXX.compute-1.amazonaws.com/10.165.103.16:7077
15/06/21 00:43:51 ERROR SparkDeploySchedulerBackend: Application has been 
killed. Reason: All masters are unresponsive! Giving up.
15/06/21 00:43:51 WARN SparkDeploySchedulerBackend: Application ID is not 
initialized yet.
15/06/21 00:43:51 INFO SparkUI: Stopped Spark web UI at 
http://XXX.compute-1.amazonaws.com:4040
15/06/21 00:43:51 INFO DAGScheduler: Stopping DAGScheduler
15/06/21 00:43:51 INFO SparkDeploySchedulerBackend: Shutting down all executors
15/06/21 00:43:51 INFO SparkDeploySchedulerBackend: Asking each executor to 
shut down
15/06/21 00:43:51 ERROR OneForOneStrategy: 
java.lang.NullPointerException
at 
org.apache.spark.deploy.client.AppClient$ClientActor$$anonfun$receiveWithLogging$1.applyOrElse(AppClient.scala:160)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at 
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
at 
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at 
org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.spark.deploy.client.AppClient$ClientActor.aroundReceive(AppClient.scala:61)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 

Re: Submitting Spark Applications using Spark Submit

2015-06-19 Thread Andrew Or
Hi Raghav,

If you want to make changes to Spark and run your application with it, you
may follow these steps.

1. git clone g...@github.com:apache/spark
2. cd spark; build/mvn clean package -DskipTests [...]
3. make local changes
4. build/mvn package -DskipTests [...] (no need to clean again here)
5. bin/spark-submit --master spark://[...] --class your.main.class your.jar

No need to pass in extra --driver-java-options or --driver-extra-classpath
as others have suggested. When using spark-submit, the main jar comes from
assembly/target/scala_2.10, which is prepared through mvn package. You
just have to make sure that you re-package the assembly jar after each
modification.

-Andrew

2015-06-18 16:35 GMT-07:00 maxdml max...@cs.duke.edu:

 You can specify the jars of your application to be included with
 spark-submit
 with the /--jars/ switch.

 Otherwise, are you sure that your newly compiled spark jar assembly is in
 assembly/target/scala-2.10/?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-Applications-using-Spark-Submit-tp23352p23400.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Submitting Spark Applications using Spark Submit

2015-06-19 Thread Andrew Or
Hi Raghav,

I'm assuming you're using standalone mode. When using the Spark EC2 scripts
you need to make sure that every machine has the most updated jars. Once
you have built on one of the nodes, you must *rsync* the Spark directory to
the rest of the nodes (see /root/spark-ec2/copy-dir).

That said, I usually build it locally on my laptop and *scp* the assembly
jar to the cluster instead of building it there. The EC2 machines often
take much longer to build for some reason. Also it's cumbersome to set up
proper IDE there.

-Andrew


2015-06-19 19:11 GMT-07:00 Raghav Shankar raghav0110...@gmail.com:

 Thanks Andrew! Is this all I have to do when using the spark ec2 script to
 setup a spark cluster? It seems to be getting an assembly jar that is not
 from my project(perhaps from a maven repo). Is there a way to make the
 ec2 script use the assembly jar that I created?

 Thanks,
 Raghav


 On Friday, June 19, 2015, Andrew Or and...@databricks.com wrote:

 Hi Raghav,

 If you want to make changes to Spark and run your application with it,
 you may follow these steps.

 1. git clone g...@github.com:apache/spark
 2. cd spark; build/mvn clean package -DskipTests [...]
 3. make local changes
 4. build/mvn package -DskipTests [...] (no need to clean again here)
 5. bin/spark-submit --master spark://[...] --class your.main.class
 your.jar

 No need to pass in extra --driver-java-options or
 --driver-extra-classpath as others have suggested. When using spark-submit,
 the main jar comes from assembly/target/scala_2.10, which is prepared
 through mvn package. You just have to make sure that you re-package the
 assembly jar after each modification.

 -Andrew

 2015-06-18 16:35 GMT-07:00 maxdml max...@cs.duke.edu:

 You can specify the jars of your application to be included with
 spark-submit
 with the /--jars/ switch.

 Otherwise, are you sure that your newly compiled spark jar assembly is in
 assembly/target/scala-2.10/?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-Applications-using-Spark-Submit-tp23352p23400.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: Submitting Spark Applications using Spark Submit

2015-06-19 Thread Raghav Shankar
Thanks Andrew! Is this all I have to do when using the spark ec2 script to
setup a spark cluster? It seems to be getting an assembly jar that is not
from my project(perhaps from a maven repo). Is there a way to make the ec2
script use the assembly jar that I created?

Thanks,
Raghav

On Friday, June 19, 2015, Andrew Or and...@databricks.com wrote:

 Hi Raghav,

 If you want to make changes to Spark and run your application with it, you
 may follow these steps.

 1. git clone g...@github.com:apache/spark
 2. cd spark; build/mvn clean package -DskipTests [...]
 3. make local changes
 4. build/mvn package -DskipTests [...] (no need to clean again here)
 5. bin/spark-submit --master spark://[...] --class your.main.class your.jar

 No need to pass in extra --driver-java-options or --driver-extra-classpath
 as others have suggested. When using spark-submit, the main jar comes from
 assembly/target/scala_2.10, which is prepared through mvn package. You
 just have to make sure that you re-package the assembly jar after each
 modification.

 -Andrew

 2015-06-18 16:35 GMT-07:00 maxdml max...@cs.duke.edu
 javascript:_e(%7B%7D,'cvml','max...@cs.duke.edu');:

 You can specify the jars of your application to be included with
 spark-submit
 with the /--jars/ switch.

 Otherwise, are you sure that your newly compiled spark jar assembly is in
 assembly/target/scala-2.10/?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-Applications-using-Spark-Submit-tp23352p23400.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 javascript:_e(%7B%7D,'cvml','user-unsubscr...@spark.apache.org');
 For additional commands, e-mail: user-h...@spark.apache.org
 javascript:_e(%7B%7D,'cvml','user-h...@spark.apache.org');





Re: Submitting Spark Applications using Spark Submit

2015-06-18 Thread lovelylavs
Hi,

To make the jar files as part of the jar which you would like to use, you
should create a uber jar. Please refer to the following:

https://maven.apache.org/plugins/maven-shade-plugin/examples/includes-excludes.html




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-Applications-using-Spark-Submit-tp23352p23395.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Submitting Spark Applications using Spark Submit

2015-06-18 Thread maxdml
You can specify the jars of your application to be included with spark-submit
with the /--jars/ switch.

Otherwise, are you sure that your newly compiled spark jar assembly is in
assembly/target/scala-2.10/?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-Applications-using-Spark-Submit-tp23352p23400.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Submitting Spark Applications using Spark Submit

2015-06-16 Thread Yanbo Liang
If you run Spark on YARN, the simplest way is replace the
$SPARK_HOME/lib/spark-.jar with your own version spark jar file and run
your application.
The spark-submit script will upload this jar to YARN cluster automatically
and then you can run your application as usual.
It does not care about which version of Spark in your YARN cluster.

2015-06-17 10:42 GMT+08:00 Raghav Shankar raghav0110...@gmail.com:

 The documentation says spark.driver.userClassPathFirst can only be used in
 cluster mode. Does this mean I have to set the --deploy-mode option for
 spark-submit to cluster? Or can I still use the default client? My
 understanding is that even in the default deploy mode, spark still uses
 the slave machines I have on ec2.

 Also, the spark.driver.extraLibraryPath property mentions that I can
 provide a path for special libraries on the spark-submit command line
 options. Do my jar files in this path have to be the same name as the jar
 used by spark, or is it intelligent enough to identify that two jars are
 supposed to be the same thing? If they are supposed to be the same name,
 how can I find out the name I should use for my jar? Eg: If I just name my
 modified spark-core jar as spark.jar and put in a lib folder and provide
 the path of the folder to spark-submit would that be enough to tell Spark
 to use that spark-core jar instead of the default?

 Thanks,
 Raghav

 On Jun 16, 2015, at 7:19 PM, Will Briggs wrbri...@gmail.com wrote:

 If this is research-only, and you don't want to have to worry about
 updating the jars installed by default on the cluster, you can add your
 custom Spark jar using the spark.driver.extraLibraryPath configuration
 property when running spark-submit, and then use the experimental 
 spark.driver.userClassPathFirst config to force it to use yours.

 See here for more details and options:
 https://spark.apache.org/docs/1.4.0/configuration.html

 On June 16, 2015, at 10:12 PM, Raghav Shankar raghav0110...@gmail.com
 wrote:

 I made the change so that I could implement top() using treeReduce(). A
 member on here suggested I make the change in RDD.scala to accomplish that.
 Also, this is for a research project, and not for commercial use.

 So, any advice on how I can get the spark submit to use my custom built
 jars would be very useful.

 Thanks,
 Raghav

 On Jun 16, 2015, at 6:57 PM, Will Briggs wrbri...@gmail.com wrote:

 In general, you should avoid making direct changes to the Spark source
 code. If you are using Scala, you can seamlessly blend your own methods on
 top of the base RDDs using implicit conversions.

 Regards,
 Will

 On June 16, 2015, at 7:53 PM, raggy raghav0110...@gmail.com wrote:

 I am trying to submit a spark application using the command line. I used
 the
 spark submit command for doing so. I initially setup my Spark application
 on
 Eclipse and have been making changes on there. I recently obtained my own
 version of the Spark source code and added a new method to RDD.scala. I
 created a new spark core jar using mvn, and I added it to my eclipse build
 path. My application ran perfectly fine.

 Now, I would like to submit it through the command line. I submitted my
 application like this:

 bin/spark-submit --master local[2] --class SimpleApp
 /Users/XXX/Desktop/spark2.jar

 The spark-submit command is within the spark project that I modified by
 adding new methods.
 When I do so, I get this error:

 java.lang.NoSuchMethodError:
 org.apache.spark.rdd.RDD.treeTop(ILscala/math/Ordering;)Ljava/lang/Object;
 at SimpleApp$.main(SimpleApp.scala:12)
 at SimpleApp.main(SimpleApp.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at

 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
 at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
 at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

 When I use spark submit, where does the jar come from? How do I make sure
 it
 uses the jars that have built?




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-Applications-using-Spark-Submit-tp23352.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org






Re: Submitting Spark Applications using Spark Submit

2015-06-16 Thread Raghav Shankar
To clarify, I am using the spark standalone cluster.

On Tuesday, June 16, 2015, Yanbo Liang yblia...@gmail.com wrote:

 If you run Spark on YARN, the simplest way is replace the
 $SPARK_HOME/lib/spark-.jar with your own version spark jar file and run
 your application.
 The spark-submit script will upload this jar to YARN cluster automatically
 and then you can run your application as usual.
 It does not care about which version of Spark in your YARN cluster.

 2015-06-17 10:42 GMT+08:00 Raghav Shankar raghav0110...@gmail.com
 javascript:_e(%7B%7D,'cvml','raghav0110...@gmail.com');:

 The documentation says spark.driver.userClassPathFirst can only be used
 in cluster mode. Does this mean I have to set the --deploy-mode option
 for spark-submit to cluster? Or can I still use the default client? My
 understanding is that even in the default deploy mode, spark still uses
 the slave machines I have on ec2.

 Also, the spark.driver.extraLibraryPath property mentions that I can
 provide a path for special libraries on the spark-submit command line
 options. Do my jar files in this path have to be the same name as the jar
 used by spark, or is it intelligent enough to identify that two jars are
 supposed to be the same thing? If they are supposed to be the same name,
 how can I find out the name I should use for my jar? Eg: If I just name my
 modified spark-core jar as spark.jar and put in a lib folder and provide
 the path of the folder to spark-submit would that be enough to tell Spark
 to use that spark-core jar instead of the default?

 Thanks,
 Raghav

 On Jun 16, 2015, at 7:19 PM, Will Briggs wrbri...@gmail.com
 javascript:_e(%7B%7D,'cvml','wrbri...@gmail.com'); wrote:

 If this is research-only, and you don't want to have to worry about
 updating the jars installed by default on the cluster, you can add your
 custom Spark jar using the spark.driver.extraLibraryPath configuration
 property when running spark-submit, and then use the experimental 
 spark.driver.userClassPathFirst config to force it to use yours.

 See here for more details and options:
 https://spark.apache.org/docs/1.4.0/configuration.html

 On June 16, 2015, at 10:12 PM, Raghav Shankar raghav0110...@gmail.com
 javascript:_e(%7B%7D,'cvml','raghav0110...@gmail.com'); wrote:

 I made the change so that I could implement top() using treeReduce(). A
 member on here suggested I make the change in RDD.scala to accomplish that.
 Also, this is for a research project, and not for commercial use.

 So, any advice on how I can get the spark submit to use my custom built
 jars would be very useful.

 Thanks,
 Raghav

 On Jun 16, 2015, at 6:57 PM, Will Briggs wrbri...@gmail.com
 javascript:_e(%7B%7D,'cvml','wrbri...@gmail.com'); wrote:

 In general, you should avoid making direct changes to the Spark source
 code. If you are using Scala, you can seamlessly blend your own methods on
 top of the base RDDs using implicit conversions.

 Regards,
 Will

 On June 16, 2015, at 7:53 PM, raggy raghav0110...@gmail.com
 javascript:_e(%7B%7D,'cvml','raghav0110...@gmail.com'); wrote:

 I am trying to submit a spark application using the command line. I used
 the
 spark submit command for doing so. I initially setup my Spark application
 on
 Eclipse and have been making changes on there. I recently obtained my own
 version of the Spark source code and added a new method to RDD.scala. I
 created a new spark core jar using mvn, and I added it to my eclipse build
 path. My application ran perfectly fine.

 Now, I would like to submit it through the command line. I submitted my
 application like this:

 bin/spark-submit --master local[2] --class SimpleApp
 /Users/XXX/Desktop/spark2.jar

 The spark-submit command is within the spark project that I modified by
 adding new methods.
 When I do so, I get this error:

 java.lang.NoSuchMethodError:
 org.apache.spark.rdd.RDD.treeTop(ILscala/math/Ordering;)Ljava/lang/Object;
 at SimpleApp$.main(SimpleApp.scala:12)
 at SimpleApp.main(SimpleApp.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at

 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
 at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
 at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

 When I use spark submit, where does the jar come from? How do I make sure
 it
 uses the jars that have built?




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-Applications-using-Spark-Submit-tp23352.html
 Sent from the Apache Spark User List 

Re: Submitting Spark Applications using Spark Submit

2015-06-16 Thread Will Briggs
In general, you should avoid making direct changes to the Spark source code. If 
you are using Scala, you can seamlessly blend your own methods on top of the 
base RDDs using implicit conversions.

Regards,
Will

On June 16, 2015, at 7:53 PM, raggy raghav0110...@gmail.com wrote:

I am trying to submit a spark application using the command line. I used the
spark submit command for doing so. I initially setup my Spark application on
Eclipse and have been making changes on there. I recently obtained my own
version of the Spark source code and added a new method to RDD.scala. I
created a new spark core jar using mvn, and I added it to my eclipse build
path. My application ran perfectly fine. 

Now, I would like to submit it through the command line. I submitted my
application like this:

bin/spark-submit --master local[2] --class SimpleApp
/Users/XXX/Desktop/spark2.jar

The spark-submit command is within the spark project that I modified by
adding new methods.
When I do so, I get this error:

java.lang.NoSuchMethodError:
org.apache.spark.rdd.RDD.treeTop(ILscala/math/Ordering;)Ljava/lang/Object;
at SimpleApp$.main(SimpleApp.scala:12)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

When I use spark submit, where does the jar come from? How do I make sure it
uses the jars that have built? 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-Applications-using-Spark-Submit-tp23352.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Submitting Spark Applications using Spark Submit

2015-06-16 Thread Raghav Shankar
I made the change so that I could implement top() using treeReduce(). A member 
on here suggested I make the change in RDD.scala to accomplish that. Also, this 
is for a research project, and not for commercial use. 

So, any advice on how I can get the spark submit to use my custom built jars 
would be very useful.

Thanks,
Raghav

 On Jun 16, 2015, at 6:57 PM, Will Briggs wrbri...@gmail.com wrote:
 
 In general, you should avoid making direct changes to the Spark source code. 
 If you are using Scala, you can seamlessly blend your own methods on top of 
 the base RDDs using implicit conversions.
 
 Regards,
 Will
 
 On June 16, 2015, at 7:53 PM, raggy raghav0110...@gmail.com wrote:
 
 I am trying to submit a spark application using the command line. I used the
 spark submit command for doing so. I initially setup my Spark application on
 Eclipse and have been making changes on there. I recently obtained my own
 version of the Spark source code and added a new method to RDD.scala. I
 created a new spark core jar using mvn, and I added it to my eclipse build
 path. My application ran perfectly fine. 
 
 Now, I would like to submit it through the command line. I submitted my
 application like this:
 
 bin/spark-submit --master local[2] --class SimpleApp
 /Users/XXX/Desktop/spark2.jar
 
 The spark-submit command is within the spark project that I modified by
 adding new methods.
 When I do so, I get this error:
 
 java.lang.NoSuchMethodError:
 org.apache.spark.rdd.RDD.treeTop(ILscala/math/Ordering;)Ljava/lang/Object;
   at SimpleApp$.main(SimpleApp.scala:12)
   at SimpleApp.main(SimpleApp.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
   at 
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 
 When I use spark submit, where does the jar come from? How do I make sure it
 uses the jars that have built? 
 
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-Applications-using-Spark-Submit-tp23352.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Submitting Spark Applications using Spark Submit

2015-06-16 Thread Will Briggs
If this is research-only, and you don't want to have to worry about updating 
the jars installed by default on the cluster, you can add your custom Spark jar 
using the spark.driver.extraLibraryPath configuration property when running 
spark-submit, and then use the experimental  spark.driver.userClassPathFirst 
config to force it to use yours.

See here for more details and options: 
https://spark.apache.org/docs/1.4.0/configuration.html

On June 16, 2015, at 10:12 PM, Raghav Shankar raghav0110...@gmail.com wrote:

I made the change so that I could implement top() using treeReduce(). A member 
on here suggested I make the change in RDD.scala to accomplish that. Also, this 
is for a research project, and not for commercial use. 

So, any advice on how I can get the spark submit to use my custom built jars 
would be very useful.

Thanks,
Raghav

 On Jun 16, 2015, at 6:57 PM, Will Briggs wrbri...@gmail.com wrote:
 
 In general, you should avoid making direct changes to the Spark source code. 
 If you are using Scala, you can seamlessly blend your own methods on top of 
 the base RDDs using implicit conversions.
 
 Regards,
 Will
 
 On June 16, 2015, at 7:53 PM, raggy raghav0110...@gmail.com wrote:
 
 I am trying to submit a spark application using the command line. I used the
 spark submit command for doing so. I initially setup my Spark application on
 Eclipse and have been making changes on there. I recently obtained my own
 version of the Spark source code and added a new method to RDD.scala. I
 created a new spark core jar using mvn, and I added it to my eclipse build
 path. My application ran perfectly fine. 
 
 Now, I would like to submit it through the command line. I submitted my
 application like this:
 
 bin/spark-submit --master local[2] --class SimpleApp
 /Users/XXX/Desktop/spark2.jar
 
 The spark-submit command is within the spark project that I modified by
 adding new methods.
 When I do so, I get this error:
 
 java.lang.NoSuchMethodError:
 org.apache.spark.rdd.RDD.treeTop(ILscala/math/Ordering;)Ljava/lang/Object;
   at SimpleApp$.main(SimpleApp.scala:12)
   at SimpleApp.main(SimpleApp.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
   at 
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 
 When I use spark submit, where does the jar come from? How do I make sure it
 uses the jars that have built? 
 
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-Applications-using-Spark-Submit-tp23352.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 



Re: Submitting Spark Applications using Spark Submit

2015-06-16 Thread Raghav Shankar
The documentation says spark.driver.userClassPathFirst can only be used in 
cluster mode. Does this mean I have to set the --deploy-mode option for 
spark-submit to cluster? Or can I still use the default client? My 
understanding is that even in the default deploy mode, spark still uses the 
slave machines I have on ec2. 

Also, the spark.driver.extraLibraryPath property mentions that I can provide a 
path for special libraries on the spark-submit command line options. Do my jar 
files in this path have to be the same name as the jar used by spark, or is it 
intelligent enough to identify that two jars are supposed to be the same thing? 
If they are supposed to be the same name, how can I find out the name I should 
use for my jar? Eg: If I just name my modified spark-core jar as spark.jar and 
put in a lib folder and provide the path of the folder to spark-submit would 
that be enough to tell Spark to use that spark-core jar instead of the default?

Thanks,
Raghav

 On Jun 16, 2015, at 7:19 PM, Will Briggs wrbri...@gmail.com wrote:
 
 If this is research-only, and you don't want to have to worry about updating 
 the jars installed by default on the cluster, you can add your custom Spark 
 jar using the spark.driver.extraLibraryPath configuration property when 
 running spark-submit, and then use the experimental  
 spark.driver.userClassPathFirst config to force it to use yours.
 
 See here for more details and options: 
 https://spark.apache.org/docs/1.4.0/configuration.html
 
 On June 16, 2015, at 10:12 PM, Raghav Shankar raghav0110...@gmail.com wrote:
 
 I made the change so that I could implement top() using treeReduce(). A 
 member on here suggested I make the change in RDD.scala to accomplish that. 
 Also, this is for a research project, and not for commercial use. 
 
 So, any advice on how I can get the spark submit to use my custom built jars 
 would be very useful.
 
 Thanks,
 Raghav
 
 On Jun 16, 2015, at 6:57 PM, Will Briggs wrbri...@gmail.com wrote:
 
 In general, you should avoid making direct changes to the Spark source code. 
 If you are using Scala, you can seamlessly blend your own methods on top of 
 the base RDDs using implicit conversions.
 
 Regards,
 Will
 
 On June 16, 2015, at 7:53 PM, raggy raghav0110...@gmail.com wrote:
 
 I am trying to submit a spark application using the command line. I used the
 spark submit command for doing so. I initially setup my Spark application on
 Eclipse and have been making changes on there. I recently obtained my own
 version of the Spark source code and added a new method to RDD.scala. I
 created a new spark core jar using mvn, and I added it to my eclipse build
 path. My application ran perfectly fine. 
 
 Now, I would like to submit it through the command line. I submitted my
 application like this:
 
 bin/spark-submit --master local[2] --class SimpleApp
 /Users/XXX/Desktop/spark2.jar
 
 The spark-submit command is within the spark project that I modified by
 adding new methods.
 When I do so, I get this error:
 
 java.lang.NoSuchMethodError:
 org.apache.spark.rdd.RDD.treeTop(ILscala/math/Ordering;)Ljava/lang/Object;
  at SimpleApp$.main(SimpleApp.scala:12)
  at SimpleApp.main(SimpleApp.scala)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
  at 
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
  at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 
 When I use spark submit, where does the jar come from? How do I make sure it
 uses the jars that have built? 
 
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-Applications-using-Spark-Submit-tp23352.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org