Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?

2014-03-25 Thread Andrew Lee
Hi All,
I'm getting the following error when I execute start-master.sh which also 
invokes spark-class at the end.








Failed to find Spark assembly in /root/spark/assembly/target/scala-2.10/
You need to build Spark with 'sbt/sbt assembly' before running this program.
After digging into the code, I see the CLASSPATH is hardcoded with 
spark-assembly.*hadoop.*.jar.In bin/spark-class :
if [ ! -f $FWDIR/RELEASE ]; then  # Exit if the user hasn't compiled Spark  
num_jars=$(ls $FWDIR/assembly/target/scala-$SCALA_VERSION/ | grep 
spark-assembly.*hadoop.*.jar | wc -l)  jars_list=$(ls 
$FWDIR/assembly/target/scala-$SCALA_VERSION/ | grep 
spark-assembly.*hadoop.*.jar)  if [ $num_jars -eq 0 ]; thenecho 
Failed to find Spark assembly in $FWDIR/assembly/target/scala-$SCALA_VERSION/ 
2echo You need to build Spark with 'sbt/sbt assembly' before running 
this program. 2exit 1  fi  if [ $num_jars -gt 1 ]; thenecho 
Found multiple Spark assembly jars in 
$FWDIR/assembly/target/scala-$SCALA_VERSION: 2echo $jars_listecho 
Please remove all but one jar.exit 1  fi






















fi
Is there any reason why this is only grabbing spark-assembly.*hadoop.*.jar ? I 
am trying to run Spark that links to my own version of Hadoop under 
/opt/hadoop23/, and I use 'sbt/sbt clean package' to build the package without 
the Hadoop jar. What is the correct way to link to my own Hadoop jar?


  

RE: Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?

2014-03-25 Thread Andrew Lee
Hi Paul,
I got it sorted out.
The problem is that the JARs are built into the assembly JARs when you run
sbt/sbt clean assembly
What I did is:sbt/sbt clean package
This will only give you the small JARs. The next steps is to update the 
CLASSPATH in the bin/compute-classpath.sh script manually, appending all the 
JARs.
With :
sbt/sbt assembly
We can't introduce our own Hadoop patch since it will always pull from Maven 
repo, unless we hijack the repository path, or do a 'mvn install' locally. This 
is more of a hack I think.


Date: Tue, 25 Mar 2014 15:23:08 -0700
Subject: Re: Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar 
files?
From: paulmscho...@gmail.com
To: user@spark.apache.org

Andrew, 
I ran into the same problem and eventually settled on just running the jars 
directly with java. Since we use sbt to build our jars we had all the 
dependancies builtin to the jar it self so need for random class paths. 


On Tue, Mar 25, 2014 at 1:47 PM, Andrew Lee alee...@hotmail.com wrote:




Hi All,
I'm getting the following error when I execute start-master.sh which also 
invokes spark-class at the end.








Failed to find Spark assembly in /root/spark/assembly/target/scala-2.10/

You need to build Spark with 'sbt/sbt assembly' before running this program.


After digging into the code, I see the CLASSPATH is hardcoded with 
spark-assembly.*hadoop.*.jar.

In bin/spark-class :


if [ ! -f $FWDIR/RELEASE ]; then
  # Exit if the user hasn't compiled Spark
  num_jars=$(ls $FWDIR/assembly/target/scala-$SCALA_VERSION/ | grep 
spark-assembly.*hadoop.*.jar | wc -l)

  jars_list=$(ls $FWDIR/assembly/target/scala-$SCALA_VERSION/ | grep 
spark-assembly.*hadoop.*.jar)
  if [ $num_jars -eq 0 ]; then

echo Failed to find Spark assembly in 
$FWDIR/assembly/target/scala-$SCALA_VERSION/ 2
echo You need to build Spark with 'sbt/sbt assembly' before running this 
program. 2

exit 1
  fi
  if [ $num_jars -gt 1 ]; then
echo Found multiple Spark assembly jars in 
$FWDIR/assembly/target/scala-$SCALA_VERSION: 2

echo $jars_list
echo Please remove all but one jar.
exit 1
  fi

























fi


Is there any reason why this is only grabbing spark-assembly.*hadoop.*.jar ? I 
am trying to run Spark that links to my own version of Hadoop under 
/opt/hadoop23/, 

and I use 'sbt/sbt clean package' to build the package without the Hadoop jar. 
What is the correct way to link to my own Hadoop jar?