Re: which line in SparkBuild.scala specifies hadoop-core-xxx.jar?

Nan Zhu Fri, 20 Dec 2013 14:18:02 -0800

Hi, Gary,

Thank you very much


This afternoon, I tried to compile spark with my customized hadoop, it finally 
works

For those who shared the same problem with me:

1. add the following line to SparkBuild.scala

resolvers ++= Seq("Local Hadoop Repo" at "file:///Users/nanzhu/.m2/repository”),

2. install your customized jars

mvn install:install-file 
-Dfile=/Users/nanzhu/code/hadoop-1.2.1/build/hadoop-client-1.2.2-SNAPSHOT.jar 
-DgroupId=org.apache.hadoop -DartifactId=hadoop-core -Dversion=1.2.2-SNAPSHOT 
-Dpackaging=jar -DgeneratePom=true

mvn install:install-file 
-Dfile=/Users/nanzhu/code/hadoop-1.2.1/build/hadoop-core-1.2.2-SNAPSHOT.jar 
-DgroupId=org.apache.hadoop -DartifactId=hadoop-core -Dversion=1.2.2-SNAPSHOT 
-Dpackaging=jar -DgeneratePom=true

3. set SPARK_HADOOP_VERSION to 1.22-SNAPSHOT

4. add the dependency of hadoop-core

search org.apache.hadoop" % "hadoop-client" % hadoopVersion 
excludeAll(excludeJackson, excludeNetty, excludeAsm, excludeCglib) in 
project/SparkBuild.scala

add "org.apache.hadoop" % "hadoop-core" % hadoopVersion 
excludeAll(excludeJackson, excludeNetty, excludeAsm, excludeCglib), below it

5. compile spark

note:

a. in 1, I don’t know why  resolvers ++= Seq(Resolver.file("Local Maven Repo", 
file(Path.userHome + "/.m2/repository"))), cannot resolve my directory, so I 
have to manually add resolvers ++= Seq("Local Hadoop Repo" at 
"file:///Users/nanzhu/.m2/repository”). It is still weird that  Seq("Local 
Hadoop Repo”, file("Users/nanzhu/.m2/repository”)) doesn’t work….

b. in 4, the cllient.jar dependency cannot download core.jar in automatic 
(why?) I have to add an explicit dependency on core.jar


Best,  

--  
Nan Zhu


On Monday, December 16, 2013 at 2:41 PM, Gary Malouf wrote:

> Check out the dependencies for the version of hadoop-client you are using - I 
> think you will find that hadoop-core is present there.
>  
>  
>  
>  
> On Mon, Dec 16, 2013 at 1:28 PM, Nan Zhu <[email protected] 
> (mailto:[email protected])> wrote:
> > Hi, Gary,   
> >  
> > The page says Spark uses hadoop-client.jar to interact with HDFS, but why 
> > it also downloads hadoop-core?
> >  
> > Do I just need to change the dependency on hadoop-client to my local repo?  
> >  
> > Best,  
> >  
> > --  
> > Nan Zhu
> > School of Computer Science,
> > McGill University
> >  
> >  
> >  
> >  
> > On Monday, December 16, 2013 at 9:05 AM, Gary Malouf wrote:
> >  
> > > Hi Nan, check out the 'Note about Hadoop Versions' on 
> > > http://spark.incubator.apache.org/docs/latest/
> > >  
> > > Let us know if this does not solve your problem.  
> > >  
> > > Gary
> > >  
> > >  
> > > On Mon, Dec 16, 2013 at 8:19 AM, Nan Zhu <[email protected] 
> > > (mailto:[email protected])> wrote:
> > > > Hi, Azuryy  
> > > >  
> > > > Thank you for the reply
> > > >  
> > > > So you compiled Spark with mvn?
> > > >  
> > > > I’m watching the pom.xml, I think it is doing the same work as 
> > > > SparkBuild.Scala,   
> > > >  
> > > > I’m still confused by that, in Spark, some class utilized some classes 
> > > > like InputFormat, I assume that this should be included in 
> > > > hadoop-core.jar,
> > > >  
> > > > but I didn’t find any line specified hadoop-core-1.0.4.jar in pom.xml 
> > > > and SparkBuild.scala,   
> > > >  
> > > > Can you explain a bit to me?
> > > >  
> > > > Best,  
> > > >  
> > > > --  
> > > > Nan Zhu
> > > > School of Computer Science,
> > > > McGill University
> > > >  
> > > >  
> > > >  
> > > > On Monday, December 16, 2013 at 3:58 AM, Azuryy Yu wrote:
> > > >  
> > > > > Hi Nan,
> > > > > I am also using our customized hadoop, so you need to modiy the 
> > > > > pom.xml, but before this change, you should install your customized 
> > > > > hadoop-* jar in the local maven repo.
> > > > >  
> > > > >  
> > > > >  
> > > > >  
> > > > > On Sun, Dec 15, 2013 at 2:45 AM, Nan Zhu <[email protected] 
> > > > > (mailto:[email protected])> wrote:
> > > > > > Hi, all  
> > > > > >  
> > > > > > I’m trying to compile Spark with a customized version of hadoop, 
> > > > > > where I modify the implementation of DFSInputStream,  
> > > > > >  
> > > > > > I would like to SparkBuild.scala to make spark compile with my 
> > > > > > hadoop-core.xxx.jar instead of download a original one?  
> > > > > >  
> > > > > > I only found hadoop-client-xxx.jar and some lines about yarn jars 
> > > > > > in ScalaBuild.scala,  
> > > > > >  
> > > > > > Can you tell me which line I should modify to achieve the goal?
> > > > > >  
> > > > > > Best,  
> > > > > >  
> > > > > > --  
> > > > > > Nan Zhu
> > > > > > School of Computer Science,
> > > > > > McGill University
> > > > > >  
> > > > > >  
> > > > >  
> > > >  
> > >  
> >  
>

Re: which line in SparkBuild.scala specifies hadoop-core-xxx.jar?

Reply via email to