Re: Spark 1.1 / cdh4 stuck using old hadoop client?

Sean Owen Tue, 16 Sep 2014 03:56:11 -0700

>From the caller / application perspective, you don't care what version
of Hadoop Spark is running on on the cluster. The Spark API you
compile against is the same. When you spark-submit the app, at
runtime, Spark is using the Hadoop libraries from the cluster, which
are the right version.


So when you build your app, you mark Spark as a 'provided' dependency.
Therefore in general, no, you do not build Spark for yourself if you
are a Spark app creator.

(Of course, your app would care if it were also using Hadoop libraries
directly. In that case, you will want to depend on hadoop-client, and
the right version for your cluster, but still mark it as provided.)

The version Spark is built against only matters when you are deploying
Spark's artifacts on the cluster to set it up.

Your error suggests there is still a version mismatch. Either you
deployed a build that was not compatible, or, maybe you are packaging
a version of Spark with your app which is incompatible and
interfering.

For example, the artifacts you get via Maven depend on Hadoop 1.0.4. I
suspect that's what you're doing -- packaging Spark(+Hadoop1.0.4) with
your app, when it shouldn't be packaged.

Spark works out of the box with just about any modern combo of HDFS and YARN.

On Tue, Sep 16, 2014 at 2:28 AM, Paul Wais <pw...@yelp.com> wrote:
> Dear List,
>
> I'm having trouble getting Spark 1.1 to use the Hadoop 2 API for
> reading SequenceFiles.  In particular, I'm seeing:
>
> Exception in thread "main" org.apache.hadoop.ipc.RemoteException:
> Server IPC version 7 cannot communicate with client version 4
>         at org.apache.hadoop.ipc.Client.call(Client.java:1070)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>         at com.sun.proxy.$Proxy7.getProtocolVersion(Unknown Source)
>         ...
>
> When invoking JavaSparkContext#newAPIHadoopFile().  (With args
> validSequenceFileURI, SequenceFileInputFormat.class, Text.class,
> BytesWritable.class, new Job().getConfiguration() -- Pretty close to
> the unit test here:
> https://github.com/apache/spark/blob/f0f1ba09b195f23f0c89af6fa040c9e01dfa8951/core/src/test/java/org/apache/spark/JavaAPISuite.java#L916
> )
>
>
> This error indicates to me that Spark is using an old hadoop client to
> do reads.  Oddly I'm able to do /writes/ ok, i.e. I'm able to write
> via JavaPairRdd#saveAsNewAPIHadoopFile() to my hdfs cluster.
>
>
> Do I need to explicitly build spark for modern hadoop??  I previously
> had an hdfs cluster running hadoop 2.3.0 and I was getting a similar
> error (server is using version 9, client is using version 4).
>
>
> I'm using Spark 1.1 cdh4 as well as hadoop cdh4 from the links posted
> on spark's site:
>  * http://d3kbcqa49mib13.cloudfront.net/spark-1.1.0-bin-cdh4.tgz
>  * http://d3kbcqa49mib13.cloudfront.net/hadoop-2.0.0-cdh4.2.0.tar.gz
>
>
> What distro of hadoop is used at Data Bricks?  Are there distros of
> Spark 1.1 and hadoop that should work together out-of-the-box?
> (Previously I had Spark 1.0.0 and Hadoop 2.3 working fine..)
>
> Thanks for any help anybody can give me here!
> -Paul
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark 1.1 / cdh4 stuck using old hadoop client?

Reply via email to