marcelo,
i was not aware of those fixes. its a fulltime job to keep up with spark...
i will take another look. it would be great if that works on spark
standalone also and resolves the issues i experienced before.

about putting stuff on classpath before spark or yarn... yeah you can shoot
yourself in the foot with it, but since the container is isolated it should
be ok, no? we have been using HADOOP_USER_CLASSPATH_FIRST forever with
great success.

with the ability to put our own classes first and support for security yarn
now seems more attractive than standalone to me for many
applications/situations. never thought i would say that.

best

On Wed, Feb 4, 2015 at 4:01 PM, Marcelo Vanzin <van...@cloudera.com> wrote:

> Hi Koert,
>
> On Wed, Feb 4, 2015 at 11:35 AM, Koert Kuipers <ko...@tresata.com> wrote:
> > do i understand it correctly that on yarn the the customer jars are truly
> > placed before the yarn and spark jars on classpath? meaning at container
> > construction time, on the same classloader? that would be great news for
> me.
> > it would open up the possibility of using newer versions of many
> libraries.
>
> That's correct, the Yarn setting places the user's jars in the system
> classpath before Spark/Hadoop jars, so they can override classes
> needed by Spark/Hadoop.
>
> That's the main reason why it's not documented and not suggested
> unless there's no other workaround. Because you're potentially
> overriding classes that might break Spark, Hadoop or something else
> that's packaged with those. But if it works for your case, that's
> great.
>
> As for the "userClassPath" first thing, I've made some changes to the
> class loaders as part of implementing that option for Yarn [1], and
> someone also made similar changes in isolation [2]. So maybe the
> issues you were running into are fixed by either of those? In the
> future, it would be great to be able to declare that feature stable,
> since I believe it's a better alternative to overriding libraries that
> Spark or Hadoop depend on.
>
> [1] https://github.com/apache/spark/pull/3233
> [2] https://github.com/apache/spark/pull/3725
>
> --
> Marcelo
>

Reply via email to