Github user skonto commented on the issue:
https://github.com/apache/spark/pull/18587
@vanzin
> args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
My understanding:
Files intended for the appmaster are handled by copying to hdfs:
https://github.com/apache/spark/blob/1cad31f00644d899d8e74d58c6eb4e9f72065473/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L365
The distributed cache manager does the work by utilizing the distributed
Hadoop cache for jars at the executor side.
https://github.com/apache/spark/blob/ab9872db1f9c0f289541ec5756d1a142d85545ce/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientDistributedCacheManager.scala
> As for the change, it seems to work because the Mesos backend starts the
driver using spark-submit, right?
That is the idea. The second time the submit is called in order to launch
the driver, the submit is done using client mode which is the default. The call
for launching the driver in client mode is here:
https://github.com/apache/spark/blob/8da3f7041aafa71d7596b531625edb899970fec2/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L411-L442
> If that's the case it seems fine, although it kinda loses the ability to
use the ivy cache on the machine launching the job...
That's is correct. If you launch jobs from a machine within the same
network cache makes sense. Otherwise it is just copying over the internet. Now
in restarts it is also important when you re-launch something to access it from
the cache. In spark there is no functionality for the mesos code to exploit any
type of cache from what I see. I would prefer a unified cluster layer for
certain things like in the case of other frameworks:
https://issues.apache.org/jira/browse/FLINK-6177
I guess some refactoring would make yarn stuff accessible from the mesos
part in Spark. Some other options are the fetcher cache mechanism but not so
sure if it is distributed, it is only local per agent, need to discuss it
further with @susanxhuynh @ArtRand
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]