[GitHub] spark issue #18587: [SPARK-12559][Mesos] fix --packages for mesos

skonto Wed, 12 Jul 2017 03:14:26 -0700

Github user skonto commented on the issue:

    https://github.com/apache/spark/pull/18587
  
    @vanzin 
    > args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
    
    My understanding:
    Files intended for the appmaster are handled by copying to hdfs:
    
https://github.com/apache/spark/blob/1cad31f00644d899d8e74d58c6eb4e9f72065473/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L365
    The distributed cache manager does the work by utilizing the distributed 
Hadoop cache for jars at the executor side. 
    
https://github.com/apache/spark/blob/ab9872db1f9c0f289541ec5756d1a142d85545ce/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientDistributedCacheManager.scala
    
    > As for the change, it seems to work because the Mesos backend starts the 
driver using spark-submit, right?
    That is the idea. The second time the submit is called in order to launch 
the driver, the submit is done using client mode which is the default. The call 
for launching the driver in client mode is here:
    
https://github.com/apache/spark/blob/8da3f7041aafa71d7596b531625edb899970fec2/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L411-L442
    
    > If that's the case it seems fine, although it kinda loses the ability to 
use the ivy cache on the machine launching the job...
    
    That's is correct. If you launch jobs from a machine within the same 
network cache makes sense. Otherwise it is just copying over the internet. Now 
in restarts it is also important when you re-launch something to access it from 
the cache. In spark there is no functionality for the mesos code to exploit any 
type of cache from what I see. I would prefer a unified cluster layer for 
certain things like in the case of other frameworks: 
https://issues.apache.org/jira/browse/FLINK-6177
    I guess some refactoring would make yarn stuff accessible from the mesos 
part in Spark. Some other options are the fetcher cache mechanism but not so 
sure if it is distributed, it is only local per agent, need to discuss it 
further with @susanxhuynh @ArtRand




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #18587: [SPARK-12559][Mesos] fix --packages for mesos

Reply via email to