Hi John, Sorry haven't get time to respond to your questions over the weekend.
If you're running client mode, to use the Docker/Mesos integration minimally you just need to set the image configuration 'spark.mesos.executor.docker.image' as stated in the documentation, which Spark will use this image to run each Spark executor. Therefore, if you want to include your python dependencies, you can also pre-install them in that image and it should be able to find it if you set the PYTHON env variables pointing to those. I'm not very familiar with python, but I recently got Mesos cluster mode with python to work and it's merged into master. Tim On Mon, Sep 21, 2015 at 8:34 AM, John Omernik <j...@omernik.com> wrote: > Hey all - > > Curious at the best way to include python packages in my Spark > installation. (Such as NLTK). Basically I am running on Mesos, and would > like to find a way to include the package in the binary distribution in > that I don't want to install packages on all nodes. We should be able to > include in the distribution, right?. > > I thought of using the Docker Mesos integration, but I have been unable to > find information on this (see my other question on Docker/Mesos/Spark). > Any other thoughts on the best way to include packages in Spark WITHOUT > installing on each node would be appreciated! > > John >