[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

holdenk Sun, 07 Jan 2018 19:32:33 -0800

Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13599#discussion_r160071888
  
    --- Diff: python/pyspark/context.py ---
    @@ -980,6 +996,33 @@ def getConf(self):
             conf.setAll(self._conf.getAll())
             return conf
     
    +    def install_packages(self, packages, install_driver=True):
    +        """
    +        install python packages on all executors and driver through pip
    +        :param packages: string for single package or a list of string for 
multiple packages
    +        :param install_driver: whether to install packages in client
    +        """
    +        if self._conf.get("spark.pyspark.virtualenv.enabled") != "true":
    +            raise Exception("install_packages can only use called when "
    +                            "spark.pyspark.virtualenv.enabled set as true")
    +        if isinstance(packages, basestring):
    +            packages = [packages]
    +        num_executors = int(self._conf.get("spark.executor.instances"))
    +        dummyRDD = self.parallelize(range(num_executors), num_executors)
    --- End diff --
    
    Right, even without dynamic execution this depend on us contiuing to do 
uniform distribution of data with parallelize which I don't is gauranteed (and 
we have no test which would catch this breaking).



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

Reply via email to