Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13599#discussion_r122558038
  
    --- Diff: python/pyspark/context.py ---
    @@ -980,6 +996,33 @@ def getConf(self):
             conf.setAll(self._conf.getAll())
             return conf
     
    +    def install_packages(self, packages, install_driver=True):
    +        """
    +        install python packages on all executors and driver through pip
    +        :param packages: string for single package or a list of string for 
multiple packages
    +        :param install_driver: whether to install packages in client
    +        """
    +        if self._conf.get("spark.pyspark.virtualenv.enabled") != "true":
    +            raise Exception("install_packages can only use called when "
    +                            "spark.pyspark.virtualenv.enabled set as true")
    +        if isinstance(packages, basestring):
    +            packages = [packages]
    +        num_executors = int(self._conf.get("spark.executor.instances"))
    +        dummyRDD = self.parallelize(range(num_executors), num_executors)
    --- End diff --
    
    This is not guaranteed to work, and overlooks the situation of having 
executors added.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to