Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r160572606 --- Diff: python/pyspark/context.py --- @@ -1023,6 +1032,35 @@ def getConf(self): conf.setAll(self._conf.getAll()) return conf + def install_packages(self, packages, install_driver=True): + """ + install python packages on all executors and driver through pip. pip will be installed + by default no matter using native virtualenv or conda. So it is guaranteed that pip is + available if virtualenv is enabled. + :param packages: string for single package or a list of string for multiple packages + :param install_driver: whether to install packages in client + """ + if self._conf.get("spark.pyspark.virtualenv.enabled") != "true": + raise RuntimeError("install_packages can only use called when " + "spark.pyspark.virtualenv.enabled set as true") + if isinstance(packages, basestring): + packages = [packages] + # seems statusTracker.getExecutorInfos() will return driver + exeuctors, so -1 here. + num_executors = len(self._jsc.sc().statusTracker().getExecutorInfos()) - 1 + dummyRDD = self.parallelize(range(num_executors), num_executors) + + def _run_pip(packages, iterator): + import pip + pip.main(["install"] + packages) + + # run it in the main thread. Will do it in a separated thread after + # https://github.com/pypa/pip/issues/2553 is fixed + if install_driver: + _run_pip(packages, None) + + import functools + dummyRDD.foreachPartition(functools.partial(_run_pip, packages)) --- End diff -- It make sense to making this feature as experimental. Because although it is not reliable in some cases, it is still pretty useful in interactive mode, e.g. In notebook, it is not possible to set down all the dependent packages before launching spark app. Installing packages at runtime is very useful for interactive mode. And since usually notebook is a experimental phase, not production phase. These corner cases should be acceptable IMHO as long as we document them and make users aware.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org