I am not sure what's wrong. maybe you can ssh to that machine and run this r script manually first to verify what's wrong.
Lian Jiang <jiangok2...@gmail.com>于2018年8月30日周四 上午10:34写道: > Jeff, > > R is installed on namenode and all data nodes. The R packages have been > copied to them all too. I am not sure if an R script launched by pyspark's > subprocess > can access spark context or not. If not, using addFiles to add R packages > into spark context will not help test.r install the packages. Thanks for > clue. > > > > On Wed, Aug 29, 2018 at 7:24 PM Jeff Zhang <zjf...@gmail.com> wrote: > >> >> You need to make sure the spark driver machine have this package >> installed. And since you are using yarn-cluster mode via livy, you have to >> install this packages on all nodes because the spark driver could be >> launched in any node of this cluster. >> >> >> >> Lian Jiang <jiangok2...@gmail.com>于2018年8月30日周四 上午1:46写道: >> >>> After calling a sample R script, we found another issue when running a >>> real R script. This R script failed to load changepoint library. >>> >>> I tried: >>> >>> %livy2.sparkr >>> install.packages("changepoint", repos="file:///mnt/data/tmp/r") >>> library(changepoint) // I see "Successfully loaded changepoint package >>> version 2.2.2" >>> >>> %livy2.pyspark >>> from pyspark import SparkFiles >>> import subprocess >>> >>> sc.addFile("hdfs:///user/zeppelin/test.r") >>> testpath = SparkFiles.get('test.r') >>> stdoutdata = subprocess.getoutput("Rscript " + testpath) >>> print(stdoutdata) >>> >>> The error: Error in library(changepoint) : there is no package called >>> ‘changepoint’ >>> >>> test.r is simply: >>> >>> library(changepoint) >>> >>> Any idea how to make changepoint available for the R script? Thanks. >>> >>> >>> >>> On Tue, Aug 28, 2018 at 10:07 PM Lian Jiang <jiangok2...@gmail.com> >>> wrote: >>> >>>> Thanks Jeff. >>>> >>>> This worked: >>>> >>>> %livy2.pyspark >>>> from pyspark import SparkFiles >>>> import subprocess >>>> >>>> sc.addFile("hdfs:///user/zeppelin/ocic/test.r") >>>> testpath = SparkFiles.get('test.r') >>>> stdoutdata = subprocess.getoutput("Rscript " + testpath) >>>> print(stdoutdata) >>>> >>>> Cheers! >>>> >>>> On Tue, Aug 28, 2018 at 6:09 PM Jeff Zhang <zjf...@gmail.com> wrote: >>>> >>>>> Do you run it under yarn-cluster mode ? Then you must ensure your >>>>> rscript shipped to that driver (via sc.addFile or setting >>>>> livy.spark.files) >>>>> >>>>> And also you need to make sure you have R installed in all hosts of >>>>> yarn cluster because the driver may run any node of this cluster. >>>>> >>>>> >>>>> >>>>> Lian Jiang <jiangok2...@gmail.com>于2018年8月29日周三 上午1:35写道: >>>>> >>>>>> Thanks Lucas. We tried and got the same error. Below is the code: >>>>>> >>>>>> %livy2.pyspark >>>>>> import subprocess >>>>>> sc.addFile("hdfs:///user/zeppelin/test.r") >>>>>> stdoutdata = subprocess.getoutput("Rscript test.r") >>>>>> print(stdoutdata) >>>>>> >>>>>> Fatal error: cannot open file 'test.r': No such file or directory >>>>>> >>>>>> >>>>>> sc.addFile adds test.r to spark context. However, subprocess does not >>>>>> use spark context. >>>>>> >>>>>> Hdfs path does not work either: subprocess.getoutput("Rscript >>>>>> hdfs:///user/zeppelin/test.r") >>>>>> >>>>>> Any idea how to make python call R script? Appreciate! >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) < >>>>>> lucas.partri...@ge.com> wrote: >>>>>> >>>>>>> Have you tried SparkContext.addFile() (not addPyFile()) to add your >>>>>>> R script? >>>>>>> >>>>>>> >>>>>>> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile >>>>>>> >>>>>>> >>>>>>> >>>>>>> *From:* Lian Jiang <jiangok2...@gmail.com> >>>>>>> *Sent:* 27 August 2018 22:42 >>>>>>> *To:* users@zeppelin.apache.org >>>>>>> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> >>>>>>> >>>>>>> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter >>>>>>> notebooks to Zeppelin. One issue we came across is that a python script >>>>>>> calling R script does not work in Zeppelin. >>>>>>> >>>>>>> >>>>>>> >>>>>>> %livy2.pyspark >>>>>>> >>>>>>> import os >>>>>>> >>>>>>> sc.addPyFile("hdfs:///user/zeppelin/my.py") >>>>>>> >>>>>>> import my >>>>>>> >>>>>>> my.test() >>>>>>> >>>>>>> >>>>>>> >>>>>>> my.test() calls R script like: ['Rscript', 'myR.r'] >>>>>>> >>>>>>> >>>>>>> >>>>>>> Fatal error: cannot open file 'myR.r': No such file or directory >>>>>>> >>>>>>> >>>>>>> >>>>>>> When running this notebook in jupyter, both my.py and myR.r exist in >>>>>>> the same folder. I understand the story changes on hadoop because the >>>>>>> scripts run in containers. >>>>>>> >>>>>>> >>>>>>> >>>>>>> My question: >>>>>>> >>>>>>> Is this scenario supported in zeppelin? How to add a R script into a >>>>>>> python spark context so that the Python script can find the R script? >>>>>>> Appreciate! >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>