Thanks Shivram. Your suggestion in stack overflow regarding this did work. Thanks again.
Regards, Sourav On Wed, Jul 1, 2015 at 10:21 AM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > You can check my comment below the answer at > http://stackoverflow.com/a/30959388/4577954. BTW we added a new option to > sparkR.init to pass in packages and that should be a part of 1.5 > > Shivaram > > On Wed, Jul 1, 2015 at 10:03 AM, Sourav Mazumder < > sourav.mazumde...@gmail.com> wrote: > >> Hi, >> >> Piggybacking on this discussion. >> >> I'm trying to achieve the same, reading a csv file, from RStudio. Where >> I'm stuck is how to supply some additional package from RStudio to >> spark.init() as sparkR.init does() not provide an option to specify >> additional package. >> >> I tried following codefrom RStudio. It is giving me error "Error in >> callJMethod(sqlContext, "load", source, options) : >> Invalid jobj 1. If SparkR was restarted, Spark operations need to be >> re-executed." >> >> ------ >> Sys.setenv(SPARK_HOME="C:\\spark-1.4.0-bin-hadoop2.6") >> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"),.libPaths())) >> library(SparkR) >> >> sparkR.stop() >> >> sc <- sparkR.init(master="local[2]", sparkEnvir = >> list(spark.executor.memory="1G"), >> sparkJars="C:\\spark-1.4.0-bin-hadoop2.6\\lib\\spark-csv_2.11-1.1.0.jar") >> /* I have downloaded this spark-csv jar and kept it in lib folder of Spark >> */ >> >> sqlContext <- sparkRSQL.init(sc) >> >> plutoMN <- read.df(sqlContext, >> "C:\\Users\\Sourav\\Work\\SparkDataScience\\PlutoMN.csv", source = >> "com.databricks.spark.csv"). >> >> ------ >> >> However, I also tried this from shell as 'sparkR --package >> com.databricks:spark-csv_2.11:1.1.0". This time I used the following code >> and it works all fine. >> >> sqlContext <- sparkRSQL.init(sc) >> >> plutoMN <- read.df(sqlContext, >> "C:\\Users\\Sourav\\Work\\SparkDataScience\\PlutoMN.csv", source = >> "com.databricks.spark.csv"). >> >> Any idea how to achieve the same from RStudio ? >> >> Regards, >> >> >> >> >> On Thu, Jun 25, 2015 at 2:38 PM, Wei Zhou <zhweisop...@gmail.com> wrote: >> >>> I tried out the solution using spark-csv package, and it worked fine now >>> :) Thanks. Yes, I'm playing with a file with all columns as String, but the >>> real data I want to process are all doubles. I'm just exploring what sparkR >>> can do versus regular scala spark, as I am by heart a R person. >>> >>> 2015-06-25 14:26 GMT-07:00 Eskilson,Aleksander <alek.eskil...@cerner.com >>> >: >>> >>>> Sure, I had a similar question that Shivaram was able fast for me, >>>> the solution is implemented using a separate DataBrick’s library. Check out >>>> this thread from the email archives [1], and the read.df() command [2]. CSV >>>> files can be a bit tricky, especially with inferring their schemas. Are you >>>> using just strings as your column types right now? >>>> >>>> Alek >>>> >>>> [1] -- >>>> http://apache-spark-developers-list.1001551.n3.nabble.com/CSV-Support-in-SparkR-td12559.html >>>> [2] -- https://spark.apache.org/docs/latest/api/R/read.df.html >>>> >>>> From: Wei Zhou <zhweisop...@gmail.com> >>>> Date: Thursday, June 25, 2015 at 4:15 PM >>>> To: "shiva...@eecs.berkeley.edu" <shiva...@eecs.berkeley.edu> >>>> Cc: Aleksander Eskilson <alek.eskil...@cerner.com>, " >>>> user@spark.apache.org" <user@spark.apache.org> >>>> Subject: Re: sparkR could not find function "textFile" >>>> >>>> Thanks to both Shivaram and Alek. Then if I want to create DataFrame >>>> from comma separated flat files, what would you recommend me to do? One way >>>> I can think of is first reading the data as you would do in r, using >>>> read.table(), and then create spark DataFrame out of that R dataframe, but >>>> it is obviously not scalable. >>>> >>>> >>>> 2015-06-25 13:59 GMT-07:00 Shivaram Venkataraman < >>>> shiva...@eecs.berkeley.edu>: >>>> >>>>> The `head` function is not supported for the RRDD that is returned by >>>>> `textFile`. You can run `take(lines, 5L)`. I should add a warning here >>>>> that >>>>> the RDD API in SparkR is private because we might not support it in the >>>>> upcoming releases. So if you can use the DataFrame API for your >>>>> application >>>>> you should try that out. >>>>> >>>>> Thanks >>>>> Shivaram >>>>> >>>>> On Thu, Jun 25, 2015 at 1:49 PM, Wei Zhou <zhweisop...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Alek, >>>>>> >>>>>> Just a follow up question. This is what I did in sparkR shell: >>>>>> >>>>>> lines <- SparkR:::textFile(sc, "./README.md") >>>>>> head(lines) >>>>>> >>>>>> And I am getting error: >>>>>> >>>>>> "Error in x[seq_len(n)] : object of type 'S4' is not subsettable" >>>>>> >>>>>> I'm wondering what did I do wrong. Thanks in advance. >>>>>> >>>>>> Wei >>>>>> >>>>>> 2015-06-25 13:44 GMT-07:00 Wei Zhou <zhweisop...@gmail.com>: >>>>>> >>>>>>> Hi Alek, >>>>>>> >>>>>>> Thanks for the explanation, it is very helpful. >>>>>>> >>>>>>> Cheers, >>>>>>> Wei >>>>>>> >>>>>>> 2015-06-25 13:40 GMT-07:00 Eskilson,Aleksander < >>>>>>> alek.eskil...@cerner.com>: >>>>>>> >>>>>>>> Hi there, >>>>>>>> >>>>>>>> The tutorial you’re reading there was written before the merge of >>>>>>>> SparkR for Spark 1.4.0 >>>>>>>> For the merge, the RDD API (which includes the textFile() function) >>>>>>>> was made private, as the devs felt many of its functions were too low >>>>>>>> level. They focused instead on finishing the DataFrame API which >>>>>>>> supports >>>>>>>> local, HDFS, and Hive/HBase file reads. In the meantime, the devs are >>>>>>>> trying to determine which functions of the RDD API, if any, should be >>>>>>>> made >>>>>>>> public again. You can see the rationale behind this decision on the >>>>>>>> issue’s >>>>>>>> JIRA [1]. >>>>>>>> >>>>>>>> You can still make use of those now private RDD functions by >>>>>>>> prepending the function call with the SparkR private namespace, for >>>>>>>> example, you’d use >>>>>>>> SparkR:::textFile(…). >>>>>>>> >>>>>>>> Hope that helps, >>>>>>>> Alek >>>>>>>> >>>>>>>> [1] -- https://issues.apache.org/jira/browse/SPARK-7230 >>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D7230&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=x60a-3ztBe4XOw2bOnEI9-Mc6mENXT8PVxYvsmTLVG8&s=HpX1Cpayu5Mwu9JVt2znimJyUwtV3vcPurUO9ZJhASo&e=> >>>>>>>> >>>>>>>> From: Wei Zhou <zhweisop...@gmail.com> >>>>>>>> Date: Thursday, June 25, 2015 at 3:33 PM >>>>>>>> To: "user@spark.apache.org" <user@spark.apache.org> >>>>>>>> Subject: sparkR could not find function "textFile" >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I am exploring sparkR by activating the shell and following the >>>>>>>> tutorial here https://amplab-extras.github.io/SparkR-pkg/ >>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__amplab-2Dextras.github.io_SparkR-2Dpkg_&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=aL4A2Pv9tHbhgJUX-EnuYx2HntTnrqVpegm6Ag-FwnQ&s=qfOET1UvP0ECAKgnTJw8G13sFTi_PhiJ8Q89fMSgH_Q&e=> >>>>>>>> >>>>>>>> And when I tried to read in a local file with textFile(sc, >>>>>>>> "file_location"), it gives an error could not find function "textFile". >>>>>>>> >>>>>>>> By reading through sparkR doc for 1.4, it seems that we need >>>>>>>> sqlContext to import data, for example. >>>>>>>> >>>>>>>> people <- read.df(sqlContext, >>>>>>>> "./examples/src/main/resources/people.json", "json" >>>>>>>> >>>>>>>> ) >>>>>>>> And we need to specify the file type. >>>>>>>> >>>>>>>> My question is does sparkR stop supporting general type file >>>>>>>> importing? If not, would appreciate any help on how to do this. >>>>>>>> >>>>>>>> PS, I am trying to recreate the word count example in sparkR, and >>>>>>>> want to import README.md file, or just any file into sparkR. >>>>>>>> >>>>>>>> Thanks in advance. >>>>>>>> >>>>>>>> Best, >>>>>>>> Wei >>>>>>>> >>>>>>>> CONFIDENTIALITY NOTICE This message and any included >>>>>>>> attachments are from Cerner Corporation and are intended only for the >>>>>>>> addressee. The information contained in this message is confidential >>>>>>>> and >>>>>>>> may constitute inside or non-public information under international, >>>>>>>> federal, or state securities laws. Unauthorized forwarding, printing, >>>>>>>> copying, distribution, or use of such information is strictly >>>>>>>> prohibited >>>>>>>> and may be unlawful. If you are not the addressee, please promptly >>>>>>>> delete >>>>>>>> this message and notify the sender of the delivery error by e-mail or >>>>>>>> you >>>>>>>> may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at >>>>>>>> (+1) >>>>>>>> (816)221-1024. >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >