Thanks to both Shivaram and Alek. Then if I want to create DataFrame from
comma separated flat files, what would you recommend me to do? One way I
can think of is first reading the data as you would do in r, using
read.table(), and then create spark DataFrame out of that R dataframe, but
it is obviously not scalable.


2015-06-25 13:59 GMT-07:00 Shivaram Venkataraman <shiva...@eecs.berkeley.edu
>:

> The `head` function is not supported for the RRDD that is returned by
> `textFile`. You can run `take(lines, 5L)`. I should add a warning here that
> the RDD API in SparkR is private because we might not support it in the
> upcoming releases. So if you can use the DataFrame API for your application
> you should try that out.
>
> Thanks
> Shivaram
>
> On Thu, Jun 25, 2015 at 1:49 PM, Wei Zhou <zhweisop...@gmail.com> wrote:
>
>> Hi Alek,
>>
>> Just a follow up question. This is what I did in sparkR shell:
>>
>> lines <- SparkR:::textFile(sc, "./README.md")
>> head(lines)
>>
>> And I am getting error:
>>
>> "Error in x[seq_len(n)] : object of type 'S4' is not subsettable"
>>
>> I'm wondering what did I do wrong. Thanks in advance.
>>
>> Wei
>>
>> 2015-06-25 13:44 GMT-07:00 Wei Zhou <zhweisop...@gmail.com>:
>>
>>> Hi Alek,
>>>
>>> Thanks for the explanation, it is very helpful.
>>>
>>> Cheers,
>>> Wei
>>>
>>> 2015-06-25 13:40 GMT-07:00 Eskilson,Aleksander <alek.eskil...@cerner.com
>>> >:
>>>
>>>>  Hi there,
>>>>
>>>>  The tutorial you’re reading there was written before the merge of
>>>> SparkR for Spark 1.4.0
>>>> For the merge, the RDD API (which includes the textFile() function) was
>>>> made private, as the devs felt many of its functions were too low level.
>>>> They focused instead on finishing the DataFrame API which supports local,
>>>> HDFS, and Hive/HBase file reads. In the meantime, the devs are trying to
>>>> determine which functions of the RDD API, if any, should be made public
>>>> again. You can see the rationale behind this decision on the issue’s JIRA
>>>> [1].
>>>>
>>>>  You can still make use of those now private RDD functions by
>>>> prepending the function call with the SparkR private namespace, for
>>>> example, you’d use
>>>> SparkR:::textFile(…).
>>>>
>>>>  Hope that helps,
>>>> Alek
>>>>
>>>>  [1] -- https://issues.apache.org/jira/browse/SPARK-7230
>>>>
>>>>   From: Wei Zhou <zhweisop...@gmail.com>
>>>> Date: Thursday, June 25, 2015 at 3:33 PM
>>>> To: "user@spark.apache.org" <user@spark.apache.org>
>>>> Subject: sparkR could not find function "textFile"
>>>>
>>>>   Hi all,
>>>>
>>>>  I am exploring sparkR by activating the shell and following the
>>>> tutorial here https://amplab-extras.github.io/SparkR-pkg/
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__amplab-2Dextras.github.io_SparkR-2Dpkg_&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=aL4A2Pv9tHbhgJUX-EnuYx2HntTnrqVpegm6Ag-FwnQ&s=qfOET1UvP0ECAKgnTJw8G13sFTi_PhiJ8Q89fMSgH_Q&e=>
>>>>
>>>>  And when I tried to read in a local file with textFile(sc,
>>>> "file_location"), it gives an error could not find function "textFile".
>>>>
>>>>  By reading through sparkR doc for 1.4, it seems that we need
>>>> sqlContext to import data, for example.
>>>>
>>>> people <- read.df(sqlContext, "./examples/src/main/resources/people.json", 
>>>> "json"
>>>>
>>>> )
>>>> And we need to specify the file type.
>>>>
>>>>  My question is does sparkR stop supporting general type file
>>>> importing? If not, would appreciate any help on how to do this.
>>>>
>>>>  PS, I am trying to recreate the word count example in sparkR, and
>>>> want to import README.md file, or just any file into sparkR.
>>>>
>>>>  Thanks in advance.
>>>>
>>>>  Best,
>>>> Wei
>>>>
>>>>    CONFIDENTIALITY NOTICE This message and any included attachments
>>>> are from Cerner Corporation and are intended only for the addressee. The
>>>> information contained in this message is confidential and may constitute
>>>> inside or non-public information under international, federal, or state
>>>> securities laws. Unauthorized forwarding, printing, copying, distribution,
>>>> or use of such information is strictly prohibited and may be unlawful. If
>>>> you are not the addressee, please promptly delete this message and notify
>>>> the sender of the delivery error by e-mail or you may call Cerner's
>>>> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024
>>>> .
>>>>
>>>
>>>
>>
>

Reply via email to