Re: SparkR - calling as.vector() with rdd dataframe causes error

Ellen Kraffmiller Mon, 21 Sep 2015 08:48:48 -0700

Thank you for the link! I was using
http://apache-spark-user-list.1001560.n3.nabble.com/, and I didn't see
replies there.


Regarding your code example, I'm doing the same thing and successfully
creating the rdd, but the problem is that when I call a clustering
algorithm like amap::hcluster(), I get an error from as.vector() that the
rdd cannot be coerced into a vector.

On Fri, Sep 18, 2015 at 12:33 PM, Luciano Resende <luckbr1...@gmail.com>
wrote:

> I see the thread with all the responses on the bottom at mail-archive :
>
> https://www.mail-archive.com/user%40spark.apache.org/msg36882.html
>
> On Fri, Sep 18, 2015 at 7:58 AM, Ellen Kraffmiller <
> ellen.kraffmil...@gmail.com> wrote:
>
>> Thanks for your response.  Is there a reason why this thread isn't
>> appearing on the mailing list?  So far, I only see my post, with no
>> answers, although I have received 2 answers via email.  It would be nice if
>> other people could see these answers as well.
>>
>> On Thu, Sep 17, 2015 at 2:22 AM, Sun, Rui <rui....@intel.com> wrote:
>>
>>> The existing algorithms operating on R data.frame can't simply operate
>>> on SparkR DataFrame. They have to be re-implemented to be based on SparkR
>>> DataFrame API.
>>>
>>> -----Original Message-----
>>> From: ekraffmiller [mailto:ellen.kraffmil...@gmail.com]
>>> Sent: Thursday, September 17, 2015 3:30 AM
>>> To: user@spark.apache.org
>>> Subject: SparkR - calling as.vector() with rdd dataframe causes error
>>>
>>> Hi,
>>> I have a library of clustering algorithms that I'm trying to run in the
>>> SparkR interactive shell. (I am working on a proof of concept for a
>>> document classification tool.) Each algorithm takes a term document matrix
>>> in the form of a dataframe.  When I pass the method a local dataframe, the
>>> clustering algorithm works correctly, but when I pass it a spark rdd, it
>>> gives an error trying to coerce the data into a vector.  Here is the code,
>>> that I'm calling within SparkR:
>>>
>>> # get matrix from a file
>>> file <-
>>>
>>> "/Applications/spark-1.5.0-bin-hadoop2.6/examples/src/main/resources/matrix.csv"
>>>
>>> #read it into variable
>>>  raw_data <- read.csv(file,sep=',',header=FALSE)
>>>
>>> #convert to a local dataframe
>>> localDF = data.frame(raw_data)
>>>
>>> # create the rdd
>>> rdd  <- createDataFrame(sqlContext,localDF)
>>>
>>> #call the algorithm with the localDF - this works result <-
>>> galileo(localDF, model='hclust',dist='euclidean',link='ward',K=5)
>>>
>>> #call with the rdd - this produces error result <- galileo(rdd,
>>> model='hclust',dist='euclidean',link='ward',K=5)
>>>
>>> Error in as.vector(data) :
>>>   no method for coercing this S4 class to a vector
>>>
>>>
>>> I get the same error if I try to directly call as.vector(rdd) as well.
>>>
>>> Is there a reason why this works for localDF and not rdd?  Should I be
>>> doing something else to coerce the object into a vector?
>>>
>>> Thanks,
>>> Ellen
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-calling-as-vector-with-rdd-dataframe-causes-error-tp24717.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For
>>> additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>

Re: SparkR - calling as.vector() with rdd dataframe causes error

Reply via email to