Update:
the job can finish, but takes a long time on a 10M row data. is there a better 
solution?
From: xing_ma...@hotmail.com
To: user@spark.apache.org
Subject: as.Date can't be applied to Spark data frame in SparkR
Date: Tue, 20 Sep 2016 10:22:17 +0800




Hi, all
I've noticed that as.Date can't be applied to Spark data frame. I've created 
the following UDF and used dapply to change a integer column "aa"  to a date 
with origin as 1960-01-01.
change_date<-function(df){   df<-as.POSIXlt(as.Date(df$aa, origin = 
"1960-01-01", tz = "UTC"))   } customSchema<- structType(structField("rc", 
"integer"),   ....   structField("change_date(x)","timestamp"))
rollup_1_t <- dapply(rollup_1, function(x) { x <- 
cbind(x,change_date(x))},schema=customSchema)
It works with a small dataset but it takes forever to finish on a big dataset. 
It does not give a result when I used 'head(rollup_1_t).
 I guess it is because for "change_date" function, it converts the spark data 
frame back to R data frame, which is slow and would potentially fail. Is there 
a better solution?
Thanks,Ye                                                                       
          

Reply via email to