Update: the job can finish, but takes a long time on a 10M row data. is there a better solution? From: xing_ma...@hotmail.com To: user@spark.apache.org Subject: as.Date can't be applied to Spark data frame in SparkR Date: Tue, 20 Sep 2016 10:22:17 +0800
Hi, all I've noticed that as.Date can't be applied to Spark data frame. I've created the following UDF and used dapply to change a integer column "aa" to a date with origin as 1960-01-01. change_date<-function(df){ df<-as.POSIXlt(as.Date(df$aa, origin = "1960-01-01", tz = "UTC")) } customSchema<- structType(structField("rc", "integer"), .... structField("change_date(x)","timestamp")) rollup_1_t <- dapply(rollup_1, function(x) { x <- cbind(x,change_date(x))},schema=customSchema) It works with a small dataset but it takes forever to finish on a big dataset. It does not give a result when I used 'head(rollup_1_t). I guess it is because for "change_date" function, it converts the spark data frame back to R data frame, which is slow and would potentially fail. Is there a better solution? Thanks,Ye