Hi, I have written a code to create dummy variables in sparkR
df <- createDataFrame(sqlContext, iris) class(dtypes(df)) cat.column<-vector(mode="character",length=nrow(df)) cat.column<-collect(select(df,df$Species)) lev<-length(levels(as.factor(unlist(cat.column)))) for (j in 1:lev){ dummy.df.new<-withColumn(df,paste0(colnames(cat.column),j),ifelse(df$Species==levels(as.factor(unlist(cat.column)))[j],1,0) ) df<-dummy.df.new } *head(df): gives me the desired output:* Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species1 Species2 Species3 1 5.1 3.5 1.4 0.2 setosa 1 0 0 2 4.9 3.0 1.4 0.2 setosa 1 0 0 3 4.7 3.2 1.3 0.2 setosa 1 0 0 4 4.6 3.1 1.5 0.2 setosa 1 0 0 5 5.0 3.6 1.4 0.2 setosa 1 0 0 6 5.4 3.9 1.7 0.4 setosa 1 0 0 *But the same thing when I try to do by creating a function * # x= dataframe$x, categorical column within the dataframe # dataframe=sparkR dataframe dummyhandle<-function(dataframe,x){ cat.column<-vector(mode="character",length=nrow(dataframe)) cat.column<-collect(select(dataframe,x)) lev<-length(levels(as.factor(unlist(cat.column)))) for (j in 1:lev){ dummy.df<-withColumn(dataframe,paste0(colnames(cat.column),j),ifelse(x==levels(as.factor(unlist(cat.column)))[j],1,0) ) dataframe<-dummy.df } return(dataframe) } *throws the following error:* Error in withColumn(dataframe, paste0(colnames(cat.column), j), ifelse(x == : error in evaluating the argument 'col' in selecting a method for function 'withColumn': Error in if (le > 0) paste0("[1:", paste(le), "]") else "(0)" : argument is not interpretable as logical -- Warm regards, Devesh.