I had problems doing this as well - I ended up using 'withColumn', it's not particularly graceful but it worked (1.5.2 on AWS EMR)
cheerd On 3 February 2016 at 22:06, Devesh Raj Singh <raj.deves...@gmail.com> wrote: > Hi, > > i am trying to create dummy variables in sparkR by creating new columns > for categorical variables. But it is not appending the columns > > > df <- createDataFrame(sqlContext, iris) > class(dtypes(df)) > > cat.column<-vector(mode="character",length=nrow(df)) > cat.column<-collect(select(df,df$Species)) > lev<-length(levels(as.factor(unlist(cat.column)))) > varb.names<-vector(mode="character",length=lev) > for (i in 1:lev){ > > varb.names[i]<-paste0(colnames(cat.column),i) > > } > > for (j in 1:lev) > > { > > dummy.df.new<-withColumn(df,paste0(colnames > (cat.column),j),if else(df$Species==levels(as.factor(un > list(cat.column)) > [j],1,0) ) > > } > > I am getting the below output for > > head(dummy.df.new) > > output: > > Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species1 > 1 5.1 3.5 1.4 0.2 setosa 1 > 2 4.9 3.0 1.4 0.2 setosa 1 > 3 4.7 3.2 1.3 0.2 setosa 1 > 4 4.6 3.1 1.5 0.2 setosa 1 > 5 5.0 3.6 1.4 0.2 setosa 1 > 6 5.4 3.9 1.7 0.4 setosa 1 > > Problem: Species2 and Species3 column are not getting added to the > dataframe > > -- > Warm regards, > Devesh. > -- Franc