Hi,
i am trying to create dummy variables in sparkR by creating new columns for
categorical variables. But it is not appending the columns
df <- createDataFrame(sqlContext, iris)
class(dtypes(df))
cat.column<-vector(mode="character",length=nrow(df))
cat.column<-collect(select(df,df$Species))
lev<-length(levels(as.factor(unlist(cat.column))))
varb.names<-vector(mode="character",length=lev)
for (i in 1:lev){
varb.names[i]<-paste0(colnames(cat.column),i)
}
for (j in 1:lev)
{
dummy.df.new<-withColumn(df,paste0(colnames
(cat.column),j),if else(df$Species==levels(as.factor(un list(cat.column))
[j],1,0) )
}
I am getting the below output for
head(dummy.df.new)
output:
Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species1
1 5.1 3.5 1.4 0.2 setosa 1
2 4.9 3.0 1.4 0.2 setosa 1
3 4.7 3.2 1.3 0.2 setosa 1
4 4.6 3.1 1.5 0.2 setosa 1
5 5.0 3.6 1.4 0.2 setosa 1
6 5.4 3.9 1.7 0.4 setosa 1
Problem: Species2 and Species3 column are not getting added to the dataframe
--
Warm regards,
Devesh.