I had problems doing this as well - I ended up using 'withColumn', it's not
particularly graceful but it worked (1.5.2 on AWS EMR)

cheerd

On 3 February 2016 at 22:06, Devesh Raj Singh <raj.deves...@gmail.com>
wrote:

> Hi,
>
> i am trying to create dummy variables in sparkR by creating new columns
> for categorical variables. But it is not appending the columns
>
>
> df <- createDataFrame(sqlContext, iris)
> class(dtypes(df))
>
> cat.column<-vector(mode="character",length=nrow(df))
> cat.column<-collect(select(df,df$Species))
> lev<-length(levels(as.factor(unlist(cat.column))))
> varb.names<-vector(mode="character",length=lev)
> for (i in 1:lev){
>
>   varb.names[i]<-paste0(colnames(cat.column),i)
>
> }
>
> for (j in 1:lev)
>
> {
>
>    dummy.df.new<-withColumn(df,paste0(colnames
>    (cat.column),j),if else(df$Species==levels(as.factor(un
> list(cat.column))
>    [j],1,0) )
>
> }
>
> I am getting the below output for
>
> head(dummy.df.new)
>
> output:
>
>   Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species1
> 1          5.1         3.5          1.4         0.2  setosa        1
> 2          4.9         3.0          1.4         0.2  setosa        1
> 3          4.7         3.2          1.3         0.2  setosa        1
> 4          4.6         3.1          1.5         0.2  setosa        1
> 5          5.0         3.6          1.4         0.2  setosa        1
> 6          5.4         3.9          1.7         0.4  setosa        1
>
> Problem: Species2 and Species3 column are not getting added to the
> dataframe
>
> --
> Warm regards,
> Devesh.
>



-- 
Franc

Reply via email to