Hi,

but "withColumn" will only add once, if i want to add columns to the same
dataframe in a loop it will keep overwriting the added column and in the
end the last added column( in the loop) will be the added column. like in
my code above.

On Wed, Feb 3, 2016 at 5:05 PM, Franc Carter <franc.car...@gmail.com> wrote:

>
> I had problems doing this as well - I ended up using 'withColumn', it's
> not particularly graceful but it worked (1.5.2 on AWS EMR)
>
> cheerd
>
> On 3 February 2016 at 22:06, Devesh Raj Singh <raj.deves...@gmail.com>
> wrote:
>
>> Hi,
>>
>> i am trying to create dummy variables in sparkR by creating new columns
>> for categorical variables. But it is not appending the columns
>>
>>
>> df <- createDataFrame(sqlContext, iris)
>> class(dtypes(df))
>>
>> cat.column<-vector(mode="character",length=nrow(df))
>> cat.column<-collect(select(df,df$Species))
>> lev<-length(levels(as.factor(unlist(cat.column))))
>> varb.names<-vector(mode="character",length=lev)
>> for (i in 1:lev){
>>
>>   varb.names[i]<-paste0(colnames(cat.column),i)
>>
>> }
>>
>> for (j in 1:lev)
>>
>> {
>>
>>    dummy.df.new<-withColumn(df,paste0(colnames
>>    (cat.column),j),if else(df$Species==levels(as.factor(un
>> list(cat.column))
>>    [j],1,0) )
>>
>> }
>>
>> I am getting the below output for
>>
>> head(dummy.df.new)
>>
>> output:
>>
>>   Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species1
>> 1          5.1         3.5          1.4         0.2  setosa        1
>> 2          4.9         3.0          1.4         0.2  setosa        1
>> 3          4.7         3.2          1.3         0.2  setosa        1
>> 4          4.6         3.1          1.5         0.2  setosa        1
>> 5          5.0         3.6          1.4         0.2  setosa        1
>> 6          5.4         3.9          1.7         0.4  setosa        1
>>
>> Problem: Species2 and Species3 column are not getting added to the
>> dataframe
>>
>> --
>> Warm regards,
>> Devesh.
>>
>
>
>
> --
> Franc
>



-- 
Warm regards,
Devesh.

Reply via email to