Devesh,
Note that DataFrame is immutable. withColumn returns a new DataFrame instead of
adding a column in-pace to the DataFrame being operated.
So, you can modify the for loop like:
for (j in 1:lev)
{
dummy.df.new<-withColumn(df,
paste0(colnames(cat.column),j),
ifelse(df$Species==levels(as.factor(unlist(cat.column)))[j],1,0) )
df<-dummy.df.new
}
As you can see, withColumn supports adding only one column, it may be more
convenient if withColumn supports adding multiple columns at once. There is a
JIRA requesting such feature
(https://issues.apache.org/jira/browse/SPARK-12225) which is still under
discussion. If you desire this feature, you could comment on it.
From: Franc Carter [mailto:[email protected]]
Sent: Wednesday, February 3, 2016 7:40 PM
To: Devesh Raj Singh
Cc: [email protected]
Subject: Re: sparkR not able to create /append new columns
Yes, I didn't work out how to solve that - sorry
On 3 February 2016 at 22:37, Devesh Raj Singh
<[email protected]<mailto:[email protected]>> wrote:
Hi,
but "withColumn" will only add once, if i want to add columns to the same
dataframe in a loop it will keep overwriting the added column and in the end
the last added column( in the loop) will be the added column. like in my code
above.
On Wed, Feb 3, 2016 at 5:05 PM, Franc Carter
<[email protected]<mailto:[email protected]>> wrote:
I had problems doing this as well - I ended up using 'withColumn', it's not
particularly graceful but it worked (1.5.2 on AWS EMR)
cheerd
On 3 February 2016 at 22:06, Devesh Raj Singh
<[email protected]<mailto:[email protected]>> wrote:
Hi,
i am trying to create dummy variables in sparkR by creating new columns for
categorical variables. But it is not appending the columns
df <- createDataFrame(sqlContext, iris)
class(dtypes(df))
cat.column<-vector(mode="character",length=nrow(df))
cat.column<-collect(select(df,df$Species))
lev<-length(levels(as.factor(unlist(cat.column))))
varb.names<-vector(mode="character",length=lev)
for (i in 1:lev){
varb.names[i]<-paste0(colnames(cat.column),i)
}
for (j in 1:lev)
{
dummy.df.new<-withColumn(df,paste0(colnames
(cat.column),j),if else(df$Species==levels(as.factor(un list(cat.column))
[j],1,0) )
}
I am getting the below output for
head(dummy.df.new)
output:
Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species1
1 5.1 3.5 1.4 0.2 setosa 1
2 4.9 3.0 1.4 0.2 setosa 1
3 4.7 3.2 1.3 0.2 setosa 1
4 4.6 3.1 1.5 0.2 setosa 1
5 5.0 3.6 1.4 0.2 setosa 1
6 5.4 3.9 1.7 0.4 setosa 1
Problem: Species2 and Species3 column are not getting added to the dataframe
--
Warm regards,
Devesh.
--
Franc
--
Warm regards,
Devesh.
--
Franc