different behavior while using createDataFrame and read.df in SparkR

Devesh Raj Singh Thu, 04 Feb 2016 22:45:46 -0800

Hi,

I am using Spark 1.5.1


When I do this

df <- createDataFrame(sqlContext, iris)

#creating a new column for category "Setosa"

df$Species1<-ifelse((df)[[5]]=="setosa",1,0)

head(df)

output: new column created

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

*but when I saved the iris dataset as a CSV file and try to read it and
convert it to sparkR dataframe*

df <- read.df(sqlContext,"/Users/devesh/Github/deveshgit2/bdaml/data/iris/",
              source = "com.databricks.spark.csv",header =
"true",inferSchema = "true")

now when I try to create new column

df$Species1<-ifelse((df)[[5]]=="setosa",1,0)
I get the below error:

16/02/05 12:11:01 ERROR RBackendHandler: col on 922 failed
Error in select(x, x$"*", alias(col, colName)) :
  error in evaluating the argument 'col' in selecting a method for function
'select': Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...)
:
  org.apache.spark.sql.AnalysisException: Cannot resolve column name
"Sepal.Length" among (Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
Species);
at org.apache.spark.s
-- 
Warm regards,
Devesh.

different behavior while using createDataFrame and read.df in SparkR

Reply via email to