Hi Spark experts,
I am now using Spark 1.4.1 and trying Spark SQL/DataFrame API
with text file in below format
id gender height
1 M 180
2 F 167
... ...
But I meet issues as described below:
1. In my test program, I specify the schema programmatically,
but when I use "|" as the separator in schema string, the code run into below
exception when being executed on the cluster(Standalone)
When I use "," as the separator, everything works fine.
2. In the code, when I use DataFrame.agg() function with
same column name is used for different statistics functions(max,min,avg)
valpeopleDF = sqlCtx.createDataFrame(rowRDD, schema)
peopleDF.filter(peopleDF("gender").equalTo("M")).agg(Map("height" ->
"avg","height" -> "max","height" -> "min")).show()
I just find only the last function's computation result is
shown(as below), Does this work as design in Spark?
Hopefully I have described the "issue" clearly, and please
feel free to correct me if have done something wrong, thanks a lot.