Cannot tell anything specific about separator as it’s not clear how you create
schema from schemaString.
Regarding the second issue - that’s expected, because there is a Map there and
you cannot provide more, than one value for the key. That’s why you see only
the last “min” value.
This is a javadoc for the agg() function, you can try it this way.
/**
* Aggregates on the entire [[DataFrame]] without groups.
* {{
* // df.agg(...) is a shorthand for df.groupBy().agg(...)
* df.agg(max($"age"), avg($"salary"))
* df.groupBy().agg(max($"age"), avg($"salary"))
* }}
* @group dfops
*/
On 10 Aug 2015, at 09:36, Netwaver <[email protected]> wrote:
> Hi Spark experts,
> I am now using Spark 1.4.1 and trying Spark SQL/DataFrame
> API with text file in below format
> id gender height
> 1 M 180
> 2 F 167
> ... ...
> But I meet issues as described below:
> 1. In my test program, I specify the schema
> programmatically, but when I use "|" as the separator in schema string, the
> code run into below exception when being executed on the cluster(Standalone)
>
> When I use "," as the separator, everything works fine.
> 2. In the code, when I use DataFrame.agg() function with
> same column name is used for different statistics functions(max,min,avg)
> val peopleDF = sqlCtx.createDataFrame(rowRDD, schema)
>
> peopleDF.filter(peopleDF("gender").equalTo("M")).agg(Map("height" ->
> "avg","height" -> "max","height" -> "min")).show()
> I just find only the last function's computation result
> is shown(as below), Does this work as design in Spark?
>
> Hopefully I have described the "issue" clearly, and please
> feel free to correct me if have done something wrong, thanks a lot.
>
>
Eugene Morozov
[email protected]