Re: Possible issue for Spark SQL/DataFrame

Eugene Morozov Wed, 12 Aug 2015 04:51:01 -0700

Cannot tell anything specific about separator as it’s not clear how you create 
schema from schemaString.


Regarding the second issue - that’s expected, because there is a Map there and 
you cannot provide more, than one value for the key. That’s why you see only 
the last “min” value.

This is a javadoc for the agg() function, you can try it this way.
/**
 * Aggregates on the entire [[DataFrame]] without groups.
 * {{
 *   // df.agg(...) is a shorthand for df.groupBy().agg(...)
 *   df.agg(max($"age"), avg($"salary"))
 *   df.groupBy().agg(max($"age"), avg($"salary"))
 * }}
 * @group dfops
 */


On 10 Aug 2015, at 09:36, Netwaver <[email protected]> wrote:

> Hi Spark experts,
>                  I am now using Spark 1.4.1 and trying Spark SQL/DataFrame 
> API with text file in below format
>                         id gender height
>                         1  M  180
>                         2  F   167
>                         ... ...
>                  But I meet issues as described below:
>                  1.  In my test program, I specify the schema 
> programmatically, but when I use "|" as the separator in schema string, the 
> code run into below exception when being executed on the cluster(Standalone)
>                    
> When I use "," as the separator, everything works fine.
>                   2.  In the code, when I use DataFrame.agg() function with 
> same column name is used for different statistics functions(max,min,avg)
>                       val peopleDF = sqlCtx.createDataFrame(rowRDD, schema)
>                       
> peopleDF.filter(peopleDF("gender").equalTo("M")).agg(Map("height" -> 
> "avg","height" -> "max","height" -> "min")).show()      
>                     I just find only the last function's computation result 
> is shown(as below), Does this work as design in Spark?
>                                    
>                  Hopefully I have described the "issue" clearly, and please 
> feel free to correct me if have done something wrong, thanks a lot.
> 
> 

Eugene Morozov
[email protected]

Re: Possible issue for Spark SQL/DataFrame

Reply via email to