Possible issue for Spark SQL/DataFrame

Netwaver Sun, 09 Aug 2015 23:45:44 -0700
Hi Spark experts,
                 I am now using Spark 1.4.1 and trying Spark SQL/DataFrame API 
with text file in below format
                        id gender height
                        1  M  180
                        2  F   167
                        ... ...
                 But I meet issues as described below:
                 1.  In my test program, I specify the schema programmatically, 
but when I use "|" as the separator in schema string, the code run into below 
exception when being executed on the cluster(Standalone)
                  
                   When I use "," as the separator, everything works fine.
                  2.  In the code, when I use DataFrame.agg() function with 
same column name is used for different statistics functions(max,min,avg)
                      valpeopleDF = sqlCtx.createDataFrame(rowRDD, schema)
                      
peopleDF.filter(peopleDF("gender").equalTo("M")).agg(Map("height" -> 
"avg","height" -> "max","height" -> "min")).show()     
                    I just find only the last function's computation result is 
shown(as below), Does this work as design in Spark?
                                 
                 Hopefully I have described the "issue" clearly, and please 
feel free to correct me if have done something wrong, thanks a lot.
Possible issue for Spark SQL/DataFrame

Reply via email to