If you already loaded csv data into a dataframe, why not register it as a
table, and use Spark SQL
to find max/min or any other aggregates? SELECT MAX(column_name) FROM
dftable_name ... seems natural.

                                                                                
                                                              
                                                                                
                                                              
                                                                                
                                                              
                                                                                
                                                              
                                                                                
                                                              
                                                                                
                                                              
                                   JESSE CHEN                                   
                                                              
                                   Big Data Performance | IBM Analytics         
                                                              
                                                                                
                                                              
                                   Office:  408 463 2296                        
                                                              
                                   Mobile: 408 828 9068                         
                                                              
                                   Email:   jfc...@us.ibm.com                   
                                                              
                                                                                
                                                              
                                                                                
                                                              






From:   ashensw <as...@wso2.com>
To:     user@spark.apache.org
Date:   08/28/2015 05:40 AM
Subject:        Calculating Min and Max Values using Spark Transformations?



Hi all,

I have a dataset which consist of large number of features(columns). It is
in csv format. So I loaded it into a spark dataframe. Then I converted it
into a JavaRDD<Row> Then using a spark transformation I converted that into
JavaRDD<String[]>. Then again converted it into a JavaRDD<double[]>. So now
I have a JavaRDD<double[]>. So is there any method to calculate max and min
values of each columns in this JavaRDD<double[]> ?

Or Is there any way to access the array if I store max and min values to a
array inside the spark transformation class?

Thanks.



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Calculating-Min-and-Max-Values-using-Spark-Transformations-tp24491.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to