If you're going to do it this way, I would ouput dayOfdate.substring(0,7), i.e. the month part, and instead of weatherCond, you can use (month,(minDeg,maxDeg,meanDeg)) --i.e. PairRDD. So weathersRDD: RDD[(String,(Double,Double,Double))]. Then use a reduceByKey as shown in multiple Spark examples..You'd end up with the sum for each metric and in the end divide by the count to get the avg of each column. If you want to use Algebird you can output (month,(Avg(minDeg),Avg(maxDeg),Avg(meanDeg))) and then all your reduce operations would be _+_.
With that said, if you're using spark 1.3 check out https://github.com/databricks/spark-csv (you should likely use the CSV package anyway, even with a lower version of Spark) and https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.DataFrame (esp. the example at the top of the file). You'd just need .groupByand .agg if you setup your dataframe column that you're grouping by to contain just the YYYY-MM portion of your date string. On Mon, Apr 6, 2015 at 10:50 AM, barisak <baris.akg...@gmail.com> wrote: > Hi > > I have a class in above desc. > > case class weatherCond(dayOfdate: String, minDeg: Int, maxDeg: Int, > meanDeg: > Int) > > I am reading the data from csv file and I put this data into weatherCond > class with this code > > val weathersRDD = sc.textFile("weather.csv").map { > line => > val Array(dayOfdate, minDeg, maxDeg, meanDeg) = > line.replaceAll("\"","").trim.split(",") > weatherCond(dayOfdate, minDeg.toInt, maxDeg.toInt, meanDeg.toInt) > } > > the question is ; how can I average the minDeg, maxDeg and meanDeg values > for each month ; > > The data set example > > day, min, max , mean > 2014-03-17,-3,5,5 > 2014-03-18,6,7,7 > 2014-03-19,6,14,10 > > result has to be (2014-03, 3, 8.6 ,7.3) -- (Average for 2014 - 03 > ) > > Thanks > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Avarage-tp22391.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >