Thanks for your replies I solved the problem with this code val weathersRDD = sc.textFile(csvfilePath).map { line => val Array(dayOfdate, minDeg, maxDeg, meanDeg) = line.replaceAll("\"","").trim.split(",") Tuple2(dayOfdate.substring(0,7), (minDeg.toInt, maxDeg.toInt, meanDeg.toInt)) }.mapValues(x => (x, 1)).reduceByKey((x, y) => ((x._1._1 + y._1._1, x._1._2 + y._1._2,x._1._3 + y._1._3),x._2 + y._2)) .mapValues{ case ((sumMin,sumMax,sumMean), count) => ((1.0 * sumMin)/count , (1.0 * sumMax)/count, (1.0 * sumMean)/count) }.collectAsMap()
but I will also try Dataframe API thanks again 2015-04-06 13:31 GMT-04:00 Cheng, Hao <hao.ch...@intel.com>: > The Dataframe API should be perfectly helpful in this case. > https://spark.apache.org/docs/1.3.0/sql-programming-guide.html > > Some code snippet will like: > > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > // this is used to implicitly convert an RDD to a DataFrame. > import sqlContext.implicits._ > weathersRDD.toDF.registerTempTable("weathers") > val results = sqlContext.sql("SELECT avg(minDeg), avg(maxDeg), > avg(meanDeg) FROM weathers GROUP BY dayToMonth(dayOfDate))") > results.collect.foreach(println) > > > -----Original Message----- > From: barisak [mailto:baris.akg...@gmail.com] > Sent: Monday, April 6, 2015 10:50 PM > To: user@spark.apache.org > Subject: Spark Avarage > > Hi > > I have a class in above desc. > > case class weatherCond(dayOfdate: String, minDeg: Int, maxDeg: Int, > meanDeg: > Int) > > I am reading the data from csv file and I put this data into weatherCond > class with this code > > val weathersRDD = sc.textFile("weather.csv").map { > line => > val Array(dayOfdate, minDeg, maxDeg, meanDeg) = > line.replaceAll("\"","").trim.split(",") > weatherCond(dayOfdate, minDeg.toInt, maxDeg.toInt, meanDeg.toInt) > } > > the question is ; how can I average the minDeg, maxDeg and meanDeg values > for each month ; > > The data set example > > day, min, max , mean > 2014-03-17,-3,5,5 > 2014-03-18,6,7,7 > 2014-03-19,6,14,10 > > result has to be (2014-03, 3, 8.6 ,7.3) -- (Average for 2014 - 03 > ) > > Thanks > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Avarage-tp22391.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional > commands, e-mail: user-h...@spark.apache.org > >