Thanks for your replies I solved the problem with this code

val weathersRDD = sc.textFile(csvfilePath).map {
  line =>
    val Array(dayOfdate, minDeg, maxDeg, meanDeg) =
line.replaceAll("\"","").trim.split(",")
    Tuple2(dayOfdate.substring(0,7), (minDeg.toInt, maxDeg.toInt,
meanDeg.toInt))
}.mapValues(x => (x, 1)).reduceByKey((x, y) => ((x._1._1 + y._1._1,
x._1._2 + y._1._2,x._1._3 + y._1._3),x._2 + y._2))
.mapValues{ case ((sumMin,sumMax,sumMean), count) => ((1.0 *
sumMin)/count , (1.0 * sumMax)/count, (1.0 * sumMean)/count)
}.collectAsMap()


but I will also try Dataframe API

thanks again



2015-04-06 13:31 GMT-04:00 Cheng, Hao <hao.ch...@intel.com>:

> The Dataframe API should be perfectly helpful in this case.
> https://spark.apache.org/docs/1.3.0/sql-programming-guide.html
>
> Some code snippet will like:
>
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> // this is used to implicitly convert an RDD to a DataFrame.
> import sqlContext.implicits._
> weathersRDD.toDF.registerTempTable("weathers")
> val results = sqlContext.sql("SELECT avg(minDeg), avg(maxDeg),
> avg(meanDeg) FROM weathers GROUP BY dayToMonth(dayOfDate))")
> results.collect.foreach(println)
>
>
> -----Original Message-----
> From: barisak [mailto:baris.akg...@gmail.com]
> Sent: Monday, April 6, 2015 10:50 PM
> To: user@spark.apache.org
> Subject: Spark Avarage
>
> Hi
>
> I have a class in above desc.
>
> case class weatherCond(dayOfdate: String, minDeg: Int, maxDeg: Int,
> meanDeg:
> Int)
>
> I am reading the data from csv file and I put this data into weatherCond
> class with this code
>
> val weathersRDD = sc.textFile("weather.csv").map {
>       line =>
>         val Array(dayOfdate, minDeg, maxDeg, meanDeg) =
> line.replaceAll("\"","").trim.split(",")
>         weatherCond(dayOfdate, minDeg.toInt, maxDeg.toInt, meanDeg.toInt)
>     }
>
> the question is ; how can I average the minDeg, maxDeg and meanDeg values
> for each month ;
>
> The data set example
>
> day, min, max , mean
> 2014-03-17,-3,5,5
> 2014-03-18,6,7,7
> 2014-03-19,6,14,10
>
> result has to be (2014-03,   3,   8.6   ,7.3)     -- (Average for 2014 - 03
> )
>
> Thanks
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Avarage-tp22391.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
> commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to