If you're going to do it this way, I would ouput dayOfdate.substring(0,7),
i.e. the month part, and instead of weatherCond, you can use
(month,(minDeg,maxDeg,meanDeg)) --i.e. PairRDD. So weathersRDD:
RDD[(String,(Double,Double,Double))]. Then use a reduceByKey as shown in
multiple Spark examples..You'd end up with the sum for each metric and in
the end divide by the count to get the avg of each column. If you want to
use Algebird you can output (month,(Avg(minDeg),Avg(maxDeg),Avg(meanDeg)))
and then all your reduce operations would be _+_.

With that said, if you're using spark 1.3 check out
https://github.com/databricks/spark-csv (you should likely use the CSV
package anyway, even with a lower version of Spark) and
https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.DataFrame
(esp. the example at the top of the file). You'd just need .groupByand .agg
if you setup your dataframe column that you're grouping by to contain just
the YYYY-MM portion of your date string.

On Mon, Apr 6, 2015 at 10:50 AM, barisak <baris.akg...@gmail.com> wrote:

> Hi
>
> I have a class in above desc.
>
> case class weatherCond(dayOfdate: String, minDeg: Int, maxDeg: Int,
> meanDeg:
> Int)
>
> I am reading the data from csv file and I put this data into weatherCond
> class with this code
>
> val weathersRDD = sc.textFile("weather.csv").map {
>       line =>
>         val Array(dayOfdate, minDeg, maxDeg, meanDeg) =
> line.replaceAll("\"","").trim.split(",")
>         weatherCond(dayOfdate, minDeg.toInt, maxDeg.toInt, meanDeg.toInt)
>     }
>
> the question is ; how can I average the minDeg, maxDeg and meanDeg values
> for each month ;
>
> The data set example
>
> day, min, max , mean
> 2014-03-17,-3,5,5
> 2014-03-18,6,7,7
> 2014-03-19,6,14,10
>
> result has to be (2014-03,   3,   8.6   ,7.3)     -- (Average for 2014 - 03
> )
>
> Thanks
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Avarage-tp22391.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to