Re: Efficient Aggregation over DB data

Andrea Esposito Thu, 01 May 2014 09:49:24 -0700

Hi Sai,

i don't sincerely figure out where you are using the RDDs (because the
split method isn't defined in them) by the way you should use the map
function instead of the foreach due the fact it is NOT idempotent and some
partitions could be recomputed executing the function multiple times.


What maybe you are searching is:

val input = sc.textFile(inputFile)
val result= input.flatMap(line => line.split("\\n").map(x =>
x.split("\\s")(2).toInt))
result.max
result.min

result.filter

??

Best,
EA



2014-04-22 11:02 GMT+02:00 Sai Prasanna <ansaiprasa...@gmail.com>:

> Hi All,
>
> I want to access a particular column of a DB table stored in a CSV format
> and perform some aggregate queries over it. I wrote the following query in
> scala as a first step.
>
> *var add=(x:String)=>x.split("\\s+)(2).toInt*
> *var result=List[Int]()*
>
> *input.split("\\n").foreach(x=>result::=add(x)) *
> *[Queries:]result.max/min/filter/sum...*
>
> But is there an efficient way/in-built function to access a particular
> column value or entire column in Spark ? Because built-in implementation
> might be efficient !
>
> Thanks.
>

Re: Efficient Aggregation over DB data

Reply via email to