Hi Sai, i don't sincerely figure out where you are using the RDDs (because the split method isn't defined in them) by the way you should use the map function instead of the foreach due the fact it is NOT idempotent and some partitions could be recomputed executing the function multiple times.
What maybe you are searching is: val input = sc.textFile(inputFile) val result= input.flatMap(line => line.split("\\n").map(x => x.split("\\s")(2).toInt)) result.max result.min result.filter ?? Best, EA 2014-04-22 11:02 GMT+02:00 Sai Prasanna <ansaiprasa...@gmail.com>: > Hi All, > > I want to access a particular column of a DB table stored in a CSV format > and perform some aggregate queries over it. I wrote the following query in > scala as a first step. > > *var add=(x:String)=>x.split("\\s+)(2).toInt* > *var result=List[Int]()* > > *input.split("\\n").foreach(x=>result::=add(x)) * > *[Queries:]result.max/min/filter/sum...* > > But is there an efficient way/in-built function to access a particular > column value or entire column in Spark ? Because built-in implementation > might be efficient ! > > Thanks. >