On Fri, Jul 11, 2014 at 10:53 PM, bdamos <a...@adobe.com> wrote: > I didn't make it clear in my first message that I want to obtain an RDD > instead > of an Iterable, and will be doing map-reduce like operations on the > data by day. My problem is that groupBy returns an RDD[(K, Iterable[T])], > but I really want an RDD[(K, RDD[T])]. > Is there a better approach to this?
Yeah, you can't have an RDD of RDDs. Why does it need to be an RDD -- because a day could have a huge amount of data? because Scala collections have map and reduce methods and the like too. I think that if you really want RDDs you can just make a series of them, with some code like (start/86400 to end/86400).map(day => (day, rdd.filter(rec => rec.time >= day*86400 && rec.time < (day+1)*86400))) I think that's your solution 1. I don't imagine it's that bad if this is what you need to do.