On Fri, Jul 11, 2014 at 10:53 PM, bdamos <a...@adobe.com> wrote:
> I didn't make it clear in my first message that I want to obtain an RDD
> instead
> of an Iterable, and will be doing map-reduce like operations on the
> data by day. My problem is that groupBy returns an RDD[(K, Iterable[T])],
> but I really want an RDD[(K, RDD[T])].
> Is there a better approach to this?

Yeah, you can't have an RDD of RDDs. Why does it need to be an RDD --
because a day could have a huge amount of data? because Scala
collections have map and reduce methods and the like too.

I think that if you really want RDDs you can just make a series of
them, with some code like

(start/86400 to end/86400).map(day => (day, rdd.filter(rec => rec.time
>= day*86400 && rec.time < (day+1)*86400)))

I think that's your solution 1. I don't imagine it's that bad if this
is what you need to do.

Reply via email to