Re: Spark streaming vs. spark usage

2014-07-29 Thread andy petrella
Yep,
But RDD/DStream would hardly fit the Monad contract (discussed several
time, and still under discussions here and there ;))
For instance, look at the signature of flatMap in both traits.

Albeit, an RDD that can generates other RDD (flatMap) is rather somethi.g
like a DStream or 'CRDD (@see dev list :P)

My.2c
Andy

--sent from crappy phone
Le 19 déc. 2013 07:09, "Ashish Rangole"  a écrit :

> I wonder if it will help to have a generic Monad container that wraps
> either RDD or DStream and provides
> map, flatmap, foreach and filter methods.
>
> case class DataMonad[A](data: A) {
> def map[B]( f : A => B ) : DataMonad[B] = {
>DataMonad( f( data ) )
> }
>
> def flatMap[B]( f : A => DataMonad[B] ) : DataMonad[B] = {
>f( data )
> }
>
> def foreach ...
> def withFilter ...
> :
> :
> etc, something like that
> }
>
> On Wed, Dec 18, 2013 at 10:42 PM, Reynold Xin  wrote:
>
>>
>> On Wed, Dec 18, 2013 at 12:17 PM, Nathan Kronenfeld <
>> nkronenf...@oculusinfo.com> wrote:
>>
>>>
>>>
>>> Since many of the functions exist in parallel between the two, I guess I
>>> would expect something like:
>>>
>>> trait BasicRDDFunctions {
>>> def map...
>>> def reduce...
>>> def filter...
>>> def foreach...
>>> }
>>>
>>> class RDD extends  BasicRDDFunctions...
>>> class DStream extends BasicRDDFunctions...
>>>
>>
>> I like this idea. We should discuss more about it on the dev list. It
>> would require refactoring some APIs, but does lead to better unification.
>>
>
>


Re: Spark streaming vs. spark usage

2014-07-28 Thread Ankur Dave
On Mon, Jul 28, 2014 at 12:53 PM, Nathan Kronenfeld <
nkronenf...@oculusinfo.com> wrote:

> But when done processing, one would still have to pull out the wrapped
> object, knowing what it was, and I don't see how to do that.


It's pretty tricky to get the level of type safety you're looking for. I
know of two ways:

1. Leave RDD and DStream as they are, but define a typeclass

that
allows converting them to a common DistributedCollection type. Example

.

2. Make RDD and DStream inherit from a common DistributedCollection trait,
as in your example, but use F-bounded polymorphism
 to
express the concrete types. Example

.

Ankur 


Re: Spark streaming vs. spark usage

2014-07-28 Thread Nathan Kronenfeld
So after months and months, I finally started to try and tackle this, but
my scala ability isn't up to it.

The problem is that, of course, even with the common interface, we don't
want inter-operability between RDDs and DStreams.

I looked into Monads, as per Ashish's suggestion, and I think I understand
their relevance.  But when done processing, one would still have to pull
out the wrapped object, knowing what it was, and I don't see how to do that.

I'm guessing there is a way to do this in scala, but I'm not seeing it.

In detail, the requirement would be having something on the order of:

abstract class DistributedCollection[T] {
def [U] map(fcn: T => U): DistributedCollection[U]
...
}

class RDD extends DistrubutedCollection[T] {
// Note the return type that doesn't quite match the interface
def [U] map(fcn: T => U): RDD[U]
...
}

class DStream extends DistrubutedCollection[T] {
// Note the return type that doesn't quite match the interface
def [U] map(fcn: T => U): DStreamU]
...
}

Can anyone point me at a way to do this?

Thanks,
 -Nathan



On Thu, Dec 19, 2013 at 1:08 AM, Ashish Rangole  wrote:

> I wonder if it will help to have a generic Monad container that wraps
> either RDD or DStream and provides
> map, flatmap, foreach and filter methods.
>
> case class DataMonad[A](data: A) {
> def map[B]( f : A => B ) : DataMonad[B] = {
>DataMonad( f( data ) )
> }
>
> def flatMap[B]( f : A => DataMonad[B] ) : DataMonad[B] = {
>f( data )
> }
>
> def foreach ...
> def withFilter ...
> :
> :
> etc, something like that
> }
>
> On Wed, Dec 18, 2013 at 10:42 PM, Reynold Xin  wrote:
>
>>
>> On Wed, Dec 18, 2013 at 12:17 PM, Nathan Kronenfeld <
>> nkronenf...@oculusinfo.com> wrote:
>>
>>>
>>>
>>> Since many of the functions exist in parallel between the two, I guess I
>>> would expect something like:
>>>
>>> trait BasicRDDFunctions {
>>> def map...
>>> def reduce...
>>> def filter...
>>> def foreach...
>>> }
>>>
>>> class RDD extends  BasicRDDFunctions...
>>> class DStream extends BasicRDDFunctions...
>>>
>>
>> I like this idea. We should discuss more about it on the dev list. It
>> would require refactoring some APIs, but does lead to better unification.
>>
>
>


-- 
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  nkronenf...@oculusinfo.com