Re: Incremental Updates to an RDD

2013-12-10 Thread Christopher Nguyen
Wes, it depends on what you mean by "sliding window" as related to "RDD": 1. Some operation over multiple rows of data within a single, large RDD, for which the operations are required to be temporally sequential. This may be the case where you're computing a running average over historic

Re: Incremental Updates to an RDD

2013-12-10 Thread Wes Mitchell
So, does that mean that if I want to do a sliding window, then I have to, in some fashion, build a stream from the RDD, push a new value on the head, filter out the oldest value, and re-persist as an RDD? On Fri, Dec 6, 2013 at 10:13 PM, Christopher Nguyen wrote: > Kyle, the fundamental contr

Re: Incremental Updates to an RDD

2013-12-09 Thread Christopher Nguyen
Kyle, many of your design goals are something we also want. Indeed it's interesting you separate "resilient" from RDD, as I've suggested there should be ways to boost performance if you're willing to give up some or all of the "R" guarantees. We haven't started looking into this yet due to other p

Re: Incremental Updates to an RDD

2013-12-09 Thread Kyle Ellrott
I'd like to use Spark as an analytical stack, the only difference is that I would like find the best way to connect it to a dataset that I'm actively working on. Perhaps saying 'updates to an RDD' is a bit of a loaded term, I don't need the 'resilient', just a distributed data set. Right now, the b

Re: Incremental Updates to an RDD

2013-12-06 Thread Christopher Nguyen
Kyle, the fundamental contract of a Spark RDD is that it is immutable. This follows the paradigm where data is (functionally) transformed into other data, rather than mutated. This allows these systems to make certain assumptions and guarantees that otherwise they wouldn't be able to. Now we've be

Incremental Updates to an RDD

2013-12-06 Thread Kyle Ellrott
I'm trying to figure out if I can use an RDD to backend an interactive server. One of the requirements would be to have incremental updates to elements in the RDD, ie transforms that change/add/delete a single element in the RDD. It seems pretty drastic to do a full RDD filter to remove a single el