Wes, it depends on what you mean by "sliding window" as related to "RDD":
1. Some operation over multiple rows of data within a single, large RDD,
for which the operations are required to be temporally sequential. This may
be the case where you're computing a running average over historic
So, does that mean that if I want to do a sliding window, then I have to,
in some fashion,
build a stream from the RDD, push a new value on the head, filter out the
oldest value, and
re-persist as an RDD?
On Fri, Dec 6, 2013 at 10:13 PM, Christopher Nguyen wrote:
> Kyle, the fundamental contr
Kyle, many of your design goals are something we also want. Indeed it's
interesting you separate "resilient" from RDD, as I've suggested there
should be ways to boost performance if you're willing to give up some or
all of the "R" guarantees.
We haven't started looking into this yet due to other p
I'd like to use Spark as an analytical stack, the only difference is that I
would like find the best way to connect it to a dataset that I'm actively
working on. Perhaps saying 'updates to an RDD' is a bit of a loaded term, I
don't need the 'resilient', just a distributed data set.
Right now, the b
Kyle, the fundamental contract of a Spark RDD is that it is immutable. This
follows the paradigm where data is (functionally) transformed into other
data, rather than mutated. This allows these systems to make certain
assumptions and guarantees that otherwise they wouldn't be able to.
Now we've be
I'm trying to figure out if I can use an RDD to backend an interactive
server. One of the requirements would be to have incremental updates to
elements in the RDD, ie transforms that change/add/delete a single element
in the RDD.
It seems pretty drastic to do a full RDD filter to remove a single el