Re: Running a task over a single input

Sean Owen Wed, 28 Jan 2015 02:20:07 -0800

Processing one object isn't a distributed operation, and doesn't
really involve Spark. Just invoke your function on your object in the
driver; there's no magic at all to that.


You can make an RDD of one object and invoke a distributed Spark
operation on it, but assuming you mean you have it on the driver,
that's wasteful. It just copies the object to another machine to
invoke the function.

On Wed, Jan 28, 2015 at 10:14 AM, Matan Safriel <[email protected]> wrote:
> Hi,
>
> How would I run a given function in Spark, over a single input object?
> Would I first add the input to the file system, then somehow invoke the
> Spark function on just that input? or should I rather twist the Spark
> streaming api for it?
>
> Assume I'd like to run a piece of computation that normally runs over a
> large dataset, over just one new added datum. I'm a bit reticent adapting my
> code to Spark without knowing the limits of this scenario.
>
> Many thanks!
> Matan

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Running a task over a single input

Reply via email to