D.Y., it depends on what you mean by "multiprocess".

RDD lifecycles are currently limited to a single SparkContext. So to
"share" RDDs you need to somehow access the same SparkContext.

This means one way to share RDDs is to make sure your accessors are in the
same JVM that started the SparkContext.

Another is to make a "server" out of that JVM, and serve up (via
HTTP/THRIFT, etc.) some kind of reference to those RDDs to multiple clients
of that server, even though there is only one SparkContext (held by the
server). We have built a server product using this pattern so I know it can
work well.

--
Christopher T. Nguyen
Co-founder & CEO, Adatao <http://adatao.com>
linkedin.com/in/ctnguyen



On Fri, Jan 24, 2014 at 6:06 PM, D.Y Feng <[email protected]> wrote:

> How can I share the RDD between multiprocess?
>
> --
>
>
> DY.Feng(叶毅锋)
> yyfeng88625@twitter
> Department of Applied Mathematics
> Guangzhou University,China
> [email protected]
>
>

Reply via email to