D.Y., it depends on what you mean by "multiprocess". RDD lifecycles are currently limited to a single SparkContext. So to "share" RDDs you need to somehow access the same SparkContext.
This means one way to share RDDs is to make sure your accessors are in the same JVM that started the SparkContext. Another is to make a "server" out of that JVM, and serve up (via HTTP/THRIFT, etc.) some kind of reference to those RDDs to multiple clients of that server, even though there is only one SparkContext (held by the server). We have built a server product using this pattern so I know it can work well. -- Christopher T. Nguyen Co-founder & CEO, Adatao <http://adatao.com> linkedin.com/in/ctnguyen On Fri, Jan 24, 2014 at 6:06 PM, D.Y Feng <[email protected]> wrote: > How can I share the RDD between multiprocess? > > -- > > > DY.Feng(叶毅锋) > yyfeng88625@twitter > Department of Applied Mathematics > Guangzhou University,China > [email protected] > >
