Hi,

I am new to spark and just going through all different features and
integration projects, so this could be very naive question.

I have requirement where I want to access data stored into other
application. It would be nice if I can share Spark Worker node inside the
same JVM. From one of the docs page (
https://spark.apache.org/docs/latest/job-scheduling.html) it mentions its
not possible and lists different tactics.

*Note that none of the modes currently provide memory sharing across
applications. If you would like to share data this way, we recommend
running a single server application that can serve multiple requests by
querying the same RDDs. For example, the Shark
<http://shark.cs.berkeley.edu> JDBC server works this way for SQL queries.
In future releases, in-memory storage systems such as Tachyon
<http://tachyon-project.org> will provide another approach to share RDDs.*

So I have following questions

1. Can Spark re-use jvms i.e. long living node which have data cached
running different spark tasks originated from different sparkContexts?
2. Can I dictate RDD partitioning so that I can ensure data-locality when
RDD from Spark and Local data is joined?
3. Can worker node be embedded inside an existing JVM?

Thanks,
Regards,
Tushar

Reply via email to