The only way I can think of is through some kind of wrapper. For java/scala,
use JNI. For Python, use extensions. There should not be a lot of work if you
know these tools.
From: Robin East
To: Annabel Melongo
Cc: Jia ; Dewful ; "user @spark"
; "d...@spark.apache.org"
Sent: Monday, December 7, 2015 10:57 AM
Subject: Re: Shared memory between C++ process and Spark
Annabel
Spark works very well with data stored in HDFS but is certainly not tied to it.
Have a look at the wide variety of connectors to things like Cassandra, HBase,
etc.
Robin
Sent from my iPhone
On 7 Dec 2015, at 18:50, Annabel Melongo wrote:
Jia,
I'm so confused on this. The architecture of Spark is to run on top of HDFS.
What you're requesting, reading and writing to a C++ process, is not part of
that requirement.
On Monday, December 7, 2015 1:42 PM, Jia wrote:
Thanks, Annabel, but I may need to clarify that I have no intention to write
and run Spark UDF in C++, I'm just wondering whether Spark can read and write
data to a C++ process with zero copy.
Best Regards,Jia
On Dec 7, 2015, at 12:26 PM, Annabel Melongo wrote:
My guess is that Jia wants to run C++ on top of Spark. If that's the case, I'm
afraid this is not possible. Spark has support for Java, Python, Scala and R.
The best way to achieve this is to run your application in C++ and used the
data created by said application to do manipulation within Spark.
On Monday, December 7, 2015 1:15 PM, Jia wrote:
Thanks, Dewful!
My impression is that Tachyon is a very nice in-memory file system that can
connect to multiple storages.However, because our data is also hold in memory,
I suspect that connecting to Spark directly may be more efficient in
performance.But definitely I need to look at Tachyon more carefully, in case it
has a very efficient C++ binding mechanism.
Best Regards,Jia
On Dec 7, 2015, at 11:46 AM, Dewful wrote:
Maybe looking into something like Tachyon would help, I see some sample c++
bindings, not sure how much of the current functionality they support...Hi,
Robin, Thanks for your reply and thanks for copying my question to user mailing
list.Yes, we have a distributed C++ application, that will store data on each
node in the cluster, and we hope to leverage Spark to do more fancy analytics
on those data. But we need high performance, that’s why we want shared
memory.Suggestions will be highly appreciated!
Best Regards,Jia
On Dec 7, 2015, at 10:54 AM, Robin East wrote:
-dev, +user (this is not a question about development of Spark itself so you’ll
get more answers in the user mailing list)
First up let me say that I don’t really know how this could be done - I’m sure
it would be possible with enough tinkering but it’s not clear what you are
trying to achieve. Spark is a distributed processing system, it has multiple
JVMs running on different machines that each run a small part of the overall
processing. Unless you have some sort of idea to have multiple C++ processes
collocated with the distributed JVMs using named memory mapped files doesn’t
make architectural sense.
---Robin
EastSpark GraphX in Action Michael Malak and Robin EastManning Publications
Co.http://www.manning.com/books/spark-graphx-in-action
On 6 Dec 2015, at 20:43, Jia wrote:
Dears, for one project, I need to implement something so Spark can read data
from a C++ process.
To provide high performance, I really hope to implement this through shared
memory between the C++ process and Java JVM process.
It seems it may be possible to use named memory mapped files and JNI to do
this, but I wonder whether there is any existing efforts or more efficient
approach to do this?
Thank you very much!
Best Regards,
Jia
-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org