Re: Shared memory between C++ process and Spark

Jia Mon, 07 Dec 2015 10:17:15 -0800

Thanks, Dewful!

My impression is that Tachyon is a very nice in-memory file system that can 
connect to multiple storages.
However, because our data is also hold in memory, I suspect that connecting to 
Spark directly may be more efficient in performance.
But definitely I need to look at Tachyon more carefully, in case it has a very 
efficient C++ binding mechanism.


Best Regards,
Jia

On Dec 7, 2015, at 11:46 AM, Dewful <dew...@gmail.com> wrote:

> Maybe looking into something like Tachyon would help, I see some sample c++ 
> bindings, not sure how much of the current functionality they support...
> 
> Hi, Robin, 
> Thanks for your reply and thanks for copying my question to user mailing list.
> Yes, we have a distributed C++ application, that will store data on each node 
> in the cluster, and we hope to leverage Spark to do more fancy analytics on 
> those data. But we need high performance, that’s why we want shared memory.
> Suggestions will be highly appreciated!
> 
> Best Regards,
> Jia
> 
> On Dec 7, 2015, at 10:54 AM, Robin East <robin.e...@xense.co.uk> wrote:
> 
>> -dev, +user (this is not a question about development of Spark itself so 
>> you’ll get more answers in the user mailing list)
>> 
>> First up let me say that I don’t really know how this could be done - I’m 
>> sure it would be possible with enough tinkering but it’s not clear what you 
>> are trying to achieve. Spark is a distributed processing system, it has 
>> multiple JVMs running on different machines that each run a small part of 
>> the overall processing. Unless you have some sort of idea to have multiple 
>> C++ processes collocated with the distributed JVMs using named memory mapped 
>> files doesn’t make architectural sense. 
>> -------------------------------------------------------------------------------
>> Robin East
>> Spark GraphX in Action Michael Malak and Robin East
>> Manning Publications Co.
>> http://www.manning.com/books/spark-graphx-in-action
>> 
>> 
>> 
>> 
>> 
>>> On 6 Dec 2015, at 20:43, Jia <jacqueline...@gmail.com> wrote:
>>> 
>>> Dears, for one project, I need to implement something so Spark can read 
>>> data from a C++ process. 
>>> To provide high performance, I really hope to implement this through shared 
>>> memory between the C++ process and Java JVM process.
>>> It seems it may be possible to use named memory mapped files and JNI to do 
>>> this, but I wonder whether there is any existing efforts or more efficient 
>>> approach to do this?
>>> Thank you very much!
>>> 
>>> Best Regards,
>>> Jia
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>> 
>> 
>

Re: Shared memory between C++ process and Spark

Reply via email to