I guess you could write a custom RDD that can read data from a memory-mapped 
file - not really my area of expertise so I’ll leave it to other members of the 
forum to chip in with comments as to whether that makes sense. 

But if you want ‘fancy analytics’ then won’t the processing time more than 
out-weigh the savings from using memory mapped files? Particularly if your 
analytics involve any kind of aggregation of data across data nodes. Have you 
looked at a Lambda architecture which could involve Spark but doesn’t 
necessarily mean you would go to the trouble of implementing a custom 
memory-mapped file reading feature.
-------------------------------------------------------------------------------
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action 
<http://www.manning.com/books/spark-graphx-in-action>





> On 7 Dec 2015, at 17:32, Jia <jacqueline...@gmail.com> wrote:
> 
> Hi, Robin, 
> Thanks for your reply and thanks for copying my question to user mailing list.
> Yes, we have a distributed C++ application, that will store data on each node 
> in the cluster, and we hope to leverage Spark to do more fancy analytics on 
> those data. But we need high performance, that’s why we want shared memory.
> Suggestions will be highly appreciated!
> 
> Best Regards,
> Jia
> 
> On Dec 7, 2015, at 10:54 AM, Robin East <robin.e...@xense.co.uk 
> <mailto:robin.e...@xense.co.uk>> wrote:
> 
>> -dev, +user (this is not a question about development of Spark itself so 
>> you’ll get more answers in the user mailing list)
>> 
>> First up let me say that I don’t really know how this could be done - I’m 
>> sure it would be possible with enough tinkering but it’s not clear what you 
>> are trying to achieve. Spark is a distributed processing system, it has 
>> multiple JVMs running on different machines that each run a small part of 
>> the overall processing. Unless you have some sort of idea to have multiple 
>> C++ processes collocated with the distributed JVMs using named memory mapped 
>> files doesn’t make architectural sense. 
>> -------------------------------------------------------------------------------
>> Robin East
>> Spark GraphX in Action Michael Malak and Robin East
>> Manning Publications Co.
>> http://www.manning.com/books/spark-graphx-in-action 
>> <http://www.manning.com/books/spark-graphx-in-action>
>> 
>> 
>> 
>> 
>> 
>>> On 6 Dec 2015, at 20:43, Jia <jacqueline...@gmail.com 
>>> <mailto:jacqueline...@gmail.com>> wrote:
>>> 
>>> Dears, for one project, I need to implement something so Spark can read 
>>> data from a C++ process. 
>>> To provide high performance, I really hope to implement this through shared 
>>> memory between the C++ process and Java JVM process.
>>> It seems it may be possible to use named memory mapped files and JNI to do 
>>> this, but I wonder whether there is any existing efforts or more efficient 
>>> approach to do this?
>>> Thank you very much!
>>> 
>>> Best Regards,
>>> Jia
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org 
>>> <mailto:dev-unsubscr...@spark.apache.org>
>>> For additional commands, e-mail: dev-h...@spark.apache.org 
>>> <mailto:dev-h...@spark.apache.org>
>>> 
>> 
> 

Reply via email to