Thanks, Robin, you have a very good point!
We feel that the data copy and allocation overhead may become a performance 
bottleneck, and is evaluating it right now.
We will do the shared memory stuff only if we’re sure about the potential 
performance gain and sure that there is no existing stuff in Spark community 
that we can leverage to do this.

Best Regards,
Jia


On Dec 7, 2015, at 11:56 AM, Robin East <robin.e...@xense.co.uk> wrote:

> I guess you could write a custom RDD that can read data from a memory-mapped 
> file - not really my area of expertise so I’ll leave it to other members of 
> the forum to chip in with comments as to whether that makes sense. 
> 
> But if you want ‘fancy analytics’ then won’t the processing time more than 
> out-weigh the savings from using memory mapped files? Particularly if your 
> analytics involve any kind of aggregation of data across data nodes. Have you 
> looked at a Lambda architecture which could involve Spark but doesn’t 
> necessarily mean you would go to the trouble of implementing a custom 
> memory-mapped file reading feature.
> -------------------------------------------------------------------------------
> Robin East
> Spark GraphX in Action Michael Malak and Robin East
> Manning Publications Co.
> http://www.manning.com/books/spark-graphx-in-action
> 
> 
> 
> 
> 
>> On 7 Dec 2015, at 17:32, Jia <jacqueline...@gmail.com> wrote:
>> 
>> Hi, Robin, 
>> Thanks for your reply and thanks for copying my question to user mailing 
>> list.
>> Yes, we have a distributed C++ application, that will store data on each 
>> node in the cluster, and we hope to leverage Spark to do more fancy 
>> analytics on those data. But we need high performance, that’s why we want 
>> shared memory.
>> Suggestions will be highly appreciated!
>> 
>> Best Regards,
>> Jia
>> 
>> On Dec 7, 2015, at 10:54 AM, Robin East <robin.e...@xense.co.uk> wrote:
>> 
>>> -dev, +user (this is not a question about development of Spark itself so 
>>> you’ll get more answers in the user mailing list)
>>> 
>>> First up let me say that I don’t really know how this could be done - I’m 
>>> sure it would be possible with enough tinkering but it’s not clear what you 
>>> are trying to achieve. Spark is a distributed processing system, it has 
>>> multiple JVMs running on different machines that each run a small part of 
>>> the overall processing. Unless you have some sort of idea to have multiple 
>>> C++ processes collocated with the distributed JVMs using named memory 
>>> mapped files doesn’t make architectural sense. 
>>> -------------------------------------------------------------------------------
>>> Robin East
>>> Spark GraphX in Action Michael Malak and Robin East
>>> Manning Publications Co.
>>> http://www.manning.com/books/spark-graphx-in-action
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On 6 Dec 2015, at 20:43, Jia <jacqueline...@gmail.com> wrote:
>>>> 
>>>> Dears, for one project, I need to implement something so Spark can read 
>>>> data from a C++ process. 
>>>> To provide high performance, I really hope to implement this through 
>>>> shared memory between the C++ process and Java JVM process.
>>>> It seems it may be possible to use named memory mapped files and JNI to do 
>>>> this, but I wonder whether there is any existing efforts or more efficient 
>>>> approach to do this?
>>>> Thank you very much!
>>>> 
>>>> Best Regards,
>>>> Jia
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>> 
>>> 
>> 
> 

Reply via email to