from:"Jian Feng"

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Jian Feng

The only way I can think of is through some kind of wrapper. For java/scala, 
use JNI. For Python, use extensions. There should not be a lot of work if you 
know these tools. 

  From: Robin East 
 To: Annabel Melongo  
Cc: Jia ; Dewful ; "user @spark" 
; "d...@spark.apache.org" 
 Sent: Monday, December 7, 2015 10:57 AM
 Subject: Re: Shared memory between C++ process and Spark

Annabel
Spark works very well with data stored in HDFS but is certainly not tied to it. 
Have a look at the wide variety of connectors to things like Cassandra, HBase, 
etc.
Robin

Sent from my iPhone
On 7 Dec 2015, at 18:50, Annabel Melongo  wrote:

Jia,
I'm so confused on this. The architecture of Spark is to run on top of HDFS. 
What you're requesting, reading and writing to a C++ process, is not part of 
that requirement.

On Monday, December 7, 2015 1:42 PM, Jia  wrote:

 Thanks, Annabel, but I may need to clarify that I have no intention to write 
and run Spark UDF in C++, I'm just wondering whether Spark can read and write 
data to a C++ process with zero copy.
Best Regards,Jia 

On Dec 7, 2015, at 12:26 PM, Annabel Melongo  wrote:

My guess is that Jia wants to run C++ on top of Spark. If that's the case, I'm 
afraid this is not possible. Spark has support for Java, Python, Scala and R.
The best way to achieve this is to run your application in C++ and used the 
data created by said application to do manipulation within Spark. 

On Monday, December 7, 2015 1:15 PM, Jia  wrote:

 Thanks, Dewful!
My impression is that Tachyon is a very nice in-memory file system that can 
connect to multiple storages.However, because our data is also hold in memory, 
I suspect that connecting to Spark directly may be more efficient in 
performance.But definitely I need to look at Tachyon more carefully, in case it 
has a very efficient C++ binding mechanism.
Best Regards,Jia
On Dec 7, 2015, at 11:46 AM, Dewful  wrote:

Maybe looking into something like Tachyon would help, I see some sample c++ 
bindings, not sure how much of the current functionality they support...Hi, 
Robin, Thanks for your reply and thanks for copying my question to user mailing 
list.Yes, we have a distributed C++ application, that will store data on each 
node in the cluster, and we hope to leverage Spark to do more fancy analytics 
on those data. But we need high performance, that’s why we want shared 
memory.Suggestions will be highly appreciated!
Best Regards,Jia
On Dec 7, 2015, at 10:54 AM, Robin East  wrote:

-dev, +user (this is not a question about development of Spark itself so you’ll 
get more answers in the user mailing list)
First up let me say that I don’t really know how this could be done - I’m sure 
it would be possible with enough tinkering but it’s not clear what you are 
trying to achieve. Spark is a distributed processing system, it has multiple 
JVMs running on different machines that each run a small part of the overall 
processing. Unless you have some sort of idea to have multiple C++ processes 
collocated with the distributed JVMs using named memory mapped files doesn’t 
make architectural sense. 
---Robin
 EastSpark GraphX in Action Michael Malak and Robin EastManning Publications 
Co.http://www.manning.com/books/spark-graphx-in-action

On 6 Dec 2015, at 20:43, Jia  wrote:
Dears, for one project, I need to implement something so Spark can read data 
from a C++ process. 
To provide high performance, I really hope to implement this through shared 
memory between the C++ process and Java JVM process.
It seems it may be possible to use named memory mapped files and JNI to do 
this, but I wonder whether there is any existing efforts or more efficient 
approach to do this?
Thank you very much!

Best Regards,
Jia

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: how to run python app in yarn?

2015-01-14 Thread Jian Feng

Thanks Josh and Marcelo! It now works!
BTW, just wondering, is there any perf difference between running spark in 
standalone mode and under yarn? The only goal that I created this cluster is to 
run spark jobs. So I can set up spark in standalone mode if it runs slow in 
yarn.
best.
  From: Josh Rosen 
 To: Marcelo Vanzin  
Cc: freedafeng ; "user@spark.apache.org" 

 Sent: Wednesday, January 14, 2015 3:20 PM
 Subject: Re: how to run python app in yarn?

There's an open PR for supporting yarn-cluster mode in PySpark: 
https://github.com/apache/spark/pull/3976 (currently blocked on reviewer 
attention / time)

On Wed, Jan 14, 2015 at 3:16 PM, Marcelo Vanzin  wrote:

As the error message says...

On Wed, Jan 14, 2015 at 3:14 PM, freedafeng  wrote:
> Error: Cluster deploy mode is currently not supported for python
> applications.

Use "yarn-client" instead of "yarn-cluster" for pyspark apps.

--
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Shared memory between C++ process and Spark

Re: how to run python app in yarn?

2 matches

Site Navigation

Mail list logo

Footer information