Re: Getting number of physical machines in Spark

2015-08-28 Thread Alexey Grishchenko
There's no canonical way to do this as I understand. For instance, when running under YARN, you have completely no idea where your containers would be started. Moreover, if one of the containers would fail, it might be restarted on another machine so the machine number might change at runtime To

Re: Getting number of physical machines in Spark

2015-08-28 Thread Jason
I've wanted similar functionality too: when network IO bound (for me I was trying to pull things from s3 to hdfs) I wish there was a `.mapMachines` api where I wouldn't have to try guess at the proper partitioning of a 'driver' RDD for `sc.parallelize(1 to N, N).map( i= pull the i'th chunk from S3

Getting number of physical machines in Spark

2015-08-27 Thread Young, Matthew T
What's the canonical way to find out the number of physical machines in a cluster at runtime in Spark? I believe SparkContext.defaultParallelism will give me the number of cores, but I'm interested in the number of NICs. I'm writing a Spark streaming application to ingest from Kafka with the