RE: No speedup in MultiLayerPerceptronClassifier with increase in number of cores

2015-10-15 Thread Ulanov, Alexander
Hi Disha, This is a good question. We plan to elaborate on it in our talk on the upcoming Spark Summit. Less workers means less compute power, more workers means more communication overhead. So, there exist an optimal number of workers for solving optimization problem with batch gradient given

Re: No speedup in MultiLayerPerceptronClassifier with increase in number of cores

2015-10-15 Thread Disha Shrivastava
Hi Alexander, Thanks for your reply.Actually I am working with a modified version of the actual MNIST dataset ( maximum samples = 8.2 M) https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html. I have been running different sized versions*( 1,10,50,1M,8M samples)* on

RE: No speedup in MultiLayerPerceptronClassifier with increase in number of cores

2015-10-12 Thread Ulanov, Alexander
Hi Disha, The problem might be as follows. The data that you have might physically reside only on two nodes and Spark launches data-local tasks. As a result, only two workers are used. You might want to force Spark to distribute the data across all nodes, however it does not seem to be

No speedup in MultiLayerPerceptronClassifier with increase in number of cores

2015-10-11 Thread Disha Shrivastava
Dear Spark developers, I am trying to study the effect of increasing number of cores ( CPU's) on speedup and accuracy ( scalability with spark ANN ) performance for the MNIST dataset using ANN implementation provided in the latest spark release. I have formed a cluster of 5 machines with 88

Re: No speedup in MultiLayerPerceptronClassifier with increase in number of cores

2015-10-11 Thread Mike Hynes
Having only 2 workers for 5 machines would be your problem: you probably want 1 worker per physical machine, which entails running the spark-daemon.sh script to start a worker on those machines. The partitioning is agnositic to how many executors are available for running the tasks, so you can't