subject:"Spark performance on 32 Cpus Server Cluster"

Spark performance on 32 Cpus Server Cluster

2015-02-20 Thread Dirceu Semighini Filho

Hi all, I'm running Spark 1.2.0, in Stand alone mode, on different cluster and server sizes. All of my data is cached in memory. Basically I have a mass of data, about 8gb, with about 37k of columns, and I'm running different configs of an BinaryLogisticRegressionBFGS. When I put spark to run on 9

Re: Spark performance on 32 Cpus Server Cluster

2015-02-20 Thread Sean Owen

It sounds like your computation just isn't CPU bound, right? or maybe that only some stages are. It's not clear what work you are doing beyond the core LR. Stages don't wait on each other unless one depends on the other. You'd have to clarify what you mean by running stages in parallel, like what

Re: Spark performance on 32 Cpus Server Cluster

2015-02-20 Thread Sean Owen

Yes that makes sense, but it doesn't make the jobs CPU-bound. What is the bottleneck? the model building or other stages? I would think you can get the model building to be CPU bound, unless you have chopped it up into really small partitions. I think it's best to look further into what stages are

Re: Spark performance on 32 Cpus Server Cluster

2015-02-20 Thread Dirceu Semighini Filho

Hi Sean, I'm trying to increase the cpu usage by running logistic regression in different datasets in parallel. They shouldn't depend on each other. I train several logistic regression models from different column combinations of a main dataset. I processed the combinations in a ParArray in an