Hello,
I am new to Spark. I have adapted an example code to do binary
classification using logistic regression. I tried it on rcv1_train.binary
dataset using LBFGS.runLBFGS solver, and obtained correct loss.

Now, I'd like to run code in parallel across 16 cores of my single CPU
socket. If I understand correctly, parallelism in Spark is achieved by
partitioning dataset into some number of partitions, approximately 3-4 times
the amount of cores in the system.  To partition the data, I am calling
data.repartition(npart), where npart is number of partitions (16*4=64 in my
case).

I run the code as follows: 
spark-submit --master local[16] --class "logreg"
target/scala-2.10/logistic-regression_2.10-1.0.2.jar  72

However, I do not observe any speedup compared to when I just use one
partition. I would much appreciate your help understanding what I am doing
wrong and why I am not seeing any speedup due to 16 cores. Please find my
code below.

Best,
Mike

*CODE*
object logreg {
  def main(args: Array[String]) {

    val conf = new SparkConf().setAppName("logreg")
    val sc = new SparkContext(conf)
    val npart=args(0).toInt;
    val data_ = MLUtils.loadLibSVMFile(sc,
"rcv1_train.binary.0label").cache()
    val data=data_.repartition(npart); // partition dataset in "npart"
partitions
    val lambda=(1.0/data.count())
    val splits = data.randomSplit(Array(1.0, 0.0), seed = 11L)
    val training = splits(0).map(x => (x.label,
MLUtils.appendBias(x.features))).cache()
    val numFeatures = data.take(1)(0).features.size
    val start = System.currentTimeMillis
    val initialWeightsWithIntercept = Vectors.dense(new
Array[Double](numFeatures + 1))
    val (weightsWithIntercept, loss) = LBFGS.runLBFGS(training,
                                                      new
LogisticGradient(),
                                                      new
SquaredL2Updater(),
                                                      10,
                                                      1e-14,
                                                      100,
                                                      lambda,
                                                     
initialWeightsWithIntercept)
    val took = (System.currentTimeMillis - start)/1000.0;
    println("LBFGS.runLBFGS: " +  took + "s")
    sc.stop()
  }
}





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Parallel-execution-on-one-node-tp21052.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to