date:20140424

Re: Problem creating objects through reflection

2014-04-24 Thread Michael Armbrust

The Spark REPL is slightly modified from the normal Scala REPL to prevent work from being done twice when closures are deserialized on the workers. I'm not sure exactly why this causes your problem, but its probably worth filing a JIRA about it. Here is another issues with classes defined in the

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-24 Thread DB Tsai

rcv1.binary is too sparse (0.15% non-zero elements), so dense format will not run due to out of memory. But sparse format runs really well. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-24 Thread DB Tsai

I'm doing the timer in runMiniBatchSGD after val numExamples = data.count() See the following. Running rcv1 dataset now, and will update soon. val startTime = System.nanoTime() for (i <- 1 to numIterations) { // Sample a subset (fraction miniBatchFraction) of the total data /

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-24 Thread Xiangrui Meng

I don't understand why sparse falls behind dense so much at the very first iteration. I didn't see count() is called in https://github.com/dbtsai/spark-lbfgs-benchmark/blob/master/src/main/scala/org/apache/spark/mllib/benchmark/BinaryLogisticRegression.scala . Maybe you have local uncommitted chang

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-24 Thread DB Tsai

Hi Xiangrui, Yes, I'm using yarn-cluster mode, and I did check # of executors I specified are the same as the actual running executors. For caching and materialization, I've the timer in optimizer after calling count(); as a result, the time for materialization in cache isn't in the benchmark. T

Problem creating objects through reflection

2014-04-24 Thread Piotr Kołaczkowski

Hi, I'm working on Cassandra-Spark integration and I hit a pretty severe problem. One of the provided functionality is mapping Cassandra rows into objects of user-defined classes. E.g. like this: class MyRow(val key: String, val data: Int) sc.cassandraTable("keyspace", "table").select("key", "dat

Re: Fw: Is there any way to make a quick test on some pre-commit code?

2014-04-24 Thread Prashant Sharma

Not sure but I use sbt/sbt ~compile instead of package. Any reason we use package instead of compile(which is slightly faster ofc.) Prashant Sharma On Thu, Apr 24, 2014 at 1:32 PM, Patrick Wendell wrote: > This is already on the wiki: > > https://cwiki.apache.org/confluence/display/SPARK/Usef

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-24 Thread Xiangrui Meng

Hi DB, I saw you are using yarn-cluster mode for the benchmark. I tested the yarn-cluster mode and found that YARN does not always give you the exact number of executors requested. Just want to confirm that you've checked the number of executors. The second thing to check is that in the benchmark

Re: Fw: Is there any way to make a quick test on some pre-commit code?

2014-04-24 Thread Patrick Wendell

This is already on the wiki: https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools On Wed, Apr 23, 2014 at 6:52 PM, Nan Zhu wrote: > I'm just asked by others for the same question > > I think Reynold gave a pretty helpful tip on this, > > Shall we put this on Contribute-to-

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-24 Thread Xiangrui Meng

I don't think it is easy to make sparse faster than dense with this sparsity and feature dimension. You can try rcv1.binary, which should show the difference easily. David, the breeze operators used here are 1. DenseVector dot SparseVector 2. axpy DenseVector SparseVector However, the SparseVect

Re: Problem creating objects through reflection

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

Problem creating objects through reflection

Re: Fw: Is there any way to make a quick test on some pre-commit code?

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

Re: Fw: Is there any way to make a quick test on some pre-commit code?

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

10 matches

Site Navigation

Mail list logo

Footer information