The Spark REPL is slightly modified from the normal Scala REPL to prevent
work from being done twice when closures are deserialized on the workers.
I'm not sure exactly why this causes your problem, but its probably worth
filing a JIRA about it.
Here is another issues with classes defined in the
rcv1.binary is too sparse (0.15% non-zero elements), so dense format will
not run due to out of memory. But sparse format runs really well.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On
I'm doing the timer in runMiniBatchSGD after val numExamples = data.count()
See the following. Running rcv1 dataset now, and will update soon.
val startTime = System.nanoTime()
for (i <- 1 to numIterations) {
// Sample a subset (fraction miniBatchFraction) of the total data
/
I don't understand why sparse falls behind dense so much at the very
first iteration. I didn't see count() is called in
https://github.com/dbtsai/spark-lbfgs-benchmark/blob/master/src/main/scala/org/apache/spark/mllib/benchmark/BinaryLogisticRegression.scala
. Maybe you have local uncommitted chang
Hi Xiangrui,
Yes, I'm using yarn-cluster mode, and I did check # of executors I
specified are the same as the actual running executors.
For caching and materialization, I've the timer in optimizer after calling
count(); as a result, the time for materialization in cache isn't in the
benchmark.
T
Hi,
I'm working on Cassandra-Spark integration and I hit a pretty severe
problem. One of the provided functionality is mapping Cassandra rows into
objects of user-defined classes. E.g. like this:
class MyRow(val key: String, val data: Int)
sc.cassandraTable("keyspace", "table").select("key", "dat
Not sure but I use sbt/sbt ~compile instead of package. Any reason we use
package instead of compile(which is slightly faster ofc.)
Prashant Sharma
On Thu, Apr 24, 2014 at 1:32 PM, Patrick Wendell wrote:
> This is already on the wiki:
>
> https://cwiki.apache.org/confluence/display/SPARK/Usef
Hi DB,
I saw you are using yarn-cluster mode for the benchmark. I tested the
yarn-cluster mode and found that YARN does not always give you the
exact number of executors requested. Just want to confirm that you've
checked the number of executors.
The second thing to check is that in the benchmark
This is already on the wiki:
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools
On Wed, Apr 23, 2014 at 6:52 PM, Nan Zhu wrote:
> I'm just asked by others for the same question
>
> I think Reynold gave a pretty helpful tip on this,
>
> Shall we put this on Contribute-to-
I don't think it is easy to make sparse faster than dense with this
sparsity and feature dimension. You can try rcv1.binary, which should
show the difference easily.
David, the breeze operators used here are
1. DenseVector dot SparseVector
2. axpy DenseVector SparseVector
However, the SparseVect
10 matches
Mail list logo