Yeah, that's probably the easiest though obviously pretty hacky.
I'm surprised that the hessian approximation isn't worse than it is. (As
in, I'd expect error messages.) It's obviously line searching much more, so
the approximation must be worse. You might be interested in this online
bfgs:
Yeah, the approximation of hssian in LBFGS isn't stateless, and it does
depend on previous LBFGS step as Xiangrui also pointed out. It's surprising
that it works without error message. I also saw the loss is fluctuating
like SGD during the training.
We will remove the miniBatch mode in LBFGS in
Also, how many failure of rejection will terminate the optimization
process? How is it related to numberOfImprovementFailures?
Thanks.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On
That's right.
FWIW, caching should be automatic now, but it might be the version of
Breeze you're using doesn't do that yet.
Also, In breeze.util._ there's an implicit that adds a tee method to
iterator, and also a last method. Both are useful for things like this.
-- David
On Sun, Apr 27,
Hi David,
I got most of the stuff working, and the loss is monotonically decreasing
by getting the history from iterator of state.
However, in the costFun, I need to know what current iteration is it for
miniBatch, which means for one iteration, if optimizer calls costFun
several times for line
LBFGS will not take a step that sends the objective value up. It might try
a step that is too big and reject it, so if you're just logging
everything that gets tried by LBFGS, you could see that. The iterations
method of the minimizer should never return an increasing objective value.
If you're
I don't know about Spark's implementation, but with LBFGS, there is a line
search step. Since computing the line search takes roughly the same work
as one iteration, an efficient implementation will take a full step and
simultaneously compute the gradient for the next step and check if the
update
I don't think it is easy to make sparse faster than dense with this
sparsity and feature dimension. You can try rcv1.binary, which should
show the difference easily.
David, the breeze operators used here are
1. DenseVector dot SparseVector
2. axpy DenseVector SparseVector
However, the
Hi DB,
I saw you are using yarn-cluster mode for the benchmark. I tested the
yarn-cluster mode and found that YARN does not always give you the
exact number of executors requested. Just want to confirm that you've
checked the number of executors.
The second thing to check is that in the
Hi Xiangrui,
Yes, I'm using yarn-cluster mode, and I did check # of executors I
specified are the same as the actual running executors.
For caching and materialization, I've the timer in optimizer after calling
count(); as a result, the time for materialization in cache isn't in the
benchmark.
I don't understand why sparse falls behind dense so much at the very
first iteration. I didn't see count() is called in
https://github.com/dbtsai/spark-lbfgs-benchmark/blob/master/src/main/scala/org/apache/spark/mllib/benchmark/BinaryLogisticRegression.scala
. Maybe you have local uncommitted
I'm doing the timer in runMiniBatchSGD after val numExamples = data.count()
See the following. Running rcv1 dataset now, and will update soon.
val startTime = System.nanoTime()
for (i - 1 to numIterations) {
// Sample a subset (fraction miniBatchFraction) of the total data
rcv1.binary is too sparse (0.15% non-zero elements), so dense format will
not run due to out of memory. But sparse format runs really well.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
Hi all,
I'm benchmarking Logistic Regression in MLlib using the newly added
optimizer LBFGS and GD. I'm using the same dataset and the same methodology
in this paper, http://www.csie.ntu.edu.tw/~cjlin/papers/l1.pdf
I want to know how Spark scale while adding workers, and how optimizers and
input
What is the number of non zeroes per row (and number of features) in the sparse
case? We've hit some issues with breeze sparse support in the past but for
sufficiently sparse data it's still pretty good.
On Apr 23, 2014, at 9:21 PM, DB Tsai dbt...@stanford.edu wrote:
Hi all,
I'm
Sorry - just saw the 11% number. That is around the spot where dense data is
usually faster (blocking, cache coherence, etc) is there any chance you have a
1% (or so) sparse dataset to experiment with?
On Apr 23, 2014, at 9:21 PM, DB Tsai dbt...@stanford.edu wrote:
Hi all,
I'm
I don't think the attachment came through in the list. Could you upload the
results somewhere and link to them ?
On Wed, Apr 23, 2014 at 9:32 PM, DB Tsai dbt...@dbtsai.com wrote:
123 features per rows, and in average, 89% are zeros.
On Apr 23, 2014 9:31 PM, Evan Sparks evan.spa...@gmail.com
Any suggestion for sparser dataset? Will test more tomorrow in the office.
On Apr 23, 2014 9:33 PM, Evan Sparks evan.spa...@gmail.com wrote:
Sorry - just saw the 11% number. That is around the spot where dense data
is usually faster (blocking, cache coherence, etc) is there any chance you
have
On Wed, Apr 23, 2014 at 9:30 PM, Evan Sparks evan.spa...@gmail.com wrote:
What is the number of non zeroes per row (and number of features) in the
sparse case? We've hit some issues with breeze sparse support in the past
but for sufficiently sparse data it's still pretty good.
Any chance you
The figure showing the Log-Likelihood vs Time can be found here.
https://github.com/dbtsai/spark-lbfgs-benchmark/raw/fd703303fb1c16ef5714901739154728550becf4/result/a9a11M.pdf
Let me know if you can not open it.
Sincerely,
DB Tsai
---
My
Was the weight vector sparse? The gradients? Or just the feature vectors?
On Wed, Apr 23, 2014 at 10:08 PM, DB Tsai dbt...@dbtsai.com wrote:
The figure showing the Log-Likelihood vs Time can be found here.
ps, it doesn't make sense to have weight and gradient sparse unless
with strong L1 penalty.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Wed, Apr 23, 2014 at 10:17 PM, DB Tsai
On Wed, Apr 23, 2014 at 10:18 PM, DB Tsai dbt...@dbtsai.com wrote:
ps, it doesn't make sense to have weight and gradient sparse unless
with strong L1 penalty.
Sure, I was just checking the obvious things. Have you run it through it a
profiler to see where the problem is?
Sincerely,
DB
Not yet since it's running in the cluster. Will run locally with
profiler. Thanks for help.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Wed, Apr 23, 2014 at 10:22 PM, David Hall
The figure showing the Log-Likelihood vs Time can be found here.
https://github.com/dbtsai/spark-lbfgs-benchmark/raw/fd703303fb1c16ef5714901739154728550becf4/result/a9a11M.pdf
Let me know if you can not open it. Thanks.
Sincerely,
DB Tsai
---
25 matches
Mail list logo