Re: NNLS bug

2014-10-16 Thread Xiangrui Meng
Thanks for reporting the bug! I will take a look. -Xiangrui On Thu, Oct 16, 2014 at 11:25 PM, Debasish Das wrote: > Hi, > > I am validating the proximal algorithm for positive and bound constrained > ALS and I came across the bug detailed in the JIRA while running ALS with > NNLS: > > https://iss

NNLS bug

2014-10-16 Thread Debasish Das
Hi, I am validating the proximal algorithm for positive and bound constrained ALS and I came across the bug detailed in the JIRA while running ALS with NNLS: https://issues.apache.org/jira/browse/SPARK-3987 ADMM based proximal algorithm came up with correct result... Thanks. Deb

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread Nicholas Chammas
On Thu, Oct 16, 2014 at 3:55 PM, shane knapp wrote: > i really, truly hate non-deterministic failures. Amen bruddah.

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread shane knapp
yeah, at this point it might be worth trying. :) the absolutely irritating thing is that i am not seeing this happen w/any other jobs other that the spark prb, nor does it seem to correlate w/time of day, network or system load, or what slave it runs on. nor are we hitting our limit of connectio

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread Nicholas Chammas
Thanks for continuing to look into this, Shane. One suggestion that Patrick brought up, if we have trouble getting to the bottom of this, is doing the git checkout ourselves in the run-tests-jenkins script and cutting out the Jenkins git plugin entirely. That way we can script retries and post fri

Re: Unit testing Master-Worker Message Passing

2014-10-16 Thread Josh Rosen
Hi Matt, I’m not sure whether those tests will actually find this specific issue.  The tests that I linked to test Spark’s Zookeeper-based multi-master mode, whereas it sounds like you’re seeing this issue in regular standalone cluster.  In those tests, the workers disconnect from the master be

Re: short jenkins downtime -- trying to get to the bottom of the git fetch timeouts

2014-10-16 Thread shane knapp
the bad news is that we've had a couple more failures due to timeouts, but the good news is that the frequency that these happen has decreased significantly (3 in the past ~18hr). seems like the git plugin downgrade has helped relieve the problem, but hasn't fixed it. i'll be looking in to this m

accumulators

2014-10-16 Thread Sean McNamara
Accumulators on the stage info page show the rolling life time value of accumulators as well as per task which is handy. I think it would be useful to add another field to the “Accumulators” table that also shows the total for the stage you are looking at (basically just a merge of the accumula

Re: Issues with ALS positive definite

2014-10-16 Thread Debasish Das
Just checked, QR is exposed by netlib: import org.netlib.lapack.Dgeqrf For the equality and bound version, I will use QR...it will be faster than the LU that I am using through jblas.solveSymmetric... On Thu, Oct 16, 2014 at 8:34 AM, Debasish Das wrote: > @xiangrui should we add this epsilon in

Re: Issues with ALS positive definite

2014-10-16 Thread Debasish Das
@xiangrui should we add this epsilon inside ALS code itself ? So that if user by mistake put 0.0 as regularization, LAPACK failures does not show up... @sean For the proximal algorithms I am using Cholesky for L1 and LU for equality and bound constraints (since the matrix is quasi definite)...I am

Re: Issues with ALS positive definite

2014-10-16 Thread Sean Owen
It Gramian is at least positive semidefinite and will be definite if the matrix is non singular, yes. That's usually but not always true. The lambda*I matrix is positive definite, well, when lambda is positive. Adding that makes it definite. At least, lambda=0 could be rejected as invalid. But t