Jenkins build is back to stable : SystemML-DailyTest #1598

2018-04-22 Thread jenkins
See 




Re: distributed cholesky on systemml

2018-04-22 Thread Matthias Boehm
sure no problem - thanks again for catching this issue that was hidden
for a while.

Yes, the same depth-first characteristic applies to the Cholesky
function as well. In contrast to U_triangular_inv, however, there are
data dependencies between the blocks per level (at least in the
current algorithm formulation), which means we cannot use the approach
I described for U_triangular_inv.

L11 = Cholesky(A11, nb)
A22 = ... U_triangular_inv(t(L11))
L22 = Cholesky(A22, nb)

However, note that there are much fewer calls to Cholesky due to the
switch to the builtin cholesky according to the given min block size.
For example, in our new test for dimensions 1362 x 1362 and min size
of 200, we call Cholesky 15 times but U_triangular_inv 2539 times.

For sufficiently large min block size this might be ok for Cholesky,
because each level also does a number of matrix multiplies that will
exploit the available parallelism of your cluster. In that regard. you
might want to experiment with different block sizes and driver memory
budgets. If I get a chance, I will also run a number of experiments
and see if we can rewrite these scripts.

Regards,
Matthias

On Sun, Apr 22, 2018 at 12:48 AM, Qifan Pu  wrote:
> Matthias,
>
> Thanks so much for taking time to fix. Really appreciated it.
> Does the same reasoning apply to the cholesky script? The recursive approach
> also looks inherently sequential.
>
> Best,
> Qifan
>
> On Sat, Apr 21, 2018 at 11:39 PM, Matthias Boehm  wrote:
>>
>> just as a quick update: this issue has now been fixed in SystemML
>> master - it was essentially a missing guard for recursive functions
>> when checking for unary size-preserving functions during
>> inter-procedural analysis (IPA).
>>
>> However, while working with this recursive cholesky function I came to
>> the conclusion that it may need some rework. The current top-down,
>> depth-first, approach is inherently sequential. This is partially
>> unnecessary because for the used recursive function U_triangular_inv
>> (which is called many more times than cholesky), blocks per level are
>> independent. Therefore, we should look into a bottom-up, breadth-first
>> approach to parallelize over the blocks in each level, which could be
>> done via parfor at script level.
>>
>> Regards,
>> Matthias
>>
>> On Sat, Apr 21, 2018 at 6:59 PM, Matthias Boehm  wrote:
>> > thanks for catching this - I just ran a toy example and this seems to
>> > be a rewrite issue (there are specific right indexing rewrites that
>> > collapse U[1:k,1:k] and U[1:k,k+1:n] into a single access to U which
>> > helps for large distributed matrices). As a workaround, you can set
>> > "sysml.optlevel" to 1 (instead of default 2, where 1 disables all
>> > rewrites), which worked fine for me. I'll fix this later today. Also
>> > I'll fix the naming from "Choleskey" to "Cholesky". Thanks again.
>> >
>> > Regards,
>> > Matthias
>> >
>> >
>> > On Sat, Apr 21, 2018 at 6:28 PM, Qifan Pu  wrote:
>> >> Hi Matthias,
>> >>
>> >> Thanks for the fast response and detailed information. This is really
>> >> helpful.
>> >>
>> >> I just tried to run it, and was tracing down a indexing bug that can be
>> >> repeated by simply running the test script of triangle solve[1]
>> >> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Invalid values
>> >> for
>> >> matrix indexing: [1667:,1:1666] must be within matrix dimensions
>> >> [1000,1000]
>> >>
>> >>
>> >> Am I missing some configuration here?
>> >>
>> >>
>> >> [1]
>> >>
>> >> https://github.com/apache/systemml/blob/master/scripts/staging/scalable_linalg/test/test_triangular_inv.dml
>> >>
>> >>
>> >> Best,
>> >> Qifan
>> >>
>> >>
>> >> On Sat, Apr 21, 2018 at 4:06 PM, Matthias Boehm 
>> >> wrote:
>> >>>
>> >>> Hi Qifan,
>> >>>
>> >>> thanks for your feedback. You're right, the builtin functions
>> >>> cholesky, inverse, eigen, solve, svd, qr, and lu are currently only
>> >>> supported as single-node operations because they're still implemented
>> >>> via Apache commons.math.
>> >>>
>> >>> However, there is an experimental script for distributed cholesky [1]
>> >>> which uses a recursive approach (with operations that allow for
>> >>> automatic distributed computation) for matrices larger than a
>> >>> user-defined block size. Once blocks become small enough, we use again
>> >>> the builtin cholesky. Graduating this script would require a broader
>> >>> set of experiments (and potential improvements) but it simply did not
>> >>> have the highest priority so far. You might want to give it a try
>> >>> though.
>> >>>
>> >>> Thanks again for your feedback - we'll consider a higher priority for
>> >>> these distributed operations when discussing the roadmap for the next
>> >>> releases.
>> >>>
>> >>> [1]
>> >>>
>> >>> https://github.com/apache/systemml/blob/master/scripts/staging/scalable_linalg/cholesky.dml
>> >>>
>> >>> Regards,
>> >>> Matthias
>> >>>