Re: Release Planning

2018-03-16 Thread Berthold Reinwald
thanks, Matthias. I will kick off the process on Monday morning PST.

Regards,
Berthold Reinwald
IBM Almaden Research Center
office: (408) 927 2208; T/L: 457 2208
e-mail: reinw...@us.ibm.com



From:   Matthias Boehm 
To: dev@systemml.apache.org
Date:   03/16/2018 09:21 PM
Subject:Re: Release Planning



Thanks for the patience. Meanwhile, all known issues have been resolved 
and
we're ready to cut an RC. Until this is done (or we have a 1.1 branch), I
would recommend to limit all commits to critical fixes.

Regards,
Matthias

On Tue, Mar 13, 2018 at 12:18 AM, Matthias Boehm  
wrote:

> just a quick update: the ANTLR issue and most other things that came up
> during QA are fixed now, except for performance issues with stratified
> stats for which I need some more time (SYSTEMML-2181 tracks the open
> issues). So it's probably a good idea to postpone the RC1 for a few days 
to
> avoid unnecessary release efforts.
>
> Regards,
> Matthias
>
> On Sun, Mar 11, 2018 at 5:43 PM, Matthias Boehm  
wrote:
>
>> well, after trying to run our perftest suite with Spark 2.3 and Spark 
2.2
>> this seems to be more complicated. Although the version update from 
4.5.3
>> to 4.7.1 solved the problem with Spark 2.3 (which uses 4.7.1), SystemML
>> would no longer be backwards compatible with Spark 2.2 and 2.1 (which 
use
>> 4.5.3) because ANTLR checks bidirectional mismatches.Unfortunately,
>> removing ANTLR from our jar does not help because the above versions 
are
>> binary incompatible.
>>
>> We need to hold off the release until we decided whether (1) we 
directly
>> release for Spark 2.3 and drop 2.2 and 2.1, or (2) we release for Spark 
2.2
>> and 2.1 with a subsequent ANTLR change and release for 2.3.
>>
>> Regards,
>> Matthias
>>
>> On Thu, Mar 8, 2018 at 5:38 PM, Niketan Pansare 
>> wrote:
>>
>>> +1. That will preserve current behavior on older Spark versions too.
>>>
>>> > On Mar 8, 2018, at 5:24 PM, Ted Yu  wrote:
>>> >
>>> > +1 on upgrading
>>> >  Original message From: Matthias Boehm <
>>> mboe...@gmail.com> Date: 3/8/18  5:19 PM  (GMT-08:00) To:
>>> dev@systemml.apache.org Subject: Re: Release Planning
>>> > related to Spark 2.3, we might want to update our ANTLR version 
because
>>> > with Spark 2.3 every parsed DML script (i.e., multiple times with
>>> imported
>>> > DML files) produces the following warning:
>>> >
>>> > "ANTLR Tool version 4.5.3 used for code generation does not match 
the
>>> > current runtime version 4.7"
>>> >
>>> > Regards,
>>> > Matthias
>>> >
>>> >> On Thu, Mar 1, 2018 at 5:22 PM, Matthias Boehm 
>>> wrote:
>>> >>
>>> >> Hi all,
>>> >>
>>> >> I'm sure you've seen that Spark 2.3 just got released. This lines 
up
>>> >> beautifully with our own SystemML 1.1 release. Accordingly, I would
>>> >> recommend to use Spark 2.3 for our due diligence algorithm-level 
Q/A.
>>> How
>>> >> about we shoot for an RC1 by March 12? This should give enough time
>>> to run
>>> >> over reasonably large data and fix all related issues.
>>> >>
>>> >> Regards,
>>> >> Matthias
>>> >>
>>> >>> On Tue, Feb 6, 2018 at 12:51 PM, Matthias Boehm 

>>> wrote:
>>> >>>
>>> >>> yes, absolutely. Here is a list of new features and improvements -
>>> please
>>> >>> feel free to extend as needed:
>>> >>>
>>> >>> 1) Extended Caffe2DML and Keras2DML APIs
>>> >>> 2) Support for large-dense blocks >16GB in CP
>>> >>> 3) New builtin functions ifelse, xor, as well as and/or/not over
>>> matrices
>>> >>> 4) Single-precision support for native conv2d and mm operations.
>>> >>> 5) Performance features and correctness of ultra-sparse operations
>>> >>> 6) Codegen: new plan cache, nary cbind
>>> >>> 7) Parfor: result merge with accumulators +=, reduced 
initialization
>>> >>> overhead
>>> >>> 8) Compiler improvements: avoid unnecessary spark instructions,
>>> corrected
>>> >>> memory estimates
>>> >>>
>>> >>> Until the RC, I'd like to support all deep learning builtin
>>> functions in
>>> >>> codegen, add a couple of pending parfor improvements, and
>>> potentially do a
>>> >>> first cut of the compiler/runtime integration for parameter 
servers.
>>> >>>
>>> >>> Regards,
>>> >>> Matthias
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Tue, Feb 6, 2018 at 12:36 PM, Niketan Pansare 
>> >
>>> >>> wrote:
>>> >>>
>>>  +1.
>>> 
>>>  We should consider including single precision native BLAS in the
>>> release
>>>  notes as well. If possible, we should add JNI wrappers for 
PowerPC,
>>> Windows
>>>  and Mac too in this release.
>>> 
>>> > On Feb 6, 2018, at 12:27 PM, Berthold Reinwald <
>>> reinw...@us.ibm.com>
>>>  wrote:
>>> >
>>> > sure.
>>> >
>>> > Makes sense. Codegen and Keras2DML made good progress, and many
>>> other
>>> > fixes/improvements.
>>> >
>>> > What else do we want time/track/highlight for it?
>>> 

Re: Sub projects in Language and run time for parameter servers [SYSTEMML-2083]

2018-03-16 Thread Chamath Abeysinghe
Hi Matthias,
After going through JIRA sub projects and references you provide I thought
of drafting proposal focusing the Distributed spark backend
 project because it
seems challenging and exciting area to explore :-).
I have sketched a rough diagram for design and the implementation plan for
the proposal,
https://drive.google.com/file/d/1MTlYWvkkApe28vDOodDR8hmxzVx9QwQX/view?usp=sharing


My idea is making Paramserv runtime similar design to ParFor runtime, and
as a extension it will handle parameter exchange. So there I will work on
some primitives required by runtime to manage the PS and then in Spark I
will implement a parameter server. Initially it will work using synchronous
method and then if time allows I will experiment with other methods and
performance factors.

And also regarding the control program I have some concerns,
In the project JIRA it was mentioned that "PS strategies will be selected
by the user", does this include the architecture of the parameter server(#
of workers and servers)  also  or does it need to be handled in the
project?

I hope this plan aligns with expectations of the community and does not
conflicts with other GSoC candidates. Your feedback for this highly
appreciated, if there is anything wrong please correct me. Thanks

*PS : I am re sending the same mail because it seems previous mail with
attachment was not delivered to the dev mailing list. *

Regards,
Chamath


On Fri, Mar 9, 2018 at 2:19 PM, Matthias Boehm  wrote:

> Hi Chamath,
>
> ad 1: Yes, this is absolutely correct. However, it is important to realize
> that within the workers, we want to run dml functions, and for these we'll
> reuse our existing compiler, runtime, operations, and data structures.
>
> ad 2: Yes, this is also correct. Indeed we can use an existing parfor
> (with local execution mode) to emulate a local, synchronous parameter
> server. However, it would be very hard - and conflicting with our
> functional and thus, stateless execution semantics - to incorporate
> asynchronous updates and strategies such as Hogwild!. Furthermore, such a
> local parameter server might also have an application with very large
> models and batches, because this would enable distributed data-parallel
> operations spawn from each local worker.
>
> ad 3: Unfortunately, there is no one single detailed architecture diagram
> because the system evolves over time. I would recommend to look at the
> following two papers, where especially [1] (the parfor paper, and its
> extensions for Spark in [2]) might give you a better idea of the parameter
> server and its workers, which are primarily meant to handle the
> orchestration and efficient parameter updates/exchange. if you're looking
> for coarse-grained component, then [3], slide 8 might be a starting point.
> At a high-level each operation and some constructs like parfor have
> physical operators for CP, SPARK, MR, and some for GPU. Similarly this
> project aims to introduce a new paramserv builtin function (most similar to
> parfor) and its different physical operators.
>
> ad 4: Since this paramserv function has similarity with parfor, we will be
> able to reuse key primitives for bringing up local/remote workers, shipping
> the compiled functions, and input data. The major extensions will be to
> call the shipped functions per batch, get the returned (i.e., updated)
> parameters and handle the exchange accordingly to the paramserv
> configuration. However, since paramserv as an operation is implemented from
> scratch, we can customize as needed and are not restricted by script-level
> semantics which renders the problem simpler as the general-purpose parfor
> construct. Both have their use cases.
>
> In case this did not clarify your questions, let us known and we'll sort
> it out.
>
> [1] http://www.vldb.org/pvldb/vol7/p553-boehm.pdf, 2014
> [2] http://www.vldb.org/pvldb/vol9/p1425-boehm.pdf, 2016
> [3] http://boss.dima.tu-berlin.de/media/BOSS16-Tutorial-mboehm.pdf, 2016
>
> Regards,
> Matthias
>
> On Thu, Mar 8, 2018 at 10:28 PM, Chamath Abeysinghe <
> abeysinghecham...@gmail.com> wrote:
>
>> Hi,
>> I am trying to understand the purpose and work needed for different sub
>> projects in SYSTEMML-2083. And I got few questions,
>>
>> * In the JIRA it was mentioned that we are not integrating off the shelf
>> Parameter Server, but rather develop language and run time support from
>> scratch. As far as I understand, this means creating syntax for DML to
>> interact with the parameter server. And the parameter server implementation
>> is in different back-ends. So for example in Spark back end we have to
>> create a some kind of parameter server implementation with different
>> strategies, and it should be invoked by the syntax in DML. Is this
>> understanding correct?
>>
>> * In the JIRA there is a sub project for local multi threaded back-end.
>> In this project does "local" mean executing on