Fwd: Sub projects in Language and run time for parameter servers [SYSTEMML-2083]

2018-03-17 Thread Matthias Boehm
-- Forwarded message --
From: Matthias Boehm 
Date: Sat, Mar 17, 2018 at 5:41 PM
Subject: Re: Sub projects in Language and run time for parameter servers
[SYSTEMML-2083]
To: Chamath Abeysinghe 


great to see that you're making progress on your proposal. However and as a
general note to all students, please don't share your personal proposals
here. Instead submit your proposals through the official channel, the GSoC
website, and I will provide feedback there.

Having said that, if you have questions related to SystemML in general or
specifics of individual components, please, don't hesitate to ask them here.

Regards,
Matthias

On Fri, Mar 16, 2018 at 4:17 AM, Chamath Abeysinghe <
abeysinghecham...@gmail.com> wrote:

> Hi Matthias,
> After going through JIRA sub projects and references you provide I thought
> of drafting proposal focusing the Distributed spark backend
>  project because it
> seems challenging and exciting area to explore :-).
> I have sketched a rough diagram for design and the implementation plan for
> the proposal, https://drive.google.com/file/d/1MTlYWvkkApe28vDOo
> dDR8hmxzVx9QwQX/view?usp=sharing
>
> My idea is making Paramserv runtime similar design to ParFor runtime, and
> as a extension it will handle parameter exchange. So there I will work on
> some primitives required by runtime to manage the PS and then in Spark I
> will implement a parameter server. Initially it will work using synchronous
> method and then if time allows I will experiment with other methods and
> performance factors.
>
> And also regarding the control program I have some concerns,
> In the project JIRA it was mentioned that "PS strategies will be selected
> by the user", does this include the architecture of the parameter server(#
> of workers and servers)  also  or does it need to be handled in the
> project?
>
> I hope this plan aligns with expectations of the community and does not
> conflicts with other GSoC candidates. Your feedback for this highly
> appreciated, if there is anything wrong please correct me. Thanks
>
> *PS : I am re sending the same mail because it seems previous mail with
> attachment was not delivered to the dev mailing list. *
>
> Regards,
> Chamath
>
>
> On Fri, Mar 9, 2018 at 2:19 PM, Matthias Boehm  wrote:
>
>> Hi Chamath,
>>
>> ad 1: Yes, this is absolutely correct. However, it is important to
>> realize that within the workers, we want to run dml functions, and for
>> these we'll reuse our existing compiler, runtime, operations, and data
>> structures.
>>
>> ad 2: Yes, this is also correct. Indeed we can use an existing parfor
>> (with local execution mode) to emulate a local, synchronous parameter
>> server. However, it would be very hard - and conflicting with our
>> functional and thus, stateless execution semantics - to incorporate
>> asynchronous updates and strategies such as Hogwild!. Furthermore, such a
>> local parameter server might also have an application with very large
>> models and batches, because this would enable distributed data-parallel
>> operations spawn from each local worker.
>>
>> ad 3: Unfortunately, there is no one single detailed architecture diagram
>> because the system evolves over time. I would recommend to look at the
>> following two papers, where especially [1] (the parfor paper, and its
>> extensions for Spark in [2]) might give you a better idea of the parameter
>> server and its workers, which are primarily meant to handle the
>> orchestration and efficient parameter updates/exchange. if you're looking
>> for coarse-grained component, then [3], slide 8 might be a starting point.
>> At a high-level each operation and some constructs like parfor have
>> physical operators for CP, SPARK, MR, and some for GPU. Similarly this
>> project aims to introduce a new paramserv builtin function (most similar to
>> parfor) and its different physical operators.
>>
>> ad 4: Since this paramserv function has similarity with parfor, we will
>> be able to reuse key primitives for bringing up local/remote workers,
>> shipping the compiled functions, and input data. The major extensions will
>> be to call the shipped functions per batch, get the returned (i.e.,
>> updated) parameters and handle the exchange accordingly to the paramserv
>> configuration. However, since paramserv as an operation is implemented from
>> scratch, we can customize as needed and are not restricted by script-level
>> semantics which renders the problem simpler as the general-purpose parfor
>> construct. Both have their use cases.
>>
>> In case this did not clarify your questions, let us known and we'll sort
>> it out.
>>
>> [1] http://www.vldb.org/pvldb/vol7/p553-boehm.pdf, 2014
>> [2] http://www.vldb.org/pvldb/vol9/p1425-boehm.pdf, 2016
>> [3] http://boss.dima.tu-berlin.de/media/BOSS16-Tutorial-mboehm.pdf, 2016
>>
>> Regards,
>> Matthias
>>
>> On Thu, 

Jenkins build is back to stable : SystemML-DailyTest #1540

2018-03-17 Thread jenkins
See