Re: Extending Codegen algorithm tests for heuristics

2018-04-03 Thread Chamath Abeysinghe
Hi,
Thanks for the tip. I solved the bug and opened a new PR for SYSTEMML-2169.

Regards,
Chamath

On Wed, Mar 14, 2018 at 7:42 AM, Matthias Boehm <mboe...@gmail.com> wrote:

> -- Forwarded message --
> From: Matthias Boehm <mboe...@gmail.com>
> Date: Tue, Mar 13, 2018 at 1:00 PM
> Subject: Re: Extending Codegen algorithm tests for heuristics
> To: Chamath Abeysinghe <abeysinghecham...@gmail.com>
>
>
> without debugging it's hard to tell, but usually something like this
> happens if blocks are incorrectly aligned. So I would recommend to simply
> do a mapToPair before the RDDAggregateUtils.mergeByKey and validate the
> correctness of shifted block indexes. Maybe the newly introduced broadcasts
> are not shifted into their target positions? For example, consider
> cbind(A,B) - before aggregation, B needs to be shifted by ncol(A).
> Furthermore, it would be great to avoid unnecessary aggregation if all but
> one inputs are broadcasts.
>
> Regards,
> Matthias
>
> On Tue, Mar 13, 2018 at 7:36 AM, Chamath Abeysinghe <
> abeysinghecham...@gmail.com> wrote:
>
> > Hi Matthias,
> > I am working on SYSTEMML-2169 issue. I have sent a partially completed PR
> > ( https://github.com/apache/systemml/pull/747 ). After those changes,
> > some test cases in NaryRBindTest, are failing and I could not understand
> > the reason.
> > Test cases are failing with following error
> > *Caused by: org.apache.sysml.runtime.DMLRuntimeException: Mismatched
> block
> > sizes for: 280 101 1000 101*
> > * at
> > org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$
> MergeBlocksFunction.call(RDDAggregateUtils.java:622)*
> > * at
> > org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$
> MergeBlocksFunction.call(RDDAggregateUtils.java:596)*
> >
> > Even after debugging the whole process I could not find a reason for
> this.
> > If you can give any suggestion that would be really helpful.
> >
> > If you have any other comment regarding the PR I could modify code
> > according to that.
> >
> > Thanks,
> > Chamath
> >
> >
> > On Wed, Mar 7, 2018 at 11:44 AM, Matthias Boehm <mboe...@gmail.com>
> wrote:
> >
> >> -- Forwarded message --
> >> From: Matthias Boehm <mboe...@gmail.com>
> >> Date: Tue, Mar 6, 2018 at 10:14 PM
> >> Subject: Re: Extending Codegen algorithm tests for heuristics
> >> To: Chamath Abeysinghe <abeysinghecham...@gmail.com>
> >>
> >>
> >> Hi Chamath,
> >>
> >> great thanks for your contribution - I left a couple of comments but we
> >> should be ready to merge this in soon. If you want to get a better
> feeling
> >> for the distributed spark backend as well, I created SYSTEMML-2169,
> which
> >> aims to extend our recently added nary cbind/rbind operations to
> leverage
> >> broadcasts when applicable.
> >>
> >> Regarding the proposal, most of the backends are rather independent, but
> >> each backend depends on the language integration. We will help out where
> >> necessary. So it depends on your interests and ideas. If you're more
> >> interested in defining the language APIs, make this and a simple backend
> >> the core of your proposal. If you're more interested in the runtime
> >> backends, I would help and add a basic language integration in time,
> which
> >> would allow you to immediate start working on the backends.
> >>
> >> Following the GSoC guidelines it's usually better to underscope the
> >> project
> >> than overscope it because you want to ensure that you're able to
> >> successfully complete the project in the ambitious timeframe and there
> >> will
> >> always be unforeseen obstacles. I would recommend to define a core
> project
> >> and potential extensions you will address if time allows. For example,
> the
> >> local, multi-threaded backend can indeed be realized relatively quickly.
> >> However, subsequently we can add and experiment with Hogwild! (i.e.,
> >> unsynchronized updates) which is known to work well for sparse models,
> >> replication and partitioning in NUMA settings, and potentially the
> >> automatic selection of update strategies.
> >>
> >> Regards,
> >> Matthias
> >>
> >>
> >> On Tue, Mar 6, 2018 at 10:24 AM, Chamath Abeysinghe <
> >> abeysinghecham...@gmail.com> wrote:
> >>
> >> > Hi

Fwd: Extending Codegen algorithm tests for heuristics

2018-03-13 Thread Matthias Boehm
-- Forwarded message --
From: Matthias Boehm <mboe...@gmail.com>
Date: Tue, Mar 13, 2018 at 1:00 PM
Subject: Re: Extending Codegen algorithm tests for heuristics
To: Chamath Abeysinghe <abeysinghecham...@gmail.com>


without debugging it's hard to tell, but usually something like this
happens if blocks are incorrectly aligned. So I would recommend to simply
do a mapToPair before the RDDAggregateUtils.mergeByKey and validate the
correctness of shifted block indexes. Maybe the newly introduced broadcasts
are not shifted into their target positions? For example, consider
cbind(A,B) - before aggregation, B needs to be shifted by ncol(A).
Furthermore, it would be great to avoid unnecessary aggregation if all but
one inputs are broadcasts.

Regards,
Matthias

On Tue, Mar 13, 2018 at 7:36 AM, Chamath Abeysinghe <
abeysinghecham...@gmail.com> wrote:

> Hi Matthias,
> I am working on SYSTEMML-2169 issue. I have sent a partially completed PR
> ( https://github.com/apache/systemml/pull/747 ). After those changes,
> some test cases in NaryRBindTest, are failing and I could not understand
> the reason.
> Test cases are failing with following error
> *Caused by: org.apache.sysml.runtime.DMLRuntimeException: Mismatched block
> sizes for: 280 101 1000 101*
> * at
> org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:622)*
> * at
> org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:596)*
>
> Even after debugging the whole process I could not find a reason for this.
> If you can give any suggestion that would be really helpful.
>
> If you have any other comment regarding the PR I could modify code
> according to that.
>
> Thanks,
> Chamath
>
>
> On Wed, Mar 7, 2018 at 11:44 AM, Matthias Boehm <mboe...@gmail.com> wrote:
>
>> -- Forwarded message --
>> From: Matthias Boehm <mboe...@gmail.com>
>> Date: Tue, Mar 6, 2018 at 10:14 PM
>> Subject: Re: Extending Codegen algorithm tests for heuristics
>> To: Chamath Abeysinghe <abeysinghecham...@gmail.com>
>>
>>
>> Hi Chamath,
>>
>> great thanks for your contribution - I left a couple of comments but we
>> should be ready to merge this in soon. If you want to get a better feeling
>> for the distributed spark backend as well, I created SYSTEMML-2169, which
>> aims to extend our recently added nary cbind/rbind operations to leverage
>> broadcasts when applicable.
>>
>> Regarding the proposal, most of the backends are rather independent, but
>> each backend depends on the language integration. We will help out where
>> necessary. So it depends on your interests and ideas. If you're more
>> interested in defining the language APIs, make this and a simple backend
>> the core of your proposal. If you're more interested in the runtime
>> backends, I would help and add a basic language integration in time, which
>> would allow you to immediate start working on the backends.
>>
>> Following the GSoC guidelines it's usually better to underscope the
>> project
>> than overscope it because you want to ensure that you're able to
>> successfully complete the project in the ambitious timeframe and there
>> will
>> always be unforeseen obstacles. I would recommend to define a core project
>> and potential extensions you will address if time allows. For example, the
>> local, multi-threaded backend can indeed be realized relatively quickly.
>> However, subsequently we can add and experiment with Hogwild! (i.e.,
>> unsynchronized updates) which is known to work well for sparse models,
>> replication and partitioning in NUMA settings, and potentially the
>> automatic selection of update strategies.
>>
>> Regards,
>> Matthias
>>
>>
>> On Tue, Mar 6, 2018 at 10:24 AM, Chamath Abeysinghe <
>> abeysinghecham...@gmail.com> wrote:
>>
>> > Hi,
>> > I have sent a pull request for this issue.
>> > As a next step, could you suggest any new issue? or anything I have to
>> do
>> > to familiarize with Language and run time for parameter servers project.
>> >
>> > And regarding writing the project proposal I have few questions.
>> > * In the epic there are few sub tasks, is it enough to focus on a single
>> > task through out the summer? Would it have enough work load or should I
>> go
>> > for multiple tasks?
>> > * What is the linkage between sub tasks? Do tasks like, Distributed
>> Spark
>> > Back-end or Local multi threaded b

Re: Extending Codegen algorithm tests for heuristics

2018-03-13 Thread Chamath Abeysinghe
Hi Matthias,
I am working on SYSTEMML-2169 issue. I have sent a partially completed PR (
https://github.com/apache/systemml/pull/747 ). After those changes, some
test cases in NaryRBindTest, are failing and I could not understand the
reason.
Test cases are failing with following error
*Caused by: org.apache.sysml.runtime.DMLRuntimeException: Mismatched block
sizes for: 280 101 1000 101*
* at
org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:622)*
* at
org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:596)*

Even after debugging the whole process I could not find a reason for this.
If you can give any suggestion that would be really helpful.

If you have any other comment regarding the PR I could modify code
according to that.

Thanks,
Chamath


On Wed, Mar 7, 2018 at 11:44 AM, Matthias Boehm <mboe...@gmail.com> wrote:

> -- Forwarded message --
> From: Matthias Boehm <mboe...@gmail.com>
> Date: Tue, Mar 6, 2018 at 10:14 PM
> Subject: Re: Extending Codegen algorithm tests for heuristics
> To: Chamath Abeysinghe <abeysinghecham...@gmail.com>
>
>
> Hi Chamath,
>
> great thanks for your contribution - I left a couple of comments but we
> should be ready to merge this in soon. If you want to get a better feeling
> for the distributed spark backend as well, I created SYSTEMML-2169, which
> aims to extend our recently added nary cbind/rbind operations to leverage
> broadcasts when applicable.
>
> Regarding the proposal, most of the backends are rather independent, but
> each backend depends on the language integration. We will help out where
> necessary. So it depends on your interests and ideas. If you're more
> interested in defining the language APIs, make this and a simple backend
> the core of your proposal. If you're more interested in the runtime
> backends, I would help and add a basic language integration in time, which
> would allow you to immediate start working on the backends.
>
> Following the GSoC guidelines it's usually better to underscope the project
> than overscope it because you want to ensure that you're able to
> successfully complete the project in the ambitious timeframe and there will
> always be unforeseen obstacles. I would recommend to define a core project
> and potential extensions you will address if time allows. For example, the
> local, multi-threaded backend can indeed be realized relatively quickly.
> However, subsequently we can add and experiment with Hogwild! (i.e.,
> unsynchronized updates) which is known to work well for sparse models,
> replication and partitioning in NUMA settings, and potentially the
> automatic selection of update strategies.
>
> Regards,
> Matthias
>
>
> On Tue, Mar 6, 2018 at 10:24 AM, Chamath Abeysinghe <
> abeysinghecham...@gmail.com> wrote:
>
> > Hi,
> > I have sent a pull request for this issue.
> > As a next step, could you suggest any new issue? or anything I have to do
> > to familiarize with Language and run time for parameter servers project.
> >
> > And regarding writing the project proposal I have few questions.
> > * In the epic there are few sub tasks, is it enough to focus on a single
> > task through out the summer? Would it have enough work load or should I
> go
> > for multiple tasks?
> > * What is the linkage between sub tasks? Do tasks like, Distributed Spark
> > Back-end or Local multi threaded back ends; need previous tasks completed
> > before starting work?
> >
> > I am glad if you could suggests some issues related to Distributed spark
> > back-end or multi threaded backend tasks.
> >
> > Thanks.
> > Regards,
> > Chamath
> >
> >
> > On Fri, Mar 2, 2018 at 6:46 AM, Matthias Boehm <mboe...@gmail.com>
> wrote:
> >
> >> Hi Chamath,
> >>
> >> in general, you're absolutely right - you can enable -stats and
> >> programmatically probe the heavy hitter statistics for certain opcodes.
> >> However, uamin and uamax stand for "unary aggregate minimum" and "unary
> >> aggregation maximum" which correspond to min(X) and max(X) on script
> level.
> >> Instead all generated fused operators are prefixed with spoof or
> sp_spoof
> >> (for distributed spark operations). The related junit assertion should
> >> already be in the existing tests, I just mentioned it for completeness.
> >>
> >> Regards,
> >> Matthias
> >>
> >> On Thu, Mar 1, 2018 at 4:30 AM, Chamath Abeysinghe <
> >> abeysinghecham...