Re: Extending Codegen algorithm tests for heuristics
Hi, Thanks for the tip. I solved the bug and opened a new PR for SYSTEMML-2169. Regards, Chamath On Wed, Mar 14, 2018 at 7:42 AM, Matthias Boehm <mboe...@gmail.com> wrote: > -- Forwarded message -- > From: Matthias Boehm <mboe...@gmail.com> > Date: Tue, Mar 13, 2018 at 1:00 PM > Subject: Re: Extending Codegen algorithm tests for heuristics > To: Chamath Abeysinghe <abeysinghecham...@gmail.com> > > > without debugging it's hard to tell, but usually something like this > happens if blocks are incorrectly aligned. So I would recommend to simply > do a mapToPair before the RDDAggregateUtils.mergeByKey and validate the > correctness of shifted block indexes. Maybe the newly introduced broadcasts > are not shifted into their target positions? For example, consider > cbind(A,B) - before aggregation, B needs to be shifted by ncol(A). > Furthermore, it would be great to avoid unnecessary aggregation if all but > one inputs are broadcasts. > > Regards, > Matthias > > On Tue, Mar 13, 2018 at 7:36 AM, Chamath Abeysinghe < > abeysinghecham...@gmail.com> wrote: > > > Hi Matthias, > > I am working on SYSTEMML-2169 issue. I have sent a partially completed PR > > ( https://github.com/apache/systemml/pull/747 ). After those changes, > > some test cases in NaryRBindTest, are failing and I could not understand > > the reason. > > Test cases are failing with following error > > *Caused by: org.apache.sysml.runtime.DMLRuntimeException: Mismatched > block > > sizes for: 280 101 1000 101* > > * at > > org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$ > MergeBlocksFunction.call(RDDAggregateUtils.java:622)* > > * at > > org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$ > MergeBlocksFunction.call(RDDAggregateUtils.java:596)* > > > > Even after debugging the whole process I could not find a reason for > this. > > If you can give any suggestion that would be really helpful. > > > > If you have any other comment regarding the PR I could modify code > > according to that. > > > > Thanks, > > Chamath > > > > > > On Wed, Mar 7, 2018 at 11:44 AM, Matthias Boehm <mboe...@gmail.com> > wrote: > > > >> -- Forwarded message -- > >> From: Matthias Boehm <mboe...@gmail.com> > >> Date: Tue, Mar 6, 2018 at 10:14 PM > >> Subject: Re: Extending Codegen algorithm tests for heuristics > >> To: Chamath Abeysinghe <abeysinghecham...@gmail.com> > >> > >> > >> Hi Chamath, > >> > >> great thanks for your contribution - I left a couple of comments but we > >> should be ready to merge this in soon. If you want to get a better > feeling > >> for the distributed spark backend as well, I created SYSTEMML-2169, > which > >> aims to extend our recently added nary cbind/rbind operations to > leverage > >> broadcasts when applicable. > >> > >> Regarding the proposal, most of the backends are rather independent, but > >> each backend depends on the language integration. We will help out where > >> necessary. So it depends on your interests and ideas. If you're more > >> interested in defining the language APIs, make this and a simple backend > >> the core of your proposal. If you're more interested in the runtime > >> backends, I would help and add a basic language integration in time, > which > >> would allow you to immediate start working on the backends. > >> > >> Following the GSoC guidelines it's usually better to underscope the > >> project > >> than overscope it because you want to ensure that you're able to > >> successfully complete the project in the ambitious timeframe and there > >> will > >> always be unforeseen obstacles. I would recommend to define a core > project > >> and potential extensions you will address if time allows. For example, > the > >> local, multi-threaded backend can indeed be realized relatively quickly. > >> However, subsequently we can add and experiment with Hogwild! (i.e., > >> unsynchronized updates) which is known to work well for sparse models, > >> replication and partitioning in NUMA settings, and potentially the > >> automatic selection of update strategies. > >> > >> Regards, > >> Matthias > >> > >> > >> On Tue, Mar 6, 2018 at 10:24 AM, Chamath Abeysinghe < > >> abeysinghecham...@gmail.com> wrote: > >> > >> > Hi
Fwd: Extending Codegen algorithm tests for heuristics
-- Forwarded message -- From: Matthias Boehm <mboe...@gmail.com> Date: Tue, Mar 13, 2018 at 1:00 PM Subject: Re: Extending Codegen algorithm tests for heuristics To: Chamath Abeysinghe <abeysinghecham...@gmail.com> without debugging it's hard to tell, but usually something like this happens if blocks are incorrectly aligned. So I would recommend to simply do a mapToPair before the RDDAggregateUtils.mergeByKey and validate the correctness of shifted block indexes. Maybe the newly introduced broadcasts are not shifted into their target positions? For example, consider cbind(A,B) - before aggregation, B needs to be shifted by ncol(A). Furthermore, it would be great to avoid unnecessary aggregation if all but one inputs are broadcasts. Regards, Matthias On Tue, Mar 13, 2018 at 7:36 AM, Chamath Abeysinghe < abeysinghecham...@gmail.com> wrote: > Hi Matthias, > I am working on SYSTEMML-2169 issue. I have sent a partially completed PR > ( https://github.com/apache/systemml/pull/747 ). After those changes, > some test cases in NaryRBindTest, are failing and I could not understand > the reason. > Test cases are failing with following error > *Caused by: org.apache.sysml.runtime.DMLRuntimeException: Mismatched block > sizes for: 280 101 1000 101* > * at > org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:622)* > * at > org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:596)* > > Even after debugging the whole process I could not find a reason for this. > If you can give any suggestion that would be really helpful. > > If you have any other comment regarding the PR I could modify code > according to that. > > Thanks, > Chamath > > > On Wed, Mar 7, 2018 at 11:44 AM, Matthias Boehm <mboe...@gmail.com> wrote: > >> -- Forwarded message -- >> From: Matthias Boehm <mboe...@gmail.com> >> Date: Tue, Mar 6, 2018 at 10:14 PM >> Subject: Re: Extending Codegen algorithm tests for heuristics >> To: Chamath Abeysinghe <abeysinghecham...@gmail.com> >> >> >> Hi Chamath, >> >> great thanks for your contribution - I left a couple of comments but we >> should be ready to merge this in soon. If you want to get a better feeling >> for the distributed spark backend as well, I created SYSTEMML-2169, which >> aims to extend our recently added nary cbind/rbind operations to leverage >> broadcasts when applicable. >> >> Regarding the proposal, most of the backends are rather independent, but >> each backend depends on the language integration. We will help out where >> necessary. So it depends on your interests and ideas. If you're more >> interested in defining the language APIs, make this and a simple backend >> the core of your proposal. If you're more interested in the runtime >> backends, I would help and add a basic language integration in time, which >> would allow you to immediate start working on the backends. >> >> Following the GSoC guidelines it's usually better to underscope the >> project >> than overscope it because you want to ensure that you're able to >> successfully complete the project in the ambitious timeframe and there >> will >> always be unforeseen obstacles. I would recommend to define a core project >> and potential extensions you will address if time allows. For example, the >> local, multi-threaded backend can indeed be realized relatively quickly. >> However, subsequently we can add and experiment with Hogwild! (i.e., >> unsynchronized updates) which is known to work well for sparse models, >> replication and partitioning in NUMA settings, and potentially the >> automatic selection of update strategies. >> >> Regards, >> Matthias >> >> >> On Tue, Mar 6, 2018 at 10:24 AM, Chamath Abeysinghe < >> abeysinghecham...@gmail.com> wrote: >> >> > Hi, >> > I have sent a pull request for this issue. >> > As a next step, could you suggest any new issue? or anything I have to >> do >> > to familiarize with Language and run time for parameter servers project. >> > >> > And regarding writing the project proposal I have few questions. >> > * In the epic there are few sub tasks, is it enough to focus on a single >> > task through out the summer? Would it have enough work load or should I >> go >> > for multiple tasks? >> > * What is the linkage between sub tasks? Do tasks like, Distributed >> Spark >> > Back-end or Local multi threaded b
Re: Extending Codegen algorithm tests for heuristics
Hi Matthias, I am working on SYSTEMML-2169 issue. I have sent a partially completed PR ( https://github.com/apache/systemml/pull/747 ). After those changes, some test cases in NaryRBindTest, are failing and I could not understand the reason. Test cases are failing with following error *Caused by: org.apache.sysml.runtime.DMLRuntimeException: Mismatched block sizes for: 280 101 1000 101* * at org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:622)* * at org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:596)* Even after debugging the whole process I could not find a reason for this. If you can give any suggestion that would be really helpful. If you have any other comment regarding the PR I could modify code according to that. Thanks, Chamath On Wed, Mar 7, 2018 at 11:44 AM, Matthias Boehm <mboe...@gmail.com> wrote: > -- Forwarded message -- > From: Matthias Boehm <mboe...@gmail.com> > Date: Tue, Mar 6, 2018 at 10:14 PM > Subject: Re: Extending Codegen algorithm tests for heuristics > To: Chamath Abeysinghe <abeysinghecham...@gmail.com> > > > Hi Chamath, > > great thanks for your contribution - I left a couple of comments but we > should be ready to merge this in soon. If you want to get a better feeling > for the distributed spark backend as well, I created SYSTEMML-2169, which > aims to extend our recently added nary cbind/rbind operations to leverage > broadcasts when applicable. > > Regarding the proposal, most of the backends are rather independent, but > each backend depends on the language integration. We will help out where > necessary. So it depends on your interests and ideas. If you're more > interested in defining the language APIs, make this and a simple backend > the core of your proposal. If you're more interested in the runtime > backends, I would help and add a basic language integration in time, which > would allow you to immediate start working on the backends. > > Following the GSoC guidelines it's usually better to underscope the project > than overscope it because you want to ensure that you're able to > successfully complete the project in the ambitious timeframe and there will > always be unforeseen obstacles. I would recommend to define a core project > and potential extensions you will address if time allows. For example, the > local, multi-threaded backend can indeed be realized relatively quickly. > However, subsequently we can add and experiment with Hogwild! (i.e., > unsynchronized updates) which is known to work well for sparse models, > replication and partitioning in NUMA settings, and potentially the > automatic selection of update strategies. > > Regards, > Matthias > > > On Tue, Mar 6, 2018 at 10:24 AM, Chamath Abeysinghe < > abeysinghecham...@gmail.com> wrote: > > > Hi, > > I have sent a pull request for this issue. > > As a next step, could you suggest any new issue? or anything I have to do > > to familiarize with Language and run time for parameter servers project. > > > > And regarding writing the project proposal I have few questions. > > * In the epic there are few sub tasks, is it enough to focus on a single > > task through out the summer? Would it have enough work load or should I > go > > for multiple tasks? > > * What is the linkage between sub tasks? Do tasks like, Distributed Spark > > Back-end or Local multi threaded back ends; need previous tasks completed > > before starting work? > > > > I am glad if you could suggests some issues related to Distributed spark > > back-end or multi threaded backend tasks. > > > > Thanks. > > Regards, > > Chamath > > > > > > On Fri, Mar 2, 2018 at 6:46 AM, Matthias Boehm <mboe...@gmail.com> > wrote: > > > >> Hi Chamath, > >> > >> in general, you're absolutely right - you can enable -stats and > >> programmatically probe the heavy hitter statistics for certain opcodes. > >> However, uamin and uamax stand for "unary aggregate minimum" and "unary > >> aggregation maximum" which correspond to min(X) and max(X) on script > level. > >> Instead all generated fused operators are prefixed with spoof or > sp_spoof > >> (for distributed spark operations). The related junit assertion should > >> already be in the existing tests, I just mentioned it for completeness. > >> > >> Regards, > >> Matthias > >> > >> On Thu, Mar 1, 2018 at 4:30 AM, Chamath Abeysinghe < > >> abeysinghecham...