Re: Removal of workaround flags

2017-02-16 Thread dusenberrymw
axpooling_backward 134.260 sec 17200
> -- 9) +* 133.959 sec 34400
> -- 10) conv2d_backward_data 128.046 sec 8600
> -- 11) relu_maxpooling 106.499 sec 17362
> -- 12) relu_backward 104.062 sec 34400
> -- 13) uack+ 90.104 sec 34400
> -- 14) r' 70.932 sec 43000
> -- 15) * 16.203 sec 95178
> -- 16) rand 16.131 sec 8613
> -- 17) / 7.988 sec 86492
> -- 18) rangeReIndex 7.640 sec 17208
> -- 19) sp_csvrblk 2.220 sec 2
> -- 20) + 2.121 sec 96528
> -- 21) uark+ 2.079 sec 43241
> -- 22) rmvar 1.580 sec 1451571
> -- 23) rshape 1.533 sec 17200
> -- 24) write 1.322 sec 9
> -- 25) createvar 0.976 sec 587259
> -- 26) - 0.961 sec 86486
> -- 27) exp 0.659 sec 17281
> -- 28) uasqk+ 0.314 sec 320
> -- 29) *2 0.312 sec 2
> -- 30) log 0.200 sec 160
> 
> Thanks,
> 
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> 
> Matthias Boehm ---02/13/2017 04:29:12 PM---Well, I used exactly the 
> mnist_lenet scenario discussed in the JIRA, but what I've observed are evic
> 
> From: Matthias Boehm <mboe...@googlemail.com>
> To: dev@systemml.incubator.apache.org
> Date: 02/13/2017 04:29 PM
> Subject: Re: Removal of workaround flags
> 
> 
> 
> 
> Well, I used exactly the mnist_lenet scenario discussed in the JIRA, but
> what I've observed are eviction times <2.5% of total execution time, almost
> no sparse intermediates, and the script execution time being dominated by
> con2d_bias_add. Again, the discrepancy might very well stem from changes
> made since the JIRA was created.
> 
> In any case, I would rather address any existing performance issues than
> globally disabling evictions (which could easily lead to OOMs) or sparse
> matrix formats. Hence, I'd like to remove these workaround flags in order
> to prevent shortcuts that do not apply to all users.
> 
> Regards,
> Matthias
> 
> On Mon, Feb 13, 2017 at 9:19 AM, <dusenberr...@gmail.com> wrote:
> 
> > Thanks for bringing up the topic.  Our deep learning scripts (i.e.
> > algorithms with several intermediate transformations) have shown cache
> > release times to be a major bottleneck, thus leading to the creation of
> > SYSTEMML-1140.  Specifically, what did you use to attempt to reproduce 1140?
> >
> >
> > -Mike
> >
> > --
> >
> > Mike Dusenberry
> > GitHub: github.com/dusenberrymw
> > LinkedIn: linkedin.com/in/mikedusenberry
> >
> > Sent from my iPhone.
> >
> >
> > > On Feb 12, 2017, at 12:30 AM, Matthias Boehm <mboe...@googlemail.com>
> > wrote:
> > >
> > > SYSTEMML-1140
> >
> 
> 
> 


Re: Removal of workaround flags

2017-02-15 Thread Niketan Pansare
)  +   2.121 sec   96528
-- 21)  uark+   2.079 sec   43241
-- 22)  rmvar   1.580 sec   1451571
-- 23)  rshape  1.533 sec   17200
-- 24)  write   1.322 sec   9
-- 25)  createvar   0.976 sec   587259
-- 26)  -   0.961 sec   86486
-- 27)  exp 0.659 sec   17281
-- 28)  uasqk+  0.314 sec   320
-- 29)  *2  0.312 sec   2
-- 30)  log 0.200 sec   160

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:   Matthias Boehm <mboe...@googlemail.com>
To: dev@systemml.incubator.apache.org
Date:   02/13/2017 04:29 PM
Subject:        Re: Removal of workaround flags



Well, I used exactly the mnist_lenet scenario discussed in the JIRA, but
what I've observed are eviction times <2.5% of total execution time, almost
no sparse intermediates, and the script execution time being dominated by
con2d_bias_add. Again, the discrepancy might very well stem from changes
made since the JIRA was created.

In any case, I would rather address any existing performance issues than
globally disabling evictions (which could easily lead to OOMs) or sparse
matrix formats. Hence, I'd like to remove these workaround flags in order
to prevent shortcuts that do not apply to all users.

Regards,
Matthias

On Mon, Feb 13, 2017 at 9:19 AM, <dusenberr...@gmail.com> wrote:

> Thanks for bringing up the topic.  Our deep learning scripts (i.e.
> algorithms with several intermediate transformations) have shown cache
> release times to be a major bottleneck, thus leading to the creation of
> SYSTEMML-1140.  Specifically, what did you use to attempt to reproduce
1140?
>
>
> -Mike
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
> > On Feb 12, 2017, at 12:30 AM, Matthias Boehm <mboe...@googlemail.com>
> wrote:
> >
> > SYSTEMML-1140
>




Re: Removal of workaround flags

2017-02-13 Thread Matthias Boehm
Well, I used exactly the mnist_lenet scenario discussed in the JIRA, but
what I've observed are eviction times <2.5% of total execution time, almost
no sparse intermediates, and the script execution time being dominated by
con2d_bias_add. Again, the discrepancy might very well stem from changes
made since the JIRA was created.

In any case, I would rather address any existing performance issues than
globally disabling evictions (which could easily lead to OOMs) or sparse
matrix formats. Hence, I'd like to remove these workaround flags in order
to prevent shortcuts that do not apply to all users.

Regards,
Matthias

On Mon, Feb 13, 2017 at 9:19 AM,  wrote:

> Thanks for bringing up the topic.  Our deep learning scripts (i.e.
> algorithms with several intermediate transformations) have shown cache
> release times to be a major bottleneck, thus leading to the creation of
> SYSTEMML-1140.  Specifically, what did you use to attempt to reproduce 1140?
>
>
> -Mike
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
> > On Feb 12, 2017, at 12:30 AM, Matthias Boehm 
> wrote:
> >
> > SYSTEMML-1140
>


Re: Removal of workaround flags

2017-02-13 Thread dusenberrymw
Thanks for bringing up the topic.  Our deep learning scripts (i.e. algorithms 
with several intermediate transformations) have shown cache release times to be 
a major bottleneck, thus leading to the creation of SYSTEMML-1140.  
Specifically, what did you use to attempt to reproduce 1140?


-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Feb 12, 2017, at 12:30 AM, Matthias Boehm  wrote:
> 
> SYSTEMML-1140