Re: Proposal to add 'accuracy test suite' before 1.0 release

2017-02-17 Thread dusenberrymw
There is also the possibility of writing the correctness tests completely in 
DML itself, thus allowing an ML researcher / data scientist to easily create 
the tests. For example, the SystemML-NN library has a full test suite written 
entirely in DML in the `nn/test/` directory (i.e. no Java tests) that tests 
mathematical correctness of gradients, as well as general correctness of 
various layers as needed.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Feb 17, 2017, at 5:46 PM, Deron Eriksson <deroneriks...@gmail.com> wrote:
> 
> +1 for creating tests for the main algorithm scripts. This would be a great
> addition to the project.
> 
> Note that the creation of tests (junit) typically requires some Java skills
> (and knowledge of ml algorithms) whereas a new algorithm script typically
> requires R/Python skills. Therefore, testing of algorithms probably
> requires some focused coordination between 'data scientists' and
> 'developers' to occur for this to happen smoothly for new algorithms.
> 
> Deron
> 
> 
>> On Fri, Feb 17, 2017 at 5:28 PM, <dusenberr...@gmail.com> wrote:
>> 
>> +1 for testing our actual (vs simplified test version) scripts against
>> some metric of choice.  This will allow us to (1) ensure that each script
>> does not have a showstopper bug (engine bug), and (2) that this script is
>> still producing a reasonable mathematical result (math bug).
>> 
>> -Mike
>> 
>> --
>> 
>> Mike Dusenberry
>> GitHub: github.com/dusenberrymw
>> LinkedIn: linkedin.com/in/mikedusenberry
>> 
>> Sent from my iPhone.
>> 
>> 
>>> On Feb 17, 2017, at 4:17 PM, Niketan Pansare <npan...@us.ibm.com> wrote:
>>> 
>>> For now, I have updated our python mllearn tests to compare the
>> prediction of our algorithm to that of scikit-learn:
>> https://github.com/apache/incubator-systemml/blob/
>> master/src/main/python/tests/test_mllearn_numpy.py#L81
>>> 
>>> The test now uses scikit-learn predictions as the baseline and computes
>> the scores (accuracy score for classifiers and r2 score for regressors). If
>> the score is greater than 95%, the test pass. Though using this approach,
>> we do not measure the generalization capability of our algorithm, we at
>> least ensure that our algorithm performs no worse than scikit-learn under
>> default setting. We can make the testing even more rigorous later. The next
>> step would be to enable these python tests through jenkins.
>>> 
>>> Thanks,
>>> 
>>> Niketan Pansare
>>> IBM Almaden Research Center
>>> E-mail: npansar At us.ibm.com
>>> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>>> 
>>> Matthias Boehm ---02/17/2017 11:54:02 AM---Yes, this has been discussed
>> a couple of times now, most recently in SYSTEMML-546. It takes quite s
>>> 
>>> From: Matthias Boehm <mboe...@googlemail.com>
>>> To: dev@systemml.incubator.apache.org
>>> Date: 02/17/2017 11:54 AM
>>> Subject: Re: Proposal to add 'accuracy test suite' before 1.0 release
>>> 
>>> 
>>> 
>>> 
>>> Yes, this has been discussed a couple of times now, most recently in
>>> SYSTEMML-546. It takes quite some effort though to create a
>>> sophisticated algorithm-level test suite as done for GLM. So by all
>>> means, please, go ahead and add these tests.
>>> 
>>> However, I would not impose any constraints on the contribution of new
>>> algorithms in that regard, or similarly on tests with simplified
>>> algorithms because it would raise the bar to high.
>>> 
>>> Regards,
>>> Matthias
>>> 
>>> 
>>>> On 2/17/2017 10:48 AM, Niketan Pansare wrote:
>>>> 
>>>> 
>>>> Hi all,
>>>> 
>>>> We currently test the correctness of individual runtime operators
>> using our
>>>> integration tests but not the "released" algorithms. To be fair, we do
>> test
>>>> a subset of "simplified" algorithms on synthetic datasets and compare
>> the
>>>> accuracy with R. Also, we are testing subset of released algorithms
>> using
>>>> our Python tests, but it's intended purpose is to only test the
>> integration
>>>> of the APIs:
>>>> Simplified algorithms:
>>>> https://github.com/apache/incubator-systemml/tree/
>> master/src/test/scripts/applications
>>>> Released algorithms:

Re: Proposal to add 'accuracy test suite' before 1.0 release

2017-02-17 Thread Deron Eriksson
+1 for creating tests for the main algorithm scripts. This would be a great
addition to the project.

Note that the creation of tests (junit) typically requires some Java skills
(and knowledge of ml algorithms) whereas a new algorithm script typically
requires R/Python skills. Therefore, testing of algorithms probably
requires some focused coordination between 'data scientists' and
'developers' to occur for this to happen smoothly for new algorithms.

Deron


On Fri, Feb 17, 2017 at 5:28 PM, <dusenberr...@gmail.com> wrote:

> +1 for testing our actual (vs simplified test version) scripts against
> some metric of choice.  This will allow us to (1) ensure that each script
> does not have a showstopper bug (engine bug), and (2) that this script is
> still producing a reasonable mathematical result (math bug).
>
> -Mike
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
> > On Feb 17, 2017, at 4:17 PM, Niketan Pansare <npan...@us.ibm.com> wrote:
> >
> > For now, I have updated our python mllearn tests to compare the
> prediction of our algorithm to that of scikit-learn:
> https://github.com/apache/incubator-systemml/blob/
> master/src/main/python/tests/test_mllearn_numpy.py#L81
> >
> > The test now uses scikit-learn predictions as the baseline and computes
> the scores (accuracy score for classifiers and r2 score for regressors). If
> the score is greater than 95%, the test pass. Though using this approach,
> we do not measure the generalization capability of our algorithm, we at
> least ensure that our algorithm performs no worse than scikit-learn under
> default setting. We can make the testing even more rigorous later. The next
> step would be to enable these python tests through jenkins.
> >
> > Thanks,
> >
> > Niketan Pansare
> > IBM Almaden Research Center
> > E-mail: npansar At us.ibm.com
> > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> >
> > Matthias Boehm ---02/17/2017 11:54:02 AM---Yes, this has been discussed
> a couple of times now, most recently in SYSTEMML-546. It takes quite s
> >
> > From: Matthias Boehm <mboe...@googlemail.com>
> > To: dev@systemml.incubator.apache.org
> > Date: 02/17/2017 11:54 AM
> > Subject: Re: Proposal to add 'accuracy test suite' before 1.0 release
> >
> >
> >
> >
> > Yes, this has been discussed a couple of times now, most recently in
> > SYSTEMML-546. It takes quite some effort though to create a
> > sophisticated algorithm-level test suite as done for GLM. So by all
> > means, please, go ahead and add these tests.
> >
> > However, I would not impose any constraints on the contribution of new
> > algorithms in that regard, or similarly on tests with simplified
> > algorithms because it would raise the bar to high.
> >
> > Regards,
> > Matthias
> >
> >
> > On 2/17/2017 10:48 AM, Niketan Pansare wrote:
> > >
> > >
> > > Hi all,
> > >
> > > We currently test the correctness of individual runtime operators
> using our
> > > integration tests but not the "released" algorithms. To be fair, we do
> test
> > > a subset of "simplified" algorithms on synthetic datasets and compare
> the
> > > accuracy with R. Also, we are testing subset of released algorithms
> using
> > > our Python tests, but it's intended purpose is to only test the
> integration
> > > of the APIs:
> > > Simplified algorithms:
> > > https://github.com/apache/incubator-systemml/tree/
> master/src/test/scripts/applications
> > > Released algorithms:
> > > https://github.com/apache/incubator-systemml/tree/
> master/scripts/algorithms
> > > Python tests:
> > > https://github.com/apache/incubator-systemml/tree/
> master/src/main/python/tests
> > >
> > > Though the released algorithm is tested when it is initially
> introduced,
> > > other artifacts (spark versions, API changes, engine improvements, etc)
> > > could cause them to return incorrect results over a period of time.
> > > Therefore, similar to our performance test suite (
> > > https://github.com/apache/incubator-systemml/tree/
> master/scripts/perftest),
> > > I propose we create another test suite ("accuracy test suite" for lack
> of a
> > > better term) that compares the accuracy (or some other metric) of our
> > > released algorithms on standard datasets. Making it a requirement to
> add
> > > tests to accuracy test suite when addi