Re: [ComDev] High resolution project logos wanted!

2018-08-23 Thread Deron Eriksson
Hi,

I uploaded our black svg logo to
https://svn.apache.org/repos/asf/comdev/project-logos/originals/

BTW In case anyone is looking for additional images, Jeremy Anderson
submitted a collection of Apache SystemML logos and images last year to the
website project at
https://github.com/apache/systemml-website/tree/master/branding_assets .

Deron


On Thu, Aug 23, 2018 at 5:33 AM Matthias Boehm  wrote:

> Could someone with access to our logos please commit them into the
> mentioned repo? @Deron: I remember you gave me once the archive with
> all versions of our logos. Thanks.
>
> Regards,
> Matthias
>
>
> -- Forwarded message --
> From: Daniel Gruno 
> Date: Thu, Aug 23, 2018 at 1:52 PM
> Subject: [ComDev] High resolution project logos wanted!
> To: d...@community.apache.org
>
>
> Hi awesome PMCs!
>
> This is a request for your project logo(s) in the best possible format
> you have. I'm on a crazy mission to gather up all project logos in the
> foundation and put them somewhere central and browseable.
>
> To that end, I need your help to speed up things!
>
> If you have or know of a high resolution or scalable version of your
> project's logo (or logos!), if you could please do the following, that
> would be really helpful:
>
> - Either commit your logo to
> https://svn.apache.org/repos/asf/comdev/project-logos/originals/
> (everyone should be able to write here)
> - Or send it to me at humbed...@apache.org
>
>
> Some guidelines:
>
> - Logos should be either scalable (SVG, EPS, PDF etc) or high
> resolution images, as good as you can get 'em, preferably PNG or high
> quality JPEGs/TIFFs.
> - Please name the file $project.$ext, e.g. cassandra.svg, httpd.png or
> netbeans.eps (use the subdomain/ldap name of your project). If you
> have multiple logos or multiple versions of your one logo, you can
> append '-1', '-2' etc to the name, for example tomee-1.svg,
> tomee-2.svg.
> - Do not include 'incubator-' in your project name, we'll treat
> podlings and projects like equals for this exercise.
> - If you have sub-projects, treat them like regular projects. thus,
> rat should be rat.svg rather than creadur-rat.
> - IF someone else beat you to it, but you have a better version of the
> logo, please add it as a new file, using the same naming convention as
> above (-1, -2 etc)
>
> For those that submit high res bitmaps, I'll try my best to convert to
> scalable formats where possible.
>
> If all goes well and enough projects collaborate on this, we should be
> able to present a web site for finding and downloading project logos
> from across the foundation.
>
> With warm regards,
> Daniel on behalf of the Community Development Project.
>
> PS: reply-to is set to my personal email, in case of out-of-office
> replies that people wouldn't want public.
>


Re: [VOTE] Apache SystemML 1.2.0 (RC1)

2018-08-21 Thread Deron Eriksson
+1

Ran basic tests to verify bin artifacts run and that artifacts can be built
and run using source artifact.

License files in artifacts appeared to be correct to me.

Note that I believe the lite jar should be removed from
https://dist.apache.org/repos/dist/dev/systemml/1.2.0-rc1/

Deron



On Tue, Aug 21, 2018 at 11:51 AM Anthony Thomas 
wrote:

> +1
>
> I ran the Python test suite on Red Hat Linux under Spark 2.2.0 (Python
> 2.7.5) and encountered no errors.
>
> Regards,
> Anthony
>
> On Tue, Aug 21, 2018 at 7:49 AM Guobao Li  wrote:
>
> > +1
> >
> > As an initiator and user of paramserv func, I just launched several tests
> > on local pc with a script using paramserv func without mkl. And no bug is
> > observed.
> >
> > Regards,
> > Guobao
> >
> > On Sun, Aug 19, 2018 at 8:09 PM Matthias Boehm 
> wrote:
> >
> > > +1
> > >
> > > I ran the perftest suite multiple times up to 80GB with and without
> > > codegen. After fixing all the issues and regressions, the entire suite
> > > ran successfully against Spark 2.2 and 2.3 and all use cases showed
> > > equal or better performance compared to SystemML 1.1.
> > >
> > > Regards,
> > > Matthias
> > >
> > > On Fri, Aug 17, 2018 at 8:41 AM, Berthold Reinwald <
> reinw...@us.ibm.com>
> > > wrote:
> > > > Please vote on releasing the following candidate as Apache SystemML
> > > > version 1.2.0
> > > >
> > > > The vote is open for at least 72 hours and passes if a majority of at
> > > > least 3 +1 PMC votes are cast.
> > > >
> > > > [ ] +1 Release this package as Apache SystemML 1.2.0
> > > > [ ] -1 Do not release this package because ...
> > > >
> > > > To learn more about Apache SystemML, please see
> > > > http://systemml.apache.org/
> > > >
> > > >
> > > > The tag to be voted on is v1.2.0-rc1 (
> > > > a1a05e29f6ee78f3c33fea355f62c78ce21766ee):
> > > > https://github.com/apache/systemml/tree/v1.2.0-rc1
> > > >
> > > >
> > > > The release artifacts can be found at:
> > > > https://dist.apache.org/repos/dist/dev/systemml/1.2.0-rc1/
> > > >
> > > >
> > > > The maven release artifacts, including signatures, digests, etc. can
> be
> > > > found at:
> > > >
> > >
> >
> https://repository.apache.org/content/repositories/orgapachesystemml-1030/org/apache/systemml/systemml/1.2.0/
> > > >
> > > >
> > > >
> > > > ===
> > > > == Apache Release policy ==
> > > > ===
> > > > http://www.apache.org/legal/release-policy.html
> > > >
> > > >
> > > > ===
> > > > == How can I help test this release? ==
> > > > ===
> > > > If you are a SystemML user, you can help us test this release by
> taking
> > > an
> > > >
> > > >
> > > > existing Algorithm or workload and running on this release candidate,
> > > then
> > > >
> > > >
> > > > reporting any regressions.
> > > >
> > > > 
> > > > == What justifies a -1 vote for this release? ==
> > > > 
> > > > -1 votes should only occur for significant stop-ship bugs or legal
> > > related
> > > >
> > > >
> > > > issues (e.g. wrong license, missing header files, etc). Minor bugs or
> > > > regressions should not block this release.
> > > >
> > > >
> > > >
> > > > Regards,
> > > > Berthold Reinwald
> > > > IBM Almaden Research Center
> > > > office: (408) 927 2208; T/L: 457 2208
> > > > e-mail: reinw...@us.ibm.com
> > > >
> > >
> >
>


Re: GSoC Project Presentation Guobao (Parameter Server)

2018-08-09 Thread Deron Eriksson
Great work and presentation, Guobao! Congratulations!

Deron

On Thu, Aug 9, 2018 at 9:52 AM Matthias Boehm  wrote:

> https://hangouts.google.com/call/Wwq1uz89KHlqgkLCkginAAEE
>
> Regards,
> Matthias
>
> On Wed, Aug 8, 2018 at 1:46 PM, Matthias Boehm  wrote:
> > just as a reminder: tomorrow at 10am PST, Guobao will give us an
> > overview of his work on parameter servers this summer. I'll post the
> > hangout link here and at our ASF slack channel  10min before the
> > presentation.
> >
> > Regards,
> > Matthias
> >
> > On Thu, Jul 19, 2018 at 12:16 AM, Matthias Boehm 
> wrote:
> >> Hi all,
> >>
> >> please mark your calendars. Guobao will present the results of his
> >> GSoC project on local and distributed parameter servers in SystemML on
> >> Thu, August 9, 10am PST. Everyone interested is welcome to join. We'll
> >> use Google Hangouts for the presentation and demo. The details will be
> >> posted shortly before the presentation.
> >>
> >> Regards,
> >> Matthias
>


Re: Updates to systemml website

2018-06-28 Thread Deron Eriksson
Hi Luciano,

I published the latest version of the website with the update to the
download page.

Deron


On Thu, Jun 28, 2018 at 6:36 AM Luciano Resende 
wrote:

> Could someone please help to publish the latest version of the website. I
> made a minor change to avoid linking to unreleased code from the download
> page (link to git repository).
>
> Thanks
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: Copyright question regarding R4ML/SystemML

2017-10-03 Thread Deron Eriksson
Thank you Henry and Luciano.

Deron

On Tue, Oct 3, 2017 at 12:37 PM, Luciano Resende 
wrote:

> On Mon, Oct 2, 2017 at 11:05 PM, Henry Saputra 
> wrote:
>
> > HI Deron,
> >
> > Sorry I missed this email. Here is the link to ASF policy about code
> > contribution: http://apache.org/foundation/how-it-works/legal.html
> >
> > For big code contrib such as R4ML need to submit Software Grant to ASF
> via
> > http://apache.org/licenses/software-grant.txt
> >
> > Once approved then could send PR to merge the change.
> >
> > - Henry
> >
>
>
> Sorry, I missed this as well,
>
> +1 for Henry comments, basically a software grant is required.
>
>
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: [PROPOSAL] R4ML Integration with SystemML

2017-09-22 Thread Deron Eriksson
>
> >> So I was thinking is it absolutely must have to sync between api?
>
> Soft-yes, we should try our best to do so.
>

There are many benefits both for SystemML users and developers to having
the APIs be as consistent as possible. Based on user feedback, I know
Niketan and Glenn did a lot of work recently to make the Python MLContext
API much more consistent with the Java/Scala MLContext API. I think there
is an expectation from SystemML users that code that utilizes one API will
act in a similar fashion with as few modifications as possible if migrated
to a different language.

As an example of a benefit to SystemML developers, if an R MLContext API is
consistent with the Scala and Python APIs, an R tab can be added to
http://apache.github.io/systemml/spark-mlcontext-programming-guide.html and
most of the MLContext documentation can be reused across the different
languages. This greatly simplifies the creation and maintenance of
documentation, which is very important with a project as large as SystemML.
In addition, consistency across MLContext APIs in different languages
simplifies code maintenance since a developer familiar with the API
features of one language can probably work without too much difficulty on
one of the other language APIs in the project. This would not be the case
if the APIs were significantly divergent.

Deron


Re: [PROPOSAL] R4ML Integration with SystemML

2017-09-22 Thread Deron Eriksson
>
>
> > 1) Would they like to merge R4ML code into the main SystemML project
> ALOK: In R we have to follow a pattern dir structure. we might be able to
> create more R pacakges. There will be a sub dir in systemML called R or
> something
> in that subdir there will be subdir R4ML (one R pacakge) in future more R
> pacakge as subdir (more details later)
>
> > itself? (Currently we have no modules.)
> > 2) What would they like to merge?
> ALOK see 1)
> > 3) If so, how do they propose to do so?
> ALOK: will explain in future proposal email
>
> > 4) Who will do the majority of the work to add R4ML code to SystemML? Or
> > who would like to volunteer to do this?
> ALOK: I will do majority of work
> > 5) Who will maintain the contributed code? Or who would like to volunteer
> > to do this?
> ALOK: Alok and Brendan will maintain.
> > 6) Documentation is needed (fit in SystemML documentation framework).
> ALOK: as Brendan pointed out R docs are different and we will take care of
> it . it is self contained
>
> > 7) Testing is needed (fit into SystemML testing framework).
> ALOK: testing will usually by the maven system command exec where it just
> calls
> cd  ; ./bin/install_all
> > 8) How is this packaged?
> ALOK:subdir
>
>

I think offering to add R4ML to SystemML and to maintain the codebase is
great. I think that addresses a couple of the main issues (how to get the
code into the project and how to maintain it).

Deron


Re: [DISCUSS] R-Interface to SystemML

2017-09-22 Thread Deron Eriksson
Hi Brendan,

Thank you for the detailed description. At a high level that sounds
feasible. Also, offering to help maintain the R codebase is extremely
helpful. Please let us know if you have any questions so that we can assist
you and Alok in your efforts, since as I said I think an R interface to
SystemML makes a lot of sense.

Deron


On Thu, Sep 21, 2017 at 4:36 PM, Brendan Dwyer <brendan.dw...@ibm.com>
wrote:

> Sorry for not responding sooner. I had some issues with my email client.
>
>
>
> I will do my best to address as many as the points that have been raised
> as I can. Hopefully Alok will be able to jump in as well once he resolves
> his email issues.
>
>
>
> - I would be happy to help maintain R4ML in SystemML and I’m sure Alok
> would too.
>
> - R4ML does allow arbitrary DML script to be executed via the
> `sysml.execute()` function.
>
> - I think we would like to merge the entire R4ML github repository into
> SystemML. We could do this the same way SparkR was merged into Spark (
> https://github.com/apache/spark/tree/master/R)
>
> - Currently the code is not ready to be merged into SystemML because we
> are still on the old ML context. We have a PR in the works that will update
> to the newest ML context. Once that happens we won’t need to duplicate the
> DML scripts.
>
> - Documentation is generated automatically with the R package “roxygen”.
> We would need to discuss how to incorporate this into the SystemML
> documentation. Perhaps we could look to Spark/SparkR for ideas.
>
> - Test are done using  the R testthat package. I can work with Alan to get
> that integrated into the systemml Jenkins  server
>
> Sent from IBM Verse
>
>
>Matthias Boehm --- Re: [DISCUSS] R-Interface to SystemML ---
> From:"Matthias Boehm" <mboe...@googlemail.com>To:
> dev@systemml.apache.org, deron@apache.orgDate:Thu, Sep 21, 2017 4:13
> PMSubject:Re: [DISCUSS] R-Interface to SystemML
>
> I pretty much agree with Niketan and Deron. In general, it would be
> usefulto provide an R API as well. However, I'm a bit concerned for two
> reasons:* Looking over the github repo, apparently R4ML is not under
> activedevelopment/maintenance anymore (last commit Jul 20). So who would
> bewilling to maintain and extend it?* Providing wrappers for our algorithm
> scripts would be just a startbecause it hides our core value proposition of
> custom large-scale ML.Hence, we would also need an MLContext equivalent
> that allows to executearbitrary DML scripts or R functions. Is there
> already a tentative designof such an API and if not, who would like to take
> it over?Regards,MatthiasOn Thu, Sep 21, 2017 at 3:43 PM, Deron Eriksson <
> deroneriks...@gmail.com>wrote:> I agree with Niketan. An R interface
> definitely makes sense for SystemML.> DML itself is based on R, so it's
> surprising we have Java/Scala/Python> interfaces to SystemML but we don't
> have an R interface.>> Perhaps R4ML committers could supply a little more
> info? For instance:> 1) Would they like to merge R4ML code into the main
> SystemML project> itself? (Currently we have no modules.)> 2) What would
> they like to merge?> 3) If so, how do they propose to do so?> 4) Who will
> do the majority of the work to add R4ML code to SystemML? Or> who would
> like to volunteer to do this?> 5) Who will maintain the contributed code?
> Or who would like to volunteer> to do this?> 6) Documentation is needed
> (fit in SystemML documentation framework).> 7) Testing is needed (fit into
> SystemML testing framework).> 8) How is this packaged?>> From a technology
> standpoint, I think an R interface totally makes sense.> As for a minor
> criticism (which I apply to other parts of SystemML too), I> see script
> wrappers at https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_SparkTC_r4ml_tree_master_R4ML_R=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=
> oU0Hd6PZBeEjeAVlZmb0utBefJN1XdJBEF8eiZhdECk=kVcfoxaRYrbaD_
> gb_hA_jn4bjiKe_gcUpc6mF1xbEd4=_qGcWSDggH-K3E_
> wTPXBBjOClp2Jub4KtvBgfeW1kbU= .> This tightly binds the existing DML
> scripts to R, which means DML> input/output modifications could potentially
> require modifications to R> code.>> Deron>>>> On Thu, Sep 21, 2017 at 11:00
> AM, Niketan Pansare <npan...@us.ibm.com>> wrote:>> > Janardhan: I believe
> this is the R4ML repo: https://urldefense.proofpoint.
> com/v2/url?u=https-3A__github.com_SparkTC_=DwIBaQ=jf_
> iaSHvJObTbx-siA1ZOg=oU0Hd6PZBeEjeAVlZmb0utBefJN1Xd
> JBEF8eiZhdECk=kVcfoxaRYrbaD_gb_hA_jn4bjiKe_gcUpc6mF1xbEd4&
> s=vj-Ogu1F2fnli1JwDjU1-S-Pauc7SSmSNG0g5sjgwUk= > > r4ml . Arvind:
> please correct me if I am wrong.> >> > O

Re: [PROPOSAL] R4ML Integration with SystemML

2017-09-21 Thread Deron Eriksson
>
> * Looking over the github repo, apparently R4ML is not under active
> development/maintenance anymore (last commit Jul 20). So who would be
> willing to maintain and extend it?
>
> ALOK: We will doing development into it . there are open PR already.
>
>
No commits since Jul 20 does raise warning flags, as Matthias pointed out.
For some perspective, SystemML has 1013 commits in the last year (~2.78 per
day). No R4ML commits in 2 months is concerning for obvious reasons. It
implies no real work has been done on the project for months.




> * Providing wrappers for our algorithm scripts would be just a start
> because it hides our core value proposition of custom large-scale ML.
> Hence, we would also need an MLContext equivalent that allows to execute
> arbitrary DML scripts or R functions. Is there already a tentative design
> of such an API and if not, who would like to take it over?
>
> ALOK: Currently no out of box MLCtx.
>
>
I believe this also raises some warning flags. Looking over the code at
https://github.com/SparkTC/r4ml/blob/master/R4ML/R/sysml.bridge.R, it looks
like the code in the R4ML master branch utilizes an old API that does not
currently exist in SystemML. As Matthias pointed out, a key value
proposition of SystemML is customizable machine learning, which would
require an API that currently exists in the project.

That said, I believe an R API interface to SystemML is extremely valuable
and I think the whole SystemML community would benefit from the R API, and
I hope you will pursue the issue further. It looks like it has been in
development since June (https://github.com/SparkTC/r4ml/pull/50).

Deron


Re: [DISCUSS] R-Interface to SystemML

2017-09-21 Thread Deron Eriksson
I agree with Niketan. An R interface definitely makes sense for SystemML.
DML itself is based on R, so it's surprising we have Java/Scala/Python
interfaces to SystemML but we don't have an R interface.

Perhaps R4ML committers could supply a little more info? For instance:
1) Would they like to merge R4ML code into the main SystemML project
itself? (Currently we have no modules.)
2) What would they like to merge?
3) If so, how do they propose to do so?
4) Who will do the majority of the work to add R4ML code to SystemML? Or
who would like to volunteer to do this?
5) Who will maintain the contributed code? Or who would like to volunteer
to do this?
6) Documentation is needed (fit in SystemML documentation framework).
7) Testing is needed (fit into SystemML testing framework).
8) How is this packaged?

>From a technology standpoint, I think an R interface totally makes sense.
As for a minor criticism (which I apply to other parts of SystemML too), I
see script wrappers at https://github.com/SparkTC/r4ml/tree/master/R4ML/R.
This tightly binds the existing DML scripts to R, which means DML
input/output modifications could potentially require modifications to R
code.

Deron



On Thu, Sep 21, 2017 at 11:00 AM, Niketan Pansare 
wrote:

> Janardhan: I believe this is the R4ML repo: https://github.com/SparkTC/
> r4ml . Arvind: please correct me if I am wrong.
>
> Overall, having a R interface for SystemML is an awesome idea. Since I am
> not an R4ML expert, may be R4ML committers can comment on how they envision
> "two code streams to work together".
>
> Also, comparing the features of R4ML with that of our Python APIs will be
> useful as it might make a stronger case for R4ML.
>
> As an FYI, here are different ways Python users can use SystemML:
> - Using MLContext to invoke DML script (http://apache.github.io/
> systemml/beginners-guide-python#invoking-dmlpydml-scripts-using-mlcontext
> and http://apache.github.io/systemml/spark-mlcontext-
> programming-guide.html)
> - Python algorithms wrappers (http://apache.github.io/
> systemml/beginners-guide-python#invoke-systemmls-algorithms)
> - (not important for R4ML discussion): Python DSL (
> http://apache.github.io/systemml/beginners-guide-python#matrix-operations)
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> [image: Inactive hide details for Janardhan ---09/21/2017 04:44:02 AM---Hi
> Arvind, This is a great idea. One question: the R4ML generat]Janardhan
> ---09/21/2017 04:44:02 AM---Hi Arvind, This is a great idea. One question:
> the R4ML generates any plan like the SystemML with `D
>
> From: Janardhan 
> To: Arvind Surve , "dev@systemml.apache.org" <
> dev@systemml.apache.org>
> Date: 09/21/2017 04:44 AM
> Subject: Re: [DISCUSS] R-Interface to SystemML
> --
>
>
>
> Hi Arvind,
>
> This is a great idea. One question: the R4ML generates any plan like the
> SystemML with `DML` Or with providing some interface we leverage this
> feature ?. Community effort in the sense of collaborative algorithm
> implementation.(?)
>
> Is this the Spark-R repo ( https://urldefense.proofpoint.
> com/v2/url?u=https-3A__github.com_rstudio_sparklyr=DwIGaQ&
> c=jf_iaSHvJObTbx-siA1ZOg=HzVC6v79boGYQrpc383_Kao_
> 6a6SaOkZrfiSrYZVby0=uxG7P-4VuICwg6yatnAEX5JBdZ-PSwyvQzq5gwX1GL0=6VRs_
> J7zXj9jZEavEP8iNvVfISAjDJeM8wFL2sBnb0g=  ) ?
>
> Thanks,
> Janardhan
>
> Sent with [ProtonMail](https://urldefense.proofpoint.com/v2/
> url?u=https-3A__protonmail.com=DwIGaQ=jf_iaSHvJObTbx-
> siA1ZOg=HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0=
> uxG7P-4VuICwg6yatnAEX5JBdZ-PSwyvQzq5gwX1GL0=khkGV3oXz1W5m_
> ueQRuKWlNMVOXXCVhV_ytNCINjJWY= ) Secure Email.
>
> >  Original Message 
> > Subject: [DISCUSS] R-Interface to SystemML
> > Local Time: September 20, 2017 12:50 PM
> > UTC Time: September 20, 2017 4:50 PM
> > From: ac...@yahoo.com.INVALID
> > To: dev@systemml.apache.org 
> >
> > Hi,
> > R4ML is an open source project which provides a R interface to
> SystemML.Its a bridge between SystemML and Spark-R.
> > Lets discuss here if and how we can get two code streams work together
> to benefit development/community effort.
> >
> > Arvind Surve | Spark Technology Center  | https://urldefense.proofpoint.
> com/v2/url?u=http-3A__www.spark.tc_=DwIGaQ=jf_iaSHvJObTbx-siA1ZOg=
> HzVC6v79boGYQrpc383_Kao_6a6SaOkZrfiSrYZVby0=uxG7P-4VuICwg6yatnAEX5JBdZ-
> PSwyvQzq5gwX1GL0=Yj8qfo7sjGBxX547UMGfLHRZFUxCAjZDTHWe8B7jLxI=
>
>
>


Re: SystemML 0.15 Release Candidate build (Not 1.0 release)

2017-09-01 Thread Deron Eriksson
+1 Arvind, thank you for getting things moving with a 0.15 release. It's
very important with regards to the perceived health of the project. For my
next project status report to the Apache Board of Directors, I look forward
to reporting that we are about to publish a 0.15 release.

Deron


On Fri, Sep 1, 2017 at 1:35 AM, Arvind Surve 
wrote:

> Hi,
> As we could not complete some of the key features targeted for SystemML
> 1.0, we will go next with release 0.15 instead of release 1.0.
> If you have any critical fixes to be added for master repository (now
> targeted for release 0.15) please let me know by EOD 09/01/2017 PST.
> I will plan on starting SystemML 0.15 Release Candidate build by Tuesday
> 09/05/2017 morning.
> ThanksArvindArvind Surve | Spark Technology Center  |
> http://www.spark.tc/


Re: Gentle ping for help on all my PRs. Thanks.

2017-08-23 Thread Deron Eriksson
Hi Janardhan,

Thank you for working on #2 [SYSTEMML-1645].

It looks like there are around 30 algorithms in the scripts directory. To
verify that each of these algorithms works with MLContext in a reproducible
form through the creation of MLContext tests for each algorithm is an
enormous undertaking for one person and is very difficult to handle in a
single pull request.

I would recommend:
Rather than working on all algorithms at once, instead focus on algorithms
one-at-a-time.
If a JIRA to create an MLContext test for this algorithm does not exist
under [SYSTEMML-1645], create it.
Start by selecting shorter algorithm scripts (perhaps under 300 lines) such
as linear regression. GLM is very difficult (~1200 lines).
Create a test class with test cases for this algorithm (this can be similar
to MLContextUnivariateStatisticsTest).
This test class must compile locally.
Verify that the test cases run locally. You should be able to do this
locally in your IDE or also using maven.
Once the test cases work locally, commit your changes and create a pull
request for this single test class.
(Optionally, you could also label the pull request with [WIP] and ask for
feedback if you are stuck. The mailing list is probably an even better
place to ask for feedback.)
Assuming the test suite passes, members of the community may give some
additional feedback which can be incorporated into the pull request.
At this stage, the pull request for the single algorithm can be merged.

This can be repeated for each algorithm (or group of closely related
algorithms). This may mean the creation of 20 pull requests rather than 1
enormous pull request.

Additionally, note that we have certain JIRAs such as SYSTEMML-1646. This
JIRA says that LinearRegCG.dml and LinearRegDS.dml work with MLContext.
However, no MLContext test class exists for this JIRA, so this result can't
be automatically reproduced anywhere. It would be very beneficial to have a
MLContextLinRegTest (such as you created on PR 589). So you might want to
have SYSTEMML-1646 either reopened and assigned to you, or you could create
another JIRA issue for your MLContextLinRegTest class. Then, create a
branch for this MLContextLinRegTest and make your commits and do a pull
request for this single test class. Then once this pull request has been
accepted and merged, then select another algorithm and create a
corresponding test class.

So, my advice would be to close PR 589. You can use the identical work you
did there as the basis for other pull requests, but divide the work into 1
JIRA issue/1 branch/1 pull request for each algorithm (or closely related
algorithms like LinearRegCG and LinearRegDS). I recommend focusing on only
one algorithm at a time. I think MLContextLinRegTest is probably a good
place to start.

Thanks for all the hard work!

Deron




On Wed, Aug 23, 2017 at 9:30 AM, Janardhan Pulivarthi <
janardhan.pulivar...@gmail.com> wrote:

> Dear committers,
>
> I am feeling that my contributions are not in an ordered way. So, I am
> listing them here. And, also listed the help required from volunteers.
>
> 1. [SYSTEMML-1437] - factorization machines
>  [*in progress*]
>  - till now, I have implemented the `*fm.dml*` core module.
>  - *Help: *I am unclear as to how the `i/o` will be for the example
> implementations, such as regression. A sample script for this, might help
> me complete all the examples *regression*, *classification*, & *ranking*.
>
>
> 2. [SYSTEMML-1645] - Verify whether all scripts work with MLContext &
> automate  [*in progress*]
> - This PR tries to write the test scripts for all top level algorithms
> for the new MLContext.
> - I am working with *Jerome* on this. Once he verifies all the scripts,
> I will add the tests for them.
> - *Help: *Can any body help review this PR, and suggest what is missing
> in this PR. I am getting script execution failures.
>
> 3. [SYSTEMML-1444] - UDFs w/ single output in expressions
>  [*in progress*]
>- The objective is to make udf's callable from expressions. I've gone
> through all the Hop, Lop implementations, compiler, parser, api to have a
> clear picture.
>- I am still making my way through this.
>- *Help: *Hi Matthias, I tried implementing a lop
> *FunctionCallCPSingle.java
>  1fb71e441518b2859963b386b1869711>
>  *
>
> 4. [SYSTEMML-1216] - implement local svd( ) function
>  [*done*]
>- With previously implemented local svd(), I've added little
> improvements and tests.
>- *Help: *This is ready to be merged.  (I believe)
>
> 5. [SYSTEMML-1214] Implement scalable version of singular value
> decomposition  [*issue with
> the testing?*]
> - This PR depends on the above PR, this implements 

Re: Merging sequences of last-level statement blocks

2017-08-08 Thread Deron Eriksson
Could we add a "cut()" or "cutGraph()" function to DML for debugging that
would cut the graph in a similar fashion as "if(1==1){}" and
"while(false){}"? It might be a little more straightforward and explicit
for users.

Deron

On Sun, Aug 6, 2017 at 3:42 PM, Matthias Boehm 
wrote:

> Hi all,
>
> we see a lot of scripts where conditional statement blocks split DAGs of
> operations. After constant folding of if predicates, unnecessary branches
> are already removed (which is important for size propagation) but we don't
> merge sequences of statement blocks yet. Consider the following example:
>
> 
> if (intercept == 2) {
>
> }
> 
>
> If the script is invoked with intercept=0 or 1, the entire if block is
> removed and we end up with a sequence of block 1 and block 2. This cut
> unnecessarily hides optimization opportunities. I intend to add a rewrite
> that merges such sequences under certain conditions.
>
> Note that this renders the current debugging approach of explicit cuts via
> "if(1==1){}" ineffective because we will anyway merge the resulting blocks.
> You can use while(false) {} instead in the future.
>
> Regards,
> Matthias
>


Auto format DML?

2017-08-03 Thread Deron Eriksson
Hi,

I experimented with Atom's Beautify to see if I could automatically format
DML using their R format.

Since DML is quite similar to R, I found that making a few minor tweaks to
DML made it possible to run DML scripts through the R formatter.

DML vs R syntax differences (there may be more):
1) DML has input parameters preceded by $
2) DML has parfor
3) DML for/parfor can take unrecognized parameters such as "check=0"
4) DML functions can return multiple values
5) DML function signatures are different from R
6) certain # comments at the end of lines seem to produce
%InLiNe_IdEnTiFiEr% errors using Beautify

Does anyone happen to know if there's a way that DML could be automatically
formatted (using any tool) without needing to tweak the DML before/after
formatting due to the syntax differences?

BTW I took a look at this because currently our DML format is only defined
as having 2-space indentations (see
http://apache.github.io/systemml/contributing-to-systemml#dml-code-format),
and it would be nice if we had a standard format that could automatically
be applied to DML using a tool.

Deron


Re: Performance Test: Phase-2 Release

2017-08-02 Thread Deron Eriksson
Hi Krishna,

First off, congratulations on your great GSOC contributions to SystemML!

WRT Phase 3 (c), Mike has created an epic (
https://issues.apache.org/jira/browse/SYSTEMML-493) to address
functionalizing the algorithms. Feel free to choose a DML algorithm (or
algorithms) and add it to the epic. Gus is currently working on this for
Kmeans (it's listed as an issue in the epic).

Also, please note that Imran has some good recommendations in
https://issues.apache.org/jira/browse/SYSTEMML-1667. We may want to create
a new version of the scripts using the folder structure recommended by
Imran.

Thanks!
Deron


On Wed, Aug 2, 2017 at 7:27 AM, Krishna Kalyan 
wrote:

> Dear SystemML Community,
> Thank you so much for your support and guidance so far.
>
> Our Phase-2 release includes:
> - HDFS support
> - Support to upload pref-test results automatically to google docs +
> automatically summarise data from google docs between two versions.
> - Support for additional arguments in our perf test suit like debug, stats,
> config, master etc...
>
> I would really appreciate if the community could use / test our performance
> test suit and share their valuable feedback. (Please refer to the
> documentation below to get started testing our performance test suit. Also
> if something is unclear please let me know.)
>
> Documentation
> https://github.com/apache/systemml/blob/master/docs/
> python-performance-test.md
>
> About Google Docs:
> We need an api client key to upload our performance test results to google
> docs. (Currently I am using my personal email account + personal api client
> key). I was wondering if a separate gmail account should be created for
> SystemML community and access to google docs / api client key can be shared
> with committers?. (Results from my machine can be found in the url at the
> end of this mail)
>
> Phase 3:
> We have another 4 weeks before GSoC ends and some tasks I will be working
> on:
> a) Offline CSV support in addition to google docs
> b) Plots comparing algorithm performance across different releases
> c) I would also be interested in refactoring / improving existing DML
> scripts. (@Deron could you please let me know how I could begin
> contributing here?)
> d) I would also like to know about other features the community would like
> us to build / add in our performance test suit. (We will definitely do our
> best to work on it)
>
> Again, thank you so much for your time and effort.
>
> Cheers,
> Krishna
>
>
> Results obtained in my local system can be found below
> https://docs.google.com/spreadsheets/d/1sIAPKGFph_
> X64ZiKMNc16IS5XiDcM23evelzBF3uwgM/edit#gid=0
>
> My System Config
> 16 GB ram
> OSX
> 2.5 GHz, Intel i5
>
> [Phase1 PR]
> https://github.com/apache/systemml/pull/537
>
> [Phase2 PR]
> https://github.com/apache/systemml/pull/575
>


Re: Matrix non-range indexing should return a scalar

2017-07-28 Thread Deron Eriksson
Thank you Mike for bringing this up. To me, this definitely makes sense at
the user (DML) level.

For a Java-style pseudocode example, currently we require the user to do
the following:
  int[][] m = int[][]{1,2,3,4};
  int[][] n = m[0][0];
  int x = (int) n;

I feel the following would be more 'natural':
  int[][] m = int[][]{1,2,3,4};
  int x = m[0][0];

If a user asks for a specific cell (and not a range) in DML code, I think
the user clearly wants a value and not a matrix that the user needs to cast
via as.scalar.

Deron



On Fri, Jul 28, 2017 at 3:41 PM,  wrote:

> Currently, non-range matrix indexing, such as `X[1,2]`, returns a 1x1
> matrix in SystemML rather than a single scalar value.  This is inconsistent
> with mathematical semantics, and with array indexing semantics of any major
> language, thus leading to confusion for users.
>
> I would like to propose that non-range indexing at the language level,
> such as `X[1,2]`, should return a single scalar value, and range indexing
> of any kind at the language level, including the trivial example
> `X[1:1,2:2]`, should return a matrix.  This would lead to clear semantics
> that are consistent with mathematics and language array indexing, thus
> preventing user confusion.  Additionally, these are the semantics that the
> NumPy project uses.
>
> Interested to hear thoughts from the rest of the community!
>
> -Mike
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>


Re: Mirror download links

2017-06-24 Thread Deron Eriksson
The Apache mirror issue has been fixed.

Deron


On Fri, Jun 23, 2017 at 2:45 PM, Deron Eriksson <deroneriks...@gmail.com>
wrote:

> I have updated the links on the downloads page (
> http://systemml.apache.org/download) to link directly to dist.apache.org
> so that SystemML users can download SystemML Apache releases until the
> Apache mirrors issue (https://issues.apache.org/jira/browse/INFRA-14250)
> is resolved.
>
> Deron
>
>
> On Tue, Jun 20, 2017 at 1:01 PM, Deron Eriksson <deroneriks...@gmail.com>
> wrote:
>
>> Please note that the Apache mirror download links on the project website (
>> http://systemml.apache.org/download) currently do not work (this relates
>> to our move to an Apache TLP).
>>
>> The project artifacts are available at:
>> https://dist.apache.org/repos/dist/release/systemml/0.14.0-incubating/
>>
>> This is being addressed (see comments) by:
>> https://issues.apache.org/jira/browse/INFRA-14250
>>
>> Deron
>>
>> --
>> Deron Eriksson
>> Spark Technology Center
>> http://www.spark.tc/
>>
>>
>


Mirror download links

2017-06-20 Thread Deron Eriksson
Please note that the Apache mirror download links on the project website (
http://systemml.apache.org/download) currently do not work (this relates to
our move to an Apache TLP).

The project artifacts are available at:
https://dist.apache.org/repos/dist/release/systemml/0.14.0-incubating/

This is being addressed (see comments) by:
https://issues.apache.org/jira/browse/INFRA-14250

Deron

-- 
Deron Eriksson
Spark Technology Center
http://www.spark.tc/


Re: Rework inter-procedural analysis

2017-06-15 Thread Deron Eriksson
Documentation is an essential part of the software development process,
especially when working on a complex system in a collaborative environment
where we want to encourage community growth.



On Wed, Jun 14, 2017 at 10:11 PM, Nakul Jindal <naku...@gmail.com> wrote:

> Thank you Matthias for agreeing to do this!
>
> "Having a very verbose doc quickly gets outdated" is a problem many
> projects deal with. We can have the community comment on PRs that changes
> those parts, if the documentation does not reflect the submitted change.
> As a starting point, since you are most familiar with the component, very
> verbose documentation is VERY welcome :)
> Specially for a complicated component like this one. It would greatly help
> existing and new members. (Unless someone on the mailing list feels
> otherwise).
>
> -Nakul
>
>
>
>
> On Wed, Jun 14, 2017 at 9:04 PM, Matthias Boehm <mboe...@googlemail.com>
> wrote:
>
> > sure - I'll try to add some documentation of IPA, probably directly
> inlined
> > into the code. Unfortunately, a too verbose dev documentation quickly
> gets
> > outdated because nobody updates it - let's see if we find the sweet spot
> > that works for the project.
> >
> > Regards,
> > Matthias
> >
> >
> > On Wed, Jun 14, 2017 at 4:15 PM, <dusenberr...@gmail.com> wrote:
> >
> > > Agreed.  More documentation, especially within the optimizer portion of
> > > the engine, is quite useful.  Given that a large number of our bugs and
> > > performance issues stem from this area, it would be good for it to be
> > clean
> > > and well documented so that future bug searches/fixes can be completed
> > in a
> > > more expedient manner.
> > >
> > > --
> > >
> > > Mike Dusenberry
> > > GitHub: github.com/dusenberrymw
> > > LinkedIn: linkedin.com/in/mikedusenberry
> > >
> > > Sent from my iPhone.
> > >
> > >
> > > > On Jun 14, 2017, at 8:51 AM, Nakul Jindal <naku...@gmail.com> wrote:
> > > >
> > > > Hi Matthias,
> > > >
> > > > If its not too much trouble, could you please create a design
> document
> > > for
> > > > this change.
> > > > This will help the rest of the contributors work on this component as
> > > well.
> > > >
> > > > Thanks,
> > > > Nakul
> > > >
> > > >
> > > > On Wed, Jun 14, 2017 at 12:00 AM, Matthias Boehm <
> > mboe...@googlemail.com
> > > >
> > > > wrote:
> > > >
> > > >> just a quick heads up: in the next couple of days, I'll rework our
> > > existing
> > > >> inter-procedural analysis (IPA) in order to (1) create well-defined
> > IPA
> > > >> passes, (2) reuse functional call graphs across multiple rounds of
> > IPA,
> > > and
> > > >> (3) introduce new IPA passes such as fine-grained literal
> propagation
> > > and
> > > >> replacements as well as inlining of functions with control
> structures.
> > > This
> > > >> will help improve the performance and debugging of scripts with
> > complex
> > > >> function call patterns. However, since this is a rather disruptive
> > > change,
> > > >> we might experience temporarily some compiler issues - if that
> happens
> > > >> please file anything you encounter against SYSTEMML-1668.
> > > >>
> > > >> Regards,
> > > >> Matthias
> > > >>
> > >
> >
>



-- 
Deron Eriksson
Spark Technology Center
http://www.spark.tc/


Git/GitHub project migration steps for existing contributors

2017-06-08 Thread Deron Eriksson
Hi,

This morning Apache Infrastructure migrated us from "incubator-systemml" to
"systemml" on Git/GitHub (see
https://issues.apache.org/jira/browse/INFRA-14214).

If you are a SystemML contributor who has previously forked the project,
you probably want to do something such as the following:

1) Update git remotes
git remote -v
git remote set-url upstream https://github.com/apache/systemml.git
git remote set-url origin https://github.com/YOUR_GITHUB_ID/systemml.git
git remote -v

2) Update GitHub fork repository name
Go to fork settings (
https://github.com/YOUR_GITHUB_ID/incubator-systemml/settings) and rename
repository from 'incubator-systemml' to 'systemml'

3) Remove "(Incubating)" from title
At fork main page (https://github.com/YOUR_GITHUB_ID/systemml), click edit
button next to title and remove "(Incubating)"

Deron

-- 
Deron Eriksson
Spark Technology Center
http://www.spark.tc/


Re: List of SystemML research papers?

2017-06-01 Thread Deron Eriksson
Perfect! Thank you Berthold and Matthias.

Deron


On Thu, Jun 1, 2017 at 11:45 AM, Matthias Boehm <mboe...@googlemail.com>
wrote:

> the following list (under tab publications) can serve as a starting point:
> https://researcher.watson.ibm.com/researcher/view_group_pubs.php?grp=3174
>
> I would actually recommend to extend this from papers to papers, talks,
> and posters in order to have a single place, where we make all such
> resources available.
>
> Regards,
> Matthias
>
> On Thu, Jun 1, 2017 at 11:27 AM, Deron Eriksson <deroneriks...@gmail.com>
> wrote:
>
>> Currently the SystemML website does not have a research page, but we
>> should
>> probably add one, similar to Apache Spark's research page at
>> http://spark.apache.org/research.html.
>>
>> Could someone respond with a list of SystemML research papers that could
>> be
>> added to a research page?
>>
>> Thanks,
>> Deron
>>
>> --
>> Deron Eriksson
>> Spark Technology Center
>> http://www.spark.tc/
>>
>
>


-- 
Deron Eriksson
Spark Technology Center
http://www.spark.tc/


List of SystemML research papers?

2017-06-01 Thread Deron Eriksson
Currently the SystemML website does not have a research page, but we should
probably add one, similar to Apache Spark's research page at
http://spark.apache.org/research.html.

Could someone respond with a list of SystemML research papers that could be
added to a research page?

Thanks,
Deron

-- 
Deron Eriksson
Spark Technology Center
http://www.spark.tc/


SystemML migration to TLP

2017-05-23 Thread Deron Eriksson
FYI I have created a JIRA issue with Apache Infrastructure for the SystemML
TLP infrastructure migration (
https://issues.apache.org/jira/browse/INFRA-14212).

This is similar to CarbonData (
https://issues.apache.org/jira/browse/INFRA-13942), which also recently
graduated as a TLP.

Deron
-- 
Deron Eriksson
Spark Technology Center
http://www.spark.tc/