Re: SYSTEMML-1451: Performance Regression Tests

2017-05-13 Thread Nakul Jindal
Hi Krishna,

I think the intention is to compare the performance to previous versions of
systemml.

We can add a csv or excel file that contains performance info from previous
versions to the repo and whena performance test is run, new numbers from
the performance run can be added into that csv/excel file and the graph
generated automatically. There are python apis to manipulate excel files,
but I have not looked at what they can and cannot do.

We could also keep the spreadsheet online (on Google docs for instance) and
have the tests update that when the user that runs the perf provides the
appropriate login credentials.

Thoughts?

-Nakul



On Fri, May 12, 2017 at 9:52 PM, Krishna Kalyan 
wrote:

> Hello All,
> I had a questions on SYSTEMML-1451
> (https://issues.apache.org/jira/browse/SYSTEMML-1451)
>
> I am not quite sure how we could achieve the 4th point  (Automatically
> compare this performance to previous version to check for performance
> regressions)
> I was wondering if the expectation is to download an older versions of
> SystemML jar (previous release) from maven repository and run performance
> tests?. Please your initial thoughts on how to go about this?.
>
> Regards,
> Krishna
>


New Google Summer of Code 2017 Student - Krishna Kalyan

2017-05-05 Thread Nakul Jindal
Hi All,

Let us all welcome Krishna Kalyan as a student of Google Summer of Code to
work on SystemML.
He will be working on automating the performance testing process of
SystemML.

His project proposal is attached and the JIRA tracking his project can be
found at https://issues.apache.org/jira/browse/SYSTEMML-1451

He has already been active with the community (
https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01209.html)
since January.

@Krishna - Even though I am officially the mentor, I encourage you to
address questions to various members of the community with issues you
encounter throughout the project. Dig through Pull Requests and discussions
to figure out who is familiar with which components.

(I can help a bit with my background - I have worked on the DML grammar and
ANTLR parser layer previously and am working on the GPU backend now. I also
ran the perf tests and am somewhat familiar with the work needed to
automate it.)

Welcome!

-Nakul


Re: [NOTICE] New Apache SystemML Committer and PPMC Member

2017-05-01 Thread Nakul Jindal
Welcome, Felix!

-Nakul

On Mon, May 1, 2017 at 4:41 PM, Deron Eriksson 
wrote:

> Congratulations and welcome, Felix!
>
> Deron
>
>
> On Mon, May 1, 2017 at 4:27 PM,  wrote:
>
> > Welcome, Felix!
> >
> > --
> >
> > Mike Dusenberry
> > GitHub: github.com/dusenberrymw
> > LinkedIn: linkedin.com/in/mikedusenberry
> >
> > Sent from my iPhone.
> >
> >
> > > On May 1, 2017, at 4:23 PM, Niketan Pansare 
> wrote:
> > >
> > > Congratulations Felix !!
> > >
> > >
> > > Luciano Resende ---05/01/2017 04:21:30 PM---Welcome Felix. On Mon, May
> > 1, 2017 at 4:18 PM, Arvind Surve 
> > >
> > > From: Luciano Resende 
> > > To: dev@systemml.incubator.apache.org, Arvind Surve 
> > > Date: 05/01/2017 04:21 PM
> > > Subject: Re: [NOTICE] New Apache SystemML Committer and PPMC Member
> > >
> > >
> > >
> > >
> > > Welcome Felix.
> > >
> > > On Mon, May 1, 2017 at 4:18 PM, Arvind Surve 
> > > wrote:
> > >
> > > > I would like to welcome Felix Schueler as a new
> > > > Committer and PPMC member of Apache SystemML.
> > > >
> > > > Thanks for all your work, and welcome !!!
> > > >
> > > >  Arvind Surve | Spark Technology Center  | http://www.spark.tc/
> > >
> > >
> > >
> > >
> > > --
> > > Luciano Resende
> > > http://twitter.com/lresende1975
> > > http://lresende.blogspot.com/
> > >
> > >
> > >
> >
>
>
>
> --
> Deron Eriksson
> Spark Technology Center
> http://www.spark.tc/
>


Re: [DISCUSS] Remove old MLContext API

2017-05-01 Thread Nakul Jindal
+1

Nakul

On Mon, May 1, 2017 at 5:37 PM,  wrote:

> +1
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
> > On May 1, 2017, at 5:13 PM, Niketan Pansare  wrote:
> >
> >
> >
> > Hi all,
> >
> > The old MLContext API (org.apache.sysml.api.MLContext,
> org.apache.sysml.api
> > .MLContextProxy, org.apache.sysml.api.MLMatrix, org.apache.sysml.api.
> > MLOutput and org.apache.sysml.api.MLBlock) has been deprecated for a
> while.
> > I would recommend removing it from our source code. Please email back if
> > you have concerns or objections.
> >
> > Thanks,
> >
> > Niketan Pansare
> > IBM Almaden Research Center
> > E-mail: npansar At us.ibm.com
> > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>


Re: [VOTE] Apache SystemML 0.14.0-incubating (RC4)

2017-04-30 Thread Nakul Jindal
+1

Ran the performance suite on an IBM internal cluster. Most tests seem to
perform reasonably close to the previous release.

-Nakul


On Fri, Apr 28, 2017 at 5:51 PM, Matthias Boehm 
wrote:

> this regression is certainly something to look into but this release
> contains a large number of fixes including many that addressed severe OOM
> issues, so it might in fact be just an issue of more conservative but now
> correct execution plans given the current capabilities of our compiler.
>
> Regards,
> Matthias
>
> On Fri, Apr 28, 2017 at 5:39 PM,  wrote:
>
> > +1  Grabbed the tar binary and the tar source and tested various local
> > scripts in Scala & Python 2 + 3, and those ran fine.  However, I did run
> > the MNIST LeNet demo on both our 0.13 release and this 0.14 candidate,
> and
> > I noticed a regression in 0.14.  For the same script run back to back,
> the
> > 0.14 candidate took longer, and looking into the stats, on 0.13 there
> were
> > 864 Spark instructions executed, while on this 0.14 there were 2513 Spark
> > instructions executed.   This also brought the `sp_mapmm` and `sp_sel+`
> > instructions into the top 10 heavy hitters.  This could be related to the
> > issue that I am seeing in SYSTEMML-1561.
> >
> > Regardless, I'm still fine with releasing this, since the deep learning
> > support is still experimental for 0.14.  For our upcoming 1.0 release,
> all
> > engine bugs and issues related to deep learning need to be fixed.  Most
> of
> > these bugs are generally applicable to all algorithms, so it is in the
> > benefit of the project to fix them.
> >
> > --
> >
> > Mike Dusenberry
> > GitHub: github.com/dusenberrymw
> > LinkedIn: linkedin.com/in/mikedusenberry
> >
> > Sent from my iPhone.
> >
> >
> > > On Apr 28, 2017, at 10:37 AM, Arvind Surve 
> > wrote:
> > >
> > > +1
> > > Completed following verifications   - License and Notice validations
>  -
> > Binary runtime validations- Source code compilation and runtime
> > validations   - Python scripts validations using Python 2 Arvind Surve |
> > Spark Technology Center  | http://www.spark.tc/
> > >
> > >  From: Glenn Weidner 
> > > To: dev@systemml.incubator.apache.org
> > > Sent: Monday, April 24, 2017 9:30 PM
> > > Subject: Re: [VOTE] Apache SystemML 0.14.0-incubating (RC4)
> > >
> > > +1
> > >
> > > Successfully ran Linear Regression, Logistic Regression, Naive Bayes,
> > SVM in
> > > Python notebooks with Spark 2.0.2 (in cloud environment) and Spark 2.1
> > (on local test cluster) after pip install of RC4 python artifact
> > > systemml-0.14.0-incubating-python.tgz. Also ran Linear Regression
> > Conjugate Gradient in Scala notebooks.
> > >
> > > Regards,
> > > Glenn
> > >
> > > Matthias Boehm ---04/24/2017 02:02:12 AM---+1 I ran large-scale
> > experiments on Spark 2.1 for L2SVM, GLM, MLogreg,
> > >
> > > From: Matthias Boehm 
> > > To: dev@systemml.incubator.apache.org
> > > Date: 04/24/2017 02:02 AM
> > > Subject: Re: [VOTE] Apache SystemML 0.14.0-incubating (RC4)
> > >
> > >
> > >
> > > +1
> > >
> > > I ran large-scale experiments on Spark 2.1 for L2SVM, GLM, MLogreg,
> > > LinregCG, LinregDS, and PCA over scaled versions of MNIST and ImageNet
> > (up
> > > to 1TB, with uncompressed and compressed linear algebra) without any
> > > issues.
> > >
> > > Compared to previous experiments with SystemML 0.11 and Spark 1.6, I've
> > > seen substantial performance improvements of >2x for iterative
> algorithms
> > > with RDD operations in the inner loop over out-of-core datasets.
> > >
> > > Regards,
> > > Matthias
> > >
> > > On Wed, Apr 19, 2017 at 4:17 PM, Arvind Surve  >
> > > wrote:
> > >
> > >> Please vote on releasing the following candidate as Apache SystemML
> > >> version 0.14.0-incubating !
> > >> The vote is open for at least 72 hours and passes if a majority of at
> > >> least 3 +1 PMC votes are cast.
> > >> [ ] +1 Release this package as Apache SystemML 0.14.0-incubating[ ] -1
> > Do
> > >> not release this package because ...
> > >> To learn more about Apache SystemML, please see
> http://systemml.apache.
> > >> org/
> > >> The tag to be voted on is v0.14.0-incubating-rc4 (
> > >> 8bdcf106ca9bd04c0f68924ad5827eb7d7d54952)
> > >> https://github.com/apache/incubator-systemml/commit/
> > >> 8bdcf106ca9bd04c0f68924ad5827eb7d7d54952
> > >>
> > >> The release artifacts can be found at :https://dist.apache.org/
> > >> repos/dist/dev/incubator/systemml/0.14.0-incubating-rc4/
> > >> The maven release artifacts, including signatures, digests, etc. can
> > >> be found at:https://repository.apache.org/content/repositories/
> > >> orgapachesystemml-1021/org/apache/systemml/systemml/0.14.
> 0-incubating/
> > >> === Apache Incubator release
> policy
> > >> ===Please find below the guide to
> > >> release management 

Re: Build passed/failed messages for pull requests

2017-04-28 Thread Nakul Jindal
I like option (2) as well.
It is difficult for a new contributor to know the URL for the Jenkins
server.

In so far as this may be considered spam, I would suggest that this can be
controlled using the notification settings on github and filters on your
email server/client.



On Fri, Apr 28, 2017 at 10:42 AM, Deron Eriksson 
wrote:

> Hi,
>
> When a pull request is created or another commit is pushed to that pull
> request, a build including running our test suite is performed (Jenkins at
> https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/).
> This is the same model that other projects such as Apache Spark use
> (Jenkins at
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/).
>
> A few days ago, automated build passed/failed pull request messages were
> introduced to our pull requests, following the same type of Spark model.
> A) SystemML example: https://github.com/apache/incubator-systemml/pull/442
> B) Spark example: https://github.com/apache/spark/pull/17765
>
> Personally I like these messages because for contributors that do pull
> requests, it automatically tells them the status of the build for their
> pull requests and gives them a direct link to the build/test results. An
> opposing viewpoint would be that these messages are somewhat like spam.
>
> So we should make a public decision on the mailing list what to do about
> these automated build status messages.
>
> Some options:
> (1) keep the automated messages exactly as they are
> (2) keep the automated messages, but consolidate the two messages into one
> (such as "Build successful" and "Refer to this link...").
> (3) get rid of the automated messages
>
> I like (2). Any other opinions or options?
>
> Thoughts?
>
> Deron
>
>
> --
> Deron Eriksson
> Spark Technology Center
> http://www.spark.tc/
>


Re: GSoC : Getting started contributions

2017-04-21 Thread Nakul Jindal
Hi Krishna,

What Arvind is describing is in essence a large part of you GSoC proposal.
You should work on this if and when your proposal gets approved. (we don't
know whether it has been approved and even if we did, we couldn't say).
In the meantime, I encourage you to play around with SystemML, go through
the JIRA site, (look at SYSTEMML-546 as Matthias suggested) and ask
questions on this mailing list that you may have.

Thanks,
Nakul


On Fri, Apr 21, 2017 at 11:00 AM, Arvind Surve 
wrote:

> Hi Krishna,
> There is immediate need for SystemML project to run performance testing
> and analyze results efficiently.Though this is small part of your overall
> GSoC project, its important to start with it and grow from there.In short
> term it will help SystemML project to expedite release cycles and in long
> run you will get head start on the project.
> What we need for short run to get performance testing for every release
> cycle or even beyond.   1. How to setup environment with configurable
> parameters quickly. (We have scripts may need some tweaking or some
> additional configuration)   2. Run performance scripts with configuration
> option of Data size (8GB, 80GB, 8000GB etc or ALL), Different set of
> algorithms regression, classification or all   3. Collect time required to
> run individual algorithm for a given size. and store it in CSV or any
> suitable format file for further processing.   4. Compare results obtained
> from step 3 to previous runs (previous release, RC etc)   5. Generate
> report indicating, Failures scenarios, Outliers (time taken was more than
> tolerance level (say x%), and Successful cases.Each to be separated
> out so that reading those reports will be easy.
>
>  Arvind Surve | Spark Technology Center  | http://www.spark.tc/
>
>   From: Matthias Boehm 
>  To: dev@systemml.incubator.apache.org
>  Sent: Saturday, April 15, 2017 3:27 PM
>  Subject: Re: GSoC : Getting started contributions
>
> A great issue to start with would be SYSTEMML-546, which aims to cleanup
> and extend our existing application tests. This would get you in touch with
> DML and PyDML algorithm scripts as well as the R scripts for comparisons.
>
> Regards,
> Matthias
>
> On Sat, Apr 15, 2017 at 2:58 PM, Krishna Kalyan 
> wrote:
>
> > Hello,
> > I quite recently applied for GSoC. (Proposal below)  [ Automate
> performance
> > testing and reporting]
> >
> > https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GFo8ALG
> > jLH2DrIfRsJksA/edit#
> >
> > As part of my effort to understand the codebase, I would like to work on
> > minor/medium issues. Could someone from the community please guide with
> > JIRAs I could work on during my spare time.
> > (I am comfortable with Python, R and bash).
> >
> > Regards,
> > Krishna
> >
>
>
>
>


Re: Error while building SystemML from source

2017-04-04 Thread Nakul Jindal
Try doing a 
mvn clean package

The "mvn clean install" is triggering the "verify" phase which we use for our 
integration tests. 

For the integration tests, you'll need to have "Rscript" installed along with 
the appropriate R packages. Also, the temporary directory the tests write to, 
need to have 777 permission. This is something that Hadoop requires. For the 
tests, there is a "SystemML-config.xml" file (that is not the one in conf/) 
that you'll need to modify to specify the temp/scratch space used by the tests. 
A simple solution I use to run the integration tests manually is to copy the 
entire systemml directory to my /tmp and run it from there. 

-Nakul


> On Apr 4, 2017, at 8:28 AM, Krishna Kalyan  wrote:
> 
> Hello,
> I wanted to build System-ML from source. However have errors while trying
> to do so.
> 
> Steps I followed:
> a) Clone System-ML repo (https://github.com/apache/incubator-systemml)
> b) mvn clean install
> 
> (Log below)
> https://gist.github.com/krishnakalyan3/b90cef0a76a77ce262889048340794ce
> 
> I guess am missing steps. Could someone please guide.
> 
> Regards,
> Krishna


Re: GSoc 2017

2017-04-03 Thread Nakul Jindal
Your project proposal looks great. Be sure to submit a final project proposal 
wherever it is you need to. 

Thanks,
Nakul

> On Apr 2, 2017, at 4:08 PM, Krishna Kalyan <krishnakaly...@gmail.com> wrote:
> 
> Hello All,
> I have updated the proposal. I hope this one is better. Please share your 
> feedback.
> 
> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GFo8ALGjLH2DrIfRsJksA/edit#
> 
> FYI : Student Application Deadline April 3 16:00 UTC. 
> 
> 
> Regards,
> Krishna
> 
>> On Sun, Apr 2, 2017 at 2:39 PM, Krishna Kalyan <krishnakaly...@gmail.com> 
>> wrote:
>> Hello Nakul,
>> My comments in Italics below.
>> 
>>> On Sat, Apr 1, 2017 at 11:27 PM, Nakul Jindal <naku...@gmail.com> wrote:
>>> Hi Krishna,
>>> 
>>> Here are some questions/remarks i have about parts of your proposal:
>>> 
>>> In the section titled Summary -
>>> 
>>> "The systematic evaluation of performance can be measured with performance 
>>> tests and micro-benchmarks"
>>> We currently do not have any micro benchmarks. Do you plan on adding any? 
>>> (It would be awesome, but remember to keep the number of tasks reasonable 
>>> given the time frame and your familiarity with the project)
>> - Removed micro bench marks from the proposal. 
>>> 
>>> Your summary section feels like its generally applicable for performance 
>>> testing on any project, which is good. However, when it comes to talking 
>>> about what you'd actually be doing, I see - " build a benchmark 
>>> infrastructure and conduct experiments, that compare different choices in 
>>> critical parts (sparsity thresholds, optimisation decisions, etc..)".
>> 
>> -  I agree and have made these changes.
>> 
>>> Going over each point:
>>> 
>>> 1. "build a benchmark infrastructure" - ok, i guess this subsumes pretty 
>>> much all the tasks involved 
>>> 2. "conduct experiments" - sure, although I think you mean testing your 
>>> benchmarking infrastructure, please correct me if this is not what you 
>>> meant 
>>> 3. "that compare different choices in critical parts"
>>>   a. "sparsity thresholds" - awesome. You'd need to figure out what 
>>> SystemML already does and what to add. 
>>>   b. "optimization decisions" - could you provide an example or two of what 
>>> exactly you mean by this. Do you mean to enable and/or disable certain 
>>> optimizations and run the perf suite and also automate the process? or 
>>> something else?
>>>   c. "etc" - more detail would be nice here. It would be nice to know what 
>>> exactly you are committing to.
>>> - will add more details in this section 
>>> 
>>> In the section titled Deliverables - 
>>> 
>>> You mention
>>> - "automation for all performance tests" - awesome! this is the primary task
>>> - "automatic scripts to test performance on a cloud provider" - this is 
>>> great
>>> - "web dashboard" - awesome! this is a nice-to-have
>>> 
>>> But before the "cloud provider" and "web dashboard" task, we'd like to 
>>> robustly check for errors and record performance numbers and generate 
>>> reports. (Tasks 2 - 6 on 
>>> https://issues.apache.org/jira/browse/SYSTEMML-1451). I see that you've 
>>> mentioned some of these tasks in you "Project milestones" section as 
>>> "Understand metrics to be captured like time, memory, errors". It'd be good 
>>> to put them here as well.
>> - Will add this information under Deliverables
>>> 
>>> Remember, you might also need to change the way SystemML reports errors and 
>>> performance numbers to complete your tasks. You, along with the currently 
>>> active members of SystemML might need to change the algorithms being tested 
>>> as well.
>> 
>> - Sure will keep this in mind and will account for this in proposal. 
>>> 
>>> In the section titled "Project Milestones" - 
>>> Your project timeline looks good, the initial set of things to before May 
>>> 30 and the fact that you've set aside the final week for buffer. You have 
>>> dug down into a week by week schedule, which is good. I have some 
>>> suggestion though:
>>> 
>>> You need to 
>>> T1. Understand what is happening now, try it out for yourself
>> 
>> - Yes, I am foll

Re: Dropping Java 6 and 7 support

2017-03-07 Thread Nakul Jindal
+1

-Nakul

On Tue, Mar 7, 2017 at 11:13 AM,  wrote:

> +1
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
> > On Mar 7, 2017, at 10:49 AM, Niketan Pansare  wrote:
> >
> > +1
> >
> > Thanks,
> >
> > Niketan Pansare
> > IBM Almaden Research Center
> > E-mail: npansar At us.ibm.com
> > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> >
> > Berthold Reinwald---03/06/2017 11:16:19 PM---+1 on removing java 6 and
> 7. Regards,
> >
> > From: Berthold Reinwald/Almaden/IBM@IBMUS
> > To: dev@systemml.incubator.apache.org
> > Date: 03/06/2017 11:16 PM
> > Subject: Re: Dropping Java 6 and 7 support
> >
> >
> >
> >
> > +1 on removing java 6 and 7.
> >
> > Regards,
> > Berthold Reinwald
> > IBM Almaden Research Center
> > office: (408) 927 2208; T/L: 457 2208
> > e-mail: reinw...@us.ibm.com
> >
> >
> >
> > From:   Matthias Boehm 
> > To: dev@systemml.incubator.apache.org
> > Date:   03/06/2017 10:58 PM
> > Subject:Dropping Java 6 and 7 support
> >
> >
> >
> > Hi all,
> >
> > I'd like to drop the support for Java 6 and 7 in our SystemML 1.0
> release.
> > Our build still refers to a java compliance level 6, which has not been
> > changed for more than 5 years now. Spark >= 1.5 anyway requires Java 7
> and
> > there has been some discussion on removing Java 7 as well because it
> > reached end of life in April 2015. Moving to Java 8 would allow us to
> > modernize the code base going forward and the 1.0 release would be the
> > perfect time for this change.
> >
> > Regards,
> > Matthias
> >
> >
> >
> >
> >
> >
> >
>


Re: [DISCUSS] SystemML Graduation

2017-03-03 Thread Nakul Jindal
+1

Thank you Luciano for starting this discussion and the guidance you've
provided on this project.
In addition to the aforementioned accomplishments of the project, the
roadmap (which has been on the mailing list) also directs us towards making
continued healthy progress.

Nakul Jindal


On Fri, Mar 3, 2017 at 5:00 PM, Glenn Weidner <gweid...@us.ibm.com> wrote:

> +1
>
> Thank you Luciano for starting the discussion and for all the guidance
> you've provided from the beginning of the project. I agree that the Apache
> SystemML community has grown and achieved many exciting things during
> incubation. For example, today we completed our fifth release of Apache
> SystemML after releasing previous version in February. Graduating to a
> top-level project will be another important accomplishment and help
> continue momentum with developers and users.
>
> Regards,
> Glenn
>
>
> [image: Inactive hide details for Deron Eriksson ---03/03/2017 12:02:35
> PM---+1 Thank you for starting this important discussion Lucian]Deron
> Eriksson ---03/03/2017 12:02:35 PM---+1 Thank you for starting this
> important discussion Luciano, and thank you for
>
> From: Deron Eriksson <deroneriks...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 03/03/2017 12:02 PM
> Subject: Re: [DISCUSS] SystemML Graduation
> --
>
>
>
> +1
>
> Thank you for starting this important discussion Luciano, and thank you for
> all the guidance that you have provided us regarding the Apache Incubator,
> the Apache Software Foundation, and open-source software development! I'd
> also like to thank Henry for all the great assistance and hard work since
> becoming an additional mentor for the project.
>
> I believe that we may indeed be ready to graduate to a top level project
> due both to our technical efforts and our community efforts. Since we
> became an incubator project, in terms of code we have consistently
> demonstrated a high level of excellent activity from a wide range of
> contributors. We have 1,065 commits since we became an incubator project
> and have closed 391 pull requests in that time. Additionally, over time, we
> have all learned many best practices and Apache guidelines, for example how
> to properly validate our source releases in terms of content and licenses.
> We have also learned the processes involved with topics such as JIRA,
> GitHub, Git, Subversion, and software releases, and how to interact with
> groups such as Apache infrastructure to effectively develop open-source
> software following the Apache way.
>
> I think everyone on the SystemML project has also worked hard to build an
> open community around the project. We have open discussions on technical
> matters, especially in the area of pull requests, and these discussions
> demonstrate a consistent ability to reach consensus while allowing
> respectful disagreement. I believe our mailing list could be used more
> frequently, since it offers a more centralized location for discussions
> (compared to pull request discussions), which could be an addition way to
> help the community. However, we do have important discussions on the
> mailing list, for example in regards to questions from users, and
> communication on the mailing list is positive and encouraging to community
> growth.
>
> Deron
>
>
> On Thu, Mar 2, 2017 at 5:14 PM, Luciano Resende <luckbr1...@gmail.com>
> wrote:
>
> > It has been an exciting 16 months so far, and the project has
> accomplished
> > 4 official Apache Releases and is currently requesting the IPMC to
> approve
> > the 5th release. We have voted 3 new committers and PPMC members and
> > welcomed a new Mentor. The community continues to evangelize the project
> at
> > universities, blog posts, public webcasts, and in multiple conferences
> > which culminate in the project being awarded 'Best Paper' at VLDB 2016.
> And
> > last, but not least, the Incubator has asked us to evaluate and possibly
> > start the graduation process [1].
> >
> > For now, I would like to get the community to take a quick look at the
> > 'Graduation Guide' [2] and use this thread to discuss your opinion about
> > SystemML graduation.
> >
> > In parallel, I will start working on updating the project page [3] with
> > milestones, and other details.
> >
> > [1] http://www.mail-archive.com/general@incubator.apache.org/
> msg58614.html
> > [2] http://incubator.apache.org/guides/graduation.html
> > [3] http://incubator.apache.org/projects/systemml.html
> >
> > --
> > Luciano Resende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
> >
>
>
>
> --
> Deron Eriksson
> Spark Technology Center
> http://www.spark.tc/
>
>
>
>


Re: [VOTE] Apache SystemML 0.13.0-incubating (RC2)

2017-02-23 Thread Nakul Jindal
+1

Basic sanity tests pass on Mac.

On Thu, Feb 23, 2017 at 1:14 PM, Deron Eriksson 
wrote:

> +1
>
> Performed the following validations for artifacts at
> https://dist.apache.org/repos/dist/dev/incubator/systemml/0.
> 13.0-incubating-rc2/
> :
>
> 1. -bin.tgz/-bin.zip contain disclaimer, license, notice
> 2. -bin.tgz/-bin.zip licenses reference all included dependencies with
> correct licenses
> 3. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar contains
> disclaimer, license, notice
> 3. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar contains antlr
> runtime and wink classes
> 4. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar license references
> antlr runtime and wink
> 5. -python.tgz contains disclaimer, license notice
> 6. -python.tgz license references antlr runtime and wink with correct
> licenses
> 7. -python.tgz systemml/systemml-java/systemml-0.13.0-incubating.jar
> contains disclaimer, license, notice
> 8. -python.tgz systemml/systemml-java/systemml-0.13.0-incubating.jar
> contains antlr runtime and wink classes
> 9. -python.tgz systemml/systemml-java/systemml-0.13.0-incubating.jar
> license references antlr runtime and wink
> 10. -src.tgz/-src.zip contain disclaimer, license, notice
> 11. -src.tgz/-src.zip licenses reference all included projects (jquery,
> etc) with correct licenses
> 12. -src.tgz/-src.zip contain no binaries (dll, exe, pdb, lib)
> 13. -src.tgz/-src.zip build project artifacts (mvn clean package -P
> distribution)
> 14. -src.tgz/-src.zip SystemML jar runs (hello world)
> 15. -src.tgz/-src.zip test suite runs (mvn verify)
> 16. -bin.tgz/-bin.zip runStandaloneSystemML.sh (hello world)
> 17. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar spark-submit
> 2.0.2
> (hello world)
> 18. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar spark-submit
> 2.1.0
> (hello world)
> 19. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar hadoop 2.7 (hello
> world)
> 20. -bin.tgz/-bin.zip runStandaloneSystemML.sh (univar stats, haberman
> data)
> 21. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar spark-submit
> 2.0.2
> (univar stats, generated data)
> 22. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar spark-submit
> 2.1.0
> (univar stats, generated data)
> 23. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar hadoop 2.7
> default
> exec mode (univar stats, generated data)
> 24. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar hadoop 2.7 hadoop
> exec mode (univar stats, generated data)
> 25. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar MLContext
> spark-shell 2.0.2 (univar stats, haberman data)
> 26. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar MLContext
> spark-shell 2.1.0 (univar stats, haberman data)
>
>
>
> On Wed, Feb 22, 2017 at 7:23 PM, Arvind Surve 
> wrote:
>
> > Please vote on releasing the following candidate as Apache SystemML
> > version 0.13.0-incubating !
> >
> > The vote is open for at least 72 hours and passes if a majority of at
> > least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache SystemML 0.13.0-incubating
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache SystemML, please see http://systemml.apache.
> > org/
> >
> > The tag to be voted on is v0.13.0-incubating-rc2 (
> > ff3e741694e507f64a6b52ee71638bddecabe7af)
> >
> > https://github.com/apache/incubator-systemml/commit/
> > ff3e741694e507f64a6b52ee71638bddecabe7af
> >
> > The release artifacts can be found at :
> > https://dist.apache.org/repos/dist/dev/incubator/systemml/0.
> > 13.0-incubating-rc2/
> >
> > The maven release artifacts, including signatures, digests, etc. can
> > be found at:
> >
> > https://repository.apache.org/content/repositories/
> > orgapachesystemml-1017/org/apache/systemml/systemml/0.13.0-incubating/
> >
> > =
> > == Apache Incubator release policy ==
> > =
> > Please find below the guide to release management during incubation:
> > http://incubator.apache.org/guides/releasemanagement.html
> >
> > ===
> > == How can I help test this release? ==
> > ===
> > If you are a SystemML user, you can help us test this release by taking
> > an existing Algorithm or workload and running on this release candidate,
> > then
> > reporting any regressions.
> >
> > 
> > == What justifies a -1 vote for this release? ==
> > 
> > -1 votes should only occur for significant stop-ship bugs or legal
> > related issues (e.g. wrong license, missing header files, etc). Minor
> bugs
> > or regressions should not block this release.
> >  -Arvind Arvind Surve | Spark Technology Center  | http://www.spark.tc/
>
>
>
>
> --
> Deron Eriksson
> Spark Technology Center
> http://www.spark.tc/
>


Re: [VOTE] Apache SystemML 0.11.0-incubating (RC5)

2016-11-02 Thread Nakul Jindal
+1

On Tue, Nov 1, 2016 at 5:08 PM, Luciano Resende 
wrote:

> Please vote on releasing the following candidate as Apache SystemML version
> 0.11.0-incubating !
>
> The vote is open for at least 72 hours and passes if a majority of at least
> 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache SystemML 0.11.0-incubating
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache SystemML, please see
> http://systemml.apache.org/
>
> The tag to be voted on is v0.11.0-incubating-rc4 (
> c2e1670c2745863195d4789f1f77ed01ec11af5e)
>
> https://github.com/apache/incubator-systemml/tree/
> c2e1670c2745863195d4789f1f77ed01ec11af5e
>
> The release artifacts can be found at :
>
> https://dist.apache.org/repos/dist/dev/incubator/systemml/0.
> 11.0-incubating-rc5/
>
> The maven release artifacts, including signatures, digests, etc. can be
> found at:
>
> https://repository.apache.org/content/repositories/orgapachesystemml-1011/
>
>
> =
> == Apache Incubator release policy ==
> =
> Please find below the guide to release management during incubation:
> http://incubator.apache.org/guides/releasemanagement.html
>
> ===
> == How can I help test this release? ==
> ===
> If you are a SystemML user, you can help us test this release by taking an
> existing Algorithm or workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> -1 votes should only occur for significant stop-ship bugs or legal related
> issues (e.g. wrong license, missing header files, etc). Minor bugs or
> regressions should not block this release.
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: [VOTE] Apache SystemML 0.11.0-incubating (RC4)

2016-10-28 Thread Nakul Jindal
+1

On Fri, Oct 28, 2016 at 12:28 PM, Glenn Weidner  wrote:

> Validated RC4 binaries on Windows environment:
> http://apache.github.io/incubator-systemml/release-
> process.html#all-binaries-execute
>
> --Glenn
>
> [image: Inactive hide details for Niketan Pansare---10/28/2016 11:49:14
> AM---All Python tests passed using Python 2.7 and Spark 1.6.1.]Niketan
> Pansare---10/28/2016 11:49:14 AM---All Python tests passed using Python 2.7
> and Spark 1.6.1. +1
>
> From: Niketan Pansare/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 10/28/2016 11:49 AM
> Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC4)
> --
>
>
>
> All Python tests passed using Python 2.7 and Spark 1.6.1.
>
> +1
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> *http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar*
> 
>
> Acs S ---10/27/2016 11:53:11 PM---+1 -Arvind From: Glenn Weidner <
> gweid...@us.ibm.com>
>
> From: Acs S 
> To: "dev@systemml.incubator.apache.org"  >
> Date: 10/27/2016 11:53 PM
> Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC4)
> --
>
>
>
> +1
> -Arvind  From: Glenn Weidner 
> To: dev@systemml.incubator.apache.org
> Sent: Thursday, October 27, 2016 11:07 PM
> Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC4)
>
> Completed performance test suite for:
>
> XS scenarios (80MB)
> S scenarios (800MB)
> M scenarios (8GB)
> L scenarios (80GB)
> XL subset (800GB)
>
> +1
>
> --Glenn
>
> Luciano Resende ---10/24/2016 04:11:22 PM---Please vote on releasing the
> following candidate as Apache SystemML version 0.11.0-incubating !
>
> From: Luciano Resende 
> To: dev@systemml.incubator.apache.org
> Date: 10/24/2016 04:11 PM
> Subject: [VOTE] Apache SystemML 0.11.0-incubating (RC4)
>
>
>
> Please vote on releasing the following candidate as Apache SystemML version
> 0.11.0-incubating !
>
> The vote is open for at least 72 hours and passes if a majority of at least
> 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache SystemML 0.11.0-incubating
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache SystemML, please see
> *http://systemml.apache.org/* 
>
> The tag to be voted on is v0.11.0-incubating-rc4
> (6937683b01a13458990e698b0cf04f4f6ccecde3)
>
> *https://github.com/apache/incu* 
> bator-systemml/tree/6937683b01a13458990e698b0cf04f4f6ccecde3
>
> The release artifacts can be found at :
>
> *https://dist.apache.org/repos/dist/dev/incubator/systemml/0*
> .
> 11.0-incubating-rc4/
>
> The maven release artifacts, including signatures, digests, etc. can be
> found at:
>
>
> *https://repository.apache.org/content/repositories/orgapachesystemml-1010/*
> 
>
>
> =
> == Apache Incubator release policy ==
> =
> Please find below the guide to release management during incubation:
> *http://incubator.apache.org/guides/releasemanagement.html*
> 
>
> ===
> == How can I help test this release? ==
> ===
> If you are a SystemML user, you can help us test this release by taking an
> existing Algorithm or workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> -1 votes should only occur for significant stop-ship bugs or legal related
> issues (e.g. wrong license, missing header files, etc). Minor bugs or
> regressions should not block this release.
>
> --
> Luciano Resende
> *http://twitter.com/lresende1975* 
> *http://lresende.blogspot.com/* 
>
>
>
>
>
>
>
>
>
>
>


Re: [VOTE] Apache SystemML 0.11.0-incubating (RC4)

2016-10-24 Thread Nakul Jindal
+1

On Mon, Oct 24, 2016 at 2:08 PM, Acs S  wrote:

> Please vote on releasing the following candidate as Apache SystemML version
> 0.11.0-incubating !
>
> The vote is open for at least 72 hours and passes if a majority of at least
> 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache SystemML 0.11.0-incubating
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache SystemML, please see http://systemml.apache.
> org/
>
> The tag to be voted on is v0.11.0-incubating-rc4
> (1baebfde400134b3af6d373c254ee084a6d28cc3)
>
> https://github.com/apache/incubator-systemml/tree/
> 1baebfde400134b3af6d373c254ee084a6d28cc3
>
> The release artifacts can be found at :
>
> https://dist.apache.org/repos/dist/dev/incubator/systemml/0.
> 11.0-incubating-rc4/
>
> The maven release artifacts, including signatures, digests, etc. can be
> found at:
>
> https://repository.apache.org/content/repositories/orgapachesystemml-1009/
>
>
> =
> == Apache Incubator release policy ==
> =
> Please find below the guide to release management during incubation:
> http://incubator.apache.org/guides/releasemanagement.html
>
> ===
> == How can I help test this release? ==
> ===
> If you are a SystemML user, you can help us test this release by taking an
> existing Algorithm or workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> -1 votes should only occur for significant stop-ship bugs or legal related
> issues (e.g. wrong license, missing header files, etc). Minor bugs or
> regressions should not block this release.
>
>


Re: Local versions of Linear Algebra Operators in DML

2016-10-24 Thread Nakul Jindal
Hi,

There is an initial implementation and PR. 
https://github.com/apache/incubator-systemml/pull/273

-Nakul


> On Oct 24, 2016, at 12:59 AM, Berthold Reinwald <reinw...@us.ibm.com> wrote:
> 
> Thanks, Imran. I think it is a good idea to start off with the DML-bodied 
> function implementation. This will hold until we can have a built in 
> implementation.
> 
> We prototyped an implementation of distributed Cholesky as a DML bodied 
> function as well. For performance optimization, as the matrix becomes 
> "small" enough, we switched over and exploit a single node implementation.
> 
> Adding a new svd() built in function that initially routes to a local 
> library is fine. I don't know whether Apache commons math has an 
> implementation that can be re-used. 
> 
> I object renaming the functions or changing the externals. Eventually 
> distributed instructions need to be added to these implementations, and 
> there are open jiras for it.
> 
> Regards,
> Berthold Reinwald
> IBM Almaden Research Center
> office: (408) 927 2208; T/L: 457 2208
> e-mail: reinw...@us.ibm.com
> 
> 
> 
> From:   Niketan Pansare/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date:   10/21/2016 01:14 PM
> Subject:Re: Local versions of Linear Algebra Operators in DML
> 
> 
> 
> I am also comfortable with option (2) ... "with a plan to implement its 
> distributed version"
> 
> Thanks,
> 
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> 
> Matthias Boehm ---10/21/2016 01:00:51 PM---thanks Nakul for reaching out 
> before starting work on this. Actually, the introduction of these CP-
> 
> From: Matthias Boehm <mboe...@googlemail.com>
> To: dev@systemml.incubator.apache.org
> Date: 10/21/2016 01:00 PM
> Subject: Re: Local versions of Linear Algebra Operators in DML
> 
> 
> 
> thanks Nakul for reaching out before starting work on this. Actually, 
> the introduction of these CP-only builtin functions was a big mistake 
> because (as you already mentioned) they mistakenly suggest that we 
> provide distributed operations for them too. The intend was to support 
> them in later versions with our own local and distributed 
> implementations. So far, this had low priority though because these 
> O(n^3) operations are seldom used over large data. However, a while 
> back, we lost potential users who were specifically interested in 
> distributed eigen - so there are still use cases.
> 
> Despite the good intentions behind the renaming, I would strongly argue 
> against it. First, it would unnecessarily lose compatibility with R 
> syntax. Second, it would defeat our clean abstraction by exposing 
> explicit local operations.
> 
> This leaves us with two options here: (1) you could use an external 
> (java-implemented) function, which gives you virtually the same runtime 
> behavior but a clear separation via an explicit registration, or (2) add 
> it to the list of CP-only operations (with a plan to implement its 
> distributed version) but name it 'svd' as in R.
> 
> 
> Regards,
> Matthias
> 
> 
>> On 10/21/2016 9:34 PM, Nakul Jindal wrote:
>> Hi,
>> 
>> Imran was planning on implementing a distributed SVD as a DML bodied
>> function.
>> The algorithm is described in the paper titled "A Distributed and
>> Incremental SVD Algorithm for Agglomerative Data Analysis on Large
>> Networks" available at https://arxiv.org/abs/1601.07010.
>> 
>> This algorithm requires the availability of a local SVD function, which 
> we
>> currently do not have in SystemML.
>> Seeing as how there are other linear algebra functions (eigen, lu, qr,
>> cholesky) in DML that reroute to Apache Common Math and only operate in
>> standalone/CP mode, would it be ok to add "svd" to this set?
>> 
>> Also, since these operations are local and not distributed and the
>> documentation doesn't make it clear that these operations wont operate 
> in
>> distributed mode, would it make sense to rename them to "local_eigen",
>> "local_qr", "local_cholesky", etc?
>> Obviously, this change would go into the version after 0.11.
>> 
>> I understand that the ideal solution to this problem is to have a
>> distributed version of the aforementioned linear algebra routines, but 
> for
>> the time being, would it be ok to go ahead do the rename, while also
>> introducing a "local_svd" ?
>> 
>> 
>> Niketan, Berthold, Matthias, Sasha - Any thoughts?
>> 
>> Thanks,
>> Nakul Jindal
>> 
> 
> 
> 
> 
> 
> 
> 


Re: [VOTE] SystemML New Logo Ideas

2016-10-21 Thread Nakul Jindal
+1 for #1 - more professional and minimalist

Second favorite is 3 or 4 - is more "fun" and conveys the idea of SystemML
being a "black box" and works well with the animation that Rose Peng worked
on.

-Nakul


On Fri, Oct 21, 2016 at 2:53 PM, Madison Myers 
wrote:

> +1 for SystemML action figures as well :D
>
> On Fri, Oct 21, 2016 at 2:53 PM, Madison Myers 
> wrote:
>
> > +1 for what Mike said! I'd say 1 is my favorite and 4 is my second
> > favorite. Would love to incorporate them all as suggested by Mike and
> > Jeremy.
> >
> > These are really great, btw. Awesome work!
> >
> > On Fri, Oct 21, 2016 at 2:33 PM, Jeremy Anderson <
> > jer...@objectadjective.com> wrote:
> >
> >> +1 Mike.
> >>
> >> We're working on t-shirt designs, sticker designs, maybe buttons. Action
> >> figures would be really cool!
> >>
> >> ...
> >>
> >> Jeremy Anderson
> >> https://twitter.com/ObjectAdjective
> >> http://www.linkedin.com/in/objectadjective
> >>
> >> On 21 October 2016 at 14:27,  wrote:
> >>
> >> > I like all of these options!  I'll give a +1 for #1 as the main logo,
> >> and
> >> > I also think it would be great to make use of the rest of the designs
> >> > throughout the website and other project materials.
> >> >
> >> > Thanks!!
> >> >
> >> > -Mike
> >> >
> >> > --
> >> >
> >> > Mike Dusenberry
> >> > GitHub: github.com/dusenberrymw
> >> > LinkedIn: linkedin.com/in/mikedusenberry
> >> >
> >> > Sent from my iPhone.
> >> >
> >> >
> >> > > On Oct 21, 2016, at 1:01 PM, Niketan Pansare 
> >> wrote:
> >> > >
> >> > > All the logos are awesome, thanks design team !!
> >> > >
> >> > > I vote for #4.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Niketan Pansare
> >> > > IBM Almaden Research Center
> >> > > E-mail: npansar At us.ibm.com
> >> > > http://researcher.watson.ibm.com/researcher/view.php?person=
> >> us-npansar
> >> > >
> >> > > Deron Eriksson ---10/21/2016 12:57:25 PM---Given the overwhelming
> >> > support for #1, I give my +1 to #1. Deron
> >> > >
> >> > > From: Deron Eriksson 
> >> > > To: dev@systemml.incubator.apache.org
> >> > > Date: 10/21/2016 12:57 PM
> >> > > Subject: Re: [VOTE] SystemML New Logo Ideas
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > Given the overwhelming support for #1, I give my +1 to #1.
> >> > >
> >> > > Deron
> >> > >
> >> > >
> >> > > On Fri, Oct 21, 2016 at 12:46 PM, Jason Azares <
> >> jason.aza...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > I vote for #1
> >> > > >
> >> > > > On Fri, Oct 21, 2016 at 12:27 PM, Matthias Boehm <
> >> > mboe...@googlemail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > ha, that's interesting - thanks for the pointer Deron, I wasn't
> >> > expecting
> >> > > > > this at all. Somehow my eyes always ignored this.
> >> > > > >
> >> > > > > Regards,
> >> > > > > Matthias
> >> > > > >
> >> > > > >
> >> > > > > On 10/21/2016 9:22 PM, Deron Eriksson wrote:
> >> > > > >
> >> > > > >> I think they all look fantastic. My untrained eye likes the
> >> > features of
> >> > > > 3
> >> > > > >> and 4 but I completely defer to the judgements of others here
> >> since
> >> > I
> >> > > > have
> >> > > > >> no training in design and the multitude of considerations
> >> involved
> >> > such
> >> > > > as
> >> > > > >> scalability.
> >> > > > >>
> >> > > > >> I believe the logo trademark is an official requirement of the
> >> ASF (
> >> > > > >> http://www.apache.org/foundation/marks/pmcs.html#graphics),
> >> > although I
> >> > > > >> don't know how strict this is.
> >> > > > >>
> >> > > > >> Deron
> >> > > > >>
> >> > > > >>
> >> > > > >> On Fri, Oct 21, 2016 at 12:15 PM, Matthias Boehm <
> >> > > > mboe...@googlemail.com>
> >> > > > >> wrote:
> >> > > > >>
> >> > > > >> Thanks for these proposals. For all the options, I'd prefer to
> >> > remove
> >> > > > the
> >> > > > >>> TM - it's just a little odd for an open source project with no
> >> > > > intentions
> >> > > > >>> to register a trademark. I know, the new Spark logo has it too
> >> but
> >> > it's
> >> > > > >>> probably a different context, especially since there are
> >> > discussions to
> >> > > > >>> add
> >> > > > >>> SPARC support in Spark 2.1 ;-)
> >> > > > >>>
> >> > > > >>> Regards,
> >> > > > >>> Matthias
> >> > > > >>>
> >> > > > >>>
> >> > > > >>> On 10/21/2016 8:47 PM, Dexter Lesaca wrote:
> >> > > > >>>
> >> > > > >>> +1 for 1
> >> > > > 
> >> > > >  On Fri, Oct 21, 2016 at 11:44 AM Jeremy Anderson <
> >> > > >  jer...@objectadjective.com>
> >> > > >  wrote:
> >> > > > 
> >> > > >  +1 on option 1 as well.
> >> > > > 
> >> > > > >
> >> > > > > For the 4 options, I think it's important that full logo
> with
> >> > name
> >> > > > and
> >> > > > > mark, scales well. I'm concerned detail will get lost with
> the
> >> > other
> >> > > > 3,
> >> > > > > at
> >> > > > > small sizes. I would love to use all of the 

Local versions of Linear Algebra Operators in DML

2016-10-21 Thread Nakul Jindal
Hi,

Imran was planning on implementing a distributed SVD as a DML bodied
function.
The algorithm is described in the paper titled "A Distributed and
Incremental SVD Algorithm for Agglomerative Data Analysis on Large
Networks" available at https://arxiv.org/abs/1601.07010.

This algorithm requires the availability of a local SVD function, which we
currently do not have in SystemML.
Seeing as how there are other linear algebra functions (eigen, lu, qr,
cholesky) in DML that reroute to Apache Common Math and only operate in
standalone/CP mode, would it be ok to add "svd" to this set?

Also, since these operations are local and not distributed and the
documentation doesn't make it clear that these operations wont operate in
distributed mode, would it make sense to rename them to "local_eigen",
"local_qr", "local_cholesky", etc?
Obviously, this change would go into the version after 0.11.

I understand that the ideal solution to this problem is to have a
distributed version of the aforementioned linear algebra routines, but for
the time being, would it be ok to go ahead do the rename, while also
introducing a "local_svd" ?


Niketan, Berthold, Matthias, Sasha - Any thoughts?

Thanks,
Nakul Jindal


Re: [VOTE] Apache SystemML 0.11.0-incubating (RC3)

2016-10-20 Thread Nakul Jindal
Basic sanity tests pasts on MacOS following the process here:
http://apache.github.io/incubator-systemml/release-process.html#all-binaries-execute

(The in-memory jar was removed by [SYSTEMML-741])

+1

Nakul Jindal


On Thu, Oct 20, 2016 at 12:18 PM, <dusenberr...@gmail.com> wrote:

> Okay I've been testing the release candidate on a large-scale problem, and
> I'm currently running into a "java.lang.NegativeArraySizeException" in
> the SparseBlockMCSR that I do not believe was present previously. I'm
> currently investigating, and will post again soon.
>
> On another note, I successfully ran all of the Python tests on both Python
> 2.7 and 3.5.
>
> -Mike
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
> > On Oct 19, 2016, at 2:46 PM, Glenn Weidner <gweid...@us.ibm.com> wrote:
> >
> > Yes - that is correct for test cases involving ID column for
> DataFrameVectorFrameConversionTest, DataFrameVectorScriptTest,
> MLContextTest. The four failures for MLContextFrameTest were slightly
> different and involve similar fix as done for FrameConverterTest under
> [SYSTEMML-568] where FrameRDDConverterUtils.csvToRowRDDused to
> incorporate schema information when converting to JavaRDD.
> >
> > Thanks,
> > Glenn
> >
> > Matthias Boehm ---10/19/2016 12:36:04 PM---Glenn, all these issues were
> only caused by wrong tests that used an invalid ID schema or populated
> >
> > From: Matthias Boehm <mboe...@googlemail.com>
> > To: dev@systemml.incubator.apache.org
> > Date: 10/19/2016 12:36 PM
> > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC3)
> >
> >
> >
> >
> > Glenn, all these issues were only caused by wrong tests that used an
> > invalid ID schema or populated this column incorrectly, right? If so,
> > then I think it's fine to release. However, if we touch it anyway, we
> > should globally change the ID schema from double to long, which is more
> > intuitive when created by hand.
> >
> > Regards,
> > Matthias
> >
> > On 10/19/2016 8:30 PM, Deron Eriksson wrote:
> > > OK, so I think it's my understanding that for the 'src' release for
> rc3,
> > > the pom is using Spark 1.4 and the test suite passes for Spark 1.4, so
> this
> > > issue being discussed regarding test cases on Spark 1.6 is not a
> blocker
> > > for this release since the 'src' release builds and all tests pass.
> > >
> > > If this is not correct, could someone please correct me?
> > >
> > > Deron
> > >
> > >
> > > On Wed, Oct 19, 2016 at 11:17 AM, Luciano Resende <
> luckbr1...@gmail.com>
> > > wrote:
> > >
> > >> if tests are consistently failing, then we should cancel the RC and
> either
> > >> fix the test or mark it as @ignored.
> > >>
> > >> Intermittent fails might be ok, but it's a community decision.
> > >>
> > >> On Wed, Oct 19, 2016 at 10:50 AM, Deron Eriksson <
> deroneriks...@gmail.com>
> > >> wrote:
> > >>
> > >>> I believe that for an Apache release, our test suite is supposed to
> pass
> > >>> (although I'm pretty sure random test fails can be ignored).
> > >>>
> > >>> See 2.1 of Release Check List here:
> > >>> http://incubator.apache.org/guides/releasemanagement.html#check-list
> > >>>
> > >>> "2.1 Build is successful including automated tests.
> > >>> The expanded source archive is expected to build and pass tests."
> > >>>
> > >>> Luciano, do you happen to know if some test failures are acceptable
> since
> > >>> our test suite is so enormous (6300+ tests)?
> > >>>
> > >>> Deron
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Oct 19, 2016 at 3:24 AM, Glenn Weidner <gweid...@us.ibm.com>
> > >>> wrote:
> > >>>
> > >>>> It's a nice-to-have but not a release blocker.
> > >>>>
> > >>>> Thanks,
> > >>>> Glenn
> > >>>>
> > >>>> [image: Inactive hide details for Niketan Pansare---10/18/2016
> 05:38:26
> > >>>> PM---Glenn: Would you prefer to have https://github.com/apache/]
> > >> Niketan
> > >>>> Pansare---10/18/2016 05:38:26 PM---Glenn: Would you prefer to have
> > >>>> https://g

Re: Building a community around SystemML

2016-09-28 Thread Nakul Jindal
Hi Dhiren,

Welcome to SystemML!

I would encourage you to go through some of these lectures which explain the 
internals of SystemML - 
https://www.youtube.com/watch?v=64bnyFR5em0=PL9U7gw7DOIGhdiKZkMAqNPIDywFMlzCaY
 

Also look through the JIRA site - 
https://issues.apache.org/jira/browse/SYSTEMML/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel
 

 and pick something that you find interesting.

I have worked on the language layer a bit and the GPU backed and can help you 
in those areas should you choose them.
You can also take a look at the commit history and get a sense of who is the 
most active in a particular component and ping them directly.

Hope this helps get you started.

-Nakul




> On Sep 28, 2016, at 1:50 PM, Dhiren Navani  wrote:
> 
> Hi everybody,
> 
> I am Dhiren Navani. I am a graduate student in Computer Science at Arizona
> State University. I am trying to develop java and apache spark skills. I am
> interested in learning more about SystemML and would like to contribute. It
> would be great if there is someone who can mentor me or guide me to
> resources, since I am comparatively new to the opensource world and
> industry in general.
> Also I can assist with some promotional activities as I am a part of lot of
> meetup groups here in Phoenix, Arizona.
> 
> On 28 September 2016 at 12:32, Luciano Resende  wrote:
> 
>> One of the remaining things that SystemML needs to do in order to graduate
>> is to build a better community around the project.
>> 
>> Some ideas are:
>> 
>> - Be more open with mailing lists discussions particularly with high level
>> designs that sometimes just get buried in PRs.
>> - Identify and participate on projects where more experienced community
>> members would mentor students or others interested in
>> participating/contributing to the project (e.g. GSoC)
>> - Identify top two main personas that would be interested in the project,
>> and bring up visibility on documentation based on these personas to make
>> their first experience with the project very smooth and without much
>> problems.
>> - Create simple JIRAs and flag them for initial contributors (e.g.
>> documentation, simple fix, etc)
>> 
>> Any other ideas ? And how do we execute this with some priority to get us
>> to graduate ?
>> 
>> --
>> Luciano Resende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>> 
> 
> 
> 
> -- 
> Thanks and Regards,
> Dhiren Amar Navani
> (+1) 480-434-0661
> 
> [image: https://www.linkedin.com/pub/dhiren-navani/41/b36/372]
> 



Re: Proof of Concept: Embedded Scala DSL

2016-09-28 Thread Nakul Jindal
As I understand it, the way it is now is the following:

{ PyDML, DML }——> ANTLR AST (org.apache.sysml.parser.dml, 
org.apache.sysml.parser.pydml) ——> Legacy AST (DMLProgram, Expression, 
ForStatement…) ——> HOPS ——> LOPS ——> Runtime

Niketan’s embedded Python DSL ——> PyDML
Felix’s embedded Scala DSL——> DML

@Niketan, when you say “IR should be at abstraction to allow Python/R DSL to be 
a thin layer”, do you mean something different than is already implemented?




> On Sep 28, 2016, at 12:37 PM, Niketan Pansare  wrote:
> 
> Hi Fred,
> 
> I would consider DMLProgram as an internal AST, which could be created by IR 
> (or IR could just create DML). According to me, IR should be at abstraction 
> to allow Python/R DSL to be a thin layer. This would maximize code reuse and 
> minimize bugs between DSLs. Something that Felix suggested (i.e. Matrix 
> class) would work best.
> 
> Thanks,
> 
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar 
> 
> 
> Frederick R Reiss---09/28/2016 12:02:01 PM---Maybe I'm missing a subtle point 
> here, but why not refactor the existing class org.apache.sysml.pars
> 
> From: Frederick R Reiss/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 09/28/2016 12:02 PM
> Subject: Re: Proof of Concept: Embedded Scala DSL
> 
> 
> 
> 
> Maybe I'm missing a subtle point here, but why not refactor the existing 
> class org.apache.sysml.parser.DMLProgram into our common internal 
> representation across DSLs? This class is already sufficiently expressive to 
> represent any DML or PyDML program.
> 
> Fred
> 
> Niketan Pansare---09/28/2016 11:20:11 AM---Thanks Felix for the response. +1
> 
> From: Niketan Pansare/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 09/28/2016 11:20 AM
> Subject: Re: Proof of Concept: Embedded Scala DSL
> 
> 
> 
> Thanks Felix for the response.
> 
> +1 
> >> For the future design I will probably make the Matrix and Vector classes 
> abstract which allows for different concrete implementations. We could 
> then have one that is backed directly by SystemML and works similar to 
> the Python DSL in that it just uses mock operators and builds the DML 
> string that is then executed using SystemML. That way the deep embedding 
> would reuse the shallow embedding and we could offer the user to either 
> use the lazy MatrixType on the Repl or write code inside the macro.
> 
> Also, I agree that we can postpone the IR and integration of different DSLs 
> until the work on parallelize is completed.
> 
> Thanks,
> 
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar 
> 
> 
> fschueler---09/28/2016 10:54:37 AM---Hi Niketan, thanks for your suggestions! 
> I thought about it a bit and here are my
> 
> From: fschue...@posteo.de
> To: dev@systemml.incubator.apache.org
> Date: 09/28/2016 10:54 AM
> Subject: Re: Proof of Concept: Embedded Scala DSL
> 
> 
> 
> Hi Niketan,
> 
> thanks for your suggestions! I thought about it a bit and here are my 
> ideas on it:
> 
> The IR you are describing is basically already my user facing API. I am 
> not sure how much sense it makes to have an IR that looks exactly like 
> the API but with control structures renamed. A common IR for all DSLs 
> definitely makes sense in general but I am not sure if it should be part 
> of one particular DSL. For maintainability it might be better to have 
> that IR somewhere on the SystemML side.
> 
> Apart from that and to what Matthias suggested, I thought about how to 
> make the DSL more suitable for using on the Repl and I think we can find 
> a good compromise. Currently my API is backed by breeze for rapid 
> prototyping where breeze just forces evaluation of every statement. For 
> the future design I will probably make the Matrix and Vector classes 
> abstract which allows for different concrete implementations. We could 
> then have one that is backed directly by SystemML and works similar to 
> the Python DSL in that it just uses mock operators and builds the DML 
> string that is then executed using SystemML. That way the deep embedding 
> would reuse the shallow embedding and we could offer the user to either 
> use the lazy MatrixType on the Repl or write code inside the macro.
> 
> I haven't started playing around with this idea but let me know what you 
> think of it. The lazy, shallow DSL would basically do what you would 
> want from a seperate IR, but i don't know if you want to call that from 
> the python DSL.
> 
> Felix
> 
> Am 24.09.2016 19:39 schrieb Niketan Pansare:
> > Hi Felix,
> > 
> > Thanks for the summary. The document is extremely useful. I
> > particularly like the 

Re: DML in Zeppelin

2016-04-12 Thread Nakul Jindal
Hi All,

Niketan, this feedback in much appreciated and I will continue to work on
this. In the meantime, some of the other (offline) feedback I got for this
included making DML variables accessible across DML cells. Towards that
end, I've made some improvements to the Zeppelin-DML integration. There is
also a convenient (albeit large ~2GB ) docker image to test this out with.

All the information is on the JIRA :
https://issues.apache.org/jira/browse/SYSTEMML-542
It has screenshots, docker instructions and steps to recreate the dev
environment to play with.

These are the features (thus far):

Launch a standalone DML cell which runs the DML interpreter locally (using
%dml)
- This has rudimentary features and will be developed if there is demand

Launch a DML cell which runs on Spark (using %spark.dml)
- Transfer data between Spark, PySpark, etc and DML Cells (as Dataframes)
  -- Read data in a Spark cell (as a DataFrame) and use it in a DML cell
  -- Write a DML matrix in a DML cell and read it as a DataFrame in a
Spark Cell
  -- This is done using ZeppelinContext (
https://zeppelin.incubator.apache.org/docs/latest/interpreter/spark.html)
- Transfer data between DML cells - scalar types (booleans, strings,
floats, integers) and matrices
  -- Any variable defined in a cell can be used (read from/written to)
in subsequent cells.
  -- This is very similar to how spark cells operate.


Any feedback is greatly appreciated.

Thanks,
Nakul Jindal



On Tue, Mar 8, 2016 at 10:30 AM, Niketan Pansare <npan...@us.ibm.com> wrote:

> Hi Nakul,
>
> This is good work !
>
> My 2 cents, we should add missing features (such as command-line
> arguments), document the API for this POC, come up with examples for
> existing algorithms with open-source datasets and put them in
> https://github.com/apache/incubator-systemml/tree/master/samples/zeppelin-notebooks
>
> This way, people are encouraged to try out (and may be even modify
> on-the-fly the) existing DML algorithms with specific datasets. Borrowing
> an example from
> http://scikit-learn.org/stable/tutorial/basic/tutorial.html:
> >>> from sklearn import datasets
> >>> iris = datasets.load_iris()
> >>> digits = datasets.load_digits()
> *>>> **from* *sklearn* *import* svm
> *>>> *clf = svm.SVC(gamma=0.001, C=100.)
> *>>> *clf.fit(digits.data[:-1], digits.target[:-1])
> *>>> *clf.predict(digits.data[-1:])
>
> We can then put a link to the given example in
> http://apache.github.io/incubator-systemml/algorithms-classification.html#support-vector-machines
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> [image: Inactive hide details for Nakul Jindal ---03/06/2016 07:22:10
> PM---Hi, I've put together a proof of concept for having DML be a]Nakul
> Jindal ---03/06/2016 07:22:10 PM---Hi, I've put together a proof of concept
> for having DML be a first class
>
> From: Nakul Jindal <naku...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 03/06/2016 07:22 PM
> Subject: DML in Zeppelin
> --
>
>
>
> Hi,
>
> I've put together a proof of concept for having DML be a first class
> citizen in Apache Zeppelin.
>
> Brief intro to Zeppelin -
> Zeppelin is a "notebook" interface to interact with Spark, Cassandra, Hive
> and other projects. It can be thought of as a REPL in a browser.
> Small units of code are put into "cell"s. These individual "cells" can then
> be run interactively. Of course there is support for queue-ing up and
> running cells in parallel.
> Cells are contained in notebooks. Notebooks can be exported and are
> persistent between sessions.
>
> One can type code in (Scala) Spark in cell 1 and save a data frame object.
> He can then type code in PySpark in cell 2 and access the previously saved
> data frame.
> This is done by the Zeppelin runtime system by injecting a special variable
> called "z" into the Spark and PySpark environments in Zeppelin. This "z" is
> an object of type ZeppelinContext and makes available a "get" and a "put"
> method.
> DML in Spark mode can now access this feature as well.
>
> In this POC, DML can operate in 2 modes - standalone and spark.
>
> Screenshots of it working:
> http://imgur.com/a/m7ASx
>
> GIF of the screenshots:
> http://i.imgur.com/NttMuKC.gifv
>
> Instructions:
> https://gist.github.com/anonymous/6ab8c569b2360232e252
>
> JIRA:
> https://issues.apache.org/jira/browse/SYSTEMML-542
>
>
> Nakul Jindal
>
>
>
>


DML in Zeppelin

2016-03-06 Thread Nakul Jindal
Hi,

I've put together a proof of concept for having DML be a first class
citizen in Apache Zeppelin.

Brief intro to Zeppelin -
Zeppelin is a "notebook" interface to interact with Spark, Cassandra, Hive
and other projects. It can be thought of as a REPL in a browser.
Small units of code are put into "cell"s. These individual "cells" can then
be run interactively. Of course there is support for queue-ing up and
running cells in parallel.
Cells are contained in notebooks. Notebooks can be exported and are
persistent between sessions.

One can type code in (Scala) Spark in cell 1 and save a data frame object.
He can then type code in PySpark in cell 2 and access the previously saved
data frame.
This is done by the Zeppelin runtime system by injecting a special variable
called "z" into the Spark and PySpark environments in Zeppelin. This "z" is
an object of type ZeppelinContext and makes available a "get" and a "put"
method.
DML in Spark mode can now access this feature as well.

In this POC, DML can operate in 2 modes - standalone and spark.

Screenshots of it working:
http://imgur.com/a/m7ASx

GIF of the screenshots:
http://i.imgur.com/NttMuKC.gifv

Instructions:
https://gist.github.com/anonymous/6ab8c569b2360232e252

JIRA:
https://issues.apache.org/jira/browse/SYSTEMML-542


Nakul Jindal