[jira] [Commented] (SYSTEMML-1437) Implement and scale Factorization Machines using SystemML

2017-05-17 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014942#comment-16014942
 ] 

Nakul Jindal commented on SYSTEMML-1437:


Hi [~return_01], it turns out that this project was not accepted by GSoC 2017.
We encourage you to work on it anyhow. The community can help with any 
questions you may have and [~iyounus] can mentor you.

> Implement and scale Factorization Machines using SystemML
> -
>
> Key: SYSTEMML-1437
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1437
> Project: SystemML
>  Issue Type: Task
>Reporter: Imran Younus
>  Labels: factorization_machines, gsoc2017, machine_learning, 
> mentor, recommender_system
>
> Factorization Machines have gained popularity in recent years due to their 
> effectiveness in recommendation systems. FMs are general predictors which 
> allow to capture interactions between all features in a features matrix. The 
> feature matrices pertinent to the recommendation systems are highly sparse. 
> SystemML's highly efficient distributed sparse matrix operations can be 
> leveraged to implement FMs in a scalable fashion. Given the closed model 
> equation of FMs, the model parameters can be learned using gradient descent 
> methods.
> This project aims to implement FMs as described in the first paper:
> http://www.algo.uni-konstanz.de/members/rendle/pdf/Rendle2010FM.pdf
> We'll showcase the scalability of SystemML implementation of FMs by creating 
> an end-to-end recommendation system.
> Basic understanding of machine learning and optimization techniques is 
> required. Will need to collaborate with the team to resolve scaling and other 
> systems related issues.
> Rating: Medium
> Mentors:  [~iyounus], [~nakul02]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1609) Update dmlFromResource to not require initial slash

2017-05-17 Thread Deron Eriksson (JIRA)
Deron Eriksson created SYSTEMML-1609:


 Summary: Update dmlFromResource to not require initial slash
 Key: SYSTEMML-1609
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1609
 Project: SystemML
  Issue Type: Improvement
  Components: APIs
Reporter: Deron Eriksson
Assignee: Deron Eriksson
Priority: Minor


Currently ScriptFactory's dmlFromResource (and pydmlFromResource) resourcePath 
requires an initial slash, such as:
{code}
ScriptFactory.dmlFromResource("/scripts/datagen/genRandData4ALS.dml");
{code}

Update dmlFromResource to not require an initial slash so that it mirrors 
dmlFromFile's behavior:
{code}
ScriptFactory.dmlFromFile("scripts/datagen/genRandData4ALS.dml");
ScriptFactory.dmlFromResource("scripts/datagen/genRandData4ALS.dml");

// initial slash also still works for dmlFromResource
ScriptFactory.dmlFromResource("/scripts/datagen/genRandData4ALS.dml");
{code}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (SYSTEMML-137) Standard format needs to be applied to all Java files

2017-05-17 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson reassigned SYSTEMML-137:
---

Assignee: Deron Eriksson

> Standard format needs to be applied to all Java files
> -
>
> Key: SYSTEMML-137
> URL: https://issues.apache.org/jira/browse/SYSTEMML-137
> Project: SystemML
>  Issue Type: Task
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
>Priority: Critical
>
> A standard code format template (such as those used in Eclipse and IntelliJ) 
> setting things such as code line widths, comment line widths, and styles of 
> angle brackets (typical C style vs typical Java style) needs to be applied to 
> ALL java files. It's very important that all Java files follow the same 
> standard code formatting. This is especially critical for doing tasks such as 
> file version comparisons. If people apply different code formats, it 
> basically becomes impossible to do file version comparisons across the code 
> format change.
> Therefore, we need to pick a standard code format template and apply this 
> format to all Java files.
> Everyone working on SystemML should then utilize the same standard code 
> format. Pull requests that don't follow the standard code format should be 
> rejected since it makes file version comparisons difficult or impossible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-991) Website LICENSE should accurately reflect css and js

2017-05-17 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson resolved SYSTEMML-991.
-
   Resolution: Fixed
Fix Version/s: Not Applicable

Fixed by [PR49|https://github.com/apache/incubator-systemml-website/pull/49].

> Website LICENSE should accurately reflect css and js
> 
>
> Key: SYSTEMML-991
> URL: https://issues.apache.org/jira/browse/SYSTEMML-991
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Website
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
> Fix For: Not Applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-991) Website LICENSE should accurately reflect css and js

2017-05-17 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson closed SYSTEMML-991.
---

> Website LICENSE should accurately reflect css and js
> 
>
> Key: SYSTEMML-991
> URL: https://issues.apache.org/jira/browse/SYSTEMML-991
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Website
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
> Fix For: Not Applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (SYSTEMML-1437) Implement and scale Factorization Machines using SystemML

2017-05-17 Thread Janardhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janardhan updated SYSTEMML-1437:

Comment: was deleted

(was: Sir, my proposal title is "SYSTEMML-1437".)

> Implement and scale Factorization Machines using SystemML
> -
>
> Key: SYSTEMML-1437
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1437
> Project: SystemML
>  Issue Type: Task
>Reporter: Imran Younus
>  Labels: factorization_machines, gsoc2017, machine_learning, 
> mentor, recommender_system
>
> Factorization Machines have gained popularity in recent years due to their 
> effectiveness in recommendation systems. FMs are general predictors which 
> allow to capture interactions between all features in a features matrix. The 
> feature matrices pertinent to the recommendation systems are highly sparse. 
> SystemML's highly efficient distributed sparse matrix operations can be 
> leveraged to implement FMs in a scalable fashion. Given the closed model 
> equation of FMs, the model parameters can be learned using gradient descent 
> methods.
> This project aims to implement FMs as described in the first paper:
> http://www.algo.uni-konstanz.de/members/rendle/pdf/Rendle2010FM.pdf
> We'll showcase the scalability of SystemML implementation of FMs by creating 
> an end-to-end recommendation system.
> Basic understanding of machine learning and optimization techniques is 
> required. Will need to collaborate with the team to resolve scaling and other 
> systems related issues.
> Rating: Medium
> Mentors:  [~iyounus], [~nakul02]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1608) Add ALS notebook example

2017-05-17 Thread Imran Younus (JIRA)
Imran Younus created SYSTEMML-1608:
--

 Summary: Add ALS notebook example
 Key: SYSTEMML-1608
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1608
 Project: SystemML
  Issue Type: Bug
Reporter: Imran Younus






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (SYSTEMML-991) Website LICENSE should accurately reflect css and js

2017-05-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson reassigned SYSTEMML-991:
---

Assignee: Deron Eriksson

> Website LICENSE should accurately reflect css and js
> 
>
> Key: SYSTEMML-991
> URL: https://issues.apache.org/jira/browse/SYSTEMML-991
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Website
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1108) Convert roadmap to md file using page-md layout

2017-05-16 Thread Deron Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16012861#comment-16012861
 ] 

Deron Eriksson commented on SYSTEMML-1108:
--

Could we resolve and close this as 'won't fix'? All the css styles are 
externalized so I believe working with the current roadmap file is actually 
quite straightforward.

cc [~gweidner] [~luciano resende] [~jeremyanderson]



> Convert roadmap to md file using page-md layout
> ---
>
> Key: SYSTEMML-1108
>     URL: https://issues.apache.org/jira/browse/SYSTEMML-1108
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Website
>Reporter: Luciano Resende
>
> There is no need to have fancy html for a bullet list of roadmap items, also, 
> having it in html makes it more difficult to community members to update it 
> as it requires html and css skills.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1068) Add Code Highlighting

2017-05-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson resolved SYSTEMML-1068.
--
   Resolution: Fixed
Fix Version/s: SystemML 1.0

Fixed by [PR43|https://github.com/apache/incubator-systemml-website/pull/43].

> Add Code Highlighting
> -
>
> Key: SYSTEMML-1068
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1068
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Website
>Reporter: Mike Dusenberry
>Assignee: Dexter Lesaca
> Fix For: SystemML 1.0
>
>
> For our tutorials, it would be nice to have code syntax highlighting to make 
> it easier to understand the code snippets.  Jekyll supports this feature 
> \[1], as do a number of other libraries.  At a minimum, we should have R, 
> Python, and Scala syntax highlighting.
> \[1]: https://jekyllrb.com/docs/posts/#highlighting-code-snippets



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (SYSTEMML-1068) Add Code Highlighting

2017-05-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson reassigned SYSTEMML-1068:


Assignee: Dexter Lesaca

> Add Code Highlighting
> -
>
> Key: SYSTEMML-1068
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1068
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Website
>Reporter: Mike Dusenberry
>Assignee: Dexter Lesaca
> Fix For: SystemML 1.0
>
>
> For our tutorials, it would be nice to have code syntax highlighting to make 
> it easier to understand the code snippets.  Jekyll supports this feature 
> \[1], as do a number of other libraries.  At a minimum, we should have R, 
> Python, and Scala syntax highlighting.
> \[1]: https://jekyllrb.com/docs/posts/#highlighting-code-snippets



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1068) Add Code Highlighting

2017-05-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson closed SYSTEMML-1068.


> Add Code Highlighting
> -
>
> Key: SYSTEMML-1068
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1068
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Website
>Reporter: Mike Dusenberry
>Assignee: Dexter Lesaca
> Fix For: SystemML 1.0
>
>
> For our tutorials, it would be nice to have code syntax highlighting to make 
> it easier to understand the code snippets.  Jekyll supports this feature 
> \[1], as do a number of other libraries.  At a minimum, we should have R, 
> Python, and Scala syntax highlighting.
> \[1]: https://jekyllrb.com/docs/posts/#highlighting-code-snippets



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1083) Fix spelling of "Demonstration" in VLDB award

2017-05-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson closed SYSTEMML-1083.


> Fix spelling of "Demonstration" in VLDB award
> -
>
> Key: SYSTEMML-1083
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1083
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Website
>Reporter: Jeremy Anderson
>Assignee: Jason Azares
>Priority: Minor
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1083) Fix spelling of "Demonstration" in VLDB award

2017-05-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson resolved SYSTEMML-1083.
--
   Resolution: Fixed
 Assignee: Jason Azares
Fix Version/s: SystemML 1.0

> Fix spelling of "Demonstration" in VLDB award
> -
>
> Key: SYSTEMML-1083
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1083
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Website
>Reporter: Jeremy Anderson
>Assignee: Jason Azares
>Priority: Minor
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1600) Display version in MLContext welcome message

2017-05-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson closed SYSTEMML-1600.


> Display version in MLContext welcome message
> 
>
> Key: SYSTEMML-1600
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1600
> Project: SystemML
>  Issue Type: Improvement
>  Components: APIs
>Reporter: Deron Eriksson
>Assignee: Krishna Kalyan
>Priority: Minor
> Fix For: SystemML 1.0
>
>
> Append SystemML version number to MLContext welcome message. It is available 
> via the MLContext version() method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1600) Display version in MLContext welcome message

2017-05-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson resolved SYSTEMML-1600.
--
   Resolution: Fixed
Fix Version/s: SystemML 1.0

Fixed by [PR502|https://github.com/apache/incubator-systemml/pull/502].

> Display version in MLContext welcome message
> 
>
> Key: SYSTEMML-1600
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1600
> Project: SystemML
>  Issue Type: Improvement
>  Components: APIs
>Reporter: Deron Eriksson
>Assignee: Krishna Kalyan
>Priority: Minor
> Fix For: SystemML 1.0
>
>
> Append SystemML version number to MLContext welcome message. It is available 
> via the MLContext version() method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (SYSTEMML-1600) Display version in MLContext welcome message

2017-05-15 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson reassigned SYSTEMML-1600:


Assignee: Krishna Kalyan

> Display version in MLContext welcome message
> 
>
> Key: SYSTEMML-1600
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1600
> Project: SystemML
>  Issue Type: Improvement
>  Components: APIs
>Reporter: Deron Eriksson
>Assignee: Krishna Kalyan
>Priority: Minor
>
> Append SystemML version number to MLContext welcome message. It is available 
> via the MLContext version() method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1600) Display version in MLContext welcome message

2017-05-15 Thread Deron Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011229#comment-16011229
 ] 

Deron Eriksson commented on SYSTEMML-1600:
--

Hi [~KrishnaKalyan3]

This is a great way to start becoming familiar with some of SystemML's codebase.

To get the version number, you might want to try something such as the 
following:
{code}
try {
ProjectInfo info = ProjectInfo.getProjectInfo();
if (info.version() != null) {
// display info.version()
}
} catch (MLContextException e) {
}
{code}

If the SystemML jar file is available, this will obtain the version number from 
the jar manifest. If the jar file isn't available (such as when 
developing/debugging in an IDE such as Eclipse), an MLContextException will be 
thrown, in which case we can just ignore the exception.



> Display version in MLContext welcome message
> 
>
> Key: SYSTEMML-1600
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1600
> Project: SystemML
>  Issue Type: Improvement
>  Components: APIs
>Reporter: Deron Eriksson
>Priority: Minor
>
> Append SystemML version number to MLContext welcome message. It is available 
> via the MLContext version() method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1605) Migrate zeppelin notebooks

2017-05-13 Thread Glenn Weidner (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009379#comment-16009379
 ] 

Glenn Weidner commented on SYSTEMML-1605:
-

Additional updates:
[PR 495|https://github.com/apache/incubator-systemml/pull/495]
[PR 498|https://github.com/apache/incubator-systemml/pull/498]

> Migrate zeppelin notebooks
> --
>
> Key: SYSTEMML-1605
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1605
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Glenn Weidner
>Assignee: Glenn Weidner
>
> Zeppelin notebooks samples in 
> https://github.com/gweidner/incubator-systemml/tree/master/samples/zeppelin-notebooks
>  need to be replaced.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1600) Display version in MLContext welcome message

2017-05-13 Thread Krishna Kalyan (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009321#comment-16009321
 ] 

Krishna Kalyan commented on SYSTEMML-1600:
--

[~deron],
I have tried something like this below. I think I am missing something. 
https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/api/mlcontext/MLContextUtil.java#L976

MLContext ml = MLContext.getActiveMLContext();
StringBuilder sb;
sb = new StringBuilder();
sb.append("\nWelcome to Apache SystemML!\n");
sb.append(version.version());

Error I get after compiling this change to a jar

scala> val ml = new MLContext(spark)
java.lang.NullPointerException
  at 
org.apache.sysml.api.mlcontext.MLContextUtil.welcomeMessage(MLContextUtil.java:981)
  at org.apache.sysml.api.mlcontext.MLContext.initMLContext(MLContext.java:230)
  at org.apache.sysml.api.mlcontext.MLContext.(MLContext.java:179)
  ... 52 elided

I wanted to get familiar with the code base and this looked like an easy fix!.

> Display version in MLContext welcome message
> 
>
> Key: SYSTEMML-1600
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1600
> Project: SystemML
>  Issue Type: Improvement
>  Components: APIs
>Reporter: Deron Eriksson
>Priority: Minor
>
> Append SystemML version number to MLContext welcome message. It is available 
> via the MLContext version() method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1470) MLContext Statistics does not reset heavy hitter instructions between runs.

2017-05-12 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson resolved SYSTEMML-1470.
--
   Resolution: Fixed
Fix Version/s: SystemML 1.0

Fixed by [PR493|https://github.com/apache/incubator-systemml/pull/493].

> MLContext Statistics does not reset heavy hitter instructions between runs.
> ---
>
> Key: SYSTEMML-1470
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1470
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>Assignee: Deron Eriksson
> Fix For: SystemML 1.0
>
>
> Currently, if multiple scripts are invoked sequentially using MLContext with 
> statistics turned on, the list of heavy hitter instructions is not reset in 
> between scripts.  Therefore, the heave hitters will carry over to each 
> subsequent script that is executed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1470) MLContext Statistics does not reset heavy hitter instructions between runs.

2017-05-12 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson closed SYSTEMML-1470.


> MLContext Statistics does not reset heavy hitter instructions between runs.
> ---
>
> Key: SYSTEMML-1470
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1470
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>Assignee: Deron Eriksson
> Fix For: SystemML 1.0
>
>
> Currently, if multiple scripts are invoked sequentially using MLContext with 
> statistics turned on, the list of heavy hitter instructions is not reset in 
> between scripts.  Therefore, the heave hitters will carry over to each 
> subsequent script that is executed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1483) Add Deconvolution layer in nn library and Caffe2DML

2017-05-12 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare resolved SYSTEMML-1483.
---
   Resolution: Fixed
Fix Version/s: SystemML 1.0

Resolved by commit 
https://github.com/apache/incubator-systemml/commit/d04d2381f369bc29c4c33e98381bcdc8a4d0aebb

> Add Deconvolution layer in nn library and Caffe2DML
> ---
>
> Key: SYSTEMML-1483
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1483
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Niketan Pansare
>Assignee: Prithviraj Sen
> Fix For: SystemML 1.0
>
>
> http://caffe.berkeleyvision.org/tutorial/layers/deconvolution.html
> [~mwdus...@us.ibm.com] [~prithvi_r_s] 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (SYSTEMML-1483) Add Deconvolution layer in nn library and Caffe2DML

2017-05-12 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare reassigned SYSTEMML-1483:
-

Assignee: Prithviraj Sen

> Add Deconvolution layer in nn library and Caffe2DML
> ---
>
> Key: SYSTEMML-1483
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1483
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Niketan Pansare
>Assignee: Prithviraj Sen
>
> http://caffe.berkeleyvision.org/tutorial/layers/deconvolution.html
> [~mwdus...@us.ibm.com] [~prithvi_r_s] 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1607) Add Linear Regression Notebook example

2017-05-12 Thread Arvind Surve (JIRA)
Arvind Surve created SYSTEMML-1607:
--

 Summary: Add Linear Regression Notebook example
 Key: SYSTEMML-1607
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1607
 Project: SystemML
  Issue Type: New Feature
Reporter: Arvind Surve






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1606) Update notebook samples with latest code

2017-05-12 Thread Arvind Surve (JIRA)
Arvind Surve created SYSTEMML-1606:
--

 Summary: Update notebook samples with latest code
 Key: SYSTEMML-1606
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1606
 Project: SystemML
  Issue Type: Bug
Reporter: Arvind Surve






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1605) Migrate zeppelin notebooks

2017-05-11 Thread Glenn Weidner (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007466#comment-16007466
 ] 

Glenn Weidner commented on SYSTEMML-1605:
-

Submitted [PR 494|https://github.com/apache/incubator-systemml/pull/494].

> Migrate zeppelin notebooks
> --
>
> Key: SYSTEMML-1605
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1605
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Glenn Weidner
>Assignee: Glenn Weidner
>
> Zeppelin notebooks samples in 
> https://github.com/gweidner/incubator-systemml/tree/master/samples/zeppelin-notebooks
>  need to be replaced.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1605) Migrate zeppelin notebooks

2017-05-11 Thread Glenn Weidner (JIRA)
Glenn Weidner created SYSTEMML-1605:
---

 Summary: Migrate zeppelin notebooks
 Key: SYSTEMML-1605
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1605
 Project: SystemML
  Issue Type: Sub-task
Reporter: Glenn Weidner
Assignee: Glenn Weidner


Zeppelin notebooks samples in 
https://github.com/gweidner/incubator-systemml/tree/master/samples/zeppelin-notebooks
 need to be replaced.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1604) Update examples including notebooks

2017-05-11 Thread Glenn Weidner (JIRA)
Glenn Weidner created SYSTEMML-1604:
---

 Summary: Update examples including notebooks
 Key: SYSTEMML-1604
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1604
 Project: SystemML
  Issue Type: Umbrella
Reporter: Glenn Weidner


Various examples in documentation including sample notebooks are out of date 
and need to be updated or removed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1603) Move release notes under _src directory

2017-05-11 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson resolved SYSTEMML-1603.
--
   Resolution: Fixed
Fix Version/s: SystemML 1.0

Fixed by [PR47|https://github.com/apache/incubator-systemml-website/pull/47].

> Move release notes under _src directory
> ---
>
> Key: SYSTEMML-1603
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1603
> Project: SystemML
>  Issue Type: Improvement
>  Components: Website
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
> Fix For: SystemML 1.0
>
>
> Currently the project release notes are stored in the root of the website 
> project:
> {code}
> 0.9.0-incubating/release_notes.md
> 0.10.0-incubating/release_notes.md
> 0.11.0-incubating/release_notes.md
> 0.12.0-incubating/release_notes.md
> 0.13.0-incubating/release_notes.md
> {code}
> Since these are not included under the _src directory, they are not generated 
> for the website, so currently the roadmap on the website links to them on 
> GitHub, which is awkward (links such as 
> https://github.com/apache/incubator-systemml-website/blob/master/0.13.0-incubating/release_notes.md).
> They can be moved to a location such as 
> _src/release_notes/release_notes-0.9.0-incubating.md, etc...
> This would allow them to be generated for the project website, which can then 
> be linked to internally rather than linking to GitHub.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1603) Move release notes under _src directory

2017-05-11 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson closed SYSTEMML-1603.


> Move release notes under _src directory
> ---
>
> Key: SYSTEMML-1603
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1603
> Project: SystemML
>  Issue Type: Improvement
>  Components: Website
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
> Fix For: SystemML 1.0
>
>
> Currently the project release notes are stored in the root of the website 
> project:
> {code}
> 0.9.0-incubating/release_notes.md
> 0.10.0-incubating/release_notes.md
> 0.11.0-incubating/release_notes.md
> 0.12.0-incubating/release_notes.md
> 0.13.0-incubating/release_notes.md
> {code}
> Since these are not included under the _src directory, they are not generated 
> for the website, so currently the roadmap on the website links to them on 
> GitHub, which is awkward (links such as 
> https://github.com/apache/incubator-systemml-website/blob/master/0.13.0-incubating/release_notes.md).
> They can be moved to a location such as 
> _src/release_notes/release_notes-0.9.0-incubating.md, etc...
> This would allow them to be generated for the project website, which can then 
> be linked to internally rather than linking to GitHub.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1598) Output statistics via MLContext to stdout

2017-05-11 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson closed SYSTEMML-1598.


> Output statistics via MLContext to stdout
> -
>
> Key: SYSTEMML-1598
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1598
> Project: SystemML
>  Issue Type: Task
>  Components: APIs
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
> Fix For: SystemML 1.0
>
>
> Recently some refactoring was performed and statistics via MLContext are now 
> output to log4j rather than standard output. In an interactive environment 
> such as Spark Shell, after setting statistics to true, it is more intuitive 
> for users to return statistics to standard output. This was the previous 
> behavior for statistics and is the existing behavior for "explain".



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1598) Output statistics via MLContext to stdout

2017-05-11 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson resolved SYSTEMML-1598.
--
   Resolution: Fixed
Fix Version/s: SystemML 1.0

Fixed by [PR491|https://github.com/apache/incubator-systemml/pull/491].

> Output statistics via MLContext to stdout
> -
>
> Key: SYSTEMML-1598
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1598
> Project: SystemML
>  Issue Type: Task
>  Components: APIs
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
> Fix For: SystemML 1.0
>
>
> Recently some refactoring was performed and statistics via MLContext are now 
> output to log4j rather than standard output. In an interactive environment 
> such as Spark Shell, after setting statistics to true, it is more intuitive 
> for users to return statistics to standard output. This was the previous 
> behavior for statistics and is the existing behavior for "explain".



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1603) Move release notes under _src directory

2017-05-11 Thread Deron Eriksson (JIRA)
Deron Eriksson created SYSTEMML-1603:


 Summary: Move release notes under _src directory
 Key: SYSTEMML-1603
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1603
 Project: SystemML
  Issue Type: Improvement
  Components: Website
Reporter: Deron Eriksson
Assignee: Deron Eriksson


Currently the project release notes are stored in the root of the website 
project:
{code}
0.9.0-incubating/release_notes.md
0.10.0-incubating/release_notes.md
0.11.0-incubating/release_notes.md
0.12.0-incubating/release_notes.md
0.13.0-incubating/release_notes.md
{code}

Since these are not included under the _src directory, they are not generated 
for the website, so currently the roadmap on the website links to them on 
GitHub, which is awkward (links such as 
https://github.com/apache/incubator-systemml-website/blob/master/0.13.0-incubating/release_notes.md).

They can be moved to a location such as 
_src/release_notes/release_notes-0.9.0-incubating.md, etc...

This would allow them to be generated for the project website, which can then 
be linked to internally rather than linking to GitHub.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1602) Improve formatting of heavy hitter output

2017-05-11 Thread Deron Eriksson (JIRA)
Deron Eriksson created SYSTEMML-1602:


 Summary: Improve formatting of heavy hitter output
 Key: SYSTEMML-1602
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1602
 Project: SystemML
  Issue Type: Improvement
Reporter: Deron Eriksson
Assignee: Deron Eriksson


Currently the statistics heavy hitter instructions are output in tabular form. 
Readability can be improved by ensuring the results are properly aligned into 
columns.

Current example:
{code}
Heavy hitter instructions (name, time, count):
-- 1)   rand0.070 sec   2   
-- 2)   ctableexpand0.001 sec   1   
-- 3)   round   0.000 sec   1   
-- 4)   rmvar   0.000 sec   18  
-- 5)   *   0.000 sec   8   
-- 6)   seq 0.000 sec   1   
-- 7)   assignvar   0.000 sec   16  
-- 8)   createvar   0.000 sec   5   
-- 9)   sqrt0.000 sec   3   
-- 10)  /   0.000 sec   2   
{code}





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1599) Extend nn layers to support different initialization type

2017-05-11 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007144#comment-16007144
 ] 

Mike Dusenberry commented on SYSTEMML-1599:
---

Oh one more thing, it would be really good to add default arguments to the 
parser layer (I think the runtime already supports it) first so that we can 
have a sane default for the layer {{init}} functions.

> Extend nn layers to support different initialization type
> -
>
> Key: SYSTEMML-1599
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1599
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Niketan Pansare
>
> Caffe currently supports the users to configure different initialization 
> types:
> 1. constant
> 2. uniform
> 3. gaussian
> 4. positive_unitball
> 5. xavier
> 6. msra
> 7. bilinear
> The init() function of the layer should accept `type` variable which can be 
> passed by Caffe2DML.
> [~mwdus...@us.ibm.com] [~prithvi_r_s]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (SYSTEMML-1470) MLContext Statistics does not reset heavy hitter instructions between runs.

2017-05-11 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson reassigned SYSTEMML-1470:


Assignee: Deron Eriksson

> MLContext Statistics does not reset heavy hitter instructions between runs.
> ---
>
> Key: SYSTEMML-1470
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1470
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>Assignee: Deron Eriksson
>
> Currently, if multiple scripts are invoked sequentially using MLContext with 
> statistics turned on, the list of heavy hitter instructions is not reset in 
> between scripts.  Therefore, the heave hitters will carry over to each 
> subsequent script that is executed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1599) Extend nn layers to support different initialization type

2017-05-11 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16006946#comment-16006946
 ] 

Mike Dusenberry commented on SYSTEMML-1599:
---

Thanks for bringing this up, [~niketanpansare].  This has been on my todo list 
for a while now.  Specifically, I'd like to create an {{init}} function in 
{{utils.dml}} that accepts {{(rows, cols, type)}} that all of the layer 
{{init}} functions call once they decide the number of rows & cols appropriate 
for the their respective layers.  Additionally, I'd like to add a few more 
functions to {{utils.dml}} for each of the initialization types that the 
{{utils::init}} function calls, such as {{normal()}}, {{uniform()}}, etc.  
Also, in addition to the Caffe init types, we should also include the 
initialization schemes from He et al., such as the He normal variant that is 
used in all of the current layers.  All together, that will basically give us 
the union of the init schemes used by Caffe and Keras 
(https://keras.io/initializers).

> Extend nn layers to support different initialization type
> -
>
> Key: SYSTEMML-1599
>     URL: https://issues.apache.org/jira/browse/SYSTEMML-1599
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Niketan Pansare
>
> Caffe currently supports the users to configure different initialization 
> types:
> 1. constant
> 2. uniform
> 3. gaussian
> 4. positive_unitball
> 5. xavier
> 6. msra
> 7. bilinear
> The init() function of the layer should accept `type` variable which can be 
> passed by Caffe2DML.
> [~mwdus...@us.ibm.com] [~prithvi_r_s]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1601) Update License files with 0.14 feedback

2017-05-11 Thread Luciano Resende (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luciano Resende resolved SYSTEMML-1601.
---
   Resolution: Fixed
Fix Version/s: SystemML 1.0

> Update License files with 0.14 feedback
> ---
>
> Key: SYSTEMML-1601
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1601
> Project: SystemML
>  Issue Type: Bug
>Reporter: Luciano Resende
>Assignee: Luciano Resende
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1510) Add Pygments license in source distribution

2017-05-11 Thread Luciano Resende (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luciano Resende resolved SYSTEMML-1510.
---
   Resolution: Fixed
Fix Version/s: SystemML 0.14

> Add Pygments license in source distribution
> ---
>
> Key: SYSTEMML-1510
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1510
> Project: SystemML
>  Issue Type: Bug
>Reporter: Arvind Surve
>Assignee: Arvind Surve
>Priority: Minor
> Fix For: SystemML 0.14
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1601) Update License files with 0.14 feedback

2017-05-11 Thread Luciano Resende (JIRA)
Luciano Resende created SYSTEMML-1601:
-

 Summary: Update License files with 0.14 feedback
 Key: SYSTEMML-1601
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1601
 Project: SystemML
  Issue Type: Bug
Reporter: Luciano Resende
Assignee: Luciano Resende






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1600) Display version in MLContext welcome message

2017-05-10 Thread Deron Eriksson (JIRA)
Deron Eriksson created SYSTEMML-1600:


 Summary: Display version in MLContext welcome message
 Key: SYSTEMML-1600
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1600
 Project: SystemML
  Issue Type: Improvement
  Components: APIs
Reporter: Deron Eriksson
Priority: Minor


Append SystemML version number to MLContext welcome message. It is available 
via the MLContext version() method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1599) Extend nn layers to support different initialization type

2017-05-10 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-1599:
-

 Summary: Extend nn layers to support different initialization type
 Key: SYSTEMML-1599
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1599
 Project: SystemML
  Issue Type: Sub-task
Reporter: Niketan Pansare


Caffe currently supports the users to configure different initialization types:
1. constant
2. uniform
3. gaussian
4. positive_unitball
5. xavier
6. msra
7. bilinear

The init() function of the layer should accept `type` variable which can be 
passed by Caffe2DML.

[~mwdus...@us.ibm.com] [~prithvi_r_s]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1598) Output statistics via MLContext to stdout

2017-05-10 Thread Deron Eriksson (JIRA)
Deron Eriksson created SYSTEMML-1598:


 Summary: Output statistics via MLContext to stdout
 Key: SYSTEMML-1598
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1598
 Project: SystemML
  Issue Type: Task
  Components: APIs
Reporter: Deron Eriksson
Assignee: Deron Eriksson


Recently some refactoring was performed and statistics via MLContext are now 
output to log4j rather than standard output. In an interactive environment such 
as Spark Shell, after setting statistics to true, it is more intuitive for 
users to return statistics to standard output. This was the previous behavior 
for statistics and is the existing behavior for "explain".



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (SYSTEMML-1597) Random Forest With Categorical Variables

2017-05-10 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005743#comment-16005743
 ] 

Mike Dusenberry edited comment on SYSTEMML-1597 at 5/11/17 1:05 AM:


cc [~prithvianight], [~reinwald], [~reinw...@us.ibm.com], [~mboehm7], [~iyounus]


was (Author: mwdus...@us.ibm.com):
cc [~prithvianight], [~reinwald], [~reinw...@us.ibm.com], [~mboehm7]

> Random Forest With Categorical Variables
> 
>
> Key: SYSTEMML-1597
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1597
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
> Attachments: rf.scala
>
>
> Our random forest scripts accept and {{R}} matrix that is designed to allow 
> the user to specify columns that have been one-hot encoded.  It does not seem 
> to work currently.  We should investigate and get the attached example 
> working correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1597) Random Forest With Categorical Variables

2017-05-10 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-1597:
-

 Summary: Random Forest With Categorical Variables
 Key: SYSTEMML-1597
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1597
 Project: SystemML
  Issue Type: Bug
Reporter: Mike Dusenberry
 Attachments: rf.scala

Our random forest scripts accept and {{R}} matrix that is designed to allow the 
user to specify columns that have been one-hot encoded.  It does not seem to 
work currently.  We should investigate and get the attached example working 
correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (SYSTEMML-1597) Random Forest With Categorical Variables

2017-05-10 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005743#comment-16005743
 ] 

Mike Dusenberry edited comment on SYSTEMML-1597 at 5/11/17 1:04 AM:


cc [~prithvianight], [~reinwald], [~reinw...@us.ibm.com], [~mboehm7]


was (Author: mwdus...@us.ibm.com):
cc [~prithvianight], [~reinwald], [~reinw...@us.ibm.com]

> Random Forest With Categorical Variables
> 
>
> Key: SYSTEMML-1597
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1597
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
> Attachments: rf.scala
>
>
> Our random forest scripts accept and {{R}} matrix that is designed to allow 
> the user to specify columns that have been one-hot encoded.  It does not seem 
> to work currently.  We should investigate and get the attached example 
> working correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1597) Random Forest With Categorical Variables

2017-05-10 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005743#comment-16005743
 ] 

Mike Dusenberry commented on SYSTEMML-1597:
---

cc [~prithvianight], [~reinwald], [~reinw...@us.ibm.com]

> Random Forest With Categorical Variables
> 
>
> Key: SYSTEMML-1597
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1597
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
> Attachments: rf.scala
>
>
> Our random forest scripts accept and {{R}} matrix that is designed to allow 
> the user to specify columns that have been one-hot encoded.  It does not seem 
> to work currently.  We should investigate and get the attached example 
> working correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1596) Set runtime platform via MLContext

2017-05-10 Thread Deron Eriksson (JIRA)
Deron Eriksson created SYSTEMML-1596:


 Summary: Set runtime platform via MLContext
 Key: SYSTEMML-1596
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1596
 Project: SystemML
  Issue Type: Improvement
  Components: APIs
Reporter: Deron Eriksson
Assignee: Deron Eriksson
Priority: Minor


Currently in Spark Shell, the runtime platform can be specified by setting 
DMLScript's rtplatform field:
{code}
org.apache.sysml.api.DMLScript.rtplatform = 
org.apache.sysml.api.DMLScript.RUNTIME_PLATFORM.SPARK
{code}

Since setting the runtime platform can be a fairly common operation when 
developing/debugging, MLContext can have a setRuntimePlatform or other such 
method for usability.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1583) Implement converter in Python to convert caffemodel in SystemML format

2017-05-10 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005332#comment-16005332
 ] 

Mike Dusenberry commented on SYSTEMML-1583:
---

Definitely think it is a great idea to be able to accept Caffe model files, in 
addition to the protobuf definitition files.  In the near future, we can also 
extend this to Keras, using the same infrastructure from the base of Caffe2DML. 
 I agree with Fred that it would be ideal to use Caffe files directly so that 
users can seamlessly move from Caffe(2) to SystemML.  It would be nice to be 
able to simply pass in URLs of protobufs & model files, and the system could 
cache them locally to avoid having to download them repeatedly.  We could 
possibly even cache the SystemML binary format of the models transparently for 
the user.  Overall, we need to continue to focus on flexibility for creating 
and running models at scale.

> Implement converter in Python to convert caffemodel in SystemML format
> --
>
> Key: SYSTEMML-1583
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1583
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Niketan Pansare
>Assignee: Arvind Surve
>
> Ideally, this converter shouldnot require the caffe to be installed. Please 
> see 
> http://stackoverflow.com/questions/37572948/extracting-weights-from-caffemodel-without-caffe-installed-in-python
> An example code to convert a caffe model to csv if caffe is installed:
> {code}
> import caffe
> import numpy as np
> #net = 
> caffe.Net('/home/biuser/nike/barista/VGG_ILSVRC_19_layers_train_val.prototxt',
>  caffe.TEST)
> net = 
> caffe.Net('/home/biuser/VGG_trained_models/VGG_ILSVRC_19_layers_deploy.prototxt',
>  '/home/biuser/VGG_trained_models/VGG_ILSVRC_19_layers.caffemodel', 
> caffe.TEST)
> #surgery.transplant(net, base_net)
> for l in [ "conv1_1", "conv1_2", "conv2_1", "conv2_2", "conv3_1", "conv3_2", 
> "conv3_3", "conv3_4", "conv4_1", "conv4_2", "conv4_3", "conv4_4", "conv5_1", 
> "conv5_2", "conv5_3", "conv5_4", "fc6", "fc7", "fc8" ]:
> w = net.params[l][0].data
> w = w.reshape(w.shape[0], -1)
> b = net.params[l][1].data
> b = b.reshape(b.shape[0], -1)
> # You may have to reshape it for fc layers
> np.savetxt("VGG_trained_models/" + l + "_weight.csv", w, 
> delimiter=",")
> np.savetxt("VGG_trained_models/" + l + "_bias.csv", b, delimiter=",")
> {code}
> Here is an example pyspark script to test this JIRA:
> {code}
> from systemml.mllearn import Caffe2DML
> from pyspark.sql import SQLContext
> import numpy as np
> import urllib, os, scipy.ndimage
> from PIL import Image
> import systemml as sml
> # ImageNet specific parameters
> img_shape = (3, 224, 224)
> # Downloads a jpg image, resizes it to 224 and return as numpy array in N X 
> CHW format
> url = 
> 'https://upload.wikimedia.org/wikipedia/commons/thumb/5/58/MountainLion.jpg/312px-MountainLion.jpg'
> outFile = 'test.jpg'
> urllib.urlretrieve(url, outFile)
> input_image = sml.convertImageToNumPyArr(Image.open(outFile), 
> img_shape=img_shape)
> # Download the ResNet network
> import urllib
> urllib.urlretrieve('https://raw.githubusercontent.com/niketanpansare/model_zoo/master/caffe/vision/resnet/ilsvrc12/ResNet_50_network.proto',
>  'ResNet_50_network.proto')
> urllib.urlretrieve('https://raw.githubusercontent.com/niketanpansare/model_zoo/master/caffe/vision/resnet/ilsvrc12/ResNet_50_solver.proto',
>  'ResNet_50_solver.proto')
> home_dir = os.path.expanduser('~')
> # let's assume that this function is implemented as 
> saveAsBinaryBlock(inputCaffeModel, outputDir)
> resnet_pretrained_weight_dir = os.path.join(home_dir, 'model_zoo', 'caffe', 
> 'vision', 'resnet', 'ilsvrc12', 'ResNet_50_pretrained_weights')
> urllib.urlretrieve('https://deepdetect.com/models/resnet/ResNet-50-model.caffemodel',
>  'ResNet-50-model.caffemodel')
> ###
> # To be implemented as part of this JIRA
> sml.saveAsBinaryBlock('ResNet-50-model.caffemodel', 
> resnet_pretrained_weight_dir)
> ###
> resnet = Caffe2DML(sqlCtx, solver='ResNet_50_solver.proto', 
> weights=resnet_pretrained_weight_dir, input_shape=img_shape)
> resnet.predict(input_image)
> # This should return array(['cougar, puma, catamount, mountain lion, painter, 
> panther, Felis '], dtype='|S64')
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1584) Add Scala Script.out(...) method that accepts Lists.

2017-05-10 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson resolved SYSTEMML-1584.
--
   Resolution: Fixed
Fix Version/s: SystemML 1.0

Addressed by [PR488|https://github.com/apache/incubator-systemml/pull/488].

> Add Scala Script.out(...) method that accepts Lists.
> 
>
> Key: SYSTEMML-1584
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1584
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>Assignee: Deron Eriksson
> Fix For: SystemML 1.0
>
>
> Currently, we have {{Script.out(...)}} methods that accept one or many String 
> arguments.  We should add method that also accepts a list of Strings.  This 
> would make it easier to define a map of inputs and a list of outputs in one 
> location, and then pass these into the appropriate {{in}} and {{out}} methods 
> elsewhere.  Likewise, it would be nice to be able to use the same list to get 
> the results after running an algorithm, although of course this may not be 
> possible with the current {{getTuple}} methods.
> cc [~deron]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1584) Add Scala Script.out(...) method that accepts Lists.

2017-05-10 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson closed SYSTEMML-1584.


> Add Scala Script.out(...) method that accepts Lists.
> 
>
> Key: SYSTEMML-1584
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1584
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>Assignee: Deron Eriksson
> Fix For: SystemML 1.0
>
>
> Currently, we have {{Script.out(...)}} methods that accept one or many String 
> arguments.  We should add method that also accepts a list of Strings.  This 
> would make it easier to define a map of inputs and a list of outputs in one 
> location, and then pass these into the appropriate {{in}} and {{out}} methods 
> elsewhere.  Likewise, it would be nice to be able to use the same list to get 
> the results after running an algorithm, although of course this may not be 
> possible with the current {{getTuple}} methods.
> cc [~deron]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1595) Missing Block Sizes For PersistentWrites & TransientWrites

2017-05-10 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry closed SYSTEMML-1595.
-

> Missing Block Sizes For PersistentWrites & TransientWrites
> --
>
> Key: SYSTEMML-1595
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1595
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
> Attachments: scenario1.dml
>
>
> In the attached script, the resulting PersisentWrites for {{doutc1_agg}} & 
> {{dWc1_agg}} end up having unknown block sizes, despite the input DAGs for 
> those variables having known block sizes.  Due to this, when we use MLContext 
> and mark those variables as outputs, the PersistentWrites will be rewritten 
> to TransientWrites, and the block sizes will remain unknown.
> To run:
> {code}
> spark-submit $SYSTEMML_HOME/target/SystemML.jar -f scenario1.dml -explain 
> recompile_hops
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1595) Missing Block Sizes For PersistentWrites & TransientWrites

2017-05-10 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry resolved SYSTEMML-1595.
---
   Resolution: Fixed
Fix Version/s: SystemML 1.0

> Missing Block Sizes For PersistentWrites & TransientWrites
> --
>
> Key: SYSTEMML-1595
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1595
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
> Attachments: scenario1.dml
>
>
> In the attached script, the resulting PersisentWrites for {{doutc1_agg}} & 
> {{dWc1_agg}} end up having unknown block sizes, despite the input DAGs for 
> those variables having known block sizes.  Due to this, when we use MLContext 
> and mark those variables as outputs, the PersistentWrites will be rewritten 
> to TransientWrites, and the block sizes will remain unknown.
> To run:
> {code}
> spark-submit $SYSTEMML_HOME/target/SystemML.jar -f scenario1.dml -explain 
> recompile_hops
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1595) Missing Block Sizes For PersistentWrites & TransientWrites

2017-05-10 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005270#comment-16005270
 ] 

Mike Dusenberry commented on SYSTEMML-1595:
---

Great, thanks [~mboehm7].  This has fixed the problem.

> Missing Block Sizes For PersistentWrites & TransientWrites
> --
>
> Key: SYSTEMML-1595
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1595
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
> Attachments: scenario1.dml
>
>
> In the attached script, the resulting PersisentWrites for {{doutc1_agg}} & 
> {{dWc1_agg}} end up having unknown block sizes, despite the input DAGs for 
> those variables having known block sizes.  Due to this, when we use MLContext 
> and mark those variables as outputs, the PersistentWrites will be rewritten 
> to TransientWrites, and the block sizes will remain unknown.
> To run:
> {code}
> spark-submit $SYSTEMML_HOME/target/SystemML.jar -f scenario1.dml -explain 
> recompile_hops
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (SYSTEMML-1595) Missing Block Sizes For PersistentWrites & TransientWrites

2017-05-10 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry reassigned SYSTEMML-1595:
-

Assignee: Matthias Boehm

> Missing Block Sizes For PersistentWrites & TransientWrites
> --
>
> Key: SYSTEMML-1595
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1595
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>Assignee: Matthias Boehm
> Attachments: scenario1.dml
>
>
> In the attached script, the resulting PersisentWrites for {{doutc1_agg}} & 
> {{dWc1_agg}} end up having unknown block sizes, despite the input DAGs for 
> those variables having known block sizes.  Due to this, when we use MLContext 
> and mark those variables as outputs, the PersistentWrites will be rewritten 
> to TransientWrites, and the block sizes will remain unknown.
> To run:
> {code}
> spark-submit $SYSTEMML_HOME/target/SystemML.jar -f scenario1.dml -explain 
> recompile_hops
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1595) Missing Block Sizes For PersistentWrites & TransientWrites

2017-05-09 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003892#comment-16003892
 ] 

Matthias Boehm commented on SYSTEMML-1595:
--

yes, I already modified the persistent to transient write rewrite and I'm 
currently in the process of adding some additional tests.

> Missing Block Sizes For PersistentWrites & TransientWrites
> --
>
> Key: SYSTEMML-1595
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1595
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
> Attachments: scenario1.dml
>
>
> In the attached script, the resulting PersisentWrites for {{doutc1_agg}} & 
> {{dWc1_agg}} end up having unknown block sizes, despite the input DAGs for 
> those variables having known block sizes.  Due to this, when we use MLContext 
> and mark those variables as outputs, the PersistentWrites will be rewritten 
> to TransientWrites, and the block sizes will remain unknown.
> To run:
> {code}
> spark-submit $SYSTEMML_HOME/target/SystemML.jar -f scenario1.dml -explain 
> recompile_hops
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1595) Missing Block Sizes For PersistentWrites & TransientWrites

2017-05-09 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003852#comment-16003852
 ] 

Mike Dusenberry commented on SYSTEMML-1595:
---

Interesting.  I just checked, and MLContext uses plain "text" dummy {{write}} 
statements as well, so that explains why I was seeing the issue originally with 
MLContext.  Should we start adding the block sizes to these persistent writes 
anyway?  Or should we just update the persistent write -> transient write 
rewrite rule to grab the block sizes from the input to the PersistentWrite?

> Missing Block Sizes For PersistentWrites & TransientWrites
> --
>
> Key: SYSTEMML-1595
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1595
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
> Attachments: scenario1.dml
>
>
> In the attached script, the resulting PersisentWrites for {{doutc1_agg}} & 
> {{dWc1_agg}} end up having unknown block sizes, despite the input DAGs for 
> those variables having known block sizes.  Due to this, when we use MLContext 
> and mark those variables as outputs, the PersistentWrites will be rewritten 
> to TransientWrites, and the block sizes will remain unknown.
> To run:
> {code}
> spark-submit $SYSTEMML_HOME/target/SystemML.jar -f scenario1.dml -explain 
> recompile_hops
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1595) Missing Block Sizes For PersistentWrites & TransientWrites

2017-05-09 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003809#comment-16003809
 ] 

Matthias Boehm commented on SYSTEMML-1595:
--

As it turns out the "unknown" blocksizes originate from the default format 
"text" which is not a blocked representation. By specifying {{format="binary"}} 
in the write statements, the blocksizes are correctly set. 

However, every transient write should have proper block sizes and we need to 
make sure they are set correctly when modifying persistent writes to transient 
writes.

> Missing Block Sizes For PersistentWrites & TransientWrites
> --
>
> Key: SYSTEMML-1595
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1595
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
> Attachments: scenario1.dml
>
>
> In the attached script, the resulting PersisentWrites for {{doutc1_agg}} & 
> {{dWc1_agg}} end up having unknown block sizes, despite the input DAGs for 
> those variables having known block sizes.  Due to this, when we use MLContext 
> and mark those variables as outputs, the PersistentWrites will be rewritten 
> to TransientWrites, and the block sizes will remain unknown.
> To run:
> {code}
> spark-submit $SYSTEMML_HOME/target/SystemML.jar -f scenario1.dml -explain 
> recompile_hops
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (SYSTEMML-1561) Improve constant folding during compilation

2017-05-09 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003710#comment-16003710
 ] 

Mike Dusenberry edited comment on SYSTEMML-1561 at 5/9/17 10:57 PM:


Well, I tried logging the rewrites with {{ProgramRewriter.LDEBUG = true}} 
enabled and log4j set to DEBUG, but it only displayed the common subexpression 
elimination rewrites during the second chance pass.  Looking into it further, 
rewrites like the constant folding don't seem to ever emit debug logging, so I 
don't think the log isn't showing the whole picture.  Regardless, here's the 
trace (look for the {{ABOUT TO START STATIC REWRITE + IPA SECOND 
CHANCE}} section).

{code}
.

17/05/09 15:50:35 DEBUG DMLScript:
DML config:
INFO: localtmpdir: /tmp/systemml
INFO: scratch: scratch_space
INFO: optlevel: 2
INFO: numreducers: 10
INFO: defaultblocksize: 1000
INFO: dml.yarn.appmaster: false
INFO: dml.yarn.appmaster.mem: 2048
INFO: dml.yarn.mapreduce.mem: -1
INFO: cp.parallel.matrixmult: true
INFO: cp.parallel.textio: true
INFO: native.blas: auto
INFO: compressed.linalg: false
INFO: codegen.enabled: false
INFO: codegen.literals: 1
INFO: codegen.plancache: true
INFO: systemml.stats.extraGPU: false
INFO: systemml.stats.extraDNN: false

17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/examples/mnist_lenet.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/affine.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/conv2d_builtin.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/cross_entropy_loss.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/dropout.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/l2_reg.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/max_pool2d_builtin.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/relu.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/softmax.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/optim/sgd_nesterov.dml
17/05/09 15:50:36 DEBUG MRConfigurationNames: Hadoop build version: 2.6.5 from 
e8c9fe0b4c252caf2ebf1464220599650f119997 by sjlee source checksum 
f05c9fa095a395faa9db9f7ba5d754
17/05/09 15:50:36 DEBUG MRConfigurationNames: Using hadoop 2.x configuration 
properties.
17/05/09 15:50:36 DEBUG MutableMetricsFactory: field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
about=, always=false, type=DEFAULT, value=[Rate of successful kerberos logins 
and latency (milliseconds)], valueName=Time)
17/05/09 15:50:36 DEBUG MutableMetricsFactory: field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
about=, always=false, type=DEFAULT, value=[Rate of failed kerberos logins and 
latency (milliseconds)], valueName=Time)
17/05/09 15:50:36 DEBUG MutableMetricsFactory: field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
about=, always=false, type=DEFAULT, value=[GetGroups], valueName=Time)
17/05/09 15:50:36 DEBUG MetricsSystemImpl: UgiMetrics, User and group related 
metrics
17/05/09 15:50:36 DEBUG KerberosName: Kerberos krb5 configuration not found, 
setting default realm to empty
17/05/09 15:50:36 DEBUG Groups:  Creating new Groups object
17/05/09 15:50:36 DEBUG NativeCodeLoader: Trying to load the custom-built 
native-hadoop library...
17/05/09 15:50:36 DEBUG NativeCodeLoader: Failed to load native-hadoop with 
error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
17/05/09 15:50:36 DEBUG NativeCodeLoader: 
java.library.path=/Users/mwdusenb/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
17/05/09 15:50:36 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
17/05/09 15:50:36 DEBUG PerformanceAdvisory: Falling back to shell based
17/05/09 15:50:36 DEBUG JniBasedUnixGroupsMappingWithFallback: Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping

[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-05-09 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003710#comment-16003710
 ] 

Mike Dusenberry commented on SYSTEMML-1561:
---

Well, I tried logging the rewrites with {{ProgramRewriter.LDEBUG = true}} 
enabled and log4j set to DEBUG, but it only displayed the common subexpression 
elimination rewrites during the second chance pass.  Looking into it further, 
rewrites like the constant folding don't seem to ever emit debug logging, so I 
don't think the log isn't showing the whole picture.  Regardless, here's the 
trace (look for the {{ABOUT TO START STATIC REWRITE + IPA SECOND 
CHANCE}} section).

{code}
17/05/09 15:50:35 DEBUG DMLScript:
DML config:
INFO: localtmpdir: /tmp/systemml
INFO: scratch: scratch_space
INFO: optlevel: 2
INFO: numreducers: 10
INFO: defaultblocksize: 1000
INFO: dml.yarn.appmaster: false
INFO: dml.yarn.appmaster.mem: 2048
INFO: dml.yarn.mapreduce.mem: -1
INFO: cp.parallel.matrixmult: true
INFO: cp.parallel.textio: true
INFO: native.blas: auto
INFO: compressed.linalg: false
INFO: codegen.enabled: false
INFO: codegen.literals: 1
INFO: codegen.plancache: true
INFO: systemml.stats.extraGPU: false
INFO: systemml.stats.extraDNN: false

17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/examples/mnist_lenet.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/affine.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/conv2d_builtin.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/cross_entropy_loss.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/dropout.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/l2_reg.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/max_pool2d_builtin.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/relu.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/layers/softmax.dml
17/05/09 15:50:35 DEBUG DMLScript: Looking for the following file in the local 
file system: ./nn/optim/sgd_nesterov.dml
17/05/09 15:50:36 DEBUG MRConfigurationNames: Hadoop build version: 2.6.5 from 
e8c9fe0b4c252caf2ebf1464220599650f119997 by sjlee source checksum 
f05c9fa095a395faa9db9f7ba5d754
17/05/09 15:50:36 DEBUG MRConfigurationNames: Using hadoop 2.x configuration 
properties.
17/05/09 15:50:36 DEBUG MutableMetricsFactory: field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
about=, always=false, type=DEFAULT, value=[Rate of successful kerberos logins 
and latency (milliseconds)], valueName=Time)
17/05/09 15:50:36 DEBUG MutableMetricsFactory: field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
about=, always=false, type=DEFAULT, value=[Rate of failed kerberos logins and 
latency (milliseconds)], valueName=Time)
17/05/09 15:50:36 DEBUG MutableMetricsFactory: field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
about=, always=false, type=DEFAULT, value=[GetGroups], valueName=Time)
17/05/09 15:50:36 DEBUG MetricsSystemImpl: UgiMetrics, User and group related 
metrics
17/05/09 15:50:36 DEBUG KerberosName: Kerberos krb5 configuration not found, 
setting default realm to empty
17/05/09 15:50:36 DEBUG Groups:  Creating new Groups object
17/05/09 15:50:36 DEBUG NativeCodeLoader: Trying to load the custom-built 
native-hadoop library...
17/05/09 15:50:36 DEBUG NativeCodeLoader: Failed to load native-hadoop with 
error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
17/05/09 15:50:36 DEBUG NativeCodeLoader: 
java.library.path=/Users/mwdusenb/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
17/05/09 15:50:36 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
17/05/09 15:50:36 DEBUG PerformanceAdvisory: Falling back to shell based
17/05/09 15:50:36 DEBUG JniBasedUnixGroupsMappingWithFallback: Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
17/05/09 15:50:36 DEBUG Gro

[jira] [Commented] (SYSTEMML-1527) Use top-level algorithm scripts for application tests

2017-05-09 Thread Krishna Kalyan (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003639#comment-16003639
 ] 

Krishna Kalyan commented on SYSTEMML-1527:
--

I am using `mvn clean verify` to check if my changes do not break anything. 
However this takes a long time. Is there a way to just test a single component 
like GLM?.

> Use top-level algorithm scripts for application tests
> -
>
> Key: SYSTEMML-1527
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1527
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Matthias Boehm
>
> There are numerous dml and pydml application tests that aim to test our 
> existing algorithms. However, these tests use replicated (and mostly 
> outdated) scripts. This task aims to remove the duplicated dml and pydml 
> scripts and to refer directly to the existing algorithm tests. This also 
> includes the update of R comparison scripts.
> See SYSTEMML-1363 for examples.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-05-09 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003628#comment-16003628
 ] 

Matthias Boehm commented on SYSTEMML-1561:
--

and it's great to see that the recompilation times are still in a reasonable 
range: 5978 DAGs in 3.2s - generally, we try to keep recompilation of average 
DAGs at around 1ms. 

> Improve constant folding during compilation
> ---
>
> Key: SYSTEMML-1561
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
> Fix For: SystemML 1.0
>
> Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, 
> scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around 
> the spatial dimensions (height and width) of the images that are stretched 
> out into rows of the input/output matrices.  These output dimensions are 
> computed within the forward functions of the above layers as small scalar 
> equations.  From a mathematical standpoint, these sizes can be determined at 
> compile time, and it is nice to have these size equations in DML (v.s. hiding 
> them inside the engine within built-in functions).  However, we do not 
> currently evaluate these expressions during compilation, and thus we are left 
> with unknown sizes even during recompilation.  This naturally leads to max 
> memory estimates and thus often leads to unnecessary distributed runtime ops 
> rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve 
> the {{Houtc1}} & {{Woutc1}} values that are returned from a 
> `conv2d::forward(...)` function.  These represent the spatial dimensions of 
> the volume with each of the rows of the output {{outc1}} of the function, and 
> the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal 
> to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is 
> created that should have the same dimensions as {{outc1}}.  For the columns, 
> if I use {{cols=ncol(outc1)}} in this rand statement, the size will be 
> propagated and CP ops will be compiled and run.  I I instead use 
> {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during 
> recompilation, and thus Spark ops will be compiled and run.  I have included 
> the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} 
> function is inserted after the {{conv2d::forward(...)}} function that 
> requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. 
>  Since those latter variables are not executed during compilation time, the 
> max pooling sizes remain unknown, even during recompilation, and thus Spark 
> ops will be compiled and run.  I have included the recompile hops plan 
> ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these 
> scenarios are fixed, as they are necessary for performant deep learning 
> applications.  Note too that this issue will be present in other non-deep 
> learning scenarios as well.
> Mailing list thread: 
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-05-09 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003618#comment-16003618
 ] 

Matthias Boehm commented on SYSTEMML-1561:
--

that's awesome - just one question: do we understand what reduced the number of 
cache writes to HDFS (export) from 2100 to 8?

> Improve constant folding during compilation
> ---
>
> Key: SYSTEMML-1561
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
> Fix For: SystemML 1.0
>
> Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, 
> scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around 
> the spatial dimensions (height and width) of the images that are stretched 
> out into rows of the input/output matrices.  These output dimensions are 
> computed within the forward functions of the above layers as small scalar 
> equations.  From a mathematical standpoint, these sizes can be determined at 
> compile time, and it is nice to have these size equations in DML (v.s. hiding 
> them inside the engine within built-in functions).  However, we do not 
> currently evaluate these expressions during compilation, and thus we are left 
> with unknown sizes even during recompilation.  This naturally leads to max 
> memory estimates and thus often leads to unnecessary distributed runtime ops 
> rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve 
> the {{Houtc1}} & {{Woutc1}} values that are returned from a 
> `conv2d::forward(...)` function.  These represent the spatial dimensions of 
> the volume with each of the rows of the output {{outc1}} of the function, and 
> the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal 
> to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is 
> created that should have the same dimensions as {{outc1}}.  For the columns, 
> if I use {{cols=ncol(outc1)}} in this rand statement, the size will be 
> propagated and CP ops will be compiled and run.  I I instead use 
> {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during 
> recompilation, and thus Spark ops will be compiled and run.  I have included 
> the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} 
> function is inserted after the {{conv2d::forward(...)}} function that 
> requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. 
>  Since those latter variables are not executed during compilation time, the 
> max pooling sizes remain unknown, even during recompilation, and thus Spark 
> ops will be compiled and run.  I have included the recompile hops plan 
> ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these 
> scenarios are fixed, as they are necessary for performant deep learning 
> applications.  Note too that this issue will be present in other non-deep 
> learning scenarios as well.
> Mailing list thread: 
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (SYSTEMML-1561) Improve constant folding during compilation

2017-05-09 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003546#comment-16003546
 ] 

Mike Dusenberry edited comment on SYSTEMML-1561 at 5/9/17 9:04 PM:
---

As I noted on SYSTEMML-1566, I ran experiments again using (1) the commit 
before the IPA scalar replacement update, (2) the commit with the IPA scalar 
replacement update, and (3) the proposed commit with the updated constant 
folding (which relies on the IPA update for usefulness), and measured the 
following results:

commit 2c5c3b14e1906cda70ae1581b19a5e908b3ab329 (pre IPA update)
{code}
17/05/05 14:39:49 INFO ScriptExecutorUtils: SystemML Statistics:
Total elapsed time: 712.183 sec.
Total compilation time: 1.996 sec.
Total execution time:   710.187 sec.
Number of compiled Spark inst:  134.
Number of executed Spark inst:  2513.
Cache hits (Mem, WB, FS, HDFS): 153624/0/0/862.
Cache writes (WB, FS, HDFS):79043/0/2170.
Cache times (ACQr/m, RLS, EXP): 32.052/0.038/5.508/55.790 sec.
HOP DAGs recompiled (PRED, SB): 0/5979.
HOP DAGs recompile time:3.670 sec.
Functions recompiled:   10.
Functions recompile time:   0.082 sec.
Spark ctx create time (lazy):   0.959 sec.
Spark trans counts (par,bc,col):347/1649/862.
Spark trans times (par,bc,col): 0.671/25.076/31.988 secs.
Total JIT compile time: 118.9 sec.
Total JVM GC count: 267.
Total JVM GC time:  7.523 sec.
Heavy hitter instructions (name, time, count):
-- 1)   train   671.994 sec 1
-- 2)   conv2d_bias_add 198.398 sec 3298
-- 3)   maxpooling_backward 174.666 sec 1720
-- 4)   predict 140.782 sec 9
-- 5)   sp_mapmm94.035 sec  1649
-- 6)   conv2d_backward_filter  63.328 sec  1720
-- 7)   sp_sel+ 39.259 sec  860
-- 8)   ba+*18.615 sec  5089
-- 9)   +*  16.627 sec  10320
-- 10)  conv2d_backward_data14.297 sec  860
{code}

commit abc9686fbaaa11c12cfa02c49c7675165acdf176 (w/ IPA update)
{code}
17/05/05 15:05:16 INFO ScriptExecutorUtils: SystemML Statistics:
Total elapsed time: 673.900 sec.
Total compilation time: 1.938 sec.
Total execution time:   671.962 sec.
Number of compiled Spark inst:  128.
Number of executed Spark inst:  2513.
Cache hits (Mem, WB, FS, HDFS): 153645/0/0/862.
Cache writes (WB, FS, HDFS):79043/0/2149.
Cache times (ACQr/m, RLS, EXP): 31.568/0.038/4.639/54.790 sec.
HOP DAGs recompiled (PRED, SB): 0/5978.
HOP DAGs recompile time:3.705 sec.
Functions recompiled:   10.
Functions recompile time:   0.068 sec.
Spark ctx create time (lazy):   0.948 sec.
Spark trans counts (par,bc,col):368/1649/862.
Spark trans times (par,bc,col): 0.689/26.035/31.503 secs.
Total JIT compile time: 111.921 sec.
Total JVM GC count: 265.
Total JVM GC time:  7.118 sec.
Heavy hitter instructions (name, time, count):
-- 1)   train   634.306 sec 1
-- 2)   conv2d_bias_add 190.557 sec 3298
-- 3)   maxpooling_backward 141.588 sec 1720
-- 4)   predict 135.222 sec 9
-- 5)   sp_mapmm94.025 sec  1649
-- 6)   conv2d_backward_filter  66.058 sec  1720
-- 7)   sp_sel+ 39.204 sec  860
-- 8)   +*  18.272 sec  10320
-- 9)   ba+*15.804 sec  5089
-- 10)  conv2d_backward_data13.627 sec  860
{code}

w/ updated constant folding
{code}
17/05/05 15:15:19 INFO ScriptExecutorUtils: SystemML Statistics:
Total elapsed time: 405.615 sec.
Total compilation time: 2.070 sec.
Total execution time:   403.545 sec.
Number of compiled Spark inst:  139.
Number of executed Spark inst:  793.
Cache hits (Mem, WB, FS, HDFS): 156654/0/0/2.
Cache writes (WB, FS, HDFS):79043/0/8.
Cache times (ACQr/m, RLS, EXP): 3.467/0.043/3.566/1.175 sec.
HOP DAGs recompiled (PRED, SB): 0/5978.
HOP DAGs recompile time:3.178 sec.
Functions recompiled:   10.
Functions recompile time:   0.072 sec.
Spark ctx create time (lazy):   1.024 sec.
Spark trans counts (par,bc,col):789/789/2.
Spark trans times (par,bc,col): 0.982/0.299/3.418 secs.
Total JIT compile time: 145.368 sec.
Total JVM GC count: 438.
Total JVM GC time:  8.992 sec.
Heavy hitter instructions (name, time, count):
-- 1)   train   370.373 sec 1
-- 2)   conv2d_bias_add 178.914 sec 3298
-- 3)   predict 116.145 sec 9
-- 4)   conv2d_backward_filter  55.582 sec  1720
-- 5)   +*  18.948 sec  10320
-- 6)   sel+18.238 sec  3369
-- 7)   ba+*16.171 sec  5949
-- 8)   conv2d_backward_data15.038 sec  860
-- 9)   sp_mapmm13.980 sec  789
-- 10)  relu_maxpooling 12.415 sec  3298
{code}

With the IPA scalar replacement + constant folding updates, we've gained an 
additio

[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-05-09 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003546#comment-16003546
 ] 

Mike Dusenberry commented on SYSTEMML-1561:
---

As I noted on SystemML-1566, I ran experiments again using (1) the commit 
before the IPA scalar replacement update, (2) the commit with the IPA scalar 
replacement update, and (3) the proposed commit with the updated constant 
folding (which relies on the IPA update for usefulness), and measured the 
following results:

commit 2c5c3b14e1906cda70ae1581b19a5e908b3ab329 (pre IPA update)
{code}
17/05/05 14:39:49 INFO ScriptExecutorUtils: SystemML Statistics:
Total elapsed time: 712.183 sec.
Total compilation time: 1.996 sec.
Total execution time:   710.187 sec.
Number of compiled Spark inst:  134.
Number of executed Spark inst:  2513.
Cache hits (Mem, WB, FS, HDFS): 153624/0/0/862.
Cache writes (WB, FS, HDFS):79043/0/2170.
Cache times (ACQr/m, RLS, EXP): 32.052/0.038/5.508/55.790 sec.
HOP DAGs recompiled (PRED, SB): 0/5979.
HOP DAGs recompile time:3.670 sec.
Functions recompiled:   10.
Functions recompile time:   0.082 sec.
Spark ctx create time (lazy):   0.959 sec.
Spark trans counts (par,bc,col):347/1649/862.
Spark trans times (par,bc,col): 0.671/25.076/31.988 secs.
Total JIT compile time: 118.9 sec.
Total JVM GC count: 267.
Total JVM GC time:  7.523 sec.
Heavy hitter instructions (name, time, count):
-- 1)   train   671.994 sec 1
-- 2)   conv2d_bias_add 198.398 sec 3298
-- 3)   maxpooling_backward 174.666 sec 1720
-- 4)   predict 140.782 sec 9
-- 5)   sp_mapmm94.035 sec  1649
-- 6)   conv2d_backward_filter  63.328 sec  1720
-- 7)   sp_sel+ 39.259 sec  860
-- 8)   ba+*18.615 sec  5089
-- 9)   +*  16.627 sec  10320
-- 10)  conv2d_backward_data14.297 sec  860
{code}

commit abc9686fbaaa11c12cfa02c49c7675165acdf176 (w/ IPA update)
{code}
17/05/05 15:05:16 INFO ScriptExecutorUtils: SystemML Statistics:
Total elapsed time: 673.900 sec.
Total compilation time: 1.938 sec.
Total execution time:   671.962 sec.
Number of compiled Spark inst:  128.
Number of executed Spark inst:  2513.
Cache hits (Mem, WB, FS, HDFS): 153645/0/0/862.
Cache writes (WB, FS, HDFS):79043/0/2149.
Cache times (ACQr/m, RLS, EXP): 31.568/0.038/4.639/54.790 sec.
HOP DAGs recompiled (PRED, SB): 0/5978.
HOP DAGs recompile time:3.705 sec.
Functions recompiled:   10.
Functions recompile time:   0.068 sec.
Spark ctx create time (lazy):   0.948 sec.
Spark trans counts (par,bc,col):368/1649/862.
Spark trans times (par,bc,col): 0.689/26.035/31.503 secs.
Total JIT compile time: 111.921 sec.
Total JVM GC count: 265.
Total JVM GC time:  7.118 sec.
Heavy hitter instructions (name, time, count):
-- 1)   train   634.306 sec 1
-- 2)   conv2d_bias_add 190.557 sec 3298
-- 3)   maxpooling_backward 141.588 sec 1720
-- 4)   predict 135.222 sec 9
-- 5)   sp_mapmm94.025 sec  1649
-- 6)   conv2d_backward_filter  66.058 sec  1720
-- 7)   sp_sel+ 39.204 sec  860
-- 8)   +*  18.272 sec  10320
-- 9)   ba+*15.804 sec  5089
-- 10)  conv2d_backward_data13.627 sec  860
{code}

w/ updated constant folding
{code}
17/05/05 15:15:19 INFO ScriptExecutorUtils: SystemML Statistics:
Total elapsed time: 405.615 sec.
Total compilation time: 2.070 sec.
Total execution time:   403.545 sec.
Number of compiled Spark inst:  139.
Number of executed Spark inst:  793.
Cache hits (Mem, WB, FS, HDFS): 156654/0/0/2.
Cache writes (WB, FS, HDFS):79043/0/8.
Cache times (ACQr/m, RLS, EXP): 3.467/0.043/3.566/1.175 sec.
HOP DAGs recompiled (PRED, SB): 0/5978.
HOP DAGs recompile time:3.178 sec.
Functions recompiled:   10.
Functions recompile time:   0.072 sec.
Spark ctx create time (lazy):   1.024 sec.
Spark trans counts (par,bc,col):789/789/2.
Spark trans times (par,bc,col): 0.982/0.299/3.418 secs.
Total JIT compile time: 145.368 sec.
Total JVM GC count: 438.
Total JVM GC time:  8.992 sec.
Heavy hitter instructions (name, time, count):
-- 1)   train   370.373 sec 1
-- 2)   conv2d_bias_add 178.914 sec 3298
-- 3)   predict 116.145 sec 9
-- 4)   conv2d_backward_filter  55.582 sec  1720
-- 5)   +*  18.948 sec  10320
-- 6)   sel+18.238 sec  3369
-- 7)   ba+*16.171 sec  5949
-- 8)   conv2d_backward_data15.038 sec  860
-- 9)   sp_mapmm13.980 sec  789
-- 10)  relu_maxpooling 12.415 sec  3298
{code}

With the IPA scalar replacement + constant folding updates, we've gained an 
additional ~300s, for a ~1.75x speedup in this

[jira] [Comment Edited] (SYSTEMML-1566) Possible regression from 0.13 -> 0.14 for MNIST LeNet script

2017-05-09 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003543#comment-16003543
 ] 

Mike Dusenberry edited comment on SYSTEMML-1566 at 5/9/17 9:01 PM:
---

I ran experiments again using (1) the commit before the IPA scalar replacement 
update, (2) the commit with the IPA scalar replacement update, and (3) the 
proposed commit with the updated constant folding (which relies on the IPA 
update for usefulness), and measured the following results:

commit 2c5c3b14e1906cda70ae1581b19a5e908b3ab329 (pre IPA update)
{code}
17/05/05 14:39:49 INFO ScriptExecutorUtils: SystemML Statistics:
Total elapsed time: 712.183 sec.
Total compilation time: 1.996 sec.
Total execution time:   710.187 sec.
Number of compiled Spark inst:  134.
Number of executed Spark inst:  2513.
Cache hits (Mem, WB, FS, HDFS): 153624/0/0/862.
Cache writes (WB, FS, HDFS):79043/0/2170.
Cache times (ACQr/m, RLS, EXP): 32.052/0.038/5.508/55.790 sec.
HOP DAGs recompiled (PRED, SB): 0/5979.
HOP DAGs recompile time:3.670 sec.
Functions recompiled:   10.
Functions recompile time:   0.082 sec.
Spark ctx create time (lazy):   0.959 sec.
Spark trans counts (par,bc,col):347/1649/862.
Spark trans times (par,bc,col): 0.671/25.076/31.988 secs.
Total JIT compile time: 118.9 sec.
Total JVM GC count: 267.
Total JVM GC time:  7.523 sec.
Heavy hitter instructions (name, time, count):
-- 1)   train   671.994 sec 1
-- 2)   conv2d_bias_add 198.398 sec 3298
-- 3)   maxpooling_backward 174.666 sec 1720
-- 4)   predict 140.782 sec 9
-- 5)   sp_mapmm94.035 sec  1649
-- 6)   conv2d_backward_filter  63.328 sec  1720
-- 7)   sp_sel+ 39.259 sec  860
-- 8)   ba+*18.615 sec  5089
-- 9)   +*  16.627 sec  10320
-- 10)  conv2d_backward_data14.297 sec  860
{code}

commit abc9686fbaaa11c12cfa02c49c7675165acdf176 (w/ IPA update)
{code}
17/05/05 15:05:16 INFO ScriptExecutorUtils: SystemML Statistics:
Total elapsed time: 673.900 sec.
Total compilation time: 1.938 sec.
Total execution time:   671.962 sec.
Number of compiled Spark inst:  128.
Number of executed Spark inst:  2513.
Cache hits (Mem, WB, FS, HDFS): 153645/0/0/862.
Cache writes (WB, FS, HDFS):79043/0/2149.
Cache times (ACQr/m, RLS, EXP): 31.568/0.038/4.639/54.790 sec.
HOP DAGs recompiled (PRED, SB): 0/5978.
HOP DAGs recompile time:3.705 sec.
Functions recompiled:   10.
Functions recompile time:   0.068 sec.
Spark ctx create time (lazy):   0.948 sec.
Spark trans counts (par,bc,col):368/1649/862.
Spark trans times (par,bc,col): 0.689/26.035/31.503 secs.
Total JIT compile time: 111.921 sec.
Total JVM GC count: 265.
Total JVM GC time:  7.118 sec.
Heavy hitter instructions (name, time, count):
-- 1)   train   634.306 sec 1
-- 2)   conv2d_bias_add 190.557 sec 3298
-- 3)   maxpooling_backward 141.588 sec 1720
-- 4)   predict 135.222 sec 9
-- 5)   sp_mapmm94.025 sec  1649
-- 6)   conv2d_backward_filter  66.058 sec  1720
-- 7)   sp_sel+ 39.204 sec  860
-- 8)   +*  18.272 sec  10320
-- 9)   ba+*15.804 sec  5089
-- 10)  conv2d_backward_data13.627 sec  860
{code}

w/ updated constant folding
{code}
17/05/05 15:15:19 INFO ScriptExecutorUtils: SystemML Statistics:
Total elapsed time: 405.615 sec.
Total compilation time: 2.070 sec.
Total execution time:   403.545 sec.
Number of compiled Spark inst:  139.
Number of executed Spark inst:  793.
Cache hits (Mem, WB, FS, HDFS): 156654/0/0/2.
Cache writes (WB, FS, HDFS):79043/0/8.
Cache times (ACQr/m, RLS, EXP): 3.467/0.043/3.566/1.175 sec.
HOP DAGs recompiled (PRED, SB): 0/5978.
HOP DAGs recompile time:3.178 sec.
Functions recompiled:   10.
Functions recompile time:   0.072 sec.
Spark ctx create time (lazy):   1.024 sec.
Spark trans counts (par,bc,col):789/789/2.
Spark trans times (par,bc,col): 0.982/0.299/3.418 secs.
Total JIT compile time: 145.368 sec.
Total JVM GC count: 438.
Total JVM GC time:  8.992 sec.
Heavy hitter instructions (name, time, count):
-- 1)   train   370.373 sec 1
-- 2)   conv2d_bias_add 178.914 sec 3298
-- 3)   predict 116.145 sec 9
-- 4)   conv2d_backward_filter  55.582 sec  1720
-- 5)   +*  18.948 sec  10320
-- 6)   sel+18.238 sec  3369
-- 7)   ba+*16.171 sec  5949
-- 8)   conv2d_backward_data15.038 sec  860
-- 9)   sp_mapmm13.980 sec  789
-- 10)  relu_maxpooling 12.415 sec  3298
{code}

It appears that there was a bug with the max pooling built-in operator that was 
fixed since the 0.14 release, which bro

[jira] [Commented] (SYSTEMML-1566) Possible regression from 0.13 -> 0.14 for MNIST LeNet script

2017-05-09 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003543#comment-16003543
 ] 

Mike Dusenberry commented on SYSTEMML-1566:
---

I ran experiments again using (1) the commit before the IPA scalar replacement 
update, (2) the commit with the IPA scalar replacement update, and (3) the 
proposed commit with the updated constant folding (which relies on the IPA 
update for usefulness), and measured the following results:

commit 2c5c3b14e1906cda70ae1581b19a5e908b3ab329 (pre IPA update)
{code}
17/05/05 14:39:49 INFO ScriptExecutorUtils: SystemML Statistics:
Total elapsed time: 712.183 sec.
Total compilation time: 1.996 sec.
Total execution time:   710.187 sec.
Number of compiled Spark inst:  134.
Number of executed Spark inst:  2513.
Cache hits (Mem, WB, FS, HDFS): 153624/0/0/862.
Cache writes (WB, FS, HDFS):79043/0/2170.
Cache times (ACQr/m, RLS, EXP): 32.052/0.038/5.508/55.790 sec.
HOP DAGs recompiled (PRED, SB): 0/5979.
HOP DAGs recompile time:3.670 sec.
Functions recompiled:   10.
Functions recompile time:   0.082 sec.
Spark ctx create time (lazy):   0.959 sec.
Spark trans counts (par,bc,col):347/1649/862.
Spark trans times (par,bc,col): 0.671/25.076/31.988 secs.
Total JIT compile time: 118.9 sec.
Total JVM GC count: 267.
Total JVM GC time:  7.523 sec.
Heavy hitter instructions (name, time, count):
-- 1)   train   671.994 sec 1
-- 2)   conv2d_bias_add 198.398 sec 3298
-- 3)   maxpooling_backward 174.666 sec 1720
-- 4)   predict 140.782 sec 9
-- 5)   sp_mapmm94.035 sec  1649
-- 6)   conv2d_backward_filter  63.328 sec  1720
-- 7)   sp_sel+ 39.259 sec  860
-- 8)   ba+*18.615 sec  5089
-- 9)   +*  16.627 sec  10320
-- 10)  conv2d_backward_data14.297 sec  860
{code}

commit abc9686fbaaa11c12cfa02c49c7675165acdf176 (w/ IPA update)
{code}
17/05/05 15:05:16 INFO ScriptExecutorUtils: SystemML Statistics:
Total elapsed time: 673.900 sec.
Total compilation time: 1.938 sec.
Total execution time:   671.962 sec.
Number of compiled Spark inst:  128.
Number of executed Spark inst:  2513.
Cache hits (Mem, WB, FS, HDFS): 153645/0/0/862.
Cache writes (WB, FS, HDFS):79043/0/2149.
Cache times (ACQr/m, RLS, EXP): 31.568/0.038/4.639/54.790 sec.
HOP DAGs recompiled (PRED, SB): 0/5978.
HOP DAGs recompile time:3.705 sec.
Functions recompiled:   10.
Functions recompile time:   0.068 sec.
Spark ctx create time (lazy):   0.948 sec.
Spark trans counts (par,bc,col):368/1649/862.
Spark trans times (par,bc,col): 0.689/26.035/31.503 secs.
Total JIT compile time: 111.921 sec.
Total JVM GC count: 265.
Total JVM GC time:  7.118 sec.
Heavy hitter instructions (name, time, count):
-- 1)   train   634.306 sec 1
-- 2)   conv2d_bias_add 190.557 sec 3298
-- 3)   maxpooling_backward 141.588 sec 1720
-- 4)   predict 135.222 sec 9
-- 5)   sp_mapmm94.025 sec  1649
-- 6)   conv2d_backward_filter  66.058 sec  1720
-- 7)   sp_sel+ 39.204 sec  860
-- 8)   +*  18.272 sec  10320
-- 9)   ba+*15.804 sec  5089
-- 10)  conv2d_backward_data13.627 sec  860
{code}

w/ updated constant folding
{code}
17/05/05 15:15:19 INFO ScriptExecutorUtils: SystemML Statistics:
Total elapsed time: 405.615 sec.
Total compilation time: 2.070 sec.
Total execution time:   403.545 sec.
Number of compiled Spark inst:  139.
Number of executed Spark inst:  793.
Cache hits (Mem, WB, FS, HDFS): 156654/0/0/2.
Cache writes (WB, FS, HDFS):79043/0/8.
Cache times (ACQr/m, RLS, EXP): 3.467/0.043/3.566/1.175 sec.
HOP DAGs recompiled (PRED, SB): 0/5978.
HOP DAGs recompile time:3.178 sec.
Functions recompiled:   10.
Functions recompile time:   0.072 sec.
Spark ctx create time (lazy):   1.024 sec.
Spark trans counts (par,bc,col):789/789/2.
Spark trans times (par,bc,col): 0.982/0.299/3.418 secs.
Total JIT compile time: 145.368 sec.
Total JVM GC count: 438.
Total JVM GC time:  8.992 sec.
Heavy hitter instructions (name, time, count):
-- 1)   train   370.373 sec 1
-- 2)   conv2d_bias_add 178.914 sec 3298
-- 3)   predict 116.145 sec 9
-- 4)   conv2d_backward_filter  55.582 sec  1720
-- 5)   +*  18.948 sec  10320
-- 6)   sel+18.238 sec  3369
-- 7)   ba+*16.171 sec  5949
-- 8)   conv2d_backward_data15.038 sec  860
-- 9)   sp_mapmm13.980 sec  789
-- 10)  relu_maxpooling 12.415 sec  3298
{code}

It appears that there was a bug with the max pooling built-in operator that was 
fixed since the 0.14 release, which brought the execution time down from ~1000s 
to ~

[jira] [Commented] (SYSTEMML-1594) Mlogit performance

2017-05-09 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003335#comment-16003335
 ] 

Mike Dusenberry commented on SYSTEMML-1594:
---

That's a long time.  Do you happen to have the log of model performance during 
training?

> Mlogit performance
> --
>
> Key: SYSTEMML-1594
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1594
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms
>Affects Versions: SystemML 0.13
> Environment: --master yarn --deploy-mode client --driver-memory 10G 
> --executor-memory 100G --num-executors 2 --executor-cores 48
>Reporter: Brendan Dwyer
>Priority: Minor
>  Labels: performance
>
> Mlogit in SparkR with a dense csv file (~120,000,000 rows & 10 columns) takes 
> about 200 seconds while in SystemML it takes hours.
> stats when I killed the job:
> {code}
> Total elapsed time:   3810.682 sec.
> Total compilation time:   1.346 sec.
> Total execution time: 3809.336 sec.
> Number of compiled Spark inst:86.
> Number of executed Spark inst:199.
> Cache hits (Mem, WB, FS, HDFS):   3130/0/116/31.
> Cache writes (WB, FS, HDFS):  454/348/0.
> Cache times (ACQr/m, RLS, EXP):   427.049/0.007/644.593/0.000 sec.
> HOP DAGs recompiled (PRED, SB):   0/693.
> HOP DAGs recompile time:  0.482 sec.
> Spark ctx create time (lazy): 7.391 sec.
> Spark trans counts (par,bc,col):0/274/30.
> Spark trans times (par,bc,col):   0.000/491.867/176.149 secs.
> Total JIT compile time:   112.869 sec.
> Total JVM GC count:   1222.
> Total JVM GC time:306.026 sec.
> Heavy hitter instructions (name, time, count):
> -- 1) sp_mapmmchain   1606.520 sec108
> -- 2) sp_mapmm367.025 sec 56
> -- 3) append  272.604 sec 29
> -- 4) sprop   239.554 sec 108
> -- 5) exp 238.915 sec 29
> -- 6) rangeReIndex217.167 sec 164
> -- 7) -   188.271 sec 318
> -- 8) /   164.405 sec 334
> -- 9) tak+*   138.371 sec 29
> -- 10)log 132.135 sec 30
> {code}
> Hops explain:
> {code}
> PROGRAM
> --MAIN PROGRAM
> GENERIC (lines 69-98) [recompile=true]
> --(4) TWrite fileB [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(6) TWrite fileLog [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(8) TWrite fmtB [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(10) TWrite intercept_status [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(12) TWrite regularization [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(15) TWrite maxiter [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(17) TWrite maxinneriter [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(19) TWrite tol [0,0,0,0,-1] [0,0,0 -> 0MB], CP
> --(21) u(print) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
> --(23) u(print) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
> --(34) PRead X [120748239,9,1000,1000,-1] [0,0,8291 -> 8291MB] [rblk], 
> SPARK
> --(35) TWrite X (34) [120748239,9,1000,1000,-1] [8291,0,0 -> 8291MB], 
> SPARK
> --(37) u(print) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
> --(48) PRead Y_vec [120748239,1,1000,1000,-1] [0,0,921 -> 921MB] 
> [rblk,chkpt], CP
> --(49) TWrite Y_vec (48) [120748239,1,1000,1000,-1] [921,0,0 -> 921MB], CP
> --(51) TWrite eta0 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(53) TWrite eta1 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(55) TWrite eta2 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(57) TWrite sigma1 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(59) TWrite sigma2 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(61) TWrite sigma3 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(63) TWrite psi [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(65) TWrite N [0,0,0,0,-1] [0,0,0 -> 0MB], CP
> --(67) TWrite D [0,0,0,0,-1] [0,0,0 -> 0MB], CP
> GENERIC (lines 103-104) [recompile=true]
> --(76) TRead X [120748239,9,1000,1000,-1] [0,0,8291 -> 8291MB], SPARK
> --(85) dg(rand) [120748239,1,1000,1000,120748239] [0,1,921 -> 922MB], CP
> --(86) b(cbind) (76,85) [120748239,10,1000,1000,-1] [9212,0,9212 -> 
> 18425MB] [chkpt], SPARK
> --(87) TWrite X (86) [120748239,10,1000,1000,-1] [9212,0,0 -> 9212MB], 
> SPARK
> --(89) TWrite D [0,0,0,0,-1] [0,0,0 -> 0MB], CP
> GENERIC (lines 107-107) [recompile=false]
> --(90) TRead D [0,0,0,0,-1] [0,0,0 -> 0MB]
> --(98) dg(rand) (90) [10,1,1000,1000,-1] [0,0,0 -> 0MB], CP
> --(99) TWr

[jira] [Commented] (SYSTEMML-1561) Improve constant folding during compilation

2017-05-09 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003330#comment-16003330
 ] 

Mike Dusenberry commented on SYSTEMML-1561:
---

[PR 484 | https://github.com/apache/incubator-systemml/pull/484] submitted.  
[~mboehm7] Can you please review when you get a chance?

> Improve constant folding during compilation
> ---
>
> Key: SYSTEMML-1561
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1561
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
> Fix For: SystemML 1.0
>
> Attachments: scenario1_plan.txt, scenario1.py, scenario2_plan.txt, 
> scenario2.py
>
>
> In our `nn` library, our convolution and pooling layers have to pass around 
> the spatial dimensions (height and width) of the images that are stretched 
> out into rows of the input/output matrices.  These output dimensions are 
> computed within the forward functions of the above layers as small scalar 
> equations.  From a mathematical standpoint, these sizes can be determined at 
> compile time, and it is nice to have these size equations in DML (v.s. hiding 
> them inside the engine within built-in functions).  However, we do not 
> currently evaluate these expressions during compilation, and thus we are left 
> with unknown sizes even during recompilation.  This naturally leads to max 
> memory estimates and thus often leads to unnecessary distributed runtime ops 
> rather than simple CP ones.
> I have two related scenarios for which this is a problem.  They both involve 
> the {{Houtc1}} & {{Woutc1}} values that are returned from a 
> `conv2d::forward(...)` function.  These represent the spatial dimensions of 
> the volume with each of the rows of the output {{outc1}} of the function, and 
> the third dimension is {{F1}}.  Thus, {{outc1}} has a number of columns equal 
> to {{F1*Houtc1*Wouc1}}.
> In the first scenario ({{scenario1.py}}), a random matrix {{doutc1}} is 
> created that should have the same dimensions as {{outc1}}.  For the columns, 
> if I use {{cols=ncol(outc1)}} in this rand statement, the size will be 
> propagated and CP ops will be compiled and run.  I I instead use 
> {{cols=F1*Houtc1*Woutc1}}, the size will forever be unknown, even during 
> recompilation, and thus Spark ops will be compiled and run.  I have included 
> the recompile hops plan ({{scenario1_plan.txt}}).
> In the second scenario ({{scenario2.py}}), a {{max_pool2d::forward(...)}} 
> function is inserted after the {{conv2d::forward(...)}} function that 
> requires the {{Houtc1}} and {{Woutc1}} variables to be supplied as arguments. 
>  Since those latter variables are not executed during compilation time, the 
> max pooling sizes remain unknown, even during recompilation, and thus Spark 
> ops will be compiled and run.  I have included the recompile hops plan 
> ({{scenario2_plan.txt}}).
> We should either improve or fix our constant folding rewrites so that these 
> scenarios are fixed, as they are necessary for performant deep learning 
> applications.  Note too that this issue will be present in other non-deep 
> learning scenarios as well.
> Mailing list thread: 
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01657.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1595) Missing Block Sizes For PersistentWrites & TransientWrites

2017-05-09 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003329#comment-16003329
 ] 

Mike Dusenberry commented on SYSTEMML-1595:
---

cc [~mboehm7]

> Missing Block Sizes For PersistentWrites & TransientWrites
> --
>
> Key: SYSTEMML-1595
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1595
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
> Attachments: scenario1.dml
>
>
> In the attached script, the resulting PersisentWrites for {{doutc1_agg}} & 
> {{dWc1_agg}} end up having unknown block sizes, despite the input DAGs for 
> those variables having known block sizes.  Due to this, when we use MLContext 
> and mark those variables as outputs, the PersistentWrites will be rewritten 
> to TransientWrites, and the block sizes will remain unknown.
> To run:
> {code}
> spark-submit $SYSTEMML_HOME/target/SystemML.jar -f scenario1.dml -explain 
> recompile_hops
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1595) Missing Block Sizes For PersistentWrites & TransientWrites

2017-05-09 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-1595:
-

 Summary: Missing Block Sizes For PersistentWrites & TransientWrites
 Key: SYSTEMML-1595
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1595
 Project: SystemML
  Issue Type: Bug
Reporter: Mike Dusenberry
 Attachments: scenario1.dml

In the attached script, the resulting PersisentWrites for {{doutc1_agg}} & 
{{dWc1_agg}} end up having unknown block sizes, despite the input DAGs for 
those variables having known block sizes.  Due to this, when we use MLContext 
and mark those variables as outputs, the PersistentWrites will be rewritten to 
TransientWrites, and the block sizes will remain unknown.

To run:
{code}
spark-submit $SYSTEMML_HOME/target/SystemML.jar -f scenario1.dml -explain 
recompile_hops
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1594) Mlogit performance

2017-05-09 Thread Brendan Dwyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003102#comment-16003102
 ] 

Brendan Dwyer commented on SYSTEMML-1594:
-

I ran the job yesterday evening and let it finish:

{code}
17/05/08 21:38:48 INFO DMLScript: SystemML Statistics:
Total elapsed time: 18081.240 sec.
Total compilation time: 1.370 sec.
Total execution time:   18079.870 sec.
Number of compiled Spark inst:  86.
Number of executed Spark inst:  932.
Cache hits (Mem, WB, FS, HDFS): 15148/0/398/102.
Cache writes (WB, FS, HDFS):2060/1435/1.
Cache times (ACQr/m, RLS, EXP): 1374.417/0.022/2553.565/3.074 sec.
HOP DAGs recompiled (PRED, SB): 0/3371.
HOP DAGs recompile time:1.432 sec.
Spark ctx create time (lazy):   5.885 sec.
Spark trans counts (par,bc,col):0/1454/101.
Spark trans times (par,bc,col): 0.000/2932.903/546.381 secs.
Total JIT compile time: 170.597 sec.
Total JVM GC count: 4993.
Total JVM GC time:  1406.29 sec.
Heavy hitter instructions (name, time, count):
-- 1)   sp_mapmmchain   9438.160 sec627
-- 2)   sprop   1456.056 sec627
-- 3)   sp_mapmm1415.644 sec199
-- 4)   append  933.689 sec 100
-- 5)   rangeReIndex910.256 sec 825
-- 6)   exp 837.920 sec 100
-- 7)   log 711.848 sec 101
-- 8)   -   662.898 sec 1019
-- 9)   /   555.954 sec 1656
-- 10)  tak+*   468.617 sec 100
{code}

> Mlogit performance
> --
>
> Key: SYSTEMML-1594
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1594
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms
>Affects Versions: SystemML 0.13
> Environment: --master yarn --deploy-mode client --driver-memory 10G 
> --executor-memory 100G --num-executors 2 --executor-cores 48
>Reporter: Brendan Dwyer
>Priority: Minor
>  Labels: performance
>
> Mlogit in SparkR with a dense csv file (~120,000,000 rows & 10 columns) takes 
> about 200 seconds while in SystemML it takes hours.
> stats when I killed the job:
> {code}
> Total elapsed time:   3810.682 sec.
> Total compilation time:   1.346 sec.
> Total execution time: 3809.336 sec.
> Number of compiled Spark inst:86.
> Number of executed Spark inst:199.
> Cache hits (Mem, WB, FS, HDFS):   3130/0/116/31.
> Cache writes (WB, FS, HDFS):  454/348/0.
> Cache times (ACQr/m, RLS, EXP):   427.049/0.007/644.593/0.000 sec.
> HOP DAGs recompiled (PRED, SB):   0/693.
> HOP DAGs recompile time:  0.482 sec.
> Spark ctx create time (lazy): 7.391 sec.
> Spark trans counts (par,bc,col):0/274/30.
> Spark trans times (par,bc,col):   0.000/491.867/176.149 secs.
> Total JIT compile time:   112.869 sec.
> Total JVM GC count:   1222.
> Total JVM GC time:306.026 sec.
> Heavy hitter instructions (name, time, count):
> -- 1) sp_mapmmchain   1606.520 sec108
> -- 2) sp_mapmm367.025 sec 56
> -- 3) append  272.604 sec 29
> -- 4) sprop   239.554 sec 108
> -- 5) exp 238.915 sec 29
> -- 6) rangeReIndex217.167 sec 164
> -- 7) -   188.271 sec 318
> -- 8) /   164.405 sec 334
> -- 9) tak+*   138.371 sec 29
> -- 10)log 132.135 sec 30
> {code}
> Hops explain:
> {code}
> PROGRAM
> --MAIN PROGRAM
> GENERIC (lines 69-98) [recompile=true]
> --(4) TWrite fileB [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(6) TWrite fileLog [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(8) TWrite fmtB [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(10) TWrite intercept_status [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(12) TWrite regularization [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(15) TWrite maxiter [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(17) TWrite maxinneriter [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(19) TWrite tol [0,0,0,0,-1] [0,0,0 -> 0MB], CP
> --(21) u(print) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
> --(23) u(print) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
> --(34) PRead X [120748239,9,1000,1000,-1] [0,0,8291 -> 8291MB] [rblk], 
> SPARK
> --(35) TWrite X (34) [120748239,9,1000,1000,-1] [8291,0,0 -> 8291MB], 
> SPARK
> --(37) u(print) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
> --(48) PRead Y_vec [120748239,1,1000,1000,-1] [0,0,921 -> 921MB] 
> [rblk,chkpt], CP
> --(49) TWrite Y_vec (48) [120748239,1,1000,1000,-1] [921,0,0 -> 921MB], CP
> --(51) TWrite eta0 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> ---

[jira] [Created] (SYSTEMML-1594) Mlogit performance

2017-05-08 Thread Brendan Dwyer (JIRA)
Brendan Dwyer created SYSTEMML-1594:
---

 Summary: Mlogit performance
 Key: SYSTEMML-1594
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1594
 Project: SystemML
  Issue Type: Bug
  Components: Algorithms
Affects Versions: SystemML 0.13
 Environment: --master yarn --deploy-mode client --driver-memory 10G 
--executor-memory 100G --num-executors 2 --executor-cores 48
Reporter: Brendan Dwyer
Priority: Minor


Mlogit in SparkR with a dense csv file (~120,000,000 rows & 10 columns) takes 
about 200 seconds while in SystemML it takes hours.


stats when I killed the job:
{code}
Total elapsed time: 3810.682 sec.
Total compilation time: 1.346 sec.
Total execution time:   3809.336 sec.
Number of compiled Spark inst:  86.
Number of executed Spark inst:  199.
Cache hits (Mem, WB, FS, HDFS): 3130/0/116/31.
Cache writes (WB, FS, HDFS):454/348/0.
Cache times (ACQr/m, RLS, EXP): 427.049/0.007/644.593/0.000 sec.
HOP DAGs recompiled (PRED, SB): 0/693.
HOP DAGs recompile time:0.482 sec.
Spark ctx create time (lazy):   7.391 sec.
Spark trans counts (par,bc,col):0/274/30.
Spark trans times (par,bc,col): 0.000/491.867/176.149 secs.
Total JIT compile time: 112.869 sec.
Total JVM GC count: 1222.
Total JVM GC time:  306.026 sec.
Heavy hitter instructions (name, time, count):
-- 1)   sp_mapmmchain   1606.520 sec108
-- 2)   sp_mapmm367.025 sec 56
-- 3)   append  272.604 sec 29
-- 4)   sprop   239.554 sec 108
-- 5)   exp 238.915 sec 29
-- 6)   rangeReIndex217.167 sec 164
-- 7)   -   188.271 sec 318
-- 8)   /   164.405 sec 334
-- 9)   tak+*   138.371 sec 29
-- 10)  log 132.135 sec 30
{code}

Hops explain:
{code}
PROGRAM
--MAIN PROGRAM
GENERIC (lines 69-98) [recompile=true]
--(4) TWrite fileB [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
--(6) TWrite fileLog [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
--(8) TWrite fmtB [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
--(10) TWrite intercept_status [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
--(12) TWrite regularization [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
--(15) TWrite maxiter [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
--(17) TWrite maxinneriter [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
--(19) TWrite tol [0,0,0,0,-1] [0,0,0 -> 0MB], CP
--(21) u(print) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
--(23) u(print) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
--(34) PRead X [120748239,9,1000,1000,-1] [0,0,8291 -> 8291MB] [rblk], SPARK
--(35) TWrite X (34) [120748239,9,1000,1000,-1] [8291,0,0 -> 8291MB], SPARK
--(37) u(print) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
--(48) PRead Y_vec [120748239,1,1000,1000,-1] [0,0,921 -> 921MB] 
[rblk,chkpt], CP
--(49) TWrite Y_vec (48) [120748239,1,1000,1000,-1] [921,0,0 -> 921MB], CP
--(51) TWrite eta0 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
--(53) TWrite eta1 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
--(55) TWrite eta2 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
--(57) TWrite sigma1 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
--(59) TWrite sigma2 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
--(61) TWrite sigma3 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
--(63) TWrite psi [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
--(65) TWrite N [0,0,0,0,-1] [0,0,0 -> 0MB], CP
--(67) TWrite D [0,0,0,0,-1] [0,0,0 -> 0MB], CP
GENERIC (lines 103-104) [recompile=true]
--(76) TRead X [120748239,9,1000,1000,-1] [0,0,8291 -> 8291MB], SPARK
--(85) dg(rand) [120748239,1,1000,1000,120748239] [0,1,921 -> 922MB], CP
--(86) b(cbind) (76,85) [120748239,10,1000,1000,-1] [9212,0,9212 -> 
18425MB] [chkpt], SPARK
--(87) TWrite X (86) [120748239,10,1000,1000,-1] [9212,0,0 -> 9212MB], SPARK
--(89) TWrite D [0,0,0,0,-1] [0,0,0 -> 0MB], CP
GENERIC (lines 107-107) [recompile=false]
--(90) TRead D [0,0,0,0,-1] [0,0,0 -> 0MB]
--(98) dg(rand) (90) [10,1,1000,1000,-1] [0,0,0 -> 0MB], CP
--(99) TWrite scale_lambda (98) [10,1,1000,1000,-1] [0,0,0 -> 0MB], CP
GENERIC (lines 110-110) [recompile=false]
--(108) TRead scale_lambda [10,1,1000,1000,-1] [0,0,0 -> 0MB], CP
--(107) TRead D [0,0,0,0,-1] [0,0,0 -> 0MB], CP
--(112) lix (108,107,107) [10,1,1000,1000,-1] [0,0,0 -> 0MB], CP
--(113) TWrite scale_lambda (112) [10,1,1000,1000,-1] [0,0,0 -> 0MB], CP
GENERIC (lines 124-126) [recompile=true]
--(174) TRead D [0,0,0,0,-1] [0,0,0 -> 0MB]
--(183) dg(rand) (174) [10,1,1000,1000,-1] [0,0,0 -> 0MB], CP
--(184) TWrite scale_X (183) [10,1,1000,1000,-1] [0,0,0 -> 0MB], CP
--(192) dg(rand) (174) [10,1,1000,1000,0] [0,0,0 -> 0MB], CP
--(193) TWrite shift_X (192) [10,1,1000,1000,0] [0,0,0 -> 0MB], CP
--(175) TRead X [120748239,10,1000,1000,-1] [0,0,9212 -> 9212M

[jira] [Updated] (SYSTEMML-1594) Mlogit performance

2017-05-08 Thread Brendan Dwyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brendan Dwyer updated SYSTEMML-1594:

Labels: performance  (was: )

> Mlogit performance
> --
>
> Key: SYSTEMML-1594
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1594
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms
>Affects Versions: SystemML 0.13
> Environment: --master yarn --deploy-mode client --driver-memory 10G 
> --executor-memory 100G --num-executors 2 --executor-cores 48
>Reporter: Brendan Dwyer
>Priority: Minor
>  Labels: performance
>
> Mlogit in SparkR with a dense csv file (~120,000,000 rows & 10 columns) takes 
> about 200 seconds while in SystemML it takes hours.
> stats when I killed the job:
> {code}
> Total elapsed time:   3810.682 sec.
> Total compilation time:   1.346 sec.
> Total execution time: 3809.336 sec.
> Number of compiled Spark inst:86.
> Number of executed Spark inst:199.
> Cache hits (Mem, WB, FS, HDFS):   3130/0/116/31.
> Cache writes (WB, FS, HDFS):  454/348/0.
> Cache times (ACQr/m, RLS, EXP):   427.049/0.007/644.593/0.000 sec.
> HOP DAGs recompiled (PRED, SB):   0/693.
> HOP DAGs recompile time:  0.482 sec.
> Spark ctx create time (lazy): 7.391 sec.
> Spark trans counts (par,bc,col):0/274/30.
> Spark trans times (par,bc,col):   0.000/491.867/176.149 secs.
> Total JIT compile time:   112.869 sec.
> Total JVM GC count:   1222.
> Total JVM GC time:306.026 sec.
> Heavy hitter instructions (name, time, count):
> -- 1) sp_mapmmchain   1606.520 sec108
> -- 2) sp_mapmm367.025 sec 56
> -- 3) append  272.604 sec 29
> -- 4) sprop   239.554 sec 108
> -- 5) exp 238.915 sec 29
> -- 6) rangeReIndex217.167 sec 164
> -- 7) -   188.271 sec 318
> -- 8) /   164.405 sec 334
> -- 9) tak+*   138.371 sec 29
> -- 10)log 132.135 sec 30
> {code}
> Hops explain:
> {code}
> PROGRAM
> --MAIN PROGRAM
> GENERIC (lines 69-98) [recompile=true]
> --(4) TWrite fileB [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(6) TWrite fileLog [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(8) TWrite fmtB [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(10) TWrite intercept_status [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(12) TWrite regularization [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(15) TWrite maxiter [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(17) TWrite maxinneriter [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(19) TWrite tol [0,0,0,0,-1] [0,0,0 -> 0MB], CP
> --(21) u(print) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
> --(23) u(print) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
> --(34) PRead X [120748239,9,1000,1000,-1] [0,0,8291 -> 8291MB] [rblk], 
> SPARK
> --(35) TWrite X (34) [120748239,9,1000,1000,-1] [8291,0,0 -> 8291MB], 
> SPARK
> --(37) u(print) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
> --(48) PRead Y_vec [120748239,1,1000,1000,-1] [0,0,921 -> 921MB] 
> [rblk,chkpt], CP
> --(49) TWrite Y_vec (48) [120748239,1,1000,1000,-1] [921,0,0 -> 921MB], CP
> --(51) TWrite eta0 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(53) TWrite eta1 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(55) TWrite eta2 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(57) TWrite sigma1 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(59) TWrite sigma2 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(61) TWrite sigma3 [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(63) TWrite psi [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
> --(65) TWrite N [0,0,0,0,-1] [0,0,0 -> 0MB], CP
> --(67) TWrite D [0,0,0,0,-1] [0,0,0 -> 0MB], CP
> GENERIC (lines 103-104) [recompile=true]
> --(76) TRead X [120748239,9,1000,1000,-1] [0,0,8291 -> 8291MB], SPARK
> --(85) dg(rand) [120748239,1,1000,1000,120748239] [0,1,921 -> 922MB], CP
> --(86) b(cbind) (76,85) [120748239,10,1000,1000,-1] [9212,0,9212 -> 
> 18425MB] [chkpt], SPARK
> --(87) TWrite X (86) [120748239,10,1000,1000,-1] [9212,0,0 -> 9212MB], 
> SPARK
> --(89) TWrite D [0,0,0,0,-1] [0,0,0 -> 0MB], CP
> GENERIC (lines 107-107) [recompile=false]
> --(90) TRead D [0,0,0,0,-1] [0,0,0 -> 0MB]
> --(98) dg(rand) (90) [10,1,1000,1000,-1] [0,0,0 -> 0MB], CP
> --(99) TWrite scale_lambda (98) [10,1,1000,1000,-1] [0,0,0 -> 0MB], CP
> GENERIC (lines 110-110) [recompile=false]
> --(1

[jira] [Created] (SYSTEMML-1593) Performance issues rexpand to ultra-sparse matrix

2017-05-08 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1593:


 Summary: Performance issues rexpand to ultra-sparse matrix
 Key: SYSTEMML-1593
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1593
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm


For a detailed description see 
https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01741.html

The issue is caused by (1) wrong input partitioning (small vector input to huge 
output only leverages a small degree of parallelism), and (2) unnecessary 
shuffle. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (SYSTEMML-1584) Add Scala Script.out(...) method that accepts Lists.

2017-05-08 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson updated SYSTEMML-1584:
-
Issue Type: Improvement  (was: Bug)

> Add Scala Script.out(...) method that accepts Lists.
> 
>
> Key: SYSTEMML-1584
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1584
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>Assignee: Deron Eriksson
>
> Currently, we have {{Script.out(...)}} methods that accept one or many String 
> arguments.  We should add method that also accepts a list of Strings.  This 
> would make it easier to define a map of inputs and a list of outputs in one 
> location, and then pass these into the appropriate {{in}} and {{out}} methods 
> elsewhere.  Likewise, it would be nice to be able to use the same list to get 
> the results after running an algorithm, although of course this may not be 
> possible with the current {{getTuple}} methods.
> cc [~deron]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (SYSTEMML-1584) Add Scala Script.out(...) method that accepts Lists.

2017-05-08 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson reassigned SYSTEMML-1584:


Assignee: Deron Eriksson

> Add Scala Script.out(...) method that accepts Lists.
> 
>
> Key: SYSTEMML-1584
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1584
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>Assignee: Deron Eriksson
>
> Currently, we have {{Script.out(...)}} methods that accept one or many String 
> arguments.  We should add method that also accepts a list of Strings.  This 
> would make it easier to define a map of inputs and a list of outputs in one 
> location, and then pass these into the appropriate {{in}} and {{out}} methods 
> elsewhere.  Likewise, it would be nice to be able to use the same list to get 
> the results after running an algorithm, although of course this may not be 
> possible with the current {{getTuple}} methods.
> cc [~deron]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1559) Update pom to allow SystemML to be used as a library

2017-05-08 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson closed SYSTEMML-1559.


> Update pom to allow SystemML to be used as a library
> 
>
> Key: SYSTEMML-1559
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1559
> Project: SystemML
>  Issue Type: Improvement
>  Components: Build
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
> Fix For: SystemML 1.0
>
>
> Allow a user to use SystemML as a Java-style library using maven coordinates, 
> such as:
> {code}
> 
>   org.apache.systemml
>   systemml
>   1.0.0-incubating-SNAPSHOT
> 
> {code}
> Currently additional dependencies need to be manually added. Ideally a user 
> can add a single systemml dependency to the user's project's pom without 
> adding additional dependencies.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1559) Update pom to allow SystemML to be used as a library

2017-05-08 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson resolved SYSTEMML-1559.
--
   Resolution: Fixed
Fix Version/s: SystemML 1.0

Addressed by [PR470|https://github.com/apache/incubator-systemml/pull/470].

> Update pom to allow SystemML to be used as a library
> 
>
> Key: SYSTEMML-1559
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1559
> Project: SystemML
>  Issue Type: Improvement
>  Components: Build
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
> Fix For: SystemML 1.0
>
>
> Allow a user to use SystemML as a Java-style library using maven coordinates, 
> such as:
> {code}
> 
>   org.apache.systemml
>   systemml
>   1.0.0-incubating-SNAPSHOT
> 
> {code}
> Currently additional dependencies need to be manually added. Ideally a user 
> can add a single systemml dependency to the user's project's pom without 
> adding additional dependencies.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1511) Tab completion for scripts using MLContext

2017-05-08 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson closed SYSTEMML-1511.


> Tab completion for scripts using MLContext
> --
>
> Key: SYSTEMML-1511
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1511
> Project: SystemML
>  Issue Type: Improvement
>  Components: APIs
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
> Fix For: SystemML 1.0
>
>
> Allow tab completion for navigating script folders, obtaining scripts, and 
> executing functions using MLContext API.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1511) Tab completion for scripts using MLContext

2017-05-08 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson resolved SYSTEMML-1511.
--
   Resolution: Fixed
Fix Version/s: SystemML 1.0

Addressed by [PR473|https://github.com/apache/incubator-systemml/pull/473].


> Tab completion for scripts using MLContext
> --
>
> Key: SYSTEMML-1511
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1511
> Project: SystemML
>  Issue Type: Improvement
>  Components: APIs
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
> Fix For: SystemML 1.0
>
>
> Allow tab completion for navigating script folders, obtaining scripts, and 
> executing functions using MLContext API.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1451) Automate performance testing and reporting

2017-05-08 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001236#comment-16001236
 ] 

Nakul Jindal commented on SYSTEMML-1451:


Lets start with those 2. But I think people would be interested in more 
eventually and you should attempt to work on them, time permitting.

> Automate performance testing and reporting
> --
>
> Key: SYSTEMML-1451
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1451
> Project: SystemML
>  Issue Type: Improvement
>  Components: Infrastructure, Test
>Reporter: Nakul Jindal
>  Labels: gsoc2017, mentor, performance, reporting, testing
>
> As part of a release (and in general), performance tests are run for SystemML.
> Currently, running and reporting on these performance tests are a manual 
> process. There are helper scripts, but largely the process is manual.
> The aim of this GSoC 2017 project is to automate performance testing and its 
> reporting.
> These are the tasks that this entails
> 1. Automate running of the performance tests, including generation of test 
> data
> 2. Detect errors and report if any
> 3. Record performance benchmarking information
> 4. Automatically compare this performance to previous version to check for 
> performance regressions
> 5. Automatically compare to Spark MLLib, R?, Julia?
> 6. Prepare report with all the information about failed jobs, performance 
> information, perf info against other comparable projects/algorithms 
> (plotted/in plain text in CSV, PDF or other common format)
> 7. Create scripts to automatically run this process on a cloud provider that 
> spins up machines, runs the test, saves the reports and spins down the 
> machines.
> 8. Create a web application to do this interactively without dropping down 
> into a shell.
> As part of this project, the student will need to know scripting (in Bash, 
> Python, etc). It may also involve changing error reporting and performance 
> reporting code in SystemML. 
> Rating - Medium (for the amount of work)
> Mentor - [~nakul02] (Other co-mentors will join in)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1583) Implement converter in Python to convert caffemodel in SystemML format

2017-05-08 Thread Frederick Reiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000968#comment-16000968
 ] 

Frederick Reiss commented on SYSTEMML-1583:
---

I presume that the conversion from a .caffemodel file to our binary block 
format is bottlenecked on the I/O system, is it not? Back-of-the-envelope, even 
large models of hundreds of megabytes should take less than a second to convert.

I recommend skipping the proprietary binary coefficient format and disk layout. 
If you need to generate the files in the background as a short-term expedient, 
that's ok for now, but at least don't force the user to create and manage them. 
And let's use the actual Caffe model zoo at 
[https://github.com/BVLC/caffe/wiki/Model-Zoo], not create an out-of-date 
partial copy on Github.

> Implement converter in Python to convert caffemodel in SystemML format
> --
>
> Key: SYSTEMML-1583
>     URL: https://issues.apache.org/jira/browse/SYSTEMML-1583
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Niketan Pansare
>Assignee: Arvind Surve
>
> Ideally, this converter shouldnot require the caffe to be installed. Please 
> see 
> http://stackoverflow.com/questions/37572948/extracting-weights-from-caffemodel-without-caffe-installed-in-python
> An example code to convert a caffe model to csv if caffe is installed:
> {code}
> import caffe
> import numpy as np
> #net = 
> caffe.Net('/home/biuser/nike/barista/VGG_ILSVRC_19_layers_train_val.prototxt',
>  caffe.TEST)
> net = 
> caffe.Net('/home/biuser/VGG_trained_models/VGG_ILSVRC_19_layers_deploy.prototxt',
>  '/home/biuser/VGG_trained_models/VGG_ILSVRC_19_layers.caffemodel', 
> caffe.TEST)
> #surgery.transplant(net, base_net)
> for l in [ "conv1_1", "conv1_2", "conv2_1", "conv2_2", "conv3_1", "conv3_2", 
> "conv3_3", "conv3_4", "conv4_1", "conv4_2", "conv4_3", "conv4_4", "conv5_1", 
> "conv5_2", "conv5_3", "conv5_4", "fc6", "fc7", "fc8" ]:
> w = net.params[l][0].data
> w = w.reshape(w.shape[0], -1)
> b = net.params[l][1].data
> b = b.reshape(b.shape[0], -1)
> # You may have to reshape it for fc layers
> np.savetxt("VGG_trained_models/" + l + "_weight.csv", w, 
> delimiter=",")
> np.savetxt("VGG_trained_models/" + l + "_bias.csv", b, delimiter=",")
> {code}
> Here is an example pyspark script to test this JIRA:
> {code}
> from systemml.mllearn import Caffe2DML
> from pyspark.sql import SQLContext
> import numpy as np
> import urllib, os, scipy.ndimage
> from PIL import Image
> import systemml as sml
> # ImageNet specific parameters
> img_shape = (3, 224, 224)
> # Downloads a jpg image, resizes it to 224 and return as numpy array in N X 
> CHW format
> url = 
> 'https://upload.wikimedia.org/wikipedia/commons/thumb/5/58/MountainLion.jpg/312px-MountainLion.jpg'
> outFile = 'test.jpg'
> urllib.urlretrieve(url, outFile)
> input_image = sml.convertImageToNumPyArr(Image.open(outFile), 
> img_shape=img_shape)
> # Download the ResNet network
> import urllib
> urllib.urlretrieve('https://raw.githubusercontent.com/niketanpansare/model_zoo/master/caffe/vision/resnet/ilsvrc12/ResNet_50_network.proto',
>  'ResNet_50_network.proto')
> urllib.urlretrieve('https://raw.githubusercontent.com/niketanpansare/model_zoo/master/caffe/vision/resnet/ilsvrc12/ResNet_50_solver.proto',
>  'ResNet_50_solver.proto')
> home_dir = os.path.expanduser('~')
> # let's assume that this function is implemented as 
> saveAsBinaryBlock(inputCaffeModel, outputDir)
> resnet_pretrained_weight_dir = os.path.join(home_dir, 'model_zoo', 'caffe', 
> 'vision', 'resnet', 'ilsvrc12', 'ResNet_50_pretrained_weights')
> urllib.urlretrieve('https://deepdetect.com/models/resnet/ResNet-50-model.caffemodel',
>  'ResNet-50-model.caffemodel')
> ###
> # To be implemented as part of this JIRA
> sml.saveAsBinaryBlock('ResNet-50-model.caffemodel', 
> resnet_pretrained_weight_dir)
> ###
> resnet = Caffe2DML(sqlCtx, solver='ResNet_50_solver.proto', 
> weights=resnet_pretrained_weight_dir, input_shape=img_shape)
> resnet.predict(input_image)
> # This should return array(['cougar, puma, catamount, mountain lion, painter, 
> panther, Felis '], dtype='|S64')
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1451) Automate performance testing and reporting

2017-05-07 Thread Krishna Kalyan (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000335#comment-16000335
 ] 

Krishna Kalyan commented on SYSTEMML-1451:
--

According to the proposal I have mentioned that the following Performance 
Metrics need to be captured
a) Runtime 
b) Number of Errors
Please let me know if anything else needs to be included.

> Automate performance testing and reporting
> --
>
> Key: SYSTEMML-1451
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1451
> Project: SystemML
>  Issue Type: Improvement
>  Components: Infrastructure, Test
>Reporter: Nakul Jindal
>  Labels: gsoc2017, mentor, performance, reporting, testing
>
> As part of a release (and in general), performance tests are run for SystemML.
> Currently, running and reporting on these performance tests are a manual 
> process. There are helper scripts, but largely the process is manual.
> The aim of this GSoC 2017 project is to automate performance testing and its 
> reporting.
> These are the tasks that this entails
> 1. Automate running of the performance tests, including generation of test 
> data
> 2. Detect errors and report if any
> 3. Record performance benchmarking information
> 4. Automatically compare this performance to previous version to check for 
> performance regressions
> 5. Automatically compare to Spark MLLib, R?, Julia?
> 6. Prepare report with all the information about failed jobs, performance 
> information, perf info against other comparable projects/algorithms 
> (plotted/in plain text in CSV, PDF or other common format)
> 7. Create scripts to automatically run this process on a cloud provider that 
> spins up machines, runs the test, saves the reports and spins down the 
> machines.
> 8. Create a web application to do this interactively without dropping down 
> into a shell.
> As part of this project, the student will need to know scripting (in Bash, 
> Python, etc). It may also involve changing error reporting and performance 
> reporting code in SystemML. 
> Rating - Medium (for the amount of work)
> Mentor - [~nakul02] (Other co-mentors will join in)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (SYSTEMML-1451) Automate performance testing and reporting

2017-05-07 Thread Krishna Kalyan (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000335#comment-16000335
 ] 

Krishna Kalyan edited comment on SYSTEMML-1451 at 5/8/17 6:43 AM:
--

According to the proposal I have mentioned that the following Performance 
Metrics need to be captured
a) Runtime 
b) Number of Errors
Please let me know if anything else needs to be included.


was (Author: krishnakalyan3):
According to the proposal I have mentioned that the following Performance 
Metrics need to be captured
a) Runtime 
b) Number of Errors
Please let me know if anything else needs to be included.

> Automate performance testing and reporting
> --
>
> Key: SYSTEMML-1451
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1451
> Project: SystemML
>  Issue Type: Improvement
>  Components: Infrastructure, Test
>Reporter: Nakul Jindal
>  Labels: gsoc2017, mentor, performance, reporting, testing
>
> As part of a release (and in general), performance tests are run for SystemML.
> Currently, running and reporting on these performance tests are a manual 
> process. There are helper scripts, but largely the process is manual.
> The aim of this GSoC 2017 project is to automate performance testing and its 
> reporting.
> These are the tasks that this entails
> 1. Automate running of the performance tests, including generation of test 
> data
> 2. Detect errors and report if any
> 3. Record performance benchmarking information
> 4. Automatically compare this performance to previous version to check for 
> performance regressions
> 5. Automatically compare to Spark MLLib, R?, Julia?
> 6. Prepare report with all the information about failed jobs, performance 
> information, perf info against other comparable projects/algorithms 
> (plotted/in plain text in CSV, PDF or other common format)
> 7. Create scripts to automatically run this process on a cloud provider that 
> spins up machines, runs the test, saves the reports and spins down the 
> machines.
> 8. Create a web application to do this interactively without dropping down 
> into a shell.
> As part of this project, the student will need to know scripting (in Bash, 
> Python, etc). It may also involve changing error reporting and performance 
> reporting code in SystemML. 
> Rating - Medium (for the amount of work)
> Mentor - [~nakul02] (Other co-mentors will join in)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1591) Improve efficiency sparse-unsafe cellwise operations

2017-05-07 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1591.


> Improve efficiency sparse-unsafe cellwise operations
> 
>
> Key: SYSTEMML-1591
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1591
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> For sparse-unsafe cellwise operations, we currently iterate over all cells 
> and use binary search to access the individual values. This is unnecessarily 
> inefficient to should be reworked in favor of a sequential scan with gap 
> handling, which would also allow us to consolidate the different code paths 
> for sparse-safe and -unsafe operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1591) Improve efficiency sparse-unsafe cellwise operations

2017-05-07 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1591.
--
   Resolution: Done
 Assignee: Matthias Boehm
Fix Version/s: (was: SystemML 0.14)
   SystemML 1.0

> Improve efficiency sparse-unsafe cellwise operations
> 
>
> Key: SYSTEMML-1591
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1591
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> For sparse-unsafe cellwise operations, we currently iterate over all cells 
> and use binary search to access the individual values. This is unnecessarily 
> inefficient to should be reworked in favor of a sequential scan with gap 
> handling, which would also allow us to consolidate the different code paths 
> for sparse-safe and -unsafe operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1590) Codegen crashes for unsupported row aggregates

2017-05-07 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1590.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Codegen crashes for unsupported row aggregates
> --
>
> Key: SYSTEMML-1590
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1590
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.14
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> For row aggregate patterns with unsupported aggregation functions such as 
> {{rowIndexMax(X)}}, codegen currently crashes with exceptions as follows:
> {code}
> Caused by: java.lang.RuntimeException: 8 ua(maxindexR)
>   at 
> org.apache.sysml.hops.codegen.template.TemplateRow.rConstructCplan(TemplateRow.java:300)
>   at 
> org.apache.sysml.hops.codegen.template.TemplateRow.rConstructCplan(TemplateRow.java:157)
>   at 
> org.apache.sysml.hops.codegen.template.TemplateRow.rConstructCplan(TemplateRow.java:157)
>   at 
> org.apache.sysml.hops.codegen.template.TemplateRow.constructCplan(TemplateRow.java:124)
>   at 
> org.apache.sysml.hops.codegen.SpoofCompiler.rConstructCPlans(SpoofCompiler.java:561)
>   at 
> org.apache.sysml.hops.codegen.SpoofCompiler.rConstructCPlans(SpoofCompiler.java:573)
>   at 
> org.apache.sysml.hops.codegen.SpoofCompiler.constructCPlans(SpoofCompiler.java:477)
>   at 
> org.apache.sysml.hops.codegen.SpoofCompiler.optimize(SpoofCompiler.java:346)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1590) Codegen crashes for unsupported row aggregates

2017-05-07 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1590.


> Codegen crashes for unsupported row aggregates
> --
>
> Key: SYSTEMML-1590
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1590
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.14
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> For row aggregate patterns with unsupported aggregation functions such as 
> {{rowIndexMax(X)}}, codegen currently crashes with exceptions as follows:
> {code}
> Caused by: java.lang.RuntimeException: 8 ua(maxindexR)
>   at 
> org.apache.sysml.hops.codegen.template.TemplateRow.rConstructCplan(TemplateRow.java:300)
>   at 
> org.apache.sysml.hops.codegen.template.TemplateRow.rConstructCplan(TemplateRow.java:157)
>   at 
> org.apache.sysml.hops.codegen.template.TemplateRow.rConstructCplan(TemplateRow.java:157)
>   at 
> org.apache.sysml.hops.codegen.template.TemplateRow.constructCplan(TemplateRow.java:124)
>   at 
> org.apache.sysml.hops.codegen.SpoofCompiler.rConstructCPlans(SpoofCompiler.java:561)
>   at 
> org.apache.sysml.hops.codegen.SpoofCompiler.rConstructCPlans(SpoofCompiler.java:573)
>   at 
> org.apache.sysml.hops.codegen.SpoofCompiler.constructCPlans(SpoofCompiler.java:477)
>   at 
> org.apache.sysml.hops.codegen.SpoofCompiler.optimize(SpoofCompiler.java:346)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1592) Improve handling of sparse outputs and sideway inputs

2017-05-07 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1592:


 Summary: Improve handling of sparse outputs and sideway inputs
 Key: SYSTEMML-1592
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1592
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1591) Improve efficiency sparse-unsafe cellwise operations

2017-05-07 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1591:


 Summary: Improve efficiency sparse-unsafe cellwise operations
 Key: SYSTEMML-1591
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1591
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm


For sparse-unsafe cellwise operations, we currently iterate over all cells and 
use binary search to access the individual values. This is unnecessarily 
inefficient to should be reworked in favor of a sequential scan with gap 
handling, which would also allow us to consolidate the different code paths for 
sparse-safe and -unsafe operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1589) conv2d_bias_add fails w/ NPE on lenet with random data

2017-05-07 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare closed SYSTEMML-1589.
-
   Resolution: Fixed
Fix Version/s: SystemML 1.0

Closed by the commit 
https://github.com/apache/incubator-systemml/commit/6863632088c8d0b548a17413692b399d512a991d

> conv2d_bias_add fails w/ NPE on lenet with random data
> --
>
> Key: SYSTEMML-1589
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1589
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Niketan Pansare
> Fix For: SystemML 1.0
>
>
> The lenet dml script fails with a null pointer exception for random multi 
> class data, generated with
> {code}
> X_full = rand(rows=6,cols=784);
> y_full = round(rand(rows=nrow(X_full), cols=1, min=1, max=10));
> {code}
> The detailed stacktrace is as follows:
> {code}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.sysml.runtime.matrix.data.LibMatrixDNN.getRowInDenseFormat(LibMatrixDNN.java:1355)
> at 
> org.apache.sysml.runtime.matrix.data.LibMatrixDNN.doIm2colSparse(LibMatrixDNN.java:1382)
> at 
> org.apache.sysml.runtime.matrix.data.LibMatrixDNN.doIm2col(LibMatrixDNN.java:1421)
> at 
> org.apache.sysml.runtime.matrix.data.LibMatrixDNN.doLoopedIm2ColConv2d(LibMatrixDNN.java:406)
> at 
> org.apache.sysml.runtime.matrix.data.LibMatrixDNN.access$400(LibMatrixDNN.java:51)
> at 
> org.apache.sysml.runtime.matrix.data.LibMatrixDNN$ConvTask.call(LibMatrixDNN.java:1143)
> at 
> org.apache.sysml.runtime.matrix.data.LibMatrixDNN$ConvTask.call(LibMatrixDNN.java:1076)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1573) Incorporate ALLOW_OPERATOR_FUSION in ConvolutionOp for developer testing

2017-05-07 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare closed SYSTEMML-1573.
-
   Resolution: Fixed
 Assignee: Niketan Pansare
Fix Version/s: SystemML 1.0

Closed by the commit 
https://github.com/apache/incubator-systemml/commit/6c215e700c1855074228972f952663663f6eabaa.

> Incorporate ALLOW_OPERATOR_FUSION in ConvolutionOp for developer testing
> 
>
> Key: SYSTEMML-1573
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1573
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1590) Codegen crashes for unsupported row aggregates

2017-05-07 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1590:


 Summary: Codegen crashes for unsupported row aggregates
 Key: SYSTEMML-1590
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1590
 Project: SystemML
  Issue Type: Bug
Affects Versions: SystemML 0.14
Reporter: Matthias Boehm


For row aggregate patterns with unsupported aggregation functions such as 
{{rowIndexMax(X)}}, codegen currently crashes with exceptions as follows:

{code}
Caused by: java.lang.RuntimeException: 8 ua(maxindexR)
at 
org.apache.sysml.hops.codegen.template.TemplateRow.rConstructCplan(TemplateRow.java:300)
at 
org.apache.sysml.hops.codegen.template.TemplateRow.rConstructCplan(TemplateRow.java:157)
at 
org.apache.sysml.hops.codegen.template.TemplateRow.rConstructCplan(TemplateRow.java:157)
at 
org.apache.sysml.hops.codegen.template.TemplateRow.constructCplan(TemplateRow.java:124)
at 
org.apache.sysml.hops.codegen.SpoofCompiler.rConstructCPlans(SpoofCompiler.java:561)
at 
org.apache.sysml.hops.codegen.SpoofCompiler.rConstructCPlans(SpoofCompiler.java:573)
at 
org.apache.sysml.hops.codegen.SpoofCompiler.constructCPlans(SpoofCompiler.java:477)
at 
org.apache.sysml.hops.codegen.SpoofCompiler.optimize(SpoofCompiler.java:346)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1527) Use top-level algorithm scripts for application tests

2017-05-07 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000196#comment-16000196
 ] 

Matthias Boehm commented on SYSTEMML-1527:
--

yes, you could simply (1) delete ./src/test/scripts/applications/glm/GLM.dml, 
(2) modify test.integration.applications.GLMTest to point to 
{{fullDMLScriptName = "scripts/algorithms/GLM.dml"}} if the script type is dml, 
and (3) modify the R script if needed. Other algorithms might also require to 
change the input parameters from positional arguments to named arguments.

I would recommend to first handle all dml scripts and disregard pydml for now. 
Furthermore, you might want to think about a nice abstraction such as 
{{AutomatedTestBase.getScript()}} to ensure consistency across the algorithms. 

> Use top-level algorithm scripts for application tests
> -
>
> Key: SYSTEMML-1527
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1527
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Matthias Boehm
>
> There are numerous dml and pydml application tests that aim to test our 
> existing algorithms. However, these tests use replicated (and mostly 
> outdated) scripts. This task aims to remove the duplicated dml and pydml 
> scripts and to refer directly to the existing algorithm tests. This also 
> includes the update of R comparison scripts.
> See SYSTEMML-1363 for examples.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (SYSTEMML-1527) Use top-level algorithm scripts for application tests

2017-05-07 Thread Krishna Kalyan (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000193#comment-16000193
 ] 

Krishna Kalyan edited comment on SYSTEMML-1527 at 5/8/17 2:25 AM:
--

Hello [~mboehm7],
Could you please help me with some questions related to this tasks below.

Lets take for example GLM algorithm
Essentially both the files are the same except for some minor differences
https://gist.github.com/krishnakalyan3/be28d984714a1be97f66dee15b9154ae

As far as I understand, the scope of this sub task would be to change the file 
below
https://github.com/apache/incubator-systemml/blob/master/src/test/scripts/applications/glm/GLM.dml
to point to the actual algorithm
https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/GLM.dml

Reproduce the same for comparison with R in GLM.R
Translate GLM.dml code to GLM.pydml (Not sure about this!).

Could you please confirm that I am not missing anything?.

Regards,
Krishna

cc [~nakul02]

https://docs.google.com/spreadsheets/d/1IjI4OqgXfZKNKmj8uxib4nG4SMk_ep0PIxj8n64SQMI/edit#gid=0


was (Author: krishnakalyan3):
Hello [~mboehm7],
I could you please help me with some questions related to this tasks below.

Lets take for example GLM algorithm
Essentially both the files are the same except for some minor differences
https://gist.github.com/krishnakalyan3/be28d984714a1be97f66dee15b9154ae

As far as I understand, the scope of this sub task would be to change the file 
below
https://github.com/apache/incubator-systemml/blob/master/src/test/scripts/applications/glm/GLM.dml
to point to the actual algorithm
https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/GLM.dml

Reproduce the same for comparison with R in GLM.R
Translate GLM.dml code to GLM.pydml (Not sure about this!).

Could you please confirm that I am not missing anything?.

Regards,
Krishna

cc [~nakul02]

https://docs.google.com/spreadsheets/d/1IjI4OqgXfZKNKmj8uxib4nG4SMk_ep0PIxj8n64SQMI/edit#gid=0

> Use top-level algorithm scripts for application tests
> -
>
> Key: SYSTEMML-1527
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1527
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Matthias Boehm
>
> There are numerous dml and pydml application tests that aim to test our 
> existing algorithms. However, these tests use replicated (and mostly 
> outdated) scripts. This task aims to remove the duplicated dml and pydml 
> scripts and to refer directly to the existing algorithm tests. This also 
> includes the update of R comparison scripts.
> See SYSTEMML-1363 for examples.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1527) Use top-level algorithm scripts for application tests

2017-05-07 Thread Krishna Kalyan (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000193#comment-16000193
 ] 

Krishna Kalyan commented on SYSTEMML-1527:
--

Hello [~mboehm7],
I could you please help me with some questions related to this tasks below.

Lets take for example GLM algorithm
Essentially both the files are the same except for some minor differences
https://gist.github.com/krishnakalyan3/be28d984714a1be97f66dee15b9154ae

As far as I understand, the scope of this sub task would be to change the file 
below
https://github.com/apache/incubator-systemml/blob/master/src/test/scripts/applications/glm/GLM.dml
to point to the actual algorithm
https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/GLM.dml

Reproduce the same for comparison with R in GLM.R
Translate GLM.dml code to GLM.pydml (Not sure about this!).

Could you please confirm that I am not missing anything?.

Regards,
Krishna

cc [~nakul02]

https://docs.google.com/spreadsheets/d/1IjI4OqgXfZKNKmj8uxib4nG4SMk_ep0PIxj8n64SQMI/edit#gid=0

> Use top-level algorithm scripts for application tests
> -
>
> Key: SYSTEMML-1527
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1527
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Matthias Boehm
>
> There are numerous dml and pydml application tests that aim to test our 
> existing algorithms. However, these tests use replicated (and mostly 
> outdated) scripts. This task aims to remove the duplicated dml and pydml 
> scripts and to refer directly to the existing algorithm tests. This also 
> includes the update of R comparison scripts.
> See SYSTEMML-1363 for examples.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1589) conv2d_bias_add fails w/ NPE on lenet with random data

2017-05-07 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1589:


 Summary: conv2d_bias_add fails w/ NPE on lenet with random data
 Key: SYSTEMML-1589
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1589
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm
Assignee: Niketan Pansare


The lenet dml script fails with a null pointer exception for random multi class 
data, generated with
{code}
X_full = rand(rows=6,cols=784);
y_full = round(rand(rows=nrow(X_full), cols=1, min=1, max=10));
{code}

The detailed stacktrace is as follows:
{code}
Caused by: java.lang.NullPointerException
at 
org.apache.sysml.runtime.matrix.data.LibMatrixDNN.getRowInDenseFormat(LibMatrixDNN.java:1355)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixDNN.doIm2colSparse(LibMatrixDNN.java:1382)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixDNN.doIm2col(LibMatrixDNN.java:1421)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixDNN.doLoopedIm2ColConv2d(LibMatrixDNN.java:406)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixDNN.access$400(LibMatrixDNN.java:51)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixDNN$ConvTask.call(LibMatrixDNN.java:1143)
at 
org.apache.sysml.runtime.matrix.data.LibMatrixDNN$ConvTask.call(LibMatrixDNN.java:1076)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (SYSTEMML-1585) Include JCuda jars into SystemML's extra.jar

2017-05-07 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15998701#comment-15998701
 ] 

Niketan Pansare edited comment on SYSTEMML-1585 at 5/7/17 9:16 PM:
---

[~nakul02] [~deron] 


was (Author: niketanpansare):
[~nakul02]

> Include JCuda jars into SystemML's extra.jar
> 
>
> Key: SYSTEMML-1585
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1585
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Niketan Pansare
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   3   4   5   6   7   8   9   10   >