[jira] [Commented] (SYSTEMML-1645) Verify whether all scripts work with MLContext & automate

2017-07-28 Thread Janardhan (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105238#comment-16105238
 ] 

Janardhan commented on SYSTEMML-1645:
-

Hi [~nilmeier], thanks for handling this jira, I will work along with you to 
write an automatic script in the test suite after you've verified the algorithm 
scripts. This PR https://github.com/apache/systemml/pull/589 is for handling 
the automatic scripts and to keep track of whether all the scripts are 
verified. So, we open up different PR for each algorithm, for change in the 
algorithm file itself. Thanks.:) 


> Verify whether all scripts work with MLContext & automate
> -
>
> Key: SYSTEMML-1645
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1645
> Project: SystemML
>  Issue Type: Epic
>  Components: Algorithms
>Reporter: Imran Younus
>Assignee: Jerome
> Fix For: SystemML 1.0
>
>
> Due to some read/write and initialization issues, algorithm scripts may or 
> may not work with MLContext. This jira tracks work needed to make sure all 
> the scripts work with MLContext. Some algorithms may need significant 
> modifications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1648) Verify whether SVM scripts work with MLContext

2017-07-28 Thread Janardhan (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105246#comment-16105246
 ] 

Janardhan commented on SYSTEMML-1648:
-

Hi [~nilmeier]. BTW, this algorithm has been taken up by Imran Younus and the 
PR is https://github.com/apache/systemml/pull/529. Only a little changes are 
need to be done.

> Verify whether SVM scripts work with MLContext
> --
>
> Key: SYSTEMML-1648
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1648
> Project: SystemML
>  Issue Type: Improvement
>  Components: Algorithms
>Reporter: Imran Younus
>Assignee: Jerome
>
> This jira plans to verify whether existing SVM scripts work properly with new 
> MLContext. These scripts include l2-svm.dml, l2-svm-predict.dml, m-svm.dml, 
> m-svm-predict.dml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1760) Improve engine robustness of distributed SGD training

2017-07-28 Thread Fei Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105497#comment-16105497
 ] 

Fei Hu commented on SYSTEMML-1760:
--

The following table shows the history of performance improvement. After fixing 
the issues SYSTEMML-1762 and 1774, the distributed MNIST_LeNet model could be 
trained in parallel with the Hybrid_Spark and Remote_Spark parfor mode. By 
changing the default Parfor_Result_Merge into REMOTE_SPARK, the run time 
reduced a lot. It indicates that the result merge may be a bottleneck for the 
performance. 

!Runtime_Table.png!

> Improve engine robustness of distributed SGD training
> -
>
> Key: SYSTEMML-1760
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1760
> Project: SystemML
>  Issue Type: Improvement
>  Components: Algorithms, Compiler, ParFor
>Reporter: Mike Dusenberry
>Assignee: Fei Hu
> Attachments: Runtime_Table.png
>
>
> Currently, we have a mathematical framework in place for training with 
> distributed SGD in a [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml].
>   This task aims to push this at scale to determine (1) the current behavior 
> of the engine (i.e. does the optimizer actually run this in a distributed 
> fashion, and (2) ways to improve the robustness and performance for this 
> scenario.  The distributed SGD framework from this example has already been 
> ported into Caffe2DML, and thus improvements made for this task will directly 
> benefit our efforts towards distributed training of Caffe models (and Keras 
> in the future).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1816) toString can return -0

2017-07-28 Thread Deron Eriksson (JIRA)
Deron Eriksson created SYSTEMML-1816:


 Summary: toString can return -0
 Key: SYSTEMML-1816
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1816
 Project: SystemML
  Issue Type: Bug
  Components: Runtime
Reporter: Deron Eriksson


When display matrix values with toString, -0 can be displayed.

Example:
{code}
m = matrix("50 99 100 200",rows=2,cols=2);
x = 100;
m = (m - x) * ((m-x) >= 0)
print(toString(m))
{code}
gives:
{code}
-0.000 -0.000
0.000 100.000
{code}

Using as.scalar on the individual cells returns 0:
{code}
for (i in 1:nrow(m)) {
for (j in 1:ncol(m)) {
n = m[i,j]
print('[' + i + ',' + j + ']:' + as.scalar(n))
}
}
{code}
gives:
{code}
[1,1]:0.0
[1,2]:0.0
[2,1]:0.0
[2,2]:100.0
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (SYSTEMML-1760) Improve engine robustness of distributed SGD training

2017-07-28 Thread Fei Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105497#comment-16105497
 ] 

Fei Hu edited comment on SYSTEMML-1760 at 7/28/17 7:04 PM:
---

cc [~mboehm7], [~dusenberrymw] [~niketanpansare] The following table shows the 
history of performance improvement. After fixing the issues SYSTEMML-1762 and 
SYSTEMML-1774, the distributed MNIST_LeNet model could be trained in parallel 
with the Hybrid_Spark and Remote_Spark parfor mode. By changing the default 
Parfor_Result_Merge into REMOTE_SPARK, the run time reduced a lot. It indicates 
that the result merge may be a bottleneck for the performance. 

!Runtime_Table.png!


was (Author: tenma):
The following table shows the history of performance improvement. After fixing 
the issues SYSTEMML-1762 and SYSTEMML-1774, the distributed MNIST_LeNet model 
could be trained in parallel with the Hybrid_Spark and Remote_Spark parfor 
mode. By changing the default Parfor_Result_Merge into REMOTE_SPARK, the run 
time reduced a lot. It indicates that the result merge may be a bottleneck for 
the performance. 

!Runtime_Table.png!

> Improve engine robustness of distributed SGD training
> -
>
> Key: SYSTEMML-1760
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1760
> Project: SystemML
>  Issue Type: Improvement
>  Components: Algorithms, Compiler, ParFor
>Reporter: Mike Dusenberry
>Assignee: Fei Hu
> Attachments: Runtime_Table.png
>
>
> Currently, we have a mathematical framework in place for training with 
> distributed SGD in a [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml].
>   This task aims to push this at scale to determine (1) the current behavior 
> of the engine (i.e. does the optimizer actually run this in a distributed 
> fashion, and (2) ways to improve the robustness and performance for this 
> scenario.  The distributed SGD framework from this example has already been 
> ported into Caffe2DML, and thus improvements made for this task will directly 
> benefit our efforts towards distributed training of Caffe models (and Keras 
> in the future).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (SYSTEMML-1652) Verify whether ALS scripts work with MLContext

2017-07-28 Thread Glenn Weidner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glenn Weidner reassigned SYSTEMML-1652:
---

Assignee: Imran Younus  (was: Jerome)

> Verify whether ALS scripts work with MLContext
> --
>
> Key: SYSTEMML-1652
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1652
> Project: SystemML
>  Issue Type: Improvement
>  Components: Algorithms
>Reporter: Imran Younus
>Assignee: Imran Younus
> Fix For: SystemML 1.0
>
>
> This jira will verify whether all ALS scripts work properly with new 
> MLContext. These scripts include ALS-DS.dml, ALS-CG.dml and ALS-predict.dml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (SYSTEMML-1760) Improve engine robustness of distributed SGD training

2017-07-28 Thread Fei Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105497#comment-16105497
 ] 

Fei Hu edited comment on SYSTEMML-1760 at 7/28/17 7:02 PM:
---

The following table shows the history of performance improvement. After fixing 
the issues SYSTEMML-1762 and SYSTEMML-1774, the distributed MNIST_LeNet model 
could be trained in parallel with the Hybrid_Spark and Remote_Spark parfor 
mode. By changing the default Parfor_Result_Merge into REMOTE_SPARK, the run 
time reduced a lot. It indicates that the result merge may be a bottleneck for 
the performance. 

!Runtime_Table.png!


was (Author: tenma):
The following table shows the history of performance improvement. After fixing 
the issues SYSTEMML-1762 and 1774, the distributed MNIST_LeNet model could be 
trained in parallel with the Hybrid_Spark and Remote_Spark parfor mode. By 
changing the default Parfor_Result_Merge into REMOTE_SPARK, the run time 
reduced a lot. It indicates that the result merge may be a bottleneck for the 
performance. 

!Runtime_Table.png!

> Improve engine robustness of distributed SGD training
> -
>
> Key: SYSTEMML-1760
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1760
> Project: SystemML
>  Issue Type: Improvement
>  Components: Algorithms, Compiler, ParFor
>Reporter: Mike Dusenberry
>Assignee: Fei Hu
> Attachments: Runtime_Table.png
>
>
> Currently, we have a mathematical framework in place for training with 
> distributed SGD in a [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml].
>   This task aims to push this at scale to determine (1) the current behavior 
> of the engine (i.e. does the optimizer actually run this in a distributed 
> fashion, and (2) ways to improve the robustness and performance for this 
> scenario.  The distributed SGD framework from this example has already been 
> ported into Caffe2DML, and thus improvements made for this task will directly 
> benefit our efforts towards distributed training of Caffe models (and Keras 
> in the future).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (SYSTEMML-1646) Verify whether Linear Regression scripts work with MLContext

2017-07-28 Thread Glenn Weidner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glenn Weidner reassigned SYSTEMML-1646:
---

Assignee: Imran Younus  (was: Jerome)

> Verify whether Linear Regression scripts work with MLContext
> 
>
> Key: SYSTEMML-1646
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1646
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Imran Younus
>Assignee: Imran Younus
> Fix For: SystemML 1.0
>
>
> This jira plans to verify whether linear regression scripts in SystemML work 
> properly with new MLContext. These scripts include LinearRegCG.dml and 
> LinearRegDS.dml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1645) Verify whether all scripts work with MLContext & automate

2017-07-28 Thread Janardhan (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105379#comment-16105379
 ] 

Janardhan commented on SYSTEMML-1645:
-

So, we need to test only the GLM, Survival analysis, and Logistic regression 
only. These will resolve this issue.

> Verify whether all scripts work with MLContext & automate
> -
>
> Key: SYSTEMML-1645
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1645
> Project: SystemML
>  Issue Type: Epic
>  Components: Algorithms
>Reporter: Imran Younus
>Assignee: Jerome
> Fix For: SystemML 1.0
>
>
> Due to some read/write and initialization issues, algorithm scripts may or 
> may not work with MLContext. This jira tracks work needed to make sure all 
> the scripts work with MLContext. Some algorithms may need significant 
> modifications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1760) Improve engine robustness of distributed SGD training

2017-07-28 Thread Fei Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hu updated SYSTEMML-1760:
-
Attachment: Runtime_Table.png

> Improve engine robustness of distributed SGD training
> -
>
> Key: SYSTEMML-1760
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1760
> Project: SystemML
>  Issue Type: Improvement
>  Components: Algorithms, Compiler, ParFor
>Reporter: Mike Dusenberry
>Assignee: Fei Hu
> Attachments: Runtime_Table.png
>
>
> Currently, we have a mathematical framework in place for training with 
> distributed SGD in a [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml].
>   This task aims to push this at scale to determine (1) the current behavior 
> of the engine (i.e. does the optimizer actually run this in a distributed 
> fashion, and (2) ways to improve the robustness and performance for this 
> scenario.  The distributed SGD framework from this example has already been 
> ported into Caffe2DML, and thus improvements made for this task will directly 
> benefit our efforts towards distributed training of Caffe models (and Keras 
> in the future).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)