[jira] [Commented] (SYSTEMML-1645) Verify whether all scripts work with MLContext & automate
[ https://issues.apache.org/jira/browse/SYSTEMML-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105238#comment-16105238 ] Janardhan commented on SYSTEMML-1645: - Hi [~nilmeier], thanks for handling this jira, I will work along with you to write an automatic script in the test suite after you've verified the algorithm scripts. This PR https://github.com/apache/systemml/pull/589 is for handling the automatic scripts and to keep track of whether all the scripts are verified. So, we open up different PR for each algorithm, for change in the algorithm file itself. Thanks.:) > Verify whether all scripts work with MLContext & automate > - > > Key: SYSTEMML-1645 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1645 > Project: SystemML > Issue Type: Epic > Components: Algorithms >Reporter: Imran Younus >Assignee: Jerome > Fix For: SystemML 1.0 > > > Due to some read/write and initialization issues, algorithm scripts may or > may not work with MLContext. This jira tracks work needed to make sure all > the scripts work with MLContext. Some algorithms may need significant > modifications. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (SYSTEMML-1648) Verify whether SVM scripts work with MLContext
[ https://issues.apache.org/jira/browse/SYSTEMML-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105246#comment-16105246 ] Janardhan commented on SYSTEMML-1648: - Hi [~nilmeier]. BTW, this algorithm has been taken up by Imran Younus and the PR is https://github.com/apache/systemml/pull/529. Only a little changes are need to be done. > Verify whether SVM scripts work with MLContext > -- > > Key: SYSTEMML-1648 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1648 > Project: SystemML > Issue Type: Improvement > Components: Algorithms >Reporter: Imran Younus >Assignee: Jerome > > This jira plans to verify whether existing SVM scripts work properly with new > MLContext. These scripts include l2-svm.dml, l2-svm-predict.dml, m-svm.dml, > m-svm-predict.dml. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (SYSTEMML-1760) Improve engine robustness of distributed SGD training
[ https://issues.apache.org/jira/browse/SYSTEMML-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105497#comment-16105497 ] Fei Hu commented on SYSTEMML-1760: -- The following table shows the history of performance improvement. After fixing the issues SYSTEMML-1762 and 1774, the distributed MNIST_LeNet model could be trained in parallel with the Hybrid_Spark and Remote_Spark parfor mode. By changing the default Parfor_Result_Merge into REMOTE_SPARK, the run time reduced a lot. It indicates that the result merge may be a bottleneck for the performance. !Runtime_Table.png! > Improve engine robustness of distributed SGD training > - > > Key: SYSTEMML-1760 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1760 > Project: SystemML > Issue Type: Improvement > Components: Algorithms, Compiler, ParFor >Reporter: Mike Dusenberry >Assignee: Fei Hu > Attachments: Runtime_Table.png > > > Currently, we have a mathematical framework in place for training with > distributed SGD in a [distributed MNIST LeNet example | > https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml]. > This task aims to push this at scale to determine (1) the current behavior > of the engine (i.e. does the optimizer actually run this in a distributed > fashion, and (2) ways to improve the robustness and performance for this > scenario. The distributed SGD framework from this example has already been > ported into Caffe2DML, and thus improvements made for this task will directly > benefit our efforts towards distributed training of Caffe models (and Keras > in the future). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (SYSTEMML-1816) toString can return -0
Deron Eriksson created SYSTEMML-1816: Summary: toString can return -0 Key: SYSTEMML-1816 URL: https://issues.apache.org/jira/browse/SYSTEMML-1816 Project: SystemML Issue Type: Bug Components: Runtime Reporter: Deron Eriksson When display matrix values with toString, -0 can be displayed. Example: {code} m = matrix("50 99 100 200",rows=2,cols=2); x = 100; m = (m - x) * ((m-x) >= 0) print(toString(m)) {code} gives: {code} -0.000 -0.000 0.000 100.000 {code} Using as.scalar on the individual cells returns 0: {code} for (i in 1:nrow(m)) { for (j in 1:ncol(m)) { n = m[i,j] print('[' + i + ',' + j + ']:' + as.scalar(n)) } } {code} gives: {code} [1,1]:0.0 [1,2]:0.0 [2,1]:0.0 [2,2]:100.0 {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (SYSTEMML-1760) Improve engine robustness of distributed SGD training
[ https://issues.apache.org/jira/browse/SYSTEMML-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105497#comment-16105497 ] Fei Hu edited comment on SYSTEMML-1760 at 7/28/17 7:04 PM: --- cc [~mboehm7], [~dusenberrymw] [~niketanpansare] The following table shows the history of performance improvement. After fixing the issues SYSTEMML-1762 and SYSTEMML-1774, the distributed MNIST_LeNet model could be trained in parallel with the Hybrid_Spark and Remote_Spark parfor mode. By changing the default Parfor_Result_Merge into REMOTE_SPARK, the run time reduced a lot. It indicates that the result merge may be a bottleneck for the performance. !Runtime_Table.png! was (Author: tenma): The following table shows the history of performance improvement. After fixing the issues SYSTEMML-1762 and SYSTEMML-1774, the distributed MNIST_LeNet model could be trained in parallel with the Hybrid_Spark and Remote_Spark parfor mode. By changing the default Parfor_Result_Merge into REMOTE_SPARK, the run time reduced a lot. It indicates that the result merge may be a bottleneck for the performance. !Runtime_Table.png! > Improve engine robustness of distributed SGD training > - > > Key: SYSTEMML-1760 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1760 > Project: SystemML > Issue Type: Improvement > Components: Algorithms, Compiler, ParFor >Reporter: Mike Dusenberry >Assignee: Fei Hu > Attachments: Runtime_Table.png > > > Currently, we have a mathematical framework in place for training with > distributed SGD in a [distributed MNIST LeNet example | > https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml]. > This task aims to push this at scale to determine (1) the current behavior > of the engine (i.e. does the optimizer actually run this in a distributed > fashion, and (2) ways to improve the robustness and performance for this > scenario. The distributed SGD framework from this example has already been > ported into Caffe2DML, and thus improvements made for this task will directly > benefit our efforts towards distributed training of Caffe models (and Keras > in the future). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (SYSTEMML-1652) Verify whether ALS scripts work with MLContext
[ https://issues.apache.org/jira/browse/SYSTEMML-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glenn Weidner reassigned SYSTEMML-1652: --- Assignee: Imran Younus (was: Jerome) > Verify whether ALS scripts work with MLContext > -- > > Key: SYSTEMML-1652 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1652 > Project: SystemML > Issue Type: Improvement > Components: Algorithms >Reporter: Imran Younus >Assignee: Imran Younus > Fix For: SystemML 1.0 > > > This jira will verify whether all ALS scripts work properly with new > MLContext. These scripts include ALS-DS.dml, ALS-CG.dml and ALS-predict.dml. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (SYSTEMML-1760) Improve engine robustness of distributed SGD training
[ https://issues.apache.org/jira/browse/SYSTEMML-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105497#comment-16105497 ] Fei Hu edited comment on SYSTEMML-1760 at 7/28/17 7:02 PM: --- The following table shows the history of performance improvement. After fixing the issues SYSTEMML-1762 and SYSTEMML-1774, the distributed MNIST_LeNet model could be trained in parallel with the Hybrid_Spark and Remote_Spark parfor mode. By changing the default Parfor_Result_Merge into REMOTE_SPARK, the run time reduced a lot. It indicates that the result merge may be a bottleneck for the performance. !Runtime_Table.png! was (Author: tenma): The following table shows the history of performance improvement. After fixing the issues SYSTEMML-1762 and 1774, the distributed MNIST_LeNet model could be trained in parallel with the Hybrid_Spark and Remote_Spark parfor mode. By changing the default Parfor_Result_Merge into REMOTE_SPARK, the run time reduced a lot. It indicates that the result merge may be a bottleneck for the performance. !Runtime_Table.png! > Improve engine robustness of distributed SGD training > - > > Key: SYSTEMML-1760 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1760 > Project: SystemML > Issue Type: Improvement > Components: Algorithms, Compiler, ParFor >Reporter: Mike Dusenberry >Assignee: Fei Hu > Attachments: Runtime_Table.png > > > Currently, we have a mathematical framework in place for training with > distributed SGD in a [distributed MNIST LeNet example | > https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml]. > This task aims to push this at scale to determine (1) the current behavior > of the engine (i.e. does the optimizer actually run this in a distributed > fashion, and (2) ways to improve the robustness and performance for this > scenario. The distributed SGD framework from this example has already been > ported into Caffe2DML, and thus improvements made for this task will directly > benefit our efforts towards distributed training of Caffe models (and Keras > in the future). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (SYSTEMML-1646) Verify whether Linear Regression scripts work with MLContext
[ https://issues.apache.org/jira/browse/SYSTEMML-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glenn Weidner reassigned SYSTEMML-1646: --- Assignee: Imran Younus (was: Jerome) > Verify whether Linear Regression scripts work with MLContext > > > Key: SYSTEMML-1646 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1646 > Project: SystemML > Issue Type: Improvement >Reporter: Imran Younus >Assignee: Imran Younus > Fix For: SystemML 1.0 > > > This jira plans to verify whether linear regression scripts in SystemML work > properly with new MLContext. These scripts include LinearRegCG.dml and > LinearRegDS.dml. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (SYSTEMML-1645) Verify whether all scripts work with MLContext & automate
[ https://issues.apache.org/jira/browse/SYSTEMML-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105379#comment-16105379 ] Janardhan commented on SYSTEMML-1645: - So, we need to test only the GLM, Survival analysis, and Logistic regression only. These will resolve this issue. > Verify whether all scripts work with MLContext & automate > - > > Key: SYSTEMML-1645 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1645 > Project: SystemML > Issue Type: Epic > Components: Algorithms >Reporter: Imran Younus >Assignee: Jerome > Fix For: SystemML 1.0 > > > Due to some read/write and initialization issues, algorithm scripts may or > may not work with MLContext. This jira tracks work needed to make sure all > the scripts work with MLContext. Some algorithms may need significant > modifications. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (SYSTEMML-1760) Improve engine robustness of distributed SGD training
[ https://issues.apache.org/jira/browse/SYSTEMML-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fei Hu updated SYSTEMML-1760: - Attachment: Runtime_Table.png > Improve engine robustness of distributed SGD training > - > > Key: SYSTEMML-1760 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1760 > Project: SystemML > Issue Type: Improvement > Components: Algorithms, Compiler, ParFor >Reporter: Mike Dusenberry >Assignee: Fei Hu > Attachments: Runtime_Table.png > > > Currently, we have a mathematical framework in place for training with > distributed SGD in a [distributed MNIST LeNet example | > https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml]. > This task aims to push this at scale to determine (1) the current behavior > of the engine (i.e. does the optimizer actually run this in a distributed > fashion, and (2) ways to improve the robustness and performance for this > scenario. The distributed SGD framework from this example has already been > ported into Caffe2DML, and thus improvements made for this task will directly > benefit our efforts towards distributed training of Caffe models (and Keras > in the future). -- This message was sent by Atlassian JIRA (v6.4.14#64029)