[jira] [Commented] (SYSTEMML-490) Runtime Platform Should Automatically Be Set To Hybrid_Spark When Executed On Spark

2016-06-03 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314497#comment-15314497
 ] 

Mike Dusenberry commented on SYSTEMML-490:
--

Great, glad to see that this has been fixed.

> Runtime Platform Should Automatically Be Set To Hybrid_Spark When Executed On 
> Spark
> ---
>
> Key: SYSTEMML-490
> URL: https://issues.apache.org/jira/browse/SYSTEMML-490
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>Assignee: Matthias Boehm
> Fix For: SystemML 0.11
>
>
> Currently, the default runtime platform is set to "hybrid" mode, which is an 
> automatically optimized hybrid between single-node and Hadoop MR.  When 
> running on Spark, we should automatically detect and change the mode to the 
> correct setting of "hybrid_spark".  Of course, our {{sparkDML.sh}} script 
> appends this runtime mode explicitly, but a user shouldn't have to do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-633) Improve Left-Indexing Performance with (Nested) Parfor Loops in UDFs

2016-06-09 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-633:
-
Priority: Blocker  (was: Critical)

> Improve Left-Indexing Performance with (Nested) Parfor Loops in UDFs
> 
>
> Key: SYSTEMML-633
> URL: https://issues.apache.org/jira/browse/SYSTEMML-633
> Project: SystemML
>  Issue Type: Improvement
>  Components: ParFor
>Reporter: Mike Dusenberry
>Priority: Blocker
> Attachments: Im2colWrapper.java, log.txt, log.txt, perf-dml.dml, 
> perf-tf.py, perf.sh, run.sh, systemml-nn-05.16.16.zip, systemml-nn.zip, 
> time.txt
>
>
> In the experimental deep learning DML library I've been building 
> ([https://github.com/dusenberrymw/systemml-nn|https://github.com/dusenberrymw/systemml-nn]),
>  I've experienced severe bottlenecks due to *left-indexing* in parfor loops.  
> Here, I will highlight a few particular instances with simplified examples, 
> but the same issue is shared across many areas of the library, particularly 
> in the convolution and max pooling layers, and is exaggerated in real 
> use-cases.
> *Quick note* on setup for any of the below experiments.  Please grab a copy 
> of the above repo (particularly the {{nn}} directory), and run any 
> experiments with the {{nn}} package available at the base directory of the 
> experiment.
> Scenario: *Convolution*
> * In the library above, the forward pass of the convolution function 
> ([{{conv::forward(...)}} | 
> https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L8]
>  in {{nn/layers/conv.dml}}) essentially accepts a matrix {{X}} of images, a 
> matrix of weights {{W}}, and several other parameters corresponding to image 
> sizes, filter sizes, etc.  It then loops through the images with a {{parfor}} 
> loop, and for each image it pads the image with {{util::pad_image}}, extracts 
> "patches" of the image into columns of a matrix in a sliding fashion across 
> the image with {{util::im2col}}, performs a matrix multiplication between the 
> matrix of patch columns and the weight matrix, and then saves the result into 
> a matrix defined outside of the parfor loop using left-indexing.
> * Left-indexing has been identified as the bottleneck by a wide margin.
> * Left-indexing is used in the main {{conv::forward(...)}} function in the 
> [last line in the parfor 
> loop|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L61],
>  in the 
> [{{util::pad_image(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L196]
>  function used by {{conv::forward(...)}}, as well as in the 
> [{{util::im2col(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L96]
>  function used by {{conv::forward(...)}}.
> * Test script (assuming the {{nn}} package is available):
> ** {{speed-633.dml}} {code}
> source("nn/layers/conv.dml") as conv
> source("nn/util.dml") as util
> # Generate data
> N = 64  # num examples
> C = 30  # num channels
> Hin = 28  # input height
> Win = 28  # input width
> F = 20  # num filters
> Hf = 3  # filter height
> Wf = 3  # filter width
> stride = 1
> pad = 1
> X = rand(rows=N, cols=C*Hin*Win)
> # Create layer
> [W, b] = conv::init(F, C, Hf, Wf)
> # Forward
> [out, Hout, Wout] = conv::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, 
> stride, pad, pad)
> print("Out: " + nrow(out) + "x" + ncol(out))
> print("Hout: " + Hout)
> print("Wout: " + Wout)
> print("")
> print(sum(out))
> {code}
> * Invocation:
> ** {{java -jar 
> $SYSTEMML_HOME/target/systemml-0.10.0-incubating-SNAPSHOT-standalone.jar -f 
> speed-633.dml -stats -explain -exec singlenode}}
> * Stats output (modified to output up to 100 instructions):
> ** {code}
> ...
> Total elapsed time:   26.834 sec.
> Total compilation time:   0.529 sec.
> Total execution time:   26.304 sec.
> Number of compiled MR Jobs: 0.
> Number of executed MR Jobs: 0.
> Cache hits (Mem, WB, FS, HDFS): 9196235/0/0/0.
> Cache writes (WB, FS, HDFS):  3070724/0/0.
> Cache times (ACQr/m, RLS, EXP): 1.474/1.120/26.998/0.000 sec.
> HOP DAGs recompiled (PRED, SB): 0/0.
> HOP DAGs recompile time:  0.268 sec.
> Functions recompiled:   129.
> Functions recompile time: 0.841 sec.
> ParFor loops optimized:   1.
> ParFor optimize time:   0.032 sec.
> ParFor initialize time:   0.015 sec.
> ParFor result merge time: 0.028 sec.
> ParFor total update in-place: 0/0/1559360
> Total JIT compile time:   14.235 sec.
> Total JVM GC count:   94.
> Total JVM GC time:0.366 sec.
> Heavy hitter instructions (name, time, count):
> -- 1)   leftIndex   41.670 sec  1559360
> -- 2)   forward   26.212 sec  1
> -- 3)   

[jira] [Created] (SYSTEMML-709) Move `SystemML.py` to `src/main/python` Directory.

2016-05-25 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-709:


 Summary: Move `SystemML.py` to `src/main/python` Directory.
 Key: SYSTEMML-709
 URL: https://issues.apache.org/jira/browse/SYSTEMML-709
 Project: SystemML
  Issue Type: Bug
Reporter: Mike Dusenberry
Priority: Minor


Just as we have source directories for Java ({{src/main/java}}) and Scala 
({{src/main/scala}}), we should create a Python directory ({{src/main/python}}) 
and move our {{SystemML.py}} file (currently at 
{{src/main/java/org/apache/sysml/api/python/SystemML.py}}) to this location.

{{src/main/java/org/apache/sysml/api/python/SystemML.py}} --> 
{{src/main/python/SystemML.py}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-708) Release checklist for 0.10.0-incubating-rc1

2016-05-25 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-708:
-
Description: 
|| Task || Status || Notes ||
| All Artifacts and Checksums Present | {panel:bgColor=#bfffba}Pass{panel} | |
| Release Candidate Build - Windows   | | |
| Release Candidate Build - OS X  | {panel:bgColor=#bfffba}Pass{panel} | |
| Release Candidate Build - Linux | | |
| Test Suite Passes - Windows | | |
| Test Suite Passes - OS X| {panel:bgColor=#bfffba}Pass{panel} | 
(Deron will re-verify) |
| Test Suite Passes - Linux   | {panel:bgColor=#bfffba}Pass{panel} | |
| All Binaries Execute| {panel:bgColor=#bfffba}Pass{panel} | 
Verified on OS X (Deron will re-verify) |
| Check LICENSE and NOTICE Files  | {panel:bgColor=#bfffba}Pass{panel} | 
(Deron will re-verify) |
| Src Artifact Builds and Tests Pass  | | |
| Single-Node Standalone - Windows| | |
| Single-Node Standalone - OS X   | {panel:bgColor=#bfffba}Pass{panel} | |
| Single-Node Standalone - Linux  | | |
| Single-Node Spark   | {panel:bgColor=#bfffba}Pass{panel} | 
Verified on OS X
| Single-Node Hadoop  | {panel:bgColor=#bfffba}Pass{panel} | 
Verified on OS X
| Notebooks - Jupyter | {panel:bgColor=#bfffba}Pass{panel} | 
Verified on OS X |
| Notebooks - Zeppelin| {panel:bgColor=#bfffba}Pass{panel} | 
Verified on OS X |
| Performance Suite - Spark   | {panel:bgColor=#bfffba}Pass{panel} | 
Run on Spark 1.6.1 for data sizes {80MB, 800MB, 8GB, 80GB} |
| Performance Suite - Hadoop  | | |


  was:
|| Task || Status || Notes ||
| All Artifacts and Checksums Present | {panel:bgColor=#bfffba}Pass{panel} | |
| Release Candidate Build - Windows   | | |
| Release Candidate Build - OS X  | {panel:bgColor=#bfffba}Pass{panel} | |
| Release Candidate Build - Linux | | |
| Test Suite Passes - Windows | | |
| Test Suite Passes - OS X| {panel:bgColor=#bfffba}Pass{panel} | 
(Deron will re-verify) |
| Test Suite Passes - Linux   | {panel:bgColor=#bfffba}Pass{panel} | |
| All Binaries Execute| {panel:bgColor=#bfffba}Pass{panel} | 
Verified on OS X (Deron will re-verify) |
| Check LICENSE and NOTICE Files  | {panel:bgColor=#bfffba}Pass{panel} | 
(Deron will re-verify) |
| Src Artifact Builds and Tests Pass  | | |
| Single-Node Standalone - Windows| | |
| Single-Node Standalone - OS X   | {panel:bgColor=#bfffba}Pass{panel} | |
| Single-Node Standalone - Linux  | | |
| Single-Node Spark   | {panel:bgColor=#bfffba}Pass{panel} | 
Verified on OS X
| Single-Node Hadoop  | {panel:bgColor=#bfffba}Pass{panel} | 
Verified on OS X
| Notebooks - Jupyter | | |
| Notebooks - Zeppelin| | |
| Performance Suite - Spark   | {panel:bgColor=#bfffba}Pass{panel} | 
Run on Spark 1.6.1 for data sizes {80MB, 800MB, 8GB, 80GB} |
| Performance Suite - Hadoop  | | |



> Release checklist for 0.10.0-incubating-rc1
> ---
>
> Key: SYSTEMML-708
> URL: https://issues.apache.org/jira/browse/SYSTEMML-708
> Project: SystemML
>  Issue Type: Task
>Reporter: Deron Eriksson
>
> || Task || Status || Notes ||
> | All Artifacts and Checksums Present | {panel:bgColor=#bfffba}Pass{panel} | |
> | Release Candidate Build - Windows   | | |
> | Release Candidate Build - OS X  | {panel:bgColor=#bfffba}Pass{panel} | |
> | Release Candidate Build - Linux | | |
> | Test Suite Passes - Windows | | |
> | Test Suite Passes - OS X| {panel:bgColor=#bfffba}Pass{panel} | 
> (Deron will re-verify) |
> | Test Suite Passes - Linux   | {panel:bgColor=#bfffba}Pass{panel} | |
> | All Binaries Execute| {panel:bgColor=#bfffba}Pass{panel} | 
> Verified on OS X (Deron will re-verify) |
> | Check LICENSE and NOTICE Files  | {panel:bgColor=#bfffba}Pass{panel} | 
> (Deron will re-verify) |
> | Src Artifact Builds and Tests Pass  | | |
> | Single-Node Standalone - Windows| | |
> | Single-Node Standalone - OS X   | {panel:bgColor=#bfffba}Pass{panel} | |
> | Single-Node Standalone - Linux  | | |
> | Single-Node Spark   | {panel:bgColor=#bfffba}Pass{panel} | 
> Verified on OS X
> | Single-Node Hadoop  | {panel:bgColor=#bfffba}Pass{panel} | 
> Verified on OS X
> | Notebooks - Jupyter | {panel:bgColor=#bfffba}Pass{panel} | 
> Verified on OS X |
> | Notebooks - Zeppelin| {panel:bgColor=#bfffba}Pass{panel} | 
> Verified on OS X |
> | Performance Suite - Spark   | {panel:bgColor=#bfffba}Pass{panel} | 
> Run on Spark 1.6.1 for data sizes {80MB, 800MB, 8GB, 80GB} |
> | Performance Suite - Hadoop  | | |

[jira] [Updated] (SYSTEMML-710) Add `SystemML.py` To All Distribution Releases

2016-05-25 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-710:
-
Labels: starter  (was: )

> Add `SystemML.py` To All Distribution Releases
> --
>
> Key: SYSTEMML-710
> URL: https://issues.apache.org/jira/browse/SYSTEMML-710
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>Priority: Minor
>  Labels: starter
>
> Currently, our {{SystemML.py}} Python API file is not included in the release 
> assemblies.  We should add it to the base directory of each of the release 
> packages so that our {{MLContext}} PySpark API can be used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-633) Improve Left-Indexing Performance with (Nested) Parfor Loops in UDFs

2016-06-14 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330247#comment-15330247
 ] 

Mike Dusenberry commented on SYSTEMML-633:
--

Awesome, thanks [~mboehm7]!  Looking forward to your thoughts on it.

> Improve Left-Indexing Performance with (Nested) Parfor Loops in UDFs
> 
>
> Key: SYSTEMML-633
> URL: https://issues.apache.org/jira/browse/SYSTEMML-633
> Project: SystemML
>  Issue Type: Improvement
>  Components: ParFor
>Reporter: Mike Dusenberry
>Priority: Blocker
> Attachments: Im2colWrapper.java, log.txt, log.txt, log_06.11.16.txt, 
> perf-dml.dml, perf-tests.tar.gz, perf-tf.py, perf.sh, run.sh, 
> systemml-nn-05.16.16.zip, systemml-nn.zip, time.txt, time_06.11.16.txt
>
>
> In the experimental deep learning DML library I've been building 
> ([https://github.com/dusenberrymw/systemml-nn|https://github.com/dusenberrymw/systemml-nn]),
>  I've experienced severe bottlenecks due to *left-indexing* in parfor loops.  
> Here, I will highlight a few particular instances with simplified examples, 
> but the same issue is shared across many areas of the library, particularly 
> in the convolution and max pooling layers, and is exaggerated in real 
> use-cases.
> *Quick note* on setup for any of the below experiments.  Please grab a copy 
> of the above repo (particularly the {{nn}} directory), and run any 
> experiments with the {{nn}} package available at the base directory of the 
> experiment.
> Scenario: *Convolution*
> * In the library above, the forward pass of the convolution function 
> ([{{conv::forward(...)}} | 
> https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L8]
>  in {{nn/layers/conv.dml}}) essentially accepts a matrix {{X}} of images, a 
> matrix of weights {{W}}, and several other parameters corresponding to image 
> sizes, filter sizes, etc.  It then loops through the images with a {{parfor}} 
> loop, and for each image it pads the image with {{util::pad_image}}, extracts 
> "patches" of the image into columns of a matrix in a sliding fashion across 
> the image with {{util::im2col}}, performs a matrix multiplication between the 
> matrix of patch columns and the weight matrix, and then saves the result into 
> a matrix defined outside of the parfor loop using left-indexing.
> * Left-indexing has been identified as the bottleneck by a wide margin.
> * Left-indexing is used in the main {{conv::forward(...)}} function in the 
> [last line in the parfor 
> loop|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L61],
>  in the 
> [{{util::pad_image(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L196]
>  function used by {{conv::forward(...)}}, as well as in the 
> [{{util::im2col(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L96]
>  function used by {{conv::forward(...)}}.
> * Test script (assuming the {{nn}} package is available):
> ** {{speed-633.dml}} {code}
> source("nn/layers/conv.dml") as conv
> source("nn/util.dml") as util
> # Generate data
> N = 64  # num examples
> C = 30  # num channels
> Hin = 28  # input height
> Win = 28  # input width
> F = 20  # num filters
> Hf = 3  # filter height
> Wf = 3  # filter width
> stride = 1
> pad = 1
> X = rand(rows=N, cols=C*Hin*Win)
> # Create layer
> [W, b] = conv::init(F, C, Hf, Wf)
> # Forward
> [out, Hout, Wout] = conv::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, 
> stride, pad, pad)
> print("Out: " + nrow(out) + "x" + ncol(out))
> print("Hout: " + Hout)
> print("Wout: " + Wout)
> print("")
> print(sum(out))
> {code}
> * Invocation:
> ** {{java -jar 
> $SYSTEMML_HOME/target/systemml-0.10.0-incubating-SNAPSHOT-standalone.jar -f 
> speed-633.dml -stats -explain -exec singlenode}}
> * Stats output (modified to output up to 100 instructions):
> ** {code}
> ...
> Total elapsed time:   26.834 sec.
> Total compilation time:   0.529 sec.
> Total execution time:   26.304 sec.
> Number of compiled MR Jobs: 0.
> Number of executed MR Jobs: 0.
> Cache hits (Mem, WB, FS, HDFS): 9196235/0/0/0.
> Cache writes (WB, FS, HDFS):  3070724/0/0.
> Cache times (ACQr/m, RLS, EXP): 1.474/1.120/26.998/0.000 sec.
> HOP DAGs recompiled (PRED, SB): 0/0.
> HOP DAGs recompile time:  0.268 sec.
> Functions recompiled:   129.
> Functions recompile time: 0.841 sec.
> ParFor loops optimized:   1.
> ParFor optimize time:   0.032 sec.
> ParFor initialize time:   0.015 sec.
> ParFor result merge time: 0.028 sec.
> ParFor total update in-place: 0/0/1559360
> Total JIT compile time:   14.235 sec.
> Total JVM GC count:   94.
> Total JVM GC time:0.366 sec.
> Heavy hitter 

[jira] [Updated] (SYSTEMML-547) Implement built-in functions for max and average pooling

2016-05-27 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-547:
-
Assignee: Niketan Pansare  (was: Nakul Jindal)

> Implement built-in functions for max and average pooling
> 
>
> Key: SYSTEMML-547
> URL: https://issues.apache.org/jira/browse/SYSTEMML-547
> Project: SystemML
>  Issue Type: New Feature
>  Components: Parser, Runtime
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
>Priority: Minor
> Fix For: SystemML 0.10
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> pool2d(input, pool_size, stride_length, border_mode="valid", pool_mode="max")
> Performs downscaling of the input matrix.
> The arguments to this function are:
> 1. input is a 2-dimensional matrix.
> 2. pool_size is a required integer parameter.
> 3. stride_length is an optional Int parameter. The default value is 1.
> 4. border_mode is an optional String parameter. The valid values are "same" 
> and "valid".
> 5. pool_mode is an optional String parameter. The valid values are "max" and 
> "avg". We can later add additional operators here (such as sum).
> For detailed documentation, see Theano's pool_2d function: 
> https://github.com/Theano/Theano/blob/master/theano/tensor/signal/pool.py#L40
> An an example, our pool2d(input=X, pool_size=2, stride_length=1, 
> border_mode="valid", pool_mode="avg") invocation is similar to Theano's 
> pool_2d(X, ds=(2,2), st=(1,1), ignore_border=True, padding=(0, 0), 
> mode="average_exc_pad")
> Since padding=(0,0) is the most common padding (probably the only one most 
> people will use), I thought of simplifying the interface by borrowing 
> concepts from TensorFlow's functions max_pool and avg_pool. See 
> https://www.tensorflow.org/versions/r0.7/api_docs/python/nn.html#avg_pool
> The above example will translate into following TensorFlow code:
> tf.nn.avg_pool(X, pool_size=(1,2,2,1), strides=(1,1,1,1), padding="VALID")
> Another good reference to understanding pooling operation is 
> http://cs231n.github.io/convolutional-networks/#pool
> [~mwdus...@us.ibm.com], [~nakul02], [~prithvi_r_s], [~reinw...@us.ibm.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (SYSTEMML-641) Performance features core block matrix multiply

2016-05-27 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry closed SYSTEMML-641.

Assignee: Matthias Boehm

> Performance features core block matrix multiply 
> 
>
> Key: SYSTEMML-641
> URL: https://issues.apache.org/jira/browse/SYSTEMML-641
> Project: SystemML
>  Issue Type: Task
>  Components: Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.10
>
>
> 1) Cache-conscious dense-dense with large skinny rhs (> L3 cache)
> 2) Scheduling improvements multi-threaded operations with short lhs
> 3) Column-wise parallelization with wide rhs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-633) Improve Left-Indexing Performance with (Nested) Parfor Loops in UDFs

2016-06-13 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-633:
-
Attachment: time_06.11.16.txt
log_06.11.16.txt
perf-tests.tar.gz

@mboehm I ran the experiment again from [commit 
c76b01a753837150c590c79557acdccb9d756a7e | 
https://github.com/apache/incubator-systemml/commit/c76b01a753837150c590c79557acdccb9d756a7e]
 on the same server, with the same singlenode execution mode.  The performance 
is similar, and it does not appear to be applying the update-in-place rule.  I 
also tried without singlenode flagged, but the performance was worse due to MR 
jobs, despite increasing the amount of memory excessively.

I've attached {{time_06.11.16.txt}} with the timings, and {{log_06.11.16.txt}} 
with the full log of all tests.

I also included {{perf-tests.tar.gz}}, which is a full tar archive that has 
everything needed to reproduce the results, minus the JAR file.  Based on 
upload size limits, I couldn't include the standalone SystemML JAR file -- just 
build from the above commit and drop the standalone JAR into the {{perf-tests}} 
folder.  For Python, you'll just need to quickly pip install TensorFlow with 
[these directions | 
https://www.tensorflow.org/versions/r0.9/get_started/os_setup.html#pip-installation].
   Execute the experiments with {{run.sh}}.

> Improve Left-Indexing Performance with (Nested) Parfor Loops in UDFs
> 
>
> Key: SYSTEMML-633
> URL: https://issues.apache.org/jira/browse/SYSTEMML-633
> Project: SystemML
>  Issue Type: Improvement
>  Components: ParFor
>Reporter: Mike Dusenberry
>Priority: Blocker
> Attachments: Im2colWrapper.java, log.txt, log.txt, log_06.11.16.txt, 
> perf-dml.dml, perf-tests.tar.gz, perf-tf.py, perf.sh, run.sh, 
> systemml-nn-05.16.16.zip, systemml-nn.zip, time.txt, time_06.11.16.txt
>
>
> In the experimental deep learning DML library I've been building 
> ([https://github.com/dusenberrymw/systemml-nn|https://github.com/dusenberrymw/systemml-nn]),
>  I've experienced severe bottlenecks due to *left-indexing* in parfor loops.  
> Here, I will highlight a few particular instances with simplified examples, 
> but the same issue is shared across many areas of the library, particularly 
> in the convolution and max pooling layers, and is exaggerated in real 
> use-cases.
> *Quick note* on setup for any of the below experiments.  Please grab a copy 
> of the above repo (particularly the {{nn}} directory), and run any 
> experiments with the {{nn}} package available at the base directory of the 
> experiment.
> Scenario: *Convolution*
> * In the library above, the forward pass of the convolution function 
> ([{{conv::forward(...)}} | 
> https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L8]
>  in {{nn/layers/conv.dml}}) essentially accepts a matrix {{X}} of images, a 
> matrix of weights {{W}}, and several other parameters corresponding to image 
> sizes, filter sizes, etc.  It then loops through the images with a {{parfor}} 
> loop, and for each image it pads the image with {{util::pad_image}}, extracts 
> "patches" of the image into columns of a matrix in a sliding fashion across 
> the image with {{util::im2col}}, performs a matrix multiplication between the 
> matrix of patch columns and the weight matrix, and then saves the result into 
> a matrix defined outside of the parfor loop using left-indexing.
> * Left-indexing has been identified as the bottleneck by a wide margin.
> * Left-indexing is used in the main {{conv::forward(...)}} function in the 
> [last line in the parfor 
> loop|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L61],
>  in the 
> [{{util::pad_image(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L196]
>  function used by {{conv::forward(...)}}, as well as in the 
> [{{util::im2col(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L96]
>  function used by {{conv::forward(...)}}.
> * Test script (assuming the {{nn}} package is available):
> ** {{speed-633.dml}} {code}
> source("nn/layers/conv.dml") as conv
> source("nn/util.dml") as util
> # Generate data
> N = 64  # num examples
> C = 30  # num channels
> Hin = 28  # input height
> Win = 28  # input width
> F = 20  # num filters
> Hf = 3  # filter height
> Wf = 3  # filter width
> stride = 1
> pad = 1
> X = rand(rows=N, cols=C*Hin*Win)
> # Create layer
> [W, b] = conv::init(F, C, Hf, Wf)
> # Forward
> [out, Hout, Wout] = conv::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, 
> stride, pad, pad)
> print("Out: " + nrow(out) + "x" + 

[jira] [Commented] (SYSTEMML-760) PYDML save function ijv and mm formats use 1-based indexing

2016-06-17 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15336341#comment-15336341
 ] 

Mike Dusenberry commented on SYSTEMML-760:
--

Yeah keeping the matrix market format the same is reasonable, as it is an 
existing standard.  The IJV format is questionable though, especially since 
Python and PyDML are 0-based, so it is more likely that 0-based IJV datasets 
will be used by these users.  We really should probably assume 0-based IJV for 
PyDML with an optional 1-based flag on reads.  That way, users in the Python 
ecosystem, which PyDML targets, will not have confusion over data formats, and 
yet at the same time will be able to use datasets from other communities.

> PYDML save function ijv and mm formats use 1-based indexing
> ---
>
> Key: SYSTEMML-760
> URL: https://issues.apache.org/jira/browse/SYSTEMML-760
> Project: SystemML
>  Issue Type: Task
>  Components: APIs
>Reporter: Deron Eriksson
>
> PYDML now uses 0-based indexing rather than 1-based indexing. Saving to ivj 
> (text) and mm (matrix market) formats uses 1-based matrices.
> The following code:
> {code}
> m = full("1 2 3 0 0 0 7 8 9 0 0 0", rows=4, cols=3)
> save(m, "m.txt", format="text")
> save(m, "m.mm", format="mm")
> {code}
> generates:
> m.txt:
> {code}
> 1 1 1.0
> 1 2 2.0
> 1 3 3.0
> 3 1 7.0
> 3 2 8.0
> 3 3 9.0
> {code}
> and
> m.mm:
> {code}
> %%MatrixMarket matrix coordinate real general
> 4 3 6
> 1 1 1.0
> 1 2 2.0
> 1 3 3.0
> 3 1 7.0
> 3 2 8.0
> 3 3 9.0
> {code}
> 0-based indexing for m.txt would be:
> {code}
> 0 0 1.0
> 0 1 2.0
> 0 2 3.0
> 2 0 7.0
> 2 1 8.0
> 2 2 9.0
> {code}
> A similar situation would exist for the m.mm file.
> Note: The reading of the matrices should also be 0-based if PYDML is 0-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-762) Fix the bug that causes local MR-Jobs when running in non-singlenode mode on MNIST data for Lenet script

2016-06-17 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-762:
-
Attachment: log2.txt

Attaching a new log file, {{log2.txt}}.

> Fix the bug that causes local MR-Jobs when running in non-singlenode mode on 
> MNIST data for Lenet script
> 
>
> Key: SYSTEMML-762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-762
> Project: SystemML
>  Issue Type: Bug
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
> Attachments: log.txt, log2.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-762) Fix the bug that causes local MR-Jobs when running in non-singlenode mode on MNIST data for Lenet script

2016-06-24 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-762:
-
Attachment: log4.txt

Attaching {{log4.txt}} based on running with {{java -Xmx50g -Xms50g -Xmn2048m 
-server -jar 
$SYSTEMML_HOME/target/systemml-0.11.0-incubating-SNAPSHOT-standalone.jar -f 
lenet-train.dml -explain recompile_hops -explain recompile_runtime -stats 
-nvargs X=train_images.csv Y=train_labels.csv Xt=test_images.csv 
Yt=test_labels.csv Xv=val_images.csv Yv=val_labels.csv FMAPS1=32 FMAPS2=64 
NODES=512 lambda=5e-04}}.

> Fix the bug that causes local MR-Jobs when running in non-singlenode mode on 
> MNIST data for Lenet script
> 
>
> Key: SYSTEMML-762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-762
> Project: SystemML
>  Issue Type: Bug
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
> Attachments: log.txt, log2.txt, log3.txt, log4.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-762) Fix the bug that causes local MR-Jobs when running in non-singlenode mode on MNIST data for Lenet script

2016-06-24 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348898#comment-15348898
 ] 

Mike Dusenberry commented on SYSTEMML-762:
--

Looks like MR jobs are being created now for the CSV reblocking.

> Fix the bug that causes local MR-Jobs when running in non-singlenode mode on 
> MNIST data for Lenet script
> 
>
> Key: SYSTEMML-762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-762
> Project: SystemML
>  Issue Type: Bug
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
> Attachments: log.txt, log2.txt, log3.txt, log4.txt, log5.txt, log6.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (SYSTEMML-762) Fix the bug that causes local MR-Jobs when running in non-singlenode mode on MNIST data for Lenet script

2016-06-24 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry reopened SYSTEMML-762:
--

[~niketanpansare] Thanks for working on this!  Unfortunately, I'm still seeing 
MR jobs with the latest build given the same experimental setup.  Can you look 
into this further?

> Fix the bug that causes local MR-Jobs when running in non-singlenode mode on 
> MNIST data for Lenet script
> 
>
> Key: SYSTEMML-762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-762
> Project: SystemML
>  Issue Type: Bug
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
> Attachments: log.txt, log2.txt, log3.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-762) Fix the bug that causes local MR-Jobs when running in non-singlenode mode on MNIST data for Lenet script

2016-06-24 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-762:
-
Attachment: log5.txt

Attaching {{log5.txt}} based on running with {code}java -Xmx50g -Xms50g 
-Xmn2048m -server -jar 
$SYSTEMML_HOME/target/systemml-0.11.0-incubating-SNAPSHOT-standalone.jar -f 
lenet-train.dml -explain recompile_runtime -stats -nvargs X=train_images.csv 
Y=train_labels.csv Xt=test_images.csv Yt=test_labels.csv Xv=val_images.csv 
Yv=val_labels.csv FMAPS1=32 FMAPS2=64 NODES=512 lambda=5e-04{code}.

> Fix the bug that causes local MR-Jobs when running in non-singlenode mode on 
> MNIST data for Lenet script
> 
>
> Key: SYSTEMML-762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-762
> Project: SystemML
>  Issue Type: Bug
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
> Attachments: log.txt, log2.txt, log3.txt, log4.txt, log5.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-762) Fix the bug that causes local MR-Jobs when running in non-singlenode mode on MNIST data for Lenet script

2016-06-24 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-762:
-
Attachment: log7.txt

Attaching {{log7.txt}} based on running with {code}$SPARK_HOME/bin/spark-submit 
--master local --driver-memory 50G --executor-memory 50G --conf 
spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128 
$SYSTEMML_HOME/target/SystemML.jar -f lenet-train.dml -stats -nvargs 
X=train_images.csv Y=train_labels.csv Xt=test_images.csv Yt=test_labels.csv 
Xv=val_images.csv Yv=val_labels.csv FMAPS1=32 FMAPS2=64 NODES=512 
lambda=5e-04{code}.

> Fix the bug that causes local MR-Jobs when running in non-singlenode mode on 
> MNIST data for Lenet script
> 
>
> Key: SYSTEMML-762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-762
> Project: SystemML
>  Issue Type: Bug
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
> Attachments: log.txt, log2.txt, log3.txt, log4.txt, log5.txt, 
> log6.txt, log7.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-516) Index Range Slicing Should Allow Implicit Upper Or Lower Bounds

2016-02-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-516:
-
Description: 
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[,2:]  # select all rows, and all columns except the first one
{code}.

  was:
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified.  It would be useful to be able to specify *either* a lower *or* 
upper bound, with the missing bound implicitly added internally.  This would 
allow for scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[,2:]  # select all rows, and all columns except the first one
{code}.


> Index Range Slicing Should Allow Implicit Upper Or Lower Bounds
> ---
>
> Key: SYSTEMML-516
> URL: https://issues.apache.org/jira/browse/SYSTEMML-516
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>
> DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
> 2:6]}}.  However, this currently requires that *both* a lower *and* upper 
> bound be specified.
> It would be useful to be able to specify *either* a lower *or* upper bound, 
> with the missing bound implicitly added internally.  This would allow for 
> scenarios such as selecting all columns *except* the first one, as in
> {code}
> data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
> X = X[,2:]  # select all rows, and all columns except the first one
> {code}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-516) Index Range Slicing Should Allow Implicit Upper Or Lower Bounds

2016-02-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-516:
-
Description: 
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified for a given row or column range.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[1:4, 2:]  # select rows 1 to 4, and columns 2 to ncol(X)
{code}.

This is the same functionality that [NumPy provides 
|http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html].

  was:
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified for a given row or column range.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[1:4, 2:]  # select rows 1 to 4, and columns 2 to ncol(X)
{code}.


> Index Range Slicing Should Allow Implicit Upper Or Lower Bounds
> ---
>
> Key: SYSTEMML-516
> URL: https://issues.apache.org/jira/browse/SYSTEMML-516
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>
> DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
> 2:6]}}.  However, this currently requires that *both* a lower *and* upper 
> bound be specified for a given row or column range.
> It would be useful to be able to specify *either* a lower *or* upper bound, 
> with the missing bound implicitly added internally.  This would allow for 
> scenarios such as selecting all columns *except* the first one, as in
> {code}
> data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
> X = X[1:4, 2:]  # select rows 1 to 4, and columns 2 to ncol(X)
> {code}.
> This is the same functionality that [NumPy provides 
> |http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-516) Index Range Slicing Should Allow Implicit Upper Or Lower Bounds

2016-02-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-516:
-
Description: 
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified for a given row or column range.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[1:4, 2:]  # select rows 1-4, and columns 2-numColumns
{code}.

  was:
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[,2:]  # select all rows, and all columns except the first one
{code}.


> Index Range Slicing Should Allow Implicit Upper Or Lower Bounds
> ---
>
> Key: SYSTEMML-516
> URL: https://issues.apache.org/jira/browse/SYSTEMML-516
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>
> DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
> 2:6]}}.  However, this currently requires that *both* a lower *and* upper 
> bound be specified for a given row or column range.
> It would be useful to be able to specify *either* a lower *or* upper bound, 
> with the missing bound implicitly added internally.  This would allow for 
> scenarios such as selecting all columns *except* the first one, as in
> {code}
> data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
> X = X[1:4, 2:]  # select rows 1-4, and columns 2-numColumns
> {code}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-516) Index Range Slicing Should Allow Implicit Upper Or Lower Bounds

2016-02-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-516:
-
Description: 
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified for a given row or column range.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[1:4, 2:]  # select rows 1 to 4, and columns 2 to numColumns
{code}.

  was:
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified for a given row or column range.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[1:4, 2:]  # select rows 1-4, and columns 2-numColumns
{code}.


> Index Range Slicing Should Allow Implicit Upper Or Lower Bounds
> ---
>
> Key: SYSTEMML-516
> URL: https://issues.apache.org/jira/browse/SYSTEMML-516
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>
> DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
> 2:6]}}.  However, this currently requires that *both* a lower *and* upper 
> bound be specified for a given row or column range.
> It would be useful to be able to specify *either* a lower *or* upper bound, 
> with the missing bound implicitly added internally.  This would allow for 
> scenarios such as selecting all columns *except* the first one, as in
> {code}
> data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
> X = X[1:4, 2:]  # select rows 1 to 4, and columns 2 to numColumns
> {code}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-516) Index Range Slicing Should Allow Implicit Upper Or Lower Bounds

2016-02-12 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-516:
-
Description: 
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified for a given row or column range.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[1:4, 2:]  # select rows 1 to 4, and columns 2 to ncol(X)
{code}.

  was:
DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified for a given row or column range.

It would be useful to be able to specify *either* a lower *or* upper bound, 
with the missing bound implicitly added internally.  This would allow for 
scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[1:4, 2:]  # select rows 1 to 4, and columns 2 to numColumns
{code}.


> Index Range Slicing Should Allow Implicit Upper Or Lower Bounds
> ---
>
> Key: SYSTEMML-516
> URL: https://issues.apache.org/jira/browse/SYSTEMML-516
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>
> DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
> 2:6]}}.  However, this currently requires that *both* a lower *and* upper 
> bound be specified for a given row or column range.
> It would be useful to be able to specify *either* a lower *or* upper bound, 
> with the missing bound implicitly added internally.  This would allow for 
> scenarios such as selecting all columns *except* the first one, as in
> {code}
> data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
> X = X[1:4, 2:]  # select rows 1 to 4, and columns 2 to ncol(X)
> {code}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-516) Index Range Slicing Should Allow Implicit Upper Or Lower Bounds

2016-02-12 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-516:


 Summary: Index Range Slicing Should Allow Implicit Upper Or Lower 
Bounds
 Key: SYSTEMML-516
 URL: https://issues.apache.org/jira/browse/SYSTEMML-516
 Project: SystemML
  Issue Type: Improvement
Reporter: Mike Dusenberry


DML allows for index slicing of matrices for specified ranges, as in {{X[1:4, 
2:6]}}.  However, this currently requires that *both* a lower *and* upper bound 
be specified.  It would be useful to be able to specify *either* a lower *or* 
upper bound, with the missing bound implicitly added internally.  This would 
allow for scenarios such as selecting all columns *except* the first one, as in
{code}
data = rand(rows=10, cols=20, min=0, max=1, pdf="uniform", sparsity=0.2)
X = X[,2:]  # select all rows, and all columns except the first one
{code}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-512) DML Script With UDFs Results In Out Of Memory Error As Compared to Without UDFs

2016-02-11 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-512:
-
Summary: DML Script With UDFs Results In Out Of Memory Error As Compared to 
Without UDFs  (was: DML Script With UDFs Results In Out Of Memory Error)

> DML Script With UDFs Results In Out Of Memory Error As Compared to Without 
> UDFs
> ---
>
> Key: SYSTEMML-512
> URL: https://issues.apache.org/jira/browse/SYSTEMML-512
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>
> Currently, the following script for running a simple version of Poisson 
> non-negative matrix factorization (PNMF) runs in linear time as desired:
> {code}
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF
> n = nrow(V)
> m = ncol(V)
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> # compute negative log-likelihood
> negloglik_temp = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> However, a small refactoring of this same script to pull the core PNMF 
> algorithm and the negative log-likelihood computation out into separate UDFs 
> results in non-linear runtime and a Java out of memory heap error on the same 
> dataset.  
> {code}
> pnmf = function(matrix[double] V, integer max_iteration, integer rank) return 
> (matrix[double] W, matrix[double] H) {
> n = nrow(V)
> m = ncol(V)
> 
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> 
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> }
> negloglikfunc = function(matrix[double] V, matrix[double] W, matrix[double] 
> H) return (double negloglik) {
> negloglik = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> }
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF and evaluate
> [W, H] = pnmf(V, max_iteration, rank)
> negloglik_temp = negloglikfunc(V, W, H)
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> The expectation would be that such modularization at the DML level should be 
> allowed without any impact on performance.
> Details:
> - Data: Amazon product co-purchasing dataset from Stanford 
> [http://snap.stanford.edu/data/amazon0601.html | 
> http://snap.stanford.edu/data/amazon0601.html]
> - Execution mode: Spark {{MLContext}}, but should be applicable to 
> command-line invocation as well. 
> - Error message:
> {code}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:415)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.sparseToDense(MatrixBlock.java:1212)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.examSparsity(MatrixBlock.java:1103)
>   at 
> org.apache.sysml.runtime.instructions.cp.MatrixMatrixArithmeticCPInstruction.processInstruction(MatrixMatrixArithmeticCPInstruction.java:60)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:227)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:169)
>   at 
> org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:183)
>   at 
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:115)
>   at 
> org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:227)
> 

[jira] [Updated] (SYSTEMML-540) Deep Learning

2016-02-25 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-540:
-
Description: 
This epic covers the addition of deep learning to SystemML, including:

* Core DML layer abstractions for deep (convolutional) neural nets.
* DML language support as necessary.
* DML code generation (Caffe, Torch, Theano, TensoFlow, etc. integration)
* etc.

  was:
This epic covers the addition of deep learning to SystemML, including:

* Core DML layer abstractions for deep (convolutional) neural nets.
* DML language support as necessary.
* DML code generation (Caffe, Theano, etc. integration)
* etc.


> Deep Learning
> -
>
> Key: SYSTEMML-540
> URL: https://issues.apache.org/jira/browse/SYSTEMML-540
> Project: SystemML
>  Issue Type: Epic
>Reporter: Mike Dusenberry
>
> This epic covers the addition of deep learning to SystemML, including:
> * Core DML layer abstractions for deep (convolutional) neural nets.
> * DML language support as necessary.
> * DML code generation (Caffe, Torch, Theano, TensoFlow, etc. integration)
> * etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-512) DML Script With UDFs Results In Out Of Memory Error As Compared to Without UDFs

2016-02-17 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151623#comment-15151623
 ] 

Mike Dusenberry commented on SYSTEMML-512:
--

[~mboehm7] I've added two Scala files with code that expresses the issue.  
{{test1.scala}} works correctly, and {{test2.scala}} has the issue described 
above.  The only difference is the PNMF script stored in {{val pnmf = ...}}.  
To replicate this, I used {{$SPARK_HOME/bin/spark-shell --master local[*] 
--driver-memory 1G --jars $SYSTEMML_HOME/target/SystemML.jar}}, and then 
{{:load test1.scala}} and {{:load test2.scala}} to run the scripts.  You will 
need the Amazon data in the same directory.

> DML Script With UDFs Results In Out Of Memory Error As Compared to Without 
> UDFs
> ---
>
> Key: SYSTEMML-512
> URL: https://issues.apache.org/jira/browse/SYSTEMML-512
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
> Attachments: test1.scala, test2.scala
>
>
> Currently, the following script for running a simple version of Poisson 
> non-negative matrix factorization (PNMF) runs in linear time as desired:
> {code}
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF
> n = nrow(V)
> m = ncol(V)
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> # compute negative log-likelihood
> negloglik_temp = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> However, a small refactoring of this same script to pull the core PNMF 
> algorithm and the negative log-likelihood computation out into separate UDFs 
> results in non-linear runtime and a Java out of memory heap error on the same 
> dataset.  
> {code}
> pnmf = function(matrix[double] V, integer max_iteration, integer rank) return 
> (matrix[double] W, matrix[double] H) {
> n = nrow(V)
> m = ncol(V)
> 
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> 
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> }
> negloglikfunc = function(matrix[double] V, matrix[double] W, matrix[double] 
> H) return (double negloglik) {
> negloglik = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> }
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF and evaluate
> [W, H] = pnmf(V, max_iteration, rank)
> negloglik_temp = negloglikfunc(V, W, H)
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> The expectation would be that such modularization at the DML level should be 
> allowed without any impact on performance.
> Details:
> - Data: Amazon product co-purchasing dataset from Stanford 
> [http://snap.stanford.edu/data/amazon0601.html | 
> http://snap.stanford.edu/data/amazon0601.html]
> - Execution mode: Spark {{MLContext}}, but should be applicable to 
> command-line invocation as well. 
> - Error message:
> {code}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:415)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.sparseToDense(MatrixBlock.java:1212)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.examSparsity(MatrixBlock.java:1103)
>   at 
> org.apache.sysml.runtime.instructions.cp.MatrixMatrixArithmeticCPInstruction.processInstruction(MatrixMatrixArithmeticCPInstruction.java:60)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:227)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:169)
>   at 
> org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:183)
>   at 
> 

[jira] [Comment Edited] (SYSTEMML-512) DML Script With UDFs Results In Out Of Memory Error As Compared to Without UDFs

2016-02-17 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151623#comment-15151623
 ] 

Mike Dusenberry edited comment on SYSTEMML-512 at 2/18/16 2:52 AM:
---

[~mboehm7] I've added two Scala files with code that expresses the issue.  
{{test1.scala}} works correctly, and {{test2.scala}} has the issue described 
above.  The only difference is the PNMF script stored in {{val pnmf = ...}}.  
To replicate this, I used {{$SPARK_HOME/bin/spark-shell --master local[*] 
--driver-memory 1G --jars $SYSTEMML_HOME/target/SystemML.jar}}, and then 
{{:load test1.scala}} and {{:load test2.scala}} to run the scripts.  You will 
need the Amazon data in the same directory.

Also, smaller data sizes (2000) will allow {{test2.scala}} to run to 
completion, but it will run much slower than {{test1.scala}}.: 


was (Author: mwdus...@us.ibm.com):
[~mboehm7] I've added two Scala files with code that expresses the issue.  
{{test1.scala}} works correctly, and {{test2.scala}} has the issue described 
above.  The only difference is the PNMF script stored in {{val pnmf = ...}}.  
To replicate this, I used {{$SPARK_HOME/bin/spark-shell --master local[*] 
--driver-memory 1G --jars $SYSTEMML_HOME/target/SystemML.jar}}, and then 
{{:load test1.scala}} and {{:load test2.scala}} to run the scripts.  You will 
need the Amazon data in the same directory.

> DML Script With UDFs Results In Out Of Memory Error As Compared to Without 
> UDFs
> ---
>
> Key: SYSTEMML-512
> URL: https://issues.apache.org/jira/browse/SYSTEMML-512
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
> Attachments: test1.scala, test2.scala
>
>
> Currently, the following script for running a simple version of Poisson 
> non-negative matrix factorization (PNMF) runs in linear time as desired:
> {code}
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF
> n = nrow(V)
> m = ncol(V)
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> # compute negative log-likelihood
> negloglik_temp = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> However, a small refactoring of this same script to pull the core PNMF 
> algorithm and the negative log-likelihood computation out into separate UDFs 
> results in non-linear runtime and a Java out of memory heap error on the same 
> dataset.  
> {code}
> pnmf = function(matrix[double] V, integer max_iteration, integer rank) return 
> (matrix[double] W, matrix[double] H) {
> n = nrow(V)
> m = ncol(V)
> 
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> 
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> }
> negloglikfunc = function(matrix[double] V, matrix[double] W, matrix[double] 
> H) return (double negloglik) {
> negloglik = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> }
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF and evaluate
> [W, H] = pnmf(V, max_iteration, rank)
> negloglik_temp = negloglikfunc(V, W, H)
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> The expectation would be that such modularization at the DML level should be 
> allowed without any impact on performance.
> Details:
> - Data: Amazon product co-purchasing dataset from Stanford 
> [http://snap.stanford.edu/data/amazon0601.html | 
> http://snap.stanford.edu/data/amazon0601.html]
> - Execution mode: Spark {{MLContext}}, but should be applicable to 
> command-line invocation as well. 
> - Error message:
> {code}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:415)
>   at 
> 

[jira] [Commented] (SYSTEMML-512) DML Script With UDFs Results In Out Of Memory Error As Compared to Without UDFs

2016-02-18 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152761#comment-15152761
 ] 

Mike Dusenberry commented on SYSTEMML-512:
--

[~mboehm7] Confirmed -- the OOM issue is indeed related to the young generation 
heap size.  Setting -Xmn=100M with driver memory still set to 1G allows the 
script to run.  Is there anything we can do internally to avoid this?

For clarity to anyone else reading this, the long runtime issue is still 
present.

> DML Script With UDFs Results In Out Of Memory Error As Compared to Without 
> UDFs
> ---
>
> Key: SYSTEMML-512
> URL: https://issues.apache.org/jira/browse/SYSTEMML-512
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
> Attachments: test1.scala, test2.scala
>
>
> Currently, the following script for running a simple version of Poisson 
> non-negative matrix factorization (PNMF) runs in linear time as desired:
> {code}
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF
> n = nrow(V)
> m = ncol(V)
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> # compute negative log-likelihood
> negloglik_temp = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> However, a small refactoring of this same script to pull the core PNMF 
> algorithm and the negative log-likelihood computation out into separate UDFs 
> results in non-linear runtime and a Java out of memory heap error on the same 
> dataset.  
> {code}
> pnmf = function(matrix[double] V, integer max_iteration, integer rank) return 
> (matrix[double] W, matrix[double] H) {
> n = nrow(V)
> m = ncol(V)
> 
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> 
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> }
> negloglikfunc = function(matrix[double] V, matrix[double] W, matrix[double] 
> H) return (double negloglik) {
> negloglik = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> }
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF and evaluate
> [W, H] = pnmf(V, max_iteration, rank)
> negloglik_temp = negloglikfunc(V, W, H)
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> The expectation would be that such modularization at the DML level should be 
> allowed without any impact on performance.
> Details:
> - Data: Amazon product co-purchasing dataset from Stanford 
> [http://snap.stanford.edu/data/amazon0601.html | 
> http://snap.stanford.edu/data/amazon0601.html]
> - Execution mode: Spark {{MLContext}}, but should be applicable to 
> command-line invocation as well. 
> - Error message:
> {code}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:415)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.sparseToDense(MatrixBlock.java:1212)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.examSparsity(MatrixBlock.java:1103)
>   at 
> org.apache.sysml.runtime.instructions.cp.MatrixMatrixArithmeticCPInstruction.processInstruction(MatrixMatrixArithmeticCPInstruction.java:60)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:227)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:169)
>   at 
> org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:183)
>   at 
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:115)
>   at 
> 

[jira] [Created] (SYSTEMML-581) Add Scala API Tests to Maven Test Suites

2016-03-18 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-581:


 Summary: Add Scala API Tests to Maven Test Suites
 Key: SYSTEMML-581
 URL: https://issues.apache.org/jira/browse/SYSTEMML-581
 Project: SystemML
  Issue Type: New Feature
Reporter: Mike Dusenberry
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-577) Add High-Level "executeScript" API to Python MLContext

2016-03-15 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-577:
-
Description: 
This adds the {{executeScript(...)}} function to the Python MLContext API, and 
in the process hides the need to use {{registerInput(...)}} and 
{{registerOutput(...)}} by allowing the user to pass in a dictionary of 
key:value inputs of any type, and an array of outputs to keep.

Example:
{code}
pnmf = """ // script here """
outputs = ml.executeScript(pnmf, {"X": X_train, "maxiter": 100, "rank": 10}, 
["W", "H", "negloglik"])
{code}

  was:
This adds the {{executeScript(...)}} function to the Python MLContext API, and 
in the process hides the need to use {{registerInput(...)}} and 
{{registerOutput(...)}} by allowing the user to pass in a dictionary of 
key:value inputs of any type, and an array of outputs to keep.

Example:
{code}
pnmf = """ // script here """
outputs = ml.executeScript(pnmf, {"X": X_train, "maxiter": 100, "rank": 10}, 
["W", "H", "negloglik"])
{code}}


> Add High-Level "executeScript" API to Python MLContext
> --
>
> Key: SYSTEMML-577
> URL: https://issues.apache.org/jira/browse/SYSTEMML-577
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
>
> This adds the {{executeScript(...)}} function to the Python MLContext API, 
> and in the process hides the need to use {{registerInput(...)}} and 
> {{registerOutput(...)}} by allowing the user to pass in a dictionary of 
> key:value inputs of any type, and an array of outputs to keep.
> Example:
> {code}
> pnmf = """ // script here """
> outputs = ml.executeScript(pnmf, {"X": X_train, "maxiter": 100, "rank": 10}, 
> ["W", "H", "negloglik"])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-579) Packing our algorithm scripts into JAR

2016-03-19 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-579:
-
Description: 
Packing our algorithm to JAR without look into the user's filesystem.

We should look into the possibility of packing our algorithm scripts into the 
JAR during build time as perhaps a Maven "resource" that would be available to 
the Java process without needing to look into the user's filesystem.  This 
should help with the Scala API introduced in SYSTEMML-580.  One issue I see 
with the current approach is if a user wishes to attach the SystemML JAR to a 
cloud notebook (such as Databricks Cloud) in which an environment variable may 
not be able to be set, the API will not function.

  was:Packing our algorithm to JAR without look into the user's filesystem.


> Packing our algorithm scripts into JAR
> --
>
> Key: SYSTEMML-579
> URL: https://issues.apache.org/jira/browse/SYSTEMML-579
> Project: SystemML
>  Issue Type: Task
>  Components: Algorithms, APIs
>Affects Versions: SystemML 0.9
>Reporter: Tommy Yu
>Priority: Minor
>
> Packing our algorithm to JAR without look into the user's filesystem.
> We should look into the possibility of packing our algorithm scripts into the 
> JAR during build time as perhaps a Maven "resource" that would be available 
> to the Java process without needing to look into the user's filesystem.  This 
> should help with the Scala API introduced in SYSTEMML-580.  One issue I see 
> with the current approach is if a user wishes to attach the SystemML JAR to a 
> cloud notebook (such as Databricks Cloud) in which an environment variable 
> may not be able to be set, the API will not function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (SYSTEMML-545) Document Scala build support in Eclipse

2016-03-19 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry closed SYSTEMML-545.


> Document Scala build support in Eclipse
> ---
>
> Key: SYSTEMML-545
> URL: https://issues.apache.org/jira/browse/SYSTEMML-545
> Project: SystemML
>  Issue Type: Improvement
>  Components: Build
>Reporter: Glenn Weidner
>Assignee: Glenn Weidner
>
> In preparation for [SYSTEMML-543 Refactor MLContext in 
> Scala|https://issues.apache.org/jira/browse/SYSTEMML-543], the project build 
> needs to support Scala in Eclipse.  Initial investigation and discussion can 
> be found in [PR70|https://github.com/apache/incubator-systemml/pull/70].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-617) Override default namespace for imported custom functions

2016-04-07 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-617:
-
Assignee: Glenn Weidner

> Override default namespace for imported custom functions
> 
>
> Key: SYSTEMML-617
> URL: https://issues.apache.org/jira/browse/SYSTEMML-617
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Parser
>Reporter: Glenn Weidner
>Assignee: Glenn Weidner
>Priority: Minor
>
> This sub-task targets a specific scenario described in SYSTEMML-590.
> Example of error:
> org.apache.sysml.api.DMLException: org.apache.sysml.parser.LanguageException: 
> ERROR: null -- line 0, column 0 -- function g is undefined in namespace 
> .defaultNS
>   at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:350)
>   at org.apache.sysml.api.DMLScript.main(DMLScript.java:197)
> ...
> Caused by: org.apache.sysml.parser.LanguageException: ERROR: null -- line 0, 
> column 0 -- function g is undefined in namespace .defaultNS
>   at 
> org.apache.sysml.parser.StatementBlock.isMergeableFunctionCallBlock(StatementBlock.java:201)
>   at 
> org.apache.sysml.parser.StatementBlock.mergeFunctionCalls(StatementBlock.java:328)
>   at 
> org.apache.sysml.parser.DMLTranslator.liveVariableAnalysis(DMLTranslator.java:165)
>   at org.apache.sysml.api.DMLScript.execute(DMLScript.java:592)
>   at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:338)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-590) Improve Namespace Handling for UDFs

2016-04-07 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230474#comment-15230474
 ] 

Mike Dusenberry commented on SYSTEMML-590:
--

Thanks, [~gweidner]!  I've linked SYSTEMML-617 to this JIRA as well as a 
dependent.

> Improve Namespace Handling for UDFs
> ---
>
> Key: SYSTEMML-590
> URL: https://issues.apache.org/jira/browse/SYSTEMML-590
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Mike Dusenberry
>
> Currently, if a UDF body involves calling another UDF, the default global 
> namespace is assumed, unless a namespace is explicitly indicated.  This 
> becomes a problem when a file contains UDFs, and is then sourced from another 
> script.
> Imagine a file {{funcs.dml}} as follows:
> {code}
> f = function(double x, int a) return (double ans) {
>   x2 = g(x)
>   ans = a * x2
> }
> g = function(double x) return (double ans) {
>   ans = x * x
> }
> {code}
> Then, let's try to call {{f}}:
> {code}
> script = """
> source ("funcs.dml") as funcs
> ans = funcs::f(3, 1)
> print(ans)
> """
> ml.reset()
> ml.executeScript(script)
> {code}
> This results in an error since {{f}} is in the {{funcs}} namespace, but the 
> call to {{g}} assumes {{g}} is still in the default namespace.  Clearly, the 
> user intends to the use the {{g}} that is located in the same file.
> Currently, we would need to adjust {{funcs.dml}} as follows to explicitly 
> assume that {{f}} and {{g}} are in a {{funcs}} namespace:
> {code}
> f = function(double x, int a) return (double ans) {
>   x2 = funcs::g(x)
>   ans = a * x2
> }
> g = function(double x) return (double ans) {
>   ans = x * x
> }
> {code}
> Instead, it would be better to simply first look for {{g}} in its parent's 
> namespace.  In this case, the "parent" would be the function {{f}}, and the 
> namespace we have selected is {{funcs}}, although that choice would be left 
> up to the end-user.  Then, namespace assumptions would not be necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-618) Deep Learning DML Library

2016-04-06 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-618:
-
Issue Type: New Feature  (was: Sub-task)
Parent: (was: SYSTEMML-540)

> Deep Learning DML Library
> -
>
> Key: SYSTEMML-618
> URL: https://issues.apache.org/jira/browse/SYSTEMML-618
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>
> Create an experimental, layers-based library in pure DML to contain layer 
> abstractions with simple forward/backward APIs for affine, convolution (start 
> with 2D), max-pooling, non-linearities (relu, sigmoid, softmax, etc.), 
> dropout, loss functions, other layers, optimizers, and gradient checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-618) Deep Learning DML Library

2016-04-06 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-618:


 Summary: Deep Learning DML Library
 Key: SYSTEMML-618
 URL: https://issues.apache.org/jira/browse/SYSTEMML-618
 Project: SystemML
  Issue Type: Sub-task
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry


Create an experimental, layers-based library in pure DML to contain layer 
abstractions with simple forward/backward APIs for affine, convolution (start 
with 2D), max-pooling, non-linearities (relu, sigmoid, softmax, etc.), dropout, 
loss functions, other layers, optimizers, and gradient checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (SYSTEMML-619) Same usage of same random matrix leads to different results

2016-04-06 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229134#comment-15229134
 ] 

Mike Dusenberry edited comment on SYSTEMML-619 at 4/6/16 9:18 PM:
--

It continues:

Replace {{dX = 2 * X}} with the equivalent {{dX = 2 * (X+0.1-0.1)}}:

{code}
# Generate data
N = 3
D = 2
X = rand(rows=N, cols=D)

# Function
dX = 2 * (X+0.1-0.1)

# Print elements of dX
for (i in 1:nrow(dX)) {
  for (j in 1:ncol(dX)) {
print("dX["+i+","+j+"]: " + as.scalar(dX[i,j]))
  }
}
print("")

# Function
dX = 2 * (X+0.1-0.1)

# Print elements of dX
for (i in 1:nrow(dX)) {
  for (j in 1:ncol(dX)) {
print("dX["+i+","+j+"]: " + as.scalar(dX[i,j]))
  }
}
print("")


print("")
print("")
print("")
{code}

{code}
dX[1,1]: 0.3631027757233218
dX[1,2]: 0.7463204132762002
dX[2,1]: 1.396631386586599
dX[2,2]: 0.4570007065789227
dX[3,1]: 0.7576032950257092
dX[3,2]: 1.191150376801223

dX[1,1]: 1.0917431959015071
dX[1,2]: 1.0804855841809473
dX[2,1]: 0.36778443121041454
dX[2,2]: 1.2503688857941493
dX[3,1]: 1.8974469686808841
dX[3,2]: 0.20545803320254397
{code}

Now, swap the order of the addition and subtraction and replace it with {{dX = 
2 * (X-0.1+0.1)}}:

{code}
# Generate data
N = 3
D = 2
X = rand(rows=N, cols=D)

# Function
dX = 2 * (X-0.1+0.1)

# Print elements of dX
for (i in 1:nrow(dX)) {
  for (j in 1:ncol(dX)) {
print("dX["+i+","+j+"]: " + as.scalar(dX[i,j]))
  }
}
print("")

# Function
dX = 2 * (X-0.1+0.1)

# Print elements of dX
for (i in 1:nrow(dX)) {
  for (j in 1:ncol(dX)) {
print("dX["+i+","+j+"]: " + as.scalar(dX[i,j]))
  }
}
print("")


print("")
print("")
print("")
{code}

{code}
dX[1,1]: 0.5866339382864822
dX[1,2]: 0.23074794178126168
dX[2,1]: 0.46171941318733056
dX[2,2]: 0.2481946969297868
dX[3,1]: 1.6808807265603098
dX[3,2]: 1.6798556486631278

dX[1,1]: 0.5866339382864822
dX[1,2]: 0.23074794178126168
dX[2,1]: 0.4617194131873305
dX[2,2]: 0.2481946969297868
dX[3,1]: 1.6808807265603098
dX[3,2]: 1.6798556486631278
{code}


was (Author: mwdus...@us.ibm.com):
It continues:

Replace {{dX = 2 * X}} with the equivalent {{dX = 2 * (X+0.1-0.1)}}:

{code}
# Generate data
N = 3
D = 2
X = rand(rows=N, cols=D)

# Function
dX = 2 * (X+0.1-0.1)

# Print elements of dX
for (i in 1:nrow(dX)) {
  for (j in 1:ncol(dX)) {
print("dX["+i+","+j+"]: " + as.scalar(dX[i,j]))
  }
}
print("")

# Function
dX = 2 * (X+0.1-0.1)

# Print elements of dX
for (i in 1:nrow(dX)) {
  for (j in 1:ncol(dX)) {
print("dX["+i+","+j+"]: " + as.scalar(dX[i,j]))
  }
}
print("")


print("")
print("")
print("")
{code}

{code}
dX[1,1]: 0.3631027757233218
dX[1,2]: 0.7463204132762002
dX[2,1]: 1.396631386586599
dX[2,2]: 0.4570007065789227
dX[3,1]: 0.7576032950257092
dX[3,2]: 1.191150376801223

dX[1,1]: 1.0917431959015071
dX[1,2]: 1.0804855841809473
dX[2,1]: 0.36778443121041454
dX[2,2]: 1.2503688857941493
dX[3,1]: 1.8974469686808841
dX[3,2]: 0.20545803320254397
{code}

Now, swap the order of the addition and subtraction and replace it with dX 
= 2 * (X-0.1+0.1}}:

{code}
# Generate data
N = 3
D = 2
X = rand(rows=N, cols=D)

# Function
dX = 2 * (X-0.1+0.1)

# Print elements of dX
for (i in 1:nrow(dX)) {
  for (j in 1:ncol(dX)) {
print("dX["+i+","+j+"]: " + as.scalar(dX[i,j]))
  }
}
print("")

# Function
dX = 2 * (X-0.1+0.1)

# Print elements of dX
for (i in 1:nrow(dX)) {
  for (j in 1:ncol(dX)) {
print("dX["+i+","+j+"]: " + as.scalar(dX[i,j]))
  }
}
print("")


print("")
print("")
print("")
{code}

{code}
dX[1,1]: 0.5866339382864822
dX[1,2]: 0.23074794178126168
dX[2,1]: 0.46171941318733056
dX[2,2]: 0.2481946969297868
dX[3,1]: 1.6808807265603098
dX[3,2]: 1.6798556486631278

dX[1,1]: 0.5866339382864822
dX[1,2]: 0.23074794178126168
dX[2,1]: 0.4617194131873305
dX[2,2]: 0.2481946969297868
dX[3,1]: 1.6808807265603098
dX[3,2]: 1.6798556486631278
{code}

> Same usage of same random matrix leads to different results
> ---
>
> Key: SYSTEMML-619
> URL: https://issues.apache.org/jira/browse/SYSTEMML-619
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>
> Interesting bug, as of [commit e16fe1d | 
> https://github.com/apache/incubator-systemml/commit/e16fe1df586371408a9dc3de29b13c98982ff57c]:
> Start by creating a random matrix {{X}}, multiplying it by 2 and assigning to 
> a variable {{dX}}, and then print the results:
> {code}
> # Generate data
> N = 3
> D = 2
> X = rand(rows=N, cols=D)
> # Function
> dX = 2 * X
> # Print elements of dX
> for (i in 1:nrow(dX)) {
>   for (j in 1:ncol(dX)) {
> print("dX["+i+","+j+"]: " + as.scalar(dX[i,j]))
>   }
> }
> print("")
> print("")
> print("")
> print("")
> {code}
> Output:
> {code}
> dX[1,1]: 1.0743268190621265
> dX[1,2]: 1.403590780383033
> dX[2,1]: 1.9404746268735837
> dX[2,2]: 0.8689030633611705
> dX[3,1]: 0.2589227727050818
> dX[3,2]: 

[jira] [Updated] (SYSTEMML-619) Same usage of same random matrix leads to different results

2016-04-08 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-619:
-
Fix Version/s: SystemML 0.9
   SystemML 0.10

> Same usage of same random matrix leads to different results
> ---
>
> Key: SYSTEMML-619
> URL: https://issues.apache.org/jira/browse/SYSTEMML-619
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.9, SystemML 0.10
>Reporter: Mike Dusenberry
>Assignee: Matthias Boehm
> Fix For: SystemML 0.9, SystemML 0.10
>
>
> Interesting bug, as of [commit e16fe1d | 
> https://github.com/apache/incubator-systemml/commit/e16fe1df586371408a9dc3de29b13c98982ff57c]:
> Start by creating a random matrix {{X}}, multiplying it by 2 and assigning to 
> a variable {{dX}}, and then print the results:
> {code}
> # Generate data
> N = 3
> D = 2
> X = rand(rows=N, cols=D)
> # Function
> dX = 2 * X
> # Print elements of dX
> for (i in 1:nrow(dX)) {
>   for (j in 1:ncol(dX)) {
> print("dX["+i+","+j+"]: " + as.scalar(dX[i,j]))
>   }
> }
> print("")
> print("")
> print("")
> print("")
> {code}
> Output:
> {code}
> dX[1,1]: 1.0743268190621265
> dX[1,2]: 1.403590780383033
> dX[2,1]: 1.9404746268735837
> dX[2,2]: 0.8689030633611705
> dX[3,1]: 0.2589227727050818
> dX[3,2]: 0.342402157694327
> {code}
> Now, copy and paste the assignment to {{dX}} and the print statement, thus 
> literally repeating the same code again.
> {code}
> # Generate data
> N = 3
> D = 2
> X = rand(rows=N, cols=D)
> # Function
> dX = 2 * X
> # Print elements of dX
> for (i in 1:nrow(dX)) {
>   for (j in 1:ncol(dX)) {
> print("dX["+i+","+j+"]: " + as.scalar(dX[i,j]))
>   }
> }
> print("")
> # Function
> dX = 2 * X
> # Print elements of dX
> for (i in 1:nrow(dX)) {
>   for (j in 1:ncol(dX)) {
> print("dX["+i+","+j+"]: " + as.scalar(dX[i,j]))
>   }
> }
> print("")
> print("")
> print("")
> print("")
> {code}
> Output:
> {code}
> dX[1,1]: 1.527333070705
> dX[1,2]: 1.951679510186853
> dX[2,1]: 0.9372371721327426
> dX[2,2]: 0.11462997451231827
> dX[3,1]: 0.3913879515630596
> dX[3,2]: 0.4411374996556454
> dX[1,1]: 0.15757825641372136
> dX[1,2]: 1.6331143898957619
> dX[2,1]: 0.7271506546939133
> dX[2,2]: 0.648694276576909
> dX[3,1]: 1.4763697903577369
> dX[3,2]: 1.2645782773949483
> {code}
> Notice that the outputs are different... magic!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-582) Determine If Multiple Builds Are Needed For Different Scala Versions.

2016-03-19 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-582:


 Summary: Determine If Multiple Builds Are Needed For Different 
Scala Versions.
 Key: SYSTEMML-582
 URL: https://issues.apache.org/jira/browse/SYSTEMML-582
 Project: SystemML
  Issue Type: New Feature
Reporter: Mike Dusenberry






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-540) Deep Learning

2016-03-19 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198220#comment-15198220
 ] 

Mike Dusenberry commented on SYSTEMML-540:
--

Update: I'm working on an experimental, layers-based framework directly in DML 
to contain layer abstractions with simple forward/backward APIs for affine, 
convolution (start with 2D), max-pooling, non-linearities (relu, sigmoid, 
softmax, etc.), dropout, loss functions, and other layers.  As part of this 
experiment, I'm starting by implementing as much as possible in DML, and then 
will move to built-in functions as necessary.

> Deep Learning
> -
>
> Key: SYSTEMML-540
> URL: https://issues.apache.org/jira/browse/SYSTEMML-540
> Project: SystemML
>  Issue Type: Epic
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>
> This epic covers the addition of deep learning to SystemML, including:
> * Core DML layer abstractions for deep (convolutional, recurrent) neural 
> nets, with simple forward/backward API: affine, convolution (start with 2D), 
> max-pooling, non-linearities (relu, sigmoid, softmax), dropout, loss 
> functions.
> * Modularized DML optimizers: (mini-batch, stochastic) gradient descent (w/ 
> momentum, etc.).
> * Additional DML language support as necessary (tensors, built-in functions 
> such as convolution, function pointers, list structures, etc.).
> * Integration with other deep learning frameworks (Caffe, Torch, Theano, 
> TensoFlow, etc.) via automatic DML code generation.
> * etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-580) Add Scala LogisticRegression API For Spark Pipeline

2016-03-19 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-580:


 Summary: Add Scala LogisticRegression API For Spark Pipeline
 Key: SYSTEMML-580
 URL: https://issues.apache.org/jira/browse/SYSTEMML-580
 Project: SystemML
  Issue Type: New Feature
Reporter: Tommy Yu
Assignee: Tommy Yu


I wrote a scala ml pipeline wrapper for LogisticRegression Model as a example 
for scala user.

I propose a scala version example since some weakness for java version.

It's not naturally to extend scala class in java code. We need know function 
style after compile, like
@Override
public void 
org$apache$spark$ml$param$shared$HasElasticNetParam$setter$elasticNetParam_$eq(DoubleParam
 arg0) {}

I assume it's set function, but do nothing here

Hard to follow ml parameter style, but define parameter like below

private IntParam icpt = new IntParam(this, "icpt", "Value of intercept");
private DoubleParam reg = new DoubleParam(this, "reg", "Value of regularization 
parameter");



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (SYSTEMML-580) Add Scala LogisticRegression API For Spark ML Pipeline

2016-03-19 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry resolved SYSTEMML-580.
--
Resolution: Fixed

> Add Scala LogisticRegression API For Spark ML Pipeline
> --
>
> Key: SYSTEMML-580
> URL: https://issues.apache.org/jira/browse/SYSTEMML-580
> Project: SystemML
>  Issue Type: New Feature
>  Components: APIs
>Reporter: Tommy Yu
>Assignee: Tommy Yu
>
> I wrote a scala ml pipeline wrapper for LogisticRegression Model as a example 
> for scala user.
> I propose a scala version example since some weakness for java version.
> It's not naturally to extend scala class in java code. We need know function 
> style after compile, like
> @Override
> public void 
> org$apache$spark$ml$param$shared$HasElasticNetParam$setter$elasticNetParam_$eq(DoubleParam
>  arg0) {}
> I assume it's set function, but do nothing here
> Hard to follow ml parameter style, but define parameter like below
> private IntParam icpt = new IntParam(this, "icpt", "Value of intercept");
> private DoubleParam reg = new DoubleParam(this, "reg", "Value of 
> regularization parameter");



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-580) Add Scala LogisticRegression API For Spark ML Pipeline

2016-03-19 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-580:
-
Summary: Add Scala LogisticRegression API For Spark ML Pipeline  (was: Add 
Scala LogisticRegression API For Spark Pipeline)

> Add Scala LogisticRegression API For Spark ML Pipeline
> --
>
> Key: SYSTEMML-580
> URL: https://issues.apache.org/jira/browse/SYSTEMML-580
> Project: SystemML
>  Issue Type: New Feature
>  Components: APIs
>Reporter: Tommy Yu
>Assignee: Tommy Yu
>
> I wrote a scala ml pipeline wrapper for LogisticRegression Model as a example 
> for scala user.
> I propose a scala version example since some weakness for java version.
> It's not naturally to extend scala class in java code. We need know function 
> style after compile, like
> @Override
> public void 
> org$apache$spark$ml$param$shared$HasElasticNetParam$setter$elasticNetParam_$eq(DoubleParam
>  arg0) {}
> I assume it's set function, but do nothing here
> Hard to follow ml parameter style, but define parameter like below
> private IntParam icpt = new IntParam(this, "icpt", "Value of intercept");
> private DoubleParam reg = new DoubleParam(this, "reg", "Value of 
> regularization parameter");



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-580) Add Scala LogisticRegression API For Spark ML Pipeline

2016-03-19 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197785#comment-15197785
 ] 

Mike Dusenberry commented on SYSTEMML-580:
--

[PR 70 | https://github.com/apache/incubator-systemml/pull/70] submitted.

> Add Scala LogisticRegression API For Spark ML Pipeline
> --
>
> Key: SYSTEMML-580
> URL: https://issues.apache.org/jira/browse/SYSTEMML-580
> Project: SystemML
>  Issue Type: New Feature
>  Components: APIs
>Reporter: Tommy Yu
>Assignee: Tommy Yu
>
> I wrote a scala ml pipeline wrapper for LogisticRegression Model as a example 
> for scala user.
> I propose a scala version example since some weakness for java version.
> It's not naturally to extend scala class in java code. We need know function 
> style after compile, like
> @Override
> public void 
> org$apache$spark$ml$param$shared$HasElasticNetParam$setter$elasticNetParam_$eq(DoubleParam
>  arg0) {}
> I assume it's set function, but do nothing here
> Hard to follow ml parameter style, but define parameter like below
> private IntParam icpt = new IntParam(this, "icpt", "Value of intercept");
> private DoubleParam reg = new DoubleParam(this, "reg", "Value of 
> regularization parameter");



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-580) Add Scala LogisticRegression API For Spark ML Pipeline

2016-03-20 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197797#comment-15197797
 ] 

Mike Dusenberry commented on SYSTEMML-580:
--

[PR 70 | https://github.com/apache/incubator-systemml/pull/70] merged as 
[commit 7ce19c8097f3d24d07be87d9427890834f9a9bea | 
https://github.com/apache/incubator-systemml/commit/7ce19c8097f3d24d07be87d9427890834f9a9bea].

> Add Scala LogisticRegression API For Spark ML Pipeline
> --
>
> Key: SYSTEMML-580
> URL: https://issues.apache.org/jira/browse/SYSTEMML-580
> Project: SystemML
>  Issue Type: New Feature
>  Components: APIs
>Reporter: Tommy Yu
>Assignee: Tommy Yu
>
> I wrote a scala ml pipeline wrapper for LogisticRegression Model as a example 
> for scala user.
> I propose a scala version example since some weakness for java version.
> It's not naturally to extend scala class in java code. We need know function 
> style after compile, like
> @Override
> public void 
> org$apache$spark$ml$param$shared$HasElasticNetParam$setter$elasticNetParam_$eq(DoubleParam
>  arg0) {}
> I assume it's set function, but do nothing here
> Hard to follow ml parameter style, but define parameter like below
> private IntParam icpt = new IntParam(this, "icpt", "Value of intercept");
> private DoubleParam reg = new DoubleParam(this, "reg", "Value of 
> regularization parameter");



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-588) Improve UDFs

2016-03-22 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-588:


 Summary: Improve UDFs
 Key: SYSTEMML-588
 URL: https://issues.apache.org/jira/browse/SYSTEMML-588
 Project: SystemML
  Issue Type: Epic
Reporter: Mike Dusenberry


This epic aims to improve the state of user-defined functions (UDFs) in DML & 
PyDML.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-587) Improvements Triggered By Deep Learning Work

2016-03-22 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-587:


 Summary: Improvements Triggered By Deep Learning Work
 Key: SYSTEMML-587
 URL: https://issues.apache.org/jira/browse/SYSTEMML-587
 Project: SystemML
  Issue Type: Umbrella
Reporter: Mike Dusenberry
Priority: Minor


This convenience umbrella tracks all improvements triggered by the work on deep 
learning (SYSTEMML-540), but not directly related to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-589) Add Default Parameter Values to UDFs

2016-03-22 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-589:


 Summary: Add Default Parameter Values to UDFs
 Key: SYSTEMML-589
 URL: https://issues.apache.org/jira/browse/SYSTEMML-589
 Project: SystemML
  Issue Type: Sub-task
Reporter: Mike Dusenberry


This task aims to add default parameter values to UDFs for scalar and boolean 
types.  There may already be runtime support, but the grammar does not seem to 
allow it.

Example that currently works:
{code}
script = """
f = function(double x, int a) return (double ans) {
  ans = a * x
}

ans = f(3, 1)
print(ans)
"""
ml.reset()
ml.executeScript(script)
{code}

Example that would be nice:
{code}
script = """
f = function(double x, int a=1) return (double ans) {
  ans = a * x
}

ans = f(3)
print(ans)
"""
ml.reset()
ml.executeScript(script)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-590) Assume Parent's Namespace for Nested UDF calls.

2016-03-22 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-590:
-
Description: 
Currently, if a UDF body involves calling another UDF, the default global 
namespace is assumed, unless a namespace is explicitly indicated.  This becomes 
a problem when a file contains UDFs, and is then sourced from another script.

Imagine a file {{funcs.dml}} as follows:
{code}
f = function(double x, int a) return (double ans) {
  x2 = g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Then, let's try to call {{f}}:
{code}
script = """
source ("funcs.dml") as funcs

ans = funcs::f(3, 1)
print(ans)
"""
ml.reset()
ml.executeScript(script)
{code}

This results in an error since {{f}} is in the {{funcs}} namespace, but the 
call to {{g}} assumes {{g}} is still in the default namespace.  Clearly, the 
user intends to the use the {{g}} that is located in the same file.

Currently, we would need to adjust {{funcs.dml}} as follows to explicitly 
assume that {{f}} and {{g}} are in a {{funcs}} namespace:
{code}
f = function(double x, int a) return (double ans) {
  x2 = funcs::g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Instead, it would be better to simply first look for {{g}} in its parent's 
namespace.  In this case, the "parent" would be the function {{f}}, and the 
namespace we have selected is {{funcs}}, although that choice would be left up 
to the end-user.  Then, namespace assumptions would not be necessary.

  was:
Currently, if a UDF body involves calling another UDF, the default global 
namespace is assumed, unless a namespace is explicitly indicated.  This becomes 
a problem when a file contains UDFs, and is then sourced from another script.

Imagine a file {{funcs.dml}} as follows:
{code}
f = function(double x, int a) return (double ans) {
  x2 = g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Then, let's try to call {{f}}:
{code}
script = """
source ("funcs.dml") as funcs

ans = funcs::f(3, 1)
print(ans)
"""
ml.reset()
ml.executeScript(script)
{code}

This results in an error since {{f}} is in the {{funcs}} namespace, but the 
call to {{g}} assumes {{g}} is still in the default namespace.  Clearly, the 
user intends to the use the {{g}} that is located in the same file.

Currently, we would need to adjust {{funcs.dml}} as follows to explicitly 
assume that {{f}} and {{g}} are in a {{funcs}} namespace:
{code}
f = function(double x, int a) return (double ans) {
  x2 = funcs::g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Instead, it would be better to simply first look for {{g}} in its parent's 
namespace.  In this case, the "parent" would be the function {{f}}, and the 
namespace we have selected is {{funcs}}.  Then, namespace assumptions would not 
be necessary.


> Assume Parent's Namespace for Nested UDF calls.
> ---
>
> Key: SYSTEMML-590
> URL: https://issues.apache.org/jira/browse/SYSTEMML-590
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Mike Dusenberry
>
> Currently, if a UDF body involves calling another UDF, the default global 
> namespace is assumed, unless a namespace is explicitly indicated.  This 
> becomes a problem when a file contains UDFs, and is then sourced from another 
> script.
> Imagine a file {{funcs.dml}} as follows:
> {code}
> f = function(double x, int a) return (double ans) {
>   x2 = g(x)
>   ans = a * x2
> }
> g = function(double x) return (double ans) {
>   ans = x * x
> }
> {code}
> Then, let's try to call {{f}}:
> {code}
> script = """
> source ("funcs.dml") as funcs
> ans = funcs::f(3, 1)
> print(ans)
> """
> ml.reset()
> ml.executeScript(script)
> {code}
> This results in an error since {{f}} is in the {{funcs}} namespace, but the 
> call to {{g}} assumes {{g}} is still in the default namespace.  Clearly, the 
> user intends to the use the {{g}} that is located in the same file.
> Currently, we would need to adjust {{funcs.dml}} as follows to explicitly 
> assume that {{f}} and {{g}} are in a {{funcs}} namespace:
> {code}
> f = function(double x, int a) return (double ans) {
>   x2 = funcs::g(x)
>   ans = a * x2
> }
> g = function(double x) return (double ans) {
>   ans = x * x
> }
> {code}
> Instead, it would be better to simply first look for {{g}} in its parent's 
> namespace.  In this case, the "parent" would be the function {{f}}, and the 
> namespace we have selected is {{funcs}}, although that choice would be left 
> up to the end-user.  Then, namespace assumptions would not be necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-590) Assume Parent's Namespace for Nested UDF calls.

2016-03-22 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-590:
-
Description: 
Currently, if a UDF body involves calling another UDF, the default global 
namespace is assumed, unless a namespace is explicitly indicated.  This becomes 
a problem when a file contains UDFs, and is then sourced from another script.

Imagine a file {{funcs.dml}} as follows:
{code}
f = function(double x, int a) return (double ans) {
  x2 = g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Then, let's try to call {{f}}:
{code}
script = """
source ("funcs.dml") as funcs

ans = funcs::f(3, 1)
print(ans)
"""
ml.reset()
ml.executeScript(script)
{code}

This results in an error since {{f}} is in the {{funcs}} namespace, but the 
call to {{g}} assumes {{g}} is still in the default namespace.  Clearly, the 
user intends to the use the {{g}} that is located in the same file.

Currently, we would need to adjust {{funcs.dml}} as follows to explicitly 
assume that {{f}} and {{g}} are in a {{funcs}} namespace:
{code}
f = function(double x, int a) return (double ans) {
  x2 = funcs::g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Instead, it would be better to simply first look for {{g}} in its parent's 
namespace.  In this case, the "parent" would be the function {{f}}, and the 
namespace we have selected is {{funcs}}.  Then, namespace assumptions would not 
be necessary.

  was:
Currently, if a UDF body involves calling another UDF, the default global 
namespace is assumed, unless a namespace is explicitly indicated.  This becomes 
a problem when a file contains UDFs, and is then sourced from another script.

Imagine a file {{funcs.dml}} as follows:
{code}
f = function(double x, int a) return (double ans) {
  x2 = g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Then, let's try to call {{f}}:
{code}
script = """
source ("funcs.dml") as funcs

ans = funcs::f(3, 1)
print(ans)
"""
ml.reset()
ml.executeScript(script)
{code}

This results in an error since {{f}} is in the {{funcs}} namespace, but the 
call to {{g}} assumes {{g}} is still in the default namespace.  Clearly, the 
user intends to the use the {{g}} that is located in the same file.

Currently, we would need to adjust {{funcs.dml}} as follows to explicitly 
assume that {{f}} and {{g}} are in a {{funcs}} namespace:
{code}
f = function(double x, int a) return (double ans) {


f = function(double x, int a) return (double ans) {
  x2 = funcs::g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Instead, it would be better to simply first look for {{g}} in its parent's 
namespace.  In this case, the "parent" would be the function {{f}}, and the 
namespace we have selected is {{funcs}}.  Then, namespace assumptions would not 
be necessary.


> Assume Parent's Namespace for Nested UDF calls.
> ---
>
> Key: SYSTEMML-590
> URL: https://issues.apache.org/jira/browse/SYSTEMML-590
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Mike Dusenberry
>
> Currently, if a UDF body involves calling another UDF, the default global 
> namespace is assumed, unless a namespace is explicitly indicated.  This 
> becomes a problem when a file contains UDFs, and is then sourced from another 
> script.
> Imagine a file {{funcs.dml}} as follows:
> {code}
> f = function(double x, int a) return (double ans) {
>   x2 = g(x)
>   ans = a * x2
> }
> g = function(double x) return (double ans) {
>   ans = x * x
> }
> {code}
> Then, let's try to call {{f}}:
> {code}
> script = """
> source ("funcs.dml") as funcs
> ans = funcs::f(3, 1)
> print(ans)
> """
> ml.reset()
> ml.executeScript(script)
> {code}
> This results in an error since {{f}} is in the {{funcs}} namespace, but the 
> call to {{g}} assumes {{g}} is still in the default namespace.  Clearly, the 
> user intends to the use the {{g}} that is located in the same file.
> Currently, we would need to adjust {{funcs.dml}} as follows to explicitly 
> assume that {{f}} and {{g}} are in a {{funcs}} namespace:
> {code}
> f = function(double x, int a) return (double ans) {
>   x2 = funcs::g(x)
>   ans = a * x2
> }
> g = function(double x) return (double ans) {
>   ans = x * x
> }
> {code}
> Instead, it would be better to simply first look for {{g}} in its parent's 
> namespace.  In this case, the "parent" would be the function {{f}}, and the 
> namespace we have selected is {{funcs}}.  Then, namespace assumptions would 
> not be necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-590) Assume Parent's Namespace for Nested UDF calls.

2016-03-22 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-590:


 Summary: Assume Parent's Namespace for Nested UDF calls.
 Key: SYSTEMML-590
 URL: https://issues.apache.org/jira/browse/SYSTEMML-590
 Project: SystemML
  Issue Type: Sub-task
Reporter: Mike Dusenberry


Currently, if a UDF body involves calling another UDF, the default, global 
namespace is assumed, unless a namespace is explicitly indicated.  This becomes 
a problem when a file contains UDFs, and is then sourced from another file.

Imagine a file {{funcs.dml}} as follows:
{code}
f = function(double x, int a) return (double ans) {
  x2 = g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Then, let's try to call {{f}}:
{code}
script = """
source ("funcs.dml") as funcs

ans = funcs::f(3, 1)
print(ans)
"""
ml.reset()
ml.executeScript(script)
{code}

This results in an error since {{f}} is in the {{funcs}} namespace, but the 
call to {{g}} assumes {{g}} is still in the default namespace.  Clearly, the 
user intends to the use the {{g}} that is located in the same file.

Currently, we would need to adjust {{funcs.dml}} as follows to explicitly 
assume that {{f}} and {{g}} are in a {{funcs}} namespace:
{code}
f = function(double x, int a) return (double ans) {


f = function(double x, int a) return (double ans) {
  x2 = funcs::g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Instead, it would be better to simply first look for {{g}} in its parent's 
namespace.  In this case, the "parent" would be the function {{f}}, and the 
namespace we have selected is {{funcs}}.  Then, namespace assumptions would not 
be necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-590) Assume Parent's Namespace for Nested UDF calls.

2016-03-22 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-590:
-
Description: 
Currently, if a UDF body involves calling another UDF, the default global 
namespace is assumed, unless a namespace is explicitly indicated.  This becomes 
a problem when a file contains UDFs, and is then sourced from another script.

Imagine a file {{funcs.dml}} as follows:
{code}
f = function(double x, int a) return (double ans) {
  x2 = g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Then, let's try to call {{f}}:
{code}
script = """
source ("funcs.dml") as funcs

ans = funcs::f(3, 1)
print(ans)
"""
ml.reset()
ml.executeScript(script)
{code}

This results in an error since {{f}} is in the {{funcs}} namespace, but the 
call to {{g}} assumes {{g}} is still in the default namespace.  Clearly, the 
user intends to the use the {{g}} that is located in the same file.

Currently, we would need to adjust {{funcs.dml}} as follows to explicitly 
assume that {{f}} and {{g}} are in a {{funcs}} namespace:
{code}
f = function(double x, int a) return (double ans) {


f = function(double x, int a) return (double ans) {
  x2 = funcs::g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Instead, it would be better to simply first look for {{g}} in its parent's 
namespace.  In this case, the "parent" would be the function {{f}}, and the 
namespace we have selected is {{funcs}}.  Then, namespace assumptions would not 
be necessary.

  was:
Currently, if a UDF body involves calling another UDF, the default, global 
namespace is assumed, unless a namespace is explicitly indicated.  This becomes 
a problem when a file contains UDFs, and is then sourced from another file.

Imagine a file {{funcs.dml}} as follows:
{code}
f = function(double x, int a) return (double ans) {
  x2 = g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Then, let's try to call {{f}}:
{code}
script = """
source ("funcs.dml") as funcs

ans = funcs::f(3, 1)
print(ans)
"""
ml.reset()
ml.executeScript(script)
{code}

This results in an error since {{f}} is in the {{funcs}} namespace, but the 
call to {{g}} assumes {{g}} is still in the default namespace.  Clearly, the 
user intends to the use the {{g}} that is located in the same file.

Currently, we would need to adjust {{funcs.dml}} as follows to explicitly 
assume that {{f}} and {{g}} are in a {{funcs}} namespace:
{code}
f = function(double x, int a) return (double ans) {


f = function(double x, int a) return (double ans) {
  x2 = funcs::g(x)
  ans = a * x2
}

g = function(double x) return (double ans) {
  ans = x * x
}
{code}

Instead, it would be better to simply first look for {{g}} in its parent's 
namespace.  In this case, the "parent" would be the function {{f}}, and the 
namespace we have selected is {{funcs}}.  Then, namespace assumptions would not 
be necessary.


> Assume Parent's Namespace for Nested UDF calls.
> ---
>
> Key: SYSTEMML-590
> URL: https://issues.apache.org/jira/browse/SYSTEMML-590
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Mike Dusenberry
>
> Currently, if a UDF body involves calling another UDF, the default global 
> namespace is assumed, unless a namespace is explicitly indicated.  This 
> becomes a problem when a file contains UDFs, and is then sourced from another 
> script.
> Imagine a file {{funcs.dml}} as follows:
> {code}
> f = function(double x, int a) return (double ans) {
>   x2 = g(x)
>   ans = a * x2
> }
> g = function(double x) return (double ans) {
>   ans = x * x
> }
> {code}
> Then, let's try to call {{f}}:
> {code}
> script = """
> source ("funcs.dml") as funcs
> ans = funcs::f(3, 1)
> print(ans)
> """
> ml.reset()
> ml.executeScript(script)
> {code}
> This results in an error since {{f}} is in the {{funcs}} namespace, but the 
> call to {{g}} assumes {{g}} is still in the default namespace.  Clearly, the 
> user intends to the use the {{g}} that is located in the same file.
> Currently, we would need to adjust {{funcs.dml}} as follows to explicitly 
> assume that {{f}} and {{g}} are in a {{funcs}} namespace:
> {code}
> f = function(double x, int a) return (double ans) {
> f = function(double x, int a) return (double ans) {
>   x2 = funcs::g(x)
>   ans = a * x2
> }
> g = function(double x) return (double ans) {
>   ans = x * x
> }
> {code}
> Instead, it would be better to simply first look for {{g}} in its parent's 
> namespace.  In this case, the "parent" would be the function {{f}}, and the 
> namespace we have selected is {{funcs}}.  Then, namespace assumptions would 
> not be necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-590) Assume Parent's Namespace for Nested UDF calls.

2016-03-23 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208761#comment-15208761
 ] 

Mike Dusenberry commented on SYSTEMML-590:
--

[~mboehm7] Great, I'm glad we're in agreement regarding the importance of this, 
and I agree that it could be broadened to a general improvement of the concept 
of importing other files.

Things I would like to see in an improved notion of sourcing/imports:
1. Handling of source statements at the top of a file that is itself sourced, 
without any naming issues.  A user should only have to import the file he or 
she is interested in using, irregardless of whatever dependencies that file may 
itself have, and the namespace names that the user selects should not interfere 
with the names used in the file itself for its dependencies.
2. As this JIRA issue already points out, UDF calls in a file should default to 
functions defined in that file, unless a namespace is explicitly selected.  
Additionally, any namespace selected in this manner should correspond to a 
source statement at the top of the file, and as stated in (1), the source 
statements should be executed if this file is itself sourced, without conflicts 
to the parent environment.
2. If we do make changes towards an improved sourcing/importing system, I would 
prefer *no* additional boilerplate code, such as Java-style "package" 
statements at the top of files.  Like the current sourcing works, and quite 
similar to Python 3, modules/packages should be linked to the filename itself, 
keeping the DML within the file simple.

> Assume Parent's Namespace for Nested UDF calls.
> ---
>
> Key: SYSTEMML-590
> URL: https://issues.apache.org/jira/browse/SYSTEMML-590
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Mike Dusenberry
>
> Currently, if a UDF body involves calling another UDF, the default global 
> namespace is assumed, unless a namespace is explicitly indicated.  This 
> becomes a problem when a file contains UDFs, and is then sourced from another 
> script.
> Imagine a file {{funcs.dml}} as follows:
> {code}
> f = function(double x, int a) return (double ans) {
>   x2 = g(x)
>   ans = a * x2
> }
> g = function(double x) return (double ans) {
>   ans = x * x
> }
> {code}
> Then, let's try to call {{f}}:
> {code}
> script = """
> source ("funcs.dml") as funcs
> ans = funcs::f(3, 1)
> print(ans)
> """
> ml.reset()
> ml.executeScript(script)
> {code}
> This results in an error since {{f}} is in the {{funcs}} namespace, but the 
> call to {{g}} assumes {{g}} is still in the default namespace.  Clearly, the 
> user intends to the use the {{g}} that is located in the same file.
> Currently, we would need to adjust {{funcs.dml}} as follows to explicitly 
> assume that {{f}} and {{g}} are in a {{funcs}} namespace:
> {code}
> f = function(double x, int a) return (double ans) {
>   x2 = funcs::g(x)
>   ans = a * x2
> }
> g = function(double x) return (double ans) {
>   ans = x * x
> }
> {code}
> Instead, it would be better to simply first look for {{g}} in its parent's 
> namespace.  In this case, the "parent" would be the function {{f}}, and the 
> namespace we have selected is {{funcs}}, although that choice would be left 
> up to the end-user.  Then, namespace assumptions would not be necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-592) GSoC 2016 Project Ideas - SystemML

2016-03-23 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-592:
-
Description: 
This JIRA issue serves to hold discussions regarding potential Google Summer of 
Code 2016 project ideas.  If you have an idea, please reach out!

Possible areas of focus:

* ML Algorithms
* Engine (distributed computing, performance, etc.)
* Continued integration with Spark & PySpark
* Deep language integration (Python, etc.)
* Others!

  was:
This JIRA issue serves to hold discussions regarding potential Google Summer of 
Code 2016 project ideas.  If you have an idea, please reach out!

Possible areas of focus:

* ML Algorithms
* Engine
* Integration with Spark & PySpark; deeper language integration with Python
* Others!


> GSoC 2016 Project Ideas - SystemML
> --
>
> Key: SYSTEMML-592
> URL: https://issues.apache.org/jira/browse/SYSTEMML-592
> Project: SystemML
>  Issue Type: Brainstorming
>Reporter: Mike Dusenberry
>  Labels: gsoc2016
>
> This JIRA issue serves to hold discussions regarding potential Google Summer 
> of Code 2016 project ideas.  If you have an idea, please reach out!
> Possible areas of focus:
> * ML Algorithms
> * Engine (distributed computing, performance, etc.)
> * Continued integration with Spark & PySpark
> * Deep language integration (Python, etc.)
> * Others!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-592) GSoC 2016 Project Ideas - SystemML

2016-03-23 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-592:
-
Summary: GSoC 2016 Project Ideas - SystemML  (was: GSoC 2016 Project Ideas)

> GSoC 2016 Project Ideas - SystemML
> --
>
> Key: SYSTEMML-592
> URL: https://issues.apache.org/jira/browse/SYSTEMML-592
> Project: SystemML
>  Issue Type: Brainstorming
>Reporter: Mike Dusenberry
>  Labels: gsoc2016
>
> This JIRA issue serves to hold discussions regarding potential Google Summer 
> of Code 2016 project ideas.  If you have an idea, please reach out!
> Possible areas of focus:
> * ML Algorithms
> * Engine
> * Integration with Spark & PySpark; deeper language integration with Python
> * Others!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-592) GSoC 2016 Project Ideas - SystemML

2016-03-23 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-592:
-
Description: 
This JIRA issue serves to hold discussions regarding potential Google Summer of 
Code 2016 project ideas.  If you have an idea, please reach out!

Possible areas of focus:

* ML Algorithms
* Engine (parser, compiler, runtime; distributed computing, performance, etc.)
* Continued integration with Spark & PySpark
* Deep language integration (Python, etc.)
* Others!

  was:
This JIRA issue serves to hold discussions regarding potential Google Summer of 
Code 2016 project ideas.  If you have an idea, please reach out!

Possible areas of focus:

* ML Algorithms
* Engine (distributed computing, performance, etc.)
* Continued integration with Spark & PySpark
* Deep language integration (Python, etc.)
* Others!


> GSoC 2016 Project Ideas - SystemML
> --
>
> Key: SYSTEMML-592
> URL: https://issues.apache.org/jira/browse/SYSTEMML-592
> Project: SystemML
>  Issue Type: Brainstorming
>Reporter: Mike Dusenberry
>  Labels: gsoc2016
>
> This JIRA issue serves to hold discussions regarding potential Google Summer 
> of Code 2016 project ideas.  If you have an idea, please reach out!
> Possible areas of focus:
> * ML Algorithms
> * Engine (parser, compiler, runtime; distributed computing, performance, etc.)
> * Continued integration with Spark & PySpark
> * Deep language integration (Python, etc.)
> * Others!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-592) GSoC 2016 Project Ideas

2016-03-23 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-592:
-
Labels: gsoc2016  (was: )

> GSoC 2016 Project Ideas
> ---
>
> Key: SYSTEMML-592
> URL: https://issues.apache.org/jira/browse/SYSTEMML-592
> Project: SystemML
>  Issue Type: Brainstorming
>Reporter: Mike Dusenberry
>  Labels: gsoc2016
>
> This JIRA issue serves to hold discussions regarding potential Google Summer 
> of Code 2016 project ideas.  If you have an idea, please reach out!
> Possible areas of focus:
> * ML Algorithms
> * Engine
> * Integration with Spark & PySpark; deeper language integration with Python
> * Others!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-597) Random Forest Execution Fails

2016-03-25 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-597:


 Summary: Random Forest Execution Fails
 Key: SYSTEMML-597
 URL: https://issues.apache.org/jira/browse/SYSTEMML-597
 Project: SystemML
  Issue Type: Bug
Affects Versions: SystemML 0.9
Reporter: Mike Dusenberry


An issue was raised with running our random forest algorithm on SystemML 0.9 
via MLContext with Scala Spark on a cluster.  The following example runs on a 
local machine (on bleeding-edge), but not on the given cluster.

Notice the error involves {{distinct_values_offset}}.  That variable should 
only be used if `num_cat_features` is > 0, but based on line 106 
(https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/random-forest.dml#L106)
 the else clause is used (because no R file), and {{num_cat_features}} is set 
to 0.  Also, based on the above line 106 (and no R file) that variable should 
never be initialized either.

Code:
{code}
// Create a SystemML context
import org.apache.sysml.api.MLContext
val ml = new MLContext(sc)

// Generate random data
val X = sc.parallelize(Seq((0.3, 0.1, 0.5), (0.3, 1.0, 0.6), (0.7, 0.8, 1.0), 
(0.3, 0.1, 0.1), (0.5, 0.8, 0.5))).toDF
val Y = sc.parallelize(Seq((1, 0, 0), (0, 1, 0), (0, 0, 1), (0, 1, 0), (1, 0, 
0))).toDF

// Register inputs & outputs
ml.reset()
ml.registerInput("X", X)
ml.registerInput("Y_bin", Y)
ml.registerOutput("M")

// Run the script
val nargs = Map("X" -> "", "Y" -> "", "M" -> "")
val outputs = 
ml.execute("/home/biadmin/spark-enablement/installs/SystemML/algorithms/random-forest.dml",
 nargs)
val M = outputs.getDF(sqlContext, "M")
{code}

Output:
{code}
import org.apache.sysml.api.MLContext
ml: org.apache.sysml.api.MLContext = org.apache.sysml.api.MLContext@677d8e36
X: org.apache.spark.sql.DataFrame = [_1: double, _2: double, _3: double]
Y: org.apache.spark.sql.DataFrame = [_1: int, _2: int, _3: int]
nargs: scala.collection.immutable.Map[String,String] = Map(X -> "", Y -> "", M 
-> "")
org.apache.sysml.runtime.DMLRuntimeException: 
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while 
program block generated from while statement block between lines 263 and 1167 
-- Error evaluating while program block.
at 
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:153)
at 
org.apache.sysml.api.MLContext.executeUsingSimplifiedCompilationChain(MLContext.java:1337)
at 
org.apache.sysml.api.MLContext.compileAndExecuteScript(MLContext.java:1203)
at 
org.apache.sysml.api.MLContext.compileAndExecuteScript(MLContext.java:1149)
at org.apache.sysml.api.MLContext.execute(MLContext.java:631)
at org.apache.sysml.api.MLContext.execute(MLContext.java:666)
at org.apache.sysml.api.MLContext.execute(MLContext.java:679)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:79)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:84)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:86)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:88)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:90)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:92)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:94)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:96)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:98)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:100)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:102)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:104)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:106)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:108)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:110)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:112)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:114)
at 

[jira] [Comment Edited] (SYSTEMML-594) Add Jupyter Example Notebook

2016-03-25 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212266#comment-15212266
 ] 

Mike Dusenberry edited comment on SYSTEMML-594 at 3/25/16 7:07 PM:
---

cc [~reinw...@us.ibm.com]


was (Author: mwdus...@us.ibm.com):
cc [~reinw...@us.ibm.com]]

> Add Jupyter Example Notebook
> 
>
> Key: SYSTEMML-594
> URL: https://issues.apache.org/jira/browse/SYSTEMML-594
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-594) Add Jupyter Example Notebook

2016-03-25 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212266#comment-15212266
 ] 

Mike Dusenberry commented on SYSTEMML-594:
--

cc [~reinw...@us.ibm.com]]

> Add Jupyter Example Notebook
> 
>
> Key: SYSTEMML-594
> URL: https://issues.apache.org/jira/browse/SYSTEMML-594
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (SYSTEMML-594) Add Jupyter Example Notebook

2016-03-25 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry closed SYSTEMML-594.


> Add Jupyter Example Notebook
> 
>
> Key: SYSTEMML-594
> URL: https://issues.apache.org/jira/browse/SYSTEMML-594
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (SYSTEMML-594) Add Jupyter Example Notebook

2016-03-25 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212262#comment-15212262
 ] 

Mike Dusenberry edited comment on SYSTEMML-594 at 3/25/16 7:05 PM:
---

Guide available [here | 
http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html#jupyter-pyspark-notebook-example---poisson-nonnegative-matrix-factorization].


was (Author: mwdus...@us.ibm.com):
Guide available [here | 
http://mikedusenberry.com/incubator-systemml/spark-mlcontext-programming-guide.html#jupyter-pyspark-notebook-example---poisson-nonnegative-matrix-factorization].

> Add Jupyter Example Notebook
> 
>
> Key: SYSTEMML-594
> URL: https://issues.apache.org/jira/browse/SYSTEMML-594
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-594) Add Jupyter Example Notebook

2016-03-25 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212262#comment-15212262
 ] 

Mike Dusenberry commented on SYSTEMML-594:
--

Guide available [here | 
http://mikedusenberry.com/incubator-systemml/spark-mlcontext-programming-guide.html#jupyter-pyspark-notebook-example---poisson-nonnegative-matrix-factorization].

> Add Jupyter Example Notebook
> 
>
> Key: SYSTEMML-594
> URL: https://issues.apache.org/jira/browse/SYSTEMML-594
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-597) Random Forest Execution Fails

2016-03-25 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212409#comment-15212409
 ] 

Mike Dusenberry commented on SYSTEMML-597:
--

cc [~mboehm7], [~acs_s]

> Random Forest Execution Fails
> -
>
> Key: SYSTEMML-597
> URL: https://issues.apache.org/jira/browse/SYSTEMML-597
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.9
>Reporter: Mike Dusenberry
>
> An issue was raised with running our random forest algorithm on SystemML 0.9 
> via MLContext with Scala Spark on a cluster.  The following example runs on a 
> local machine (on bleeding-edge), but not on the given cluster.
> Notice the error involves {{distinct_values_offset}}.  That variable should 
> only be used if `num_cat_features` is > 0, but based on line 106 
> (https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/random-forest.dml#L106)
>  the else clause is used (because no R file), and {{num_cat_features}} is set 
> to 0.  Also, based on the above line 106 (and no R file) that variable should 
> never be initialized either.
> Code:
> {code}
> // Create a SystemML context
> import org.apache.sysml.api.MLContext
> val ml = new MLContext(sc)
> // Generate random data
> val X = sc.parallelize(Seq((0.3, 0.1, 0.5), (0.3, 1.0, 0.6), (0.7, 0.8, 1.0), 
> (0.3, 0.1, 0.1), (0.5, 0.8, 0.5))).toDF
> val Y = sc.parallelize(Seq((1, 0, 0), (0, 1, 0), (0, 0, 1), (0, 1, 0), (1, 0, 
> 0))).toDF
> // Register inputs & outputs
> ml.reset()
> ml.registerInput("X", X)
> ml.registerInput("Y_bin", Y)
> ml.registerOutput("M")
> // Run the script
> val nargs = Map("X" -> "", "Y" -> "", "M" -> "")
> val outputs = 
> ml.execute("/home/biadmin/spark-enablement/installs/SystemML/algorithms/random-forest.dml",
>  nargs)
> val M = outputs.getDF(sqlContext, "M")
> {code}
> Output:
> {code}
> import org.apache.sysml.api.MLContext
> ml: org.apache.sysml.api.MLContext = org.apache.sysml.api.MLContext@677d8e36
> X: org.apache.spark.sql.DataFrame = [_1: double, _2: double, _3: double]
> Y: org.apache.spark.sql.DataFrame = [_1: int, _2: int, _3: int]
> nargs: scala.collection.immutable.Map[String,String] = Map(X -> "", Y -> "", 
> M -> "")
> org.apache.sysml.runtime.DMLRuntimeException: 
> org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while 
> program block generated from while statement block between lines 263 and 1167 
> -- Error evaluating while program block.
>   at 
> org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:153)
>   at 
> org.apache.sysml.api.MLContext.executeUsingSimplifiedCompilationChain(MLContext.java:1337)
>   at 
> org.apache.sysml.api.MLContext.compileAndExecuteScript(MLContext.java:1203)
>   at 
> org.apache.sysml.api.MLContext.compileAndExecuteScript(MLContext.java:1149)
>   at org.apache.sysml.api.MLContext.execute(MLContext.java:631)
>   at org.apache.sysml.api.MLContext.execute(MLContext.java:666)
>   at org.apache.sysml.api.MLContext.execute(MLContext.java:679)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:79)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:84)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:86)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:88)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:90)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:92)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:94)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:96)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:98)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:100)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:102)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:104)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:106)
>   at 
> 

[jira] [Resolved] (SYSTEMML-577) Add High-Level "executeScript" API to Python MLContext

2016-03-24 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry resolved SYSTEMML-577.
--
Resolution: Fixed

> Add High-Level "executeScript" API to Python MLContext
> --
>
> Key: SYSTEMML-577
> URL: https://issues.apache.org/jira/browse/SYSTEMML-577
> Project: SystemML
>  Issue Type: Improvement
>  Components: APIs
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
>
> This adds the {{executeScript(...)}} function to the Python MLContext API, 
> and in the process hides the need to use {{registerInput(...)}} and 
> {{registerOutput(...)}} by allowing the user to pass in a dictionary of 
> key:value inputs of any type, and an array of outputs to keep.
> Example:
> {code}
> pnmf = """ // script here """
> outputs = ml.executeScript(pnmf, {"X": X_train, "maxiter": 100, "rank": 10}, 
> ["W", "H", "negloglik"])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (SYSTEMML-577) Add High-Level "executeScript" API to Python MLContext

2016-03-24 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry closed SYSTEMML-577.


> Add High-Level "executeScript" API to Python MLContext
> --
>
> Key: SYSTEMML-577
> URL: https://issues.apache.org/jira/browse/SYSTEMML-577
> Project: SystemML
>  Issue Type: Improvement
>  Components: APIs
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
>
> This adds the {{executeScript(...)}} function to the Python MLContext API, 
> and in the process hides the need to use {{registerInput(...)}} and 
> {{registerOutput(...)}} by allowing the user to pass in a dictionary of 
> key:value inputs of any type, and an array of outputs to keep.
> Example:
> {code}
> pnmf = """ // script here """
> outputs = ml.executeScript(pnmf, {"X": X_train, "maxiter": 100, "rank": 10}, 
> ["W", "H", "negloglik"])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-577) Add High-Level "executeScript" API to Python MLContext

2016-03-24 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211104#comment-15211104
 ] 

Mike Dusenberry commented on SYSTEMML-577:
--

[PR 91 | https://github.com/apache/incubator-systemml/pull/91] submitted.

> Add High-Level "executeScript" API to Python MLContext
> --
>
> Key: SYSTEMML-577
> URL: https://issues.apache.org/jira/browse/SYSTEMML-577
> Project: SystemML
>  Issue Type: Improvement
>  Components: APIs
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
>
> This adds the {{executeScript(...)}} function to the Python MLContext API, 
> and in the process hides the need to use {{registerInput(...)}} and 
> {{registerOutput(...)}} by allowing the user to pass in a dictionary of 
> key:value inputs of any type, and an array of outputs to keep.
> Example:
> {code}
> pnmf = """ // script here """
> outputs = ml.executeScript(pnmf, {"X": X_train, "maxiter": 100, "rank": 10}, 
> ["W", "H", "negloglik"])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-577) Add High-Level "executeScript" API to Python MLContext

2016-03-24 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211105#comment-15211105
 ] 

Mike Dusenberry commented on SYSTEMML-577:
--

Merged via [commit c26a3b030ea5837f5d809a08fa8b4d13c90a4617 | 
https://github.com/apache/incubator-systemml/commit/c26a3b030ea5837f5d809a08fa8b4d13c90a4617].

> Add High-Level "executeScript" API to Python MLContext
> --
>
> Key: SYSTEMML-577
> URL: https://issues.apache.org/jira/browse/SYSTEMML-577
> Project: SystemML
>  Issue Type: Improvement
>  Components: APIs
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
>
> This adds the {{executeScript(...)}} function to the Python MLContext API, 
> and in the process hides the need to use {{registerInput(...)}} and 
> {{registerOutput(...)}} by allowing the user to pass in a dictionary of 
> key:value inputs of any type, and an array of outputs to keep.
> Example:
> {code}
> pnmf = """ // script here """
> outputs = ml.executeScript(pnmf, {"X": X_train, "maxiter": 100, "rank": 10}, 
> ["W", "H", "negloglik"])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (SYSTEMML-509) Add "executeScript" To The Python MLContext API

2016-03-24 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry resolved SYSTEMML-509.
--
Resolution: Fixed
  Assignee: Mike Dusenberry

> Add "executeScript" To The Python MLContext API 
> 
>
> Key: SYSTEMML-509
> URL: https://issues.apache.org/jira/browse/SYSTEMML-509
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (SYSTEMML-510) Generalized wdivmm w/ eps all patterns

2016-03-24 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry reassigned SYSTEMML-510:


Assignee: Glenn Weidner

> Generalized wdivmm w/ eps all patterns
> --
>
> Key: SYSTEMML-510
> URL: https://issues.apache.org/jira/browse/SYSTEMML-510
> Project: SystemML
>  Issue Type: Task
>  Components: Compiler, Parser, Runtime
>Reporter: Mike Dusenberry
>Assignee: Glenn Weidner
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> If we look at the inner loop of Poisson nonnegative matrix factorization 
> (PNMF) in general, we update the factors as 
> {code}
> H = (H * (t(W) %*% (V/(W%*%H + 1e-17/t(colSums(W)
> W = (W * ((V/(W%*%H + 1e-17)) %*% t(H)))/t(rowSums(H))
> {code}.
> Notice the addition of the "1e-17" epsilon term in the denominators.  
> Mathematically, we need this in case any cell of {code}W%*%H{code} evaluates 
> to zero so that we can avoid dividing by zero.  R needs this, but SystemML 
> technically does not due to a fused operator, "wdivmm", that takes care of 
> these situations (or this may be done in the general case?).  This fused 
> operator is currently applied to the pattern {code}t(W) %*% (V / %* (W %*% 
> H)){code}, amongst other similar patterns.  Ideally, this would easily apply 
> to {code}t(W) %*% (V/(W%*%H + 1e-17){code}, regardless of the unneeded 
> epsilon term.  Currently, the addition of the epsilon term causes the 
> algorithm to run in non-linear time (quad or exponential).  Initially, the 
> behavior pointed towards the possibility of the optimizer avoiding the 
> rewrite to the fused operator, resulting in naive computation, and non-linear 
> growth in training time.  Further exploration seems to show that the rewrite 
> is indeed still being applied, but there seems to also be a recursive nesting 
> of the same rewrite over various regions of the above statements that is not 
> found when the epsilon term is removed.
> The following is the full PNMF DML script used:
> {code}
> V = read($X)
> max_iteration = $maxiter
> rank = $rank
> n = nrow(V)
> m = ncol(V)
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> loglik0 = sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H))
> i=0
> while(i < max_iteration) {
>   # Addition of epsilon (1e-17) term causes script to run in non-linear time:
>   H = (H * (t(W) %*% (V/(W%*%H + 1e-17/t(colSums(W))
>   W = (W * ((V/(W%*%H + 1e-17)) %*% t(H)))/t(rowSums(H))
>   # Removal of epsilon works correctly:
>   #H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W))
>   #W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> loglik = sum(V*log(W%*%H+1e-17)) - as.scalar(colSums(W)%*%rowSums(H))
> print("pnmf: " + loglik0 + " -> " + loglik)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-512) DML Script With UDFs Results In Out Of Memory Error As Compared to Without UDFs

2016-03-24 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-512:
-
Issue Type: Sub-task  (was: Bug)
Parent: SYSTEMML-588

> DML Script With UDFs Results In Out Of Memory Error As Compared to Without 
> UDFs
> ---
>
> Key: SYSTEMML-512
> URL: https://issues.apache.org/jira/browse/SYSTEMML-512
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Mike Dusenberry
> Attachments: test1.scala, test2.scala
>
>
> Currently, the following script for running a simple version of Poisson 
> non-negative matrix factorization (PNMF) runs in linear time as desired:
> {code}
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF
> n = nrow(V)
> m = ncol(V)
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> # compute negative log-likelihood
> negloglik_temp = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> However, a small refactoring of this same script to pull the core PNMF 
> algorithm and the negative log-likelihood computation out into separate UDFs 
> results in non-linear runtime and a Java out of memory heap error on the same 
> dataset.  
> {code}
> pnmf = function(matrix[double] V, integer max_iteration, integer rank) return 
> (matrix[double] W, matrix[double] H) {
> n = nrow(V)
> m = ncol(V)
> 
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> 
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> }
> negloglikfunc = function(matrix[double] V, matrix[double] W, matrix[double] 
> H) return (double negloglik) {
> negloglik = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> }
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF and evaluate
> [W, H] = pnmf(V, max_iteration, rank)
> negloglik_temp = negloglikfunc(V, W, H)
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> The expectation would be that such modularization at the DML level should be 
> allowed without any impact on performance.
> Details:
> - Data: Amazon product co-purchasing dataset from Stanford 
> [http://snap.stanford.edu/data/amazon0601.html | 
> http://snap.stanford.edu/data/amazon0601.html]
> - Execution mode: Spark {{MLContext}}, but should be applicable to 
> command-line invocation as well. 
> - Error message:
> {code}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:415)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.sparseToDense(MatrixBlock.java:1212)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.examSparsity(MatrixBlock.java:1103)
>   at 
> org.apache.sysml.runtime.instructions.cp.MatrixMatrixArithmeticCPInstruction.processInstruction(MatrixMatrixArithmeticCPInstruction.java:60)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:227)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:169)
>   at 
> org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:183)
>   at 
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:115)
>   at 
> org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:227)
>   at 
> 

[jira] [Updated] (SYSTEMML-590) Improve Namespace Handling for UDFs

2016-03-24 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-590:
-
Summary: Improve Namespace Handling for UDFs  (was: Assume Parent's 
Namespace for Nested UDF calls.)

> Improve Namespace Handling for UDFs
> ---
>
> Key: SYSTEMML-590
> URL: https://issues.apache.org/jira/browse/SYSTEMML-590
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Mike Dusenberry
>
> Currently, if a UDF body involves calling another UDF, the default global 
> namespace is assumed, unless a namespace is explicitly indicated.  This 
> becomes a problem when a file contains UDFs, and is then sourced from another 
> script.
> Imagine a file {{funcs.dml}} as follows:
> {code}
> f = function(double x, int a) return (double ans) {
>   x2 = g(x)
>   ans = a * x2
> }
> g = function(double x) return (double ans) {
>   ans = x * x
> }
> {code}
> Then, let's try to call {{f}}:
> {code}
> script = """
> source ("funcs.dml") as funcs
> ans = funcs::f(3, 1)
> print(ans)
> """
> ml.reset()
> ml.executeScript(script)
> {code}
> This results in an error since {{f}} is in the {{funcs}} namespace, but the 
> call to {{g}} assumes {{g}} is still in the default namespace.  Clearly, the 
> user intends to the use the {{g}} that is located in the same file.
> Currently, we would need to adjust {{funcs.dml}} as follows to explicitly 
> assume that {{f}} and {{g}} are in a {{funcs}} namespace:
> {code}
> f = function(double x, int a) return (double ans) {
>   x2 = funcs::g(x)
>   ans = a * x2
> }
> g = function(double x) return (double ans) {
>   ans = x * x
> }
> {code}
> Instead, it would be better to simply first look for {{g}} in its parent's 
> namespace.  In this case, the "parent" would be the function {{f}}, and the 
> namespace we have selected is {{funcs}}, although that choice would be left 
> up to the end-user.  Then, namespace assumptions would not be necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-594) Add Jupyter Example Notebook

2016-03-24 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-594:


 Summary: Add Jupyter Example Notebook
 Key: SYSTEMML-594
 URL: https://issues.apache.org/jira/browse/SYSTEMML-594
 Project: SystemML
  Issue Type: New Feature
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-594) Add Jupyter Example Notebook

2016-03-24 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211153#comment-15211153
 ] 

Mike Dusenberry commented on SYSTEMML-594:
--

[PR 97 | https://github.com/apache/incubator-systemml/pull/97] submitted.

> Add Jupyter Example Notebook
> 
>
> Key: SYSTEMML-594
> URL: https://issues.apache.org/jira/browse/SYSTEMML-594
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (SYSTEMML-594) Add Jupyter Example Notebook

2016-03-25 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry resolved SYSTEMML-594.
--
Resolution: Fixed

Merged via [commit 32962fea3b81da19e755b17107d94b007fb0f6c5 | 
https://github.com/apache/incubator-systemml/commit/32962fea3b81da19e755b17107d94b007fb0f6c5].

> Add Jupyter Example Notebook
> 
>
> Key: SYSTEMML-594
> URL: https://issues.apache.org/jira/browse/SYSTEMML-594
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-543) Refactor MLContext in Scala

2016-03-03 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179337#comment-15179337
 ] 

Mike Dusenberry commented on SYSTEMML-543:
--

[~tommy_cug] Thanks for reaching out!  I haven't started on this, so please 
feel free to work on it.  However, I think that the redesign will rely on what 
[~deron] is working on with SYSTEMML-544, so please coordinate with him! :)

> Refactor MLContext in Scala
> ---
>
> Key: SYSTEMML-543
> URL: https://issues.apache.org/jira/browse/SYSTEMML-543
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
>
> Our {{MLContext}} API relies on a myriad of optional parameters as 
> conveniences for end-users, which has led to our Java implementation growing 
> in size.  Moving to Scala will allow us to use default parameters and 
> continue to expand the capabilities of the API in a clean way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-597) Random Forest Execution Fails

2016-03-28 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214390#comment-15214390
 ] 

Mike Dusenberry commented on SYSTEMML-597:
--

Confirmed locally and on a cluster to be an issue on 0.9, but not on the 
current head.

Thanks for the explanation as well.

> Random Forest Execution Fails
> -
>
> Key: SYSTEMML-597
> URL: https://issues.apache.org/jira/browse/SYSTEMML-597
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.9
>Reporter: Mike Dusenberry
>
> An issue was raised with running our random forest algorithm on SystemML 0.9 
> via MLContext with Scala Spark on a cluster.  The following example runs on a 
> local machine (on bleeding-edge), but not on the given cluster.
> Notice the error involves {{distinct_values_offset}}.  That variable should 
> only be used if `num_cat_features` is > 0, but based on line 106 
> (https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/random-forest.dml#L106)
>  the else clause is used (because no R file), and {{num_cat_features}} is set 
> to 0.  Also, based on the above line 106 (and no R file) that variable should 
> never be initialized either.
> Code:
> {code}
> // Create a SystemML context
> import org.apache.sysml.api.MLContext
> val ml = new MLContext(sc)
> // Generate random data
> val X = sc.parallelize(Seq((0.3, 0.1, 0.5), (0.3, 1.0, 0.6), (0.7, 0.8, 1.0), 
> (0.3, 0.1, 0.1), (0.5, 0.8, 0.5))).toDF
> val Y = sc.parallelize(Seq((1, 0, 0), (0, 1, 0), (0, 0, 1), (0, 1, 0), (1, 0, 
> 0))).toDF
> // Register inputs & outputs
> ml.reset()
> ml.registerInput("X", X)
> ml.registerInput("Y_bin", Y)
> ml.registerOutput("M")
> // Run the script
> val nargs = Map("X" -> "", "Y" -> "", "M" -> "")
> val outputs = 
> ml.execute("/home/biadmin/spark-enablement/installs/SystemML/algorithms/random-forest.dml",
>  nargs)
> val M = outputs.getDF(sqlContext, "M")
> {code}
> Output:
> {code}
> import org.apache.sysml.api.MLContext
> ml: org.apache.sysml.api.MLContext = org.apache.sysml.api.MLContext@677d8e36
> X: org.apache.spark.sql.DataFrame = [_1: double, _2: double, _3: double]
> Y: org.apache.spark.sql.DataFrame = [_1: int, _2: int, _3: int]
> nargs: scala.collection.immutable.Map[String,String] = Map(X -> "", Y -> "", 
> M -> "")
> org.apache.sysml.runtime.DMLRuntimeException: 
> org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while 
> program block generated from while statement block between lines 263 and 1167 
> -- Error evaluating while program block.
>   at 
> org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:153)
>   at 
> org.apache.sysml.api.MLContext.executeUsingSimplifiedCompilationChain(MLContext.java:1337)
>   at 
> org.apache.sysml.api.MLContext.compileAndExecuteScript(MLContext.java:1203)
>   at 
> org.apache.sysml.api.MLContext.compileAndExecuteScript(MLContext.java:1149)
>   at org.apache.sysml.api.MLContext.execute(MLContext.java:631)
>   at org.apache.sysml.api.MLContext.execute(MLContext.java:666)
>   at org.apache.sysml.api.MLContext.execute(MLContext.java:679)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:79)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:84)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:86)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:88)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:90)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:92)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:94)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:96)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:98)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:100)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:102)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:104)
>   at 
> 

[jira] [Closed] (SYSTEMML-597) Random Forest Execution Fails

2016-03-28 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry closed SYSTEMML-597.


> Random Forest Execution Fails
> -
>
> Key: SYSTEMML-597
> URL: https://issues.apache.org/jira/browse/SYSTEMML-597
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.9
>Reporter: Mike Dusenberry
>Assignee: Arvind Surve
> Fix For: SystemML 0.10
>
>
> An issue was raised with running our random forest algorithm on SystemML 0.9 
> via MLContext with Scala Spark on a cluster.  The following example runs on a 
> local machine (on bleeding-edge), but not on the given cluster.
> Notice the error involves {{distinct_values_offset}}.  That variable should 
> only be used if `num_cat_features` is > 0, but based on line 106 
> (https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/random-forest.dml#L106)
>  the else clause is used (because no R file), and {{num_cat_features}} is set 
> to 0.  Also, based on the above line 106 (and no R file) that variable should 
> never be initialized either.
> Code:
> {code}
> // Create a SystemML context
> import org.apache.sysml.api.MLContext
> val ml = new MLContext(sc)
> // Generate random data
> val X = sc.parallelize(Seq((0.3, 0.1, 0.5), (0.3, 1.0, 0.6), (0.7, 0.8, 1.0), 
> (0.3, 0.1, 0.1), (0.5, 0.8, 0.5))).toDF
> val Y = sc.parallelize(Seq((1, 0, 0), (0, 1, 0), (0, 0, 1), (0, 1, 0), (1, 0, 
> 0))).toDF
> // Register inputs & outputs
> ml.reset()
> ml.registerInput("X", X)
> ml.registerInput("Y_bin", Y)
> ml.registerOutput("M")
> // Run the script
> val nargs = Map("X" -> "", "Y" -> "", "M" -> "")
> val outputs = 
> ml.execute("/home/biadmin/spark-enablement/installs/SystemML/algorithms/random-forest.dml",
>  nargs)
> val M = outputs.getDF(sqlContext, "M")
> {code}
> Output:
> {code}
> import org.apache.sysml.api.MLContext
> ml: org.apache.sysml.api.MLContext = org.apache.sysml.api.MLContext@677d8e36
> X: org.apache.spark.sql.DataFrame = [_1: double, _2: double, _3: double]
> Y: org.apache.spark.sql.DataFrame = [_1: int, _2: int, _3: int]
> nargs: scala.collection.immutable.Map[String,String] = Map(X -> "", Y -> "", 
> M -> "")
> org.apache.sysml.runtime.DMLRuntimeException: 
> org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while 
> program block generated from while statement block between lines 263 and 1167 
> -- Error evaluating while program block.
>   at 
> org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:153)
>   at 
> org.apache.sysml.api.MLContext.executeUsingSimplifiedCompilationChain(MLContext.java:1337)
>   at 
> org.apache.sysml.api.MLContext.compileAndExecuteScript(MLContext.java:1203)
>   at 
> org.apache.sysml.api.MLContext.compileAndExecuteScript(MLContext.java:1149)
>   at org.apache.sysml.api.MLContext.execute(MLContext.java:631)
>   at org.apache.sysml.api.MLContext.execute(MLContext.java:666)
>   at org.apache.sysml.api.MLContext.execute(MLContext.java:679)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:79)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:84)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:86)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:88)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:90)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:92)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:94)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:96)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:98)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:100)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:102)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:104)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:106)
>   at 
> 

[jira] [Resolved] (SYSTEMML-597) Random Forest Execution Fails

2016-03-28 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry resolved SYSTEMML-597.
--
   Resolution: Fixed
Fix Version/s: SystemML 0.10

Fixed in SYSTEMML-576.

> Random Forest Execution Fails
> -
>
> Key: SYSTEMML-597
> URL: https://issues.apache.org/jira/browse/SYSTEMML-597
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.9
>Reporter: Mike Dusenberry
>Assignee: Arvind Surve
> Fix For: SystemML 0.10
>
>
> An issue was raised with running our random forest algorithm on SystemML 0.9 
> via MLContext with Scala Spark on a cluster.  The following example runs on a 
> local machine (on bleeding-edge), but not on the given cluster.
> Notice the error involves {{distinct_values_offset}}.  That variable should 
> only be used if `num_cat_features` is > 0, but based on line 106 
> (https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/random-forest.dml#L106)
>  the else clause is used (because no R file), and {{num_cat_features}} is set 
> to 0.  Also, based on the above line 106 (and no R file) that variable should 
> never be initialized either.
> Code:
> {code}
> // Create a SystemML context
> import org.apache.sysml.api.MLContext
> val ml = new MLContext(sc)
> // Generate random data
> val X = sc.parallelize(Seq((0.3, 0.1, 0.5), (0.3, 1.0, 0.6), (0.7, 0.8, 1.0), 
> (0.3, 0.1, 0.1), (0.5, 0.8, 0.5))).toDF
> val Y = sc.parallelize(Seq((1, 0, 0), (0, 1, 0), (0, 0, 1), (0, 1, 0), (1, 0, 
> 0))).toDF
> // Register inputs & outputs
> ml.reset()
> ml.registerInput("X", X)
> ml.registerInput("Y_bin", Y)
> ml.registerOutput("M")
> // Run the script
> val nargs = Map("X" -> "", "Y" -> "", "M" -> "")
> val outputs = 
> ml.execute("/home/biadmin/spark-enablement/installs/SystemML/algorithms/random-forest.dml",
>  nargs)
> val M = outputs.getDF(sqlContext, "M")
> {code}
> Output:
> {code}
> import org.apache.sysml.api.MLContext
> ml: org.apache.sysml.api.MLContext = org.apache.sysml.api.MLContext@677d8e36
> X: org.apache.spark.sql.DataFrame = [_1: double, _2: double, _3: double]
> Y: org.apache.spark.sql.DataFrame = [_1: int, _2: int, _3: int]
> nargs: scala.collection.immutable.Map[String,String] = Map(X -> "", Y -> "", 
> M -> "")
> org.apache.sysml.runtime.DMLRuntimeException: 
> org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while 
> program block generated from while statement block between lines 263 and 1167 
> -- Error evaluating while program block.
>   at 
> org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:153)
>   at 
> org.apache.sysml.api.MLContext.executeUsingSimplifiedCompilationChain(MLContext.java:1337)
>   at 
> org.apache.sysml.api.MLContext.compileAndExecuteScript(MLContext.java:1203)
>   at 
> org.apache.sysml.api.MLContext.compileAndExecuteScript(MLContext.java:1149)
>   at org.apache.sysml.api.MLContext.execute(MLContext.java:631)
>   at org.apache.sysml.api.MLContext.execute(MLContext.java:666)
>   at org.apache.sysml.api.MLContext.execute(MLContext.java:679)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:79)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:84)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:86)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:88)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:90)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:92)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:94)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:96)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:98)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:100)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:102)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:104)
>   at 
> 

[jira] [Commented] (SYSTEMML-560) Distributed frame representation

2016-03-28 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214447#comment-15214447
 ] 

Mike Dusenberry commented on SYSTEMML-560:
--

{quote}
Seamless integration: First, we aim for a seamless integration with (1) Spark's 
DataFrame and DataSet representations, (2) csv text formats, and (3) SystemML's 
binary block matrix representations.
{quote}

This is great, especially given Spark's push away from RDDs and to DataFrames 
(and now DataSets).  In addition to being seamless, will the conversion from 
DataFrames/DataSets to binary block matrices also be performant?  It is natural 
to use DataFrames/DataSets with the MLContext API for matrices (and the new 
frame format), so it would be great to improve that conversion performance, 
both for inputs from DataFrames/DataSets, and outputs back to 
DataFrames/DataSets.

> Distributed frame representation
> 
>
> Key: SYSTEMML-560
> URL: https://issues.apache.org/jira/browse/SYSTEMML-560
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>Assignee: Arvind Surve
>
> The major design goals for our distributed binary block frame representation 
> are twofold:
> * Seamless integration: First, we aim for a seamless integration with (1) 
> Spark's DataFrame and DataSet representations, (2) csv text formats, and  (3) 
> SystemML's binary block matrix representations.
> * Memory efficiency: Second, we are still interested in a block 
> representation to exploit the column-wise native array storage of 
> FrameBlocks.  
> As a good compromise with regard to both design goals, the initial design 
> proposal is 
> {code}
> FRAME := JavaPairRDD 
> {code}
> where the keys represents the row offsets of frameblock values, a frameblock 
> value covers one or multiple rows and all columns of the frame, and most 
> importantly frameblock values do not exhibit a fixed block size. NOTE that in 
> comparison to Spark's data frames, SystemML's frames are row-indexed (no a 
> set of rows) in order to allow well-defined indexing operations over frames 
> (as possible in R).  
> This representation would allow a shuffle-free conversion from DataFrames, 
> DataSets, CSV to SystemML's Frames and vice versa while still exploiting a 
> block structure whenever possible (moderate numbers of columns). Similar, 
> binary block matrix to frame conversions can also be done without shuffle in 
> the common case ncol <= blocksize (default 1k). Finally, this representation 
> also seems to be advantageous with regard to the common frame operations of 
> transform, transform apply, indexing, append, and transform decode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-652) Left-Indexing With Result of DML Function Changes Matrix Size

2016-04-26 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-652:
-
Description: 
I've found a bug in which assigning the result of a DML function to a portion 
of a matrix with left-indexing results in the left-hand matrix being reduced in 
size dimensionally. This bug was encountered while working on the deep learning 
DML library, and the following simplified example aims to provide a simple, 
reproducible example.

Given the following code,
{code}
forward = function(matrix[double] X) return (matrix[double] out) {
  out = 1 / (1 + exp(-X))
}
N = 3
M = 5
X = rand(rows=N, cols=M)
X[,1:2] = forward(X[,1:2])
print("X1: " + nrow(X) + "x" + ncol(X))

X = rand(rows=N, cols=M)
temp = forward(X[,1:2])
X[,1:2] = temp
if(1==1){}
print("X2: " + nrow(X) + "x" + ncol(X))

if(1==1){}
print("")
{code},

notice that {{X}} should always be a {{3x5}} matrix.  However, in the first 
case, {{X}} is truncated to a {{3x2}} matrix.

{code}
X1: 3x2
X2: 3x5
{code}

Note: The {{if(1==1){}}} statements are included because otherwise the print 
statements are outputted out of order.

  was:
I've found a bug in which assigning the result of a DML function to a portion 
of a matrix with left-indexing results in the left-hand matrix being reduced in 
size dimensionally. This bug was encountered while working on the deep learning 
DML library, and the following simplified example aims to provide a simple, 
reproducible example.

Given the following code,
{code}
forward = function(matrix[double] X) return (matrix[double] out) {
  out = 1 / (1 + exp(-X))
}
N = 3
M = 5
X = rand(rows=N, cols=M)
X[,1:2] = forward(X[,1:2])
print("X1: " + nrow(X) + "x" + ncol(X))

X = rand(rows=N, cols=M)
temp = forward(X[,1:2])
X[,1:2] = temp
if(1==1){}
print("X2: " + nrow(X) + "x" + ncol(X))

if(1==1){}
print("")
{code},

notice that {{X}} should always be a {{3x5}} matrix.  However, in the first 
case, {{X}} is truncated to a {{3x2}} matrix.

{code}
X1: 3x2
X2: 3x5
{code}

Note: The {{if(1==1)}} statements are included because otherwise the print 
statements are outputted out of order.


> Left-Indexing With Result of DML Function Changes Matrix Size
> -
>
> Key: SYSTEMML-652
> URL: https://issues.apache.org/jira/browse/SYSTEMML-652
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>
> I've found a bug in which assigning the result of a DML function to a portion 
> of a matrix with left-indexing results in the left-hand matrix being reduced 
> in size dimensionally. This bug was encountered while working on the deep 
> learning DML library, and the following simplified example aims to provide a 
> simple, reproducible example.
> Given the following code,
> {code}
> forward = function(matrix[double] X) return (matrix[double] out) {
>   out = 1 / (1 + exp(-X))
> }
> N = 3
> M = 5
> X = rand(rows=N, cols=M)
> X[,1:2] = forward(X[,1:2])
> print("X1: " + nrow(X) + "x" + ncol(X))
> X = rand(rows=N, cols=M)
> temp = forward(X[,1:2])
> X[,1:2] = temp
> if(1==1){}
> print("X2: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> print("")
> {code},
> notice that {{X}} should always be a {{3x5}} matrix.  However, in the first 
> case, {{X}} is truncated to a {{3x2}} matrix.
> {code}
> X1: 3x2
> X2: 3x5
> {code}
> Note: The {{if(1==1){}}} statements are included because otherwise the print 
> statements are outputted out of order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-652) Left-Indexing With Result of DML Function Changes Matrix Size

2016-04-26 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-652:
-
Description: 
I've found a bug in which assigning the result of a DML function to a portion 
of a matrix with left-indexing results in the left-hand matrix being reduced in 
size dimensionally. This bug was encountered while working on the deep learning 
DML library, and the following simplified example aims to provide a simple, 
reproducible example.

Given the following code,
{code}
forward = function(matrix[double] X) return (matrix[double] out) {
  out = 1 / (1 + exp(-X))
}
N = 3
M = 5
X = rand(rows=N, cols=M)
X[,1:2] = forward(X[,1:2])
print("X1: " + nrow(X) + "x" + ncol(X))

X = rand(rows=N, cols=M)
temp = forward(X[,1:2])
X[,1:2] = temp
if(1==1){}
print("X2: " + nrow(X) + "x" + ncol(X))

if(1==1){}
print("")
{code},

notice that {{X}} should always be a {{3x5}} matrix.  However, in the first 
case, {{X}} is truncated to a {{3x2}} matrix.

{code}
X1: 3x2
X2: 3x5
{code}

[Note: The {{if(1==1){}}} statements are included because otherwise the print 
statements are outputted out of order.]

  was:
I've found a bug in which assigning the result of a DML function to a portion 
of a matrix with left-indexing results in the left-hand matrix being reduced in 
size dimensionally. This bug was encountered while working on the deep learning 
DML library, and the following simplified example aims to provide a simple, 
reproducible example.

Given the following code,
{code}
forward = function(matrix[double] X) return (matrix[double] out) {
  out = 1 / (1 + exp(-X))
}
N = 3
M = 5
X = rand(rows=N, cols=M)
X[,1:2] = forward(X[,1:2])
print("X1: " + nrow(X) + "x" + ncol(X))

X = rand(rows=N, cols=M)
temp = forward(X[,1:2])
X[,1:2] = temp
if(1==1){}
print("X2: " + nrow(X) + "x" + ncol(X))

if(1==1){}
print("")
{code},

notice that {{X}} should always be a {{3x5}} matrix.  However, in the first 
case, {{X}} is truncated to a {{3x2}} matrix.

{code}
X1: 3x2
X2: 3x5
{code}


> Left-Indexing With Result of DML Function Changes Matrix Size
> -
>
> Key: SYSTEMML-652
> URL: https://issues.apache.org/jira/browse/SYSTEMML-652
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>
> I've found a bug in which assigning the result of a DML function to a portion 
> of a matrix with left-indexing results in the left-hand matrix being reduced 
> in size dimensionally. This bug was encountered while working on the deep 
> learning DML library, and the following simplified example aims to provide a 
> simple, reproducible example.
> Given the following code,
> {code}
> forward = function(matrix[double] X) return (matrix[double] out) {
>   out = 1 / (1 + exp(-X))
> }
> N = 3
> M = 5
> X = rand(rows=N, cols=M)
> X[,1:2] = forward(X[,1:2])
> print("X1: " + nrow(X) + "x" + ncol(X))
> X = rand(rows=N, cols=M)
> temp = forward(X[,1:2])
> X[,1:2] = temp
> if(1==1){}
> print("X2: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> print("")
> {code},
> notice that {{X}} should always be a {{3x5}} matrix.  However, in the first 
> case, {{X}} is truncated to a {{3x2}} matrix.
> {code}
> X1: 3x2
> X2: 3x5
> {code}
> [Note: The {{if(1==1){}}} statements are included because otherwise the print 
> statements are outputted out of order.]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-652) Left-Indexing With Result of DML Function Changes Matrix Size

2016-04-27 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-652:
-
Affects Version/s: SystemML 0.9

> Left-Indexing With Result of DML Function Changes Matrix Size
> -
>
> Key: SYSTEMML-652
> URL: https://issues.apache.org/jira/browse/SYSTEMML-652
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Affects Versions: SystemML 0.9, SystemML 0.10
>Reporter: Mike Dusenberry
>
> I've found a bug in which assigning the result of a DML function to a portion 
> of a matrix with left-indexing results in the left-hand matrix being reduced 
> in size dimensionally. This bug was encountered while working on the deep 
> learning DML library, and the following simplified example aims to provide a 
> simple, reproducible example.
> Given the following code,
> {code}
> N = 3
> M = 5
> forward = function(matrix[double] X) return (matrix[double] out) {
>   out = 1 / (1 + exp(-X))
> }
> X = rand(rows=N, cols=M)
> X[,1:2] = forward(X[,1:2])
> print("X1: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> X = rand(rows=N, cols=M)
> temp = forward(X[,1:2])
> X[,1:2] = temp
> print("X2: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> print("")
> {code}
> , notice that {{X}} should always be a {{3x5}} matrix, as both cases are 
> equivalent.  However, in the first case, {{X}} is truncated to a {{3x2}} 
> matrix:
> {code}
> X1: 3x2
> X2: 3x5
> {code}
> Note: The {{if(1==1){}}} statements are included because otherwise the print 
> statements are executed out of order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (SYSTEMML-652) Left-Indexing With Result of DML Function Changes Matrix Size

2016-04-27 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260484#comment-15260484
 ] 

Mike Dusenberry edited comment on SYSTEMML-652 at 4/27/16 5:04 PM:
---

[~mboehm7] Yeah I encountered this with the latest master yesterday.  Thanks 
for looking into it!

I definitely think we need to allow for function calls in expressions, and I 
think we also need to place more focus on the integration of functions with the 
rest of the optimizer - i.e. to avoid bugs such as this one, as well as to 
apply valid rewrites such as the update-in-place in SYSTEMML-633 to functions.  
Otherwise, it makes it quite difficult to build libraries that have any 
reasonable performance, and I think that hurts the project and the idea of 
declarative ML overall.


was (Author: mwdus...@us.ibm.com):
[~mboehm7], yeah I encountered this with the latest master yesterday.  Glad 
that the problem was determined.

I definitely think we need to allow for function calls in expressions, and I 
think we also need to place more focus on the integration of functions with the 
rest of the optimizer - i.e. to avoid bugs such as this one, as well as to 
apply valid rewrites such as the update-in-place in SYSTEMML-633 to functions.  
Otherwise, it makes it quite difficult to build libraries that have any 
reasonable performance, and I think that hurts the project and the idea of 
declarative ML overall.

> Left-Indexing With Result of DML Function Changes Matrix Size
> -
>
> Key: SYSTEMML-652
> URL: https://issues.apache.org/jira/browse/SYSTEMML-652
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Affects Versions: SystemML 0.9, SystemML 0.10
>Reporter: Mike Dusenberry
>
> I've found a bug in which assigning the result of a DML function to a portion 
> of a matrix with left-indexing results in the left-hand matrix being reduced 
> in size dimensionally. This bug was encountered while working on the deep 
> learning DML library, and the following simplified example aims to provide a 
> simple, reproducible example.
> Given the following code,
> {code}
> N = 3
> M = 5
> forward = function(matrix[double] X) return (matrix[double] out) {
>   out = 1 / (1 + exp(-X))
> }
> X = rand(rows=N, cols=M)
> X[,1:2] = forward(X[,1:2])
> print("X1: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> X = rand(rows=N, cols=M)
> temp = forward(X[,1:2])
> X[,1:2] = temp
> print("X2: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> print("")
> {code}
> , notice that {{X}} should always be a {{3x5}} matrix, as both cases are 
> equivalent.  However, in the first case, {{X}} is truncated to a {{3x2}} 
> matrix:
> {code}
> X1: 3x2
> X2: 3x5
> {code}
> Note: The {{if(1==1){}}} statements are included because otherwise the print 
> statements are executed out of order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (SYSTEMML-652) Left-Indexing With Result of DML Function Changes Matrix Size

2016-04-27 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260484#comment-15260484
 ] 

Mike Dusenberry edited comment on SYSTEMML-652 at 4/27/16 5:03 PM:
---

[~mboehm7], yeah I encountered this with the latest master yesterday.  Glad 
that the problem was determined.

I definitely think we need to allow for function calls in expressions, and I 
think we also need to place more focus on the integration of functions with the 
rest of the optimizer - i.e. to avoid bugs such as this one, as well as to 
apply valid rewrites such as the update-in-place in SYSTEMML-633 to functions.  
Otherwise, it makes it quite difficult to build libraries that have any 
reasonable performance, and I think that hurts the project and the idea of 
declarative ML overall.


was (Author: mwdus...@us.ibm.com):
[~mboehm7], yeah I encountered this with the latest master yesterday.  Glad 
that the problem was determined.

I definitely think we need to allow for function calls in expressions, and I 
think we also need to place more focus on the integration of functions with the 
rest of the optimizer - i.e. valid rewrites such as the update-in-place in 
SYSTEMML-633 should be applied to functions.  Otherwise, it makes it quite 
difficult to build libraries that have any reasonable performance, and I think 
that hurts the project and the idea of declarative ML overall.

> Left-Indexing With Result of DML Function Changes Matrix Size
> -
>
> Key: SYSTEMML-652
> URL: https://issues.apache.org/jira/browse/SYSTEMML-652
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Affects Versions: SystemML 0.9, SystemML 0.10
>Reporter: Mike Dusenberry
>
> I've found a bug in which assigning the result of a DML function to a portion 
> of a matrix with left-indexing results in the left-hand matrix being reduced 
> in size dimensionally. This bug was encountered while working on the deep 
> learning DML library, and the following simplified example aims to provide a 
> simple, reproducible example.
> Given the following code,
> {code}
> N = 3
> M = 5
> forward = function(matrix[double] X) return (matrix[double] out) {
>   out = 1 / (1 + exp(-X))
> }
> X = rand(rows=N, cols=M)
> X[,1:2] = forward(X[,1:2])
> print("X1: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> X = rand(rows=N, cols=M)
> temp = forward(X[,1:2])
> X[,1:2] = temp
> print("X2: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> print("")
> {code}
> , notice that {{X}} should always be a {{3x5}} matrix, as both cases are 
> equivalent.  However, in the first case, {{X}} is truncated to a {{3x2}} 
> matrix:
> {code}
> X1: 3x2
> X2: 3x5
> {code}
> Note: The {{if(1==1){}}} statements are included because otherwise the print 
> statements are executed out of order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-652) Left-Indexing With Result of DML Function Changes Matrix Size

2016-04-26 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259239#comment-15259239
 ] 

Mike Dusenberry commented on SYSTEMML-652:
--

cc [~mboehm7]

> Left-Indexing With Result of DML Function Changes Matrix Size
> -
>
> Key: SYSTEMML-652
> URL: https://issues.apache.org/jira/browse/SYSTEMML-652
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Reporter: Mike Dusenberry
>
> I've found a bug in which assigning the result of a DML function to a portion 
> of a matrix with left-indexing results in the left-hand matrix being reduced 
> in size dimensionally. This bug was encountered while working on the deep 
> learning DML library, and the following simplified example aims to provide a 
> simple, reproducible example.
> Given the following code,
> {code}
> N = 3
> M = 5
> forward = function(matrix[double] X) return (matrix[double] out) {
>   out = 1 / (1 + exp(-X))
> }
> X = rand(rows=N, cols=M)
> X[,1:2] = forward(X[,1:2])
> print("X1: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> X = rand(rows=N, cols=M)
> temp = forward(X[,1:2])
> X[,1:2] = temp
> print("X2: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> print("")
> {code}
> , notice that {{X}} should always be a {{3x5}} matrix, as both cases are 
> equivalent.  However, in the first case, {{X}} is truncated to a {{3x2}} 
> matrix:
> {code}
> X1: 3x2
> X2: 3x5
> {code}
> Note: The {{if(1==1){}}} statements are included because otherwise the print 
> statements are executed out of order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-652) Left-Indexing With Result of DML Function Changes Matrix Size

2016-04-26 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-652:
-
Component/s: Compiler

> Left-Indexing With Result of DML Function Changes Matrix Size
> -
>
> Key: SYSTEMML-652
> URL: https://issues.apache.org/jira/browse/SYSTEMML-652
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Reporter: Mike Dusenberry
>
> I've found a bug in which assigning the result of a DML function to a portion 
> of a matrix with left-indexing results in the left-hand matrix being reduced 
> in size dimensionally. This bug was encountered while working on the deep 
> learning DML library, and the following simplified example aims to provide a 
> simple, reproducible example.
> Given the following code,
> {code}
> N = 3
> M = 5
> forward = function(matrix[double] X) return (matrix[double] out) {
>   out = 1 / (1 + exp(-X))
> }
> X = rand(rows=N, cols=M)
> X[,1:2] = forward(X[,1:2])
> print("X1: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> X = rand(rows=N, cols=M)
> temp = forward(X[,1:2])
> X[,1:2] = temp
> print("X2: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> print("")
> {code}
> , notice that {{X}} should always be a {{3x5}} matrix, as both cases are 
> equivalent.  However, in the first case, {{X}} is truncated to a {{3x2}} 
> matrix:
> {code}
> X1: 3x2
> X2: 3x5
> {code}
> Note: The {{if(1==1){}}} statements are included because otherwise the print 
> statements are executed out of order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-652) Left-Indexing With Result of DML Function Changes Matrix Size

2016-04-26 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-652:
-
Description: 
I've found a bug in which assigning the result of a DML function to a portion 
of a matrix with left-indexing results in the left-hand matrix being reduced in 
size dimensionally. This bug was encountered while working on the deep learning 
DML library, and the following simplified example aims to provide a simple, 
reproducible example.

Given the following code,
{code}
forward = function(matrix[double] X) return (matrix[double] out) {
  out = 1 / (1 + exp(-X))
}
N = 3
M = 5
X = rand(rows=N, cols=M)
X[,1:2] = forward(X[,1:2])
print("X1: " + nrow(X) + "x" + ncol(X))

X = rand(rows=N, cols=M)
temp = forward(X[,1:2])
X[,1:2] = temp
if(1==1){}
print("X2: " + nrow(X) + "x" + ncol(X))

if(1==1){}
print("")
{code}

, notice that {{X}} should always be a {{3x5}} matrix.  However, in the first 
case, {{X}} is truncated to a {{3x2}} matrix.

{code}
X1: 3x2
X2: 3x5
{code}

Note: The {{if(1==1){}}} statements are included because otherwise the print 
statements are outputted out of order.

  was:
I've found a bug in which assigning the result of a DML function to a portion 
of a matrix with left-indexing results in the left-hand matrix being reduced in 
size dimensionally. This bug was encountered while working on the deep learning 
DML library, and the following simplified example aims to provide a simple, 
reproducible example.

Given the following code,
{code}
forward = function(matrix[double] X) return (matrix[double] out) {
  out = 1 / (1 + exp(-X))
}
N = 3
M = 5
X = rand(rows=N, cols=M)
X[,1:2] = forward(X[,1:2])
print("X1: " + nrow(X) + "x" + ncol(X))

X = rand(rows=N, cols=M)
temp = forward(X[,1:2])
X[,1:2] = temp
if(1==1){}
print("X2: " + nrow(X) + "x" + ncol(X))

if(1==1){}
print("")
{code},

notice that {{X}} should always be a {{3x5}} matrix.  However, in the first 
case, {{X}} is truncated to a {{3x2}} matrix.

{code}
X1: 3x2
X2: 3x5
{code}

Note: The {{if(1==1){}}} statements are included because otherwise the print 
statements are outputted out of order.


> Left-Indexing With Result of DML Function Changes Matrix Size
> -
>
> Key: SYSTEMML-652
> URL: https://issues.apache.org/jira/browse/SYSTEMML-652
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>
> I've found a bug in which assigning the result of a DML function to a portion 
> of a matrix with left-indexing results in the left-hand matrix being reduced 
> in size dimensionally. This bug was encountered while working on the deep 
> learning DML library, and the following simplified example aims to provide a 
> simple, reproducible example.
> Given the following code,
> {code}
> forward = function(matrix[double] X) return (matrix[double] out) {
>   out = 1 / (1 + exp(-X))
> }
> N = 3
> M = 5
> X = rand(rows=N, cols=M)
> X[,1:2] = forward(X[,1:2])
> print("X1: " + nrow(X) + "x" + ncol(X))
> X = rand(rows=N, cols=M)
> temp = forward(X[,1:2])
> X[,1:2] = temp
> if(1==1){}
> print("X2: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> print("")
> {code}
> , notice that {{X}} should always be a {{3x5}} matrix.  However, in the first 
> case, {{X}} is truncated to a {{3x2}} matrix.
> {code}
> X1: 3x2
> X2: 3x5
> {code}
> Note: The {{if(1==1){}}} statements are included because otherwise the print 
> statements are outputted out of order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-652) Left-Indexing With Result of DML Function Changes Matrix Size

2016-04-26 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-652:
-
Description: 
I've found a bug in which assigning the result of a DML function to a portion 
of a matrix with left-indexing results in the left-hand matrix being reduced in 
size dimensionally. This bug was encountered while working on the deep learning 
DML library, and the following simplified example aims to provide a simple, 
reproducible example.

Given the following code,
{code}
N = 3
M = 5
forward = function(matrix[double] X) return (matrix[double] out) {
  out = 1 / (1 + exp(-X))
}

X = rand(rows=N, cols=M)
X[,1:2] = forward(X[,1:2])
print("X1: " + nrow(X) + "x" + ncol(X))

if(1==1){}

X = rand(rows=N, cols=M)
temp = forward(X[,1:2])
X[,1:2] = temp
print("X2: " + nrow(X) + "x" + ncol(X))

if(1==1){}
print("")
{code}

, notice that {{X}} should always be a {{3x5}} matrix.  However, in the first 
case, {{X}} is truncated to a {{3x2}} matrix:

{code}
X1: 3x2
X2: 3x5
{code}

Note: The {{if(1==1){}}} statements are included because otherwise the print 
statements are executed out of order.

  was:
I've found a bug in which assigning the result of a DML function to a portion 
of a matrix with left-indexing results in the left-hand matrix being reduced in 
size dimensionally. This bug was encountered while working on the deep learning 
DML library, and the following simplified example aims to provide a simple, 
reproducible example.

Given the following code,
{code}
N = 3
M = 5
forward = function(matrix[double] X) return (matrix[double] out) {
  out = 1 / (1 + exp(-X))
}

X = rand(rows=N, cols=M)
X[,1:2] = forward(X[,1:2])
print("X1: " + nrow(X) + "x" + ncol(X))

if(1==1){}

X = rand(rows=N, cols=M)
temp = forward(X[,1:2])
X[,1:2] = temp
print("X2: " + nrow(X) + "x" + ncol(X))

if(1==1){}
print("")
{code}

, notice that {{X}} should always be a {{3x5}} matrix.  However, in the first 
case, {{X}} is truncated to a {{3x2}} matrix:

{code}
X1: 3x2
X2: 3x5
{code}

Note: The {{if(1==1){}}} statements are included because otherwise the print 
statements are outputted out of order.


> Left-Indexing With Result of DML Function Changes Matrix Size
> -
>
> Key: SYSTEMML-652
> URL: https://issues.apache.org/jira/browse/SYSTEMML-652
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>
> I've found a bug in which assigning the result of a DML function to a portion 
> of a matrix with left-indexing results in the left-hand matrix being reduced 
> in size dimensionally. This bug was encountered while working on the deep 
> learning DML library, and the following simplified example aims to provide a 
> simple, reproducible example.
> Given the following code,
> {code}
> N = 3
> M = 5
> forward = function(matrix[double] X) return (matrix[double] out) {
>   out = 1 / (1 + exp(-X))
> }
> X = rand(rows=N, cols=M)
> X[,1:2] = forward(X[,1:2])
> print("X1: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> X = rand(rows=N, cols=M)
> temp = forward(X[,1:2])
> X[,1:2] = temp
> print("X2: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> print("")
> {code}
> , notice that {{X}} should always be a {{3x5}} matrix.  However, in the first 
> case, {{X}} is truncated to a {{3x2}} matrix:
> {code}
> X1: 3x2
> X2: 3x5
> {code}
> Note: The {{if(1==1){}}} statements are included because otherwise the print 
> statements are executed out of order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-652) Left-Indexing With Result of DML Function Changes Matrix Size

2016-04-26 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-652:
-
Description: 
I've found a bug in which assigning the result of a DML function to a portion 
of a matrix with left-indexing results in the left-hand matrix being reduced in 
size dimensionally. This bug was encountered while working on the deep learning 
DML library, and the following simplified example aims to provide a simple, 
reproducible example.

Given the following code,
{code}
N = 3
M = 5
forward = function(matrix[double] X) return (matrix[double] out) {
  out = 1 / (1 + exp(-X))
}

X = rand(rows=N, cols=M)
X[,1:2] = forward(X[,1:2])
print("X1: " + nrow(X) + "x" + ncol(X))

if(1==1){}

X = rand(rows=N, cols=M)
temp = forward(X[,1:2])
X[,1:2] = temp
print("X2: " + nrow(X) + "x" + ncol(X))

if(1==1){}
print("")
{code}

, notice that {{X}} should always be a {{3x5}} matrix, as both cases are 
equivalent.  However, in the first case, {{X}} is truncated to a {{3x2}} matrix:

{code}
X1: 3x2
X2: 3x5
{code}

Note: The {{if(1==1){}}} statements are included because otherwise the print 
statements are executed out of order.

  was:
I've found a bug in which assigning the result of a DML function to a portion 
of a matrix with left-indexing results in the left-hand matrix being reduced in 
size dimensionally. This bug was encountered while working on the deep learning 
DML library, and the following simplified example aims to provide a simple, 
reproducible example.

Given the following code,
{code}
N = 3
M = 5
forward = function(matrix[double] X) return (matrix[double] out) {
  out = 1 / (1 + exp(-X))
}

X = rand(rows=N, cols=M)
X[,1:2] = forward(X[,1:2])
print("X1: " + nrow(X) + "x" + ncol(X))

if(1==1){}

X = rand(rows=N, cols=M)
temp = forward(X[,1:2])
X[,1:2] = temp
print("X2: " + nrow(X) + "x" + ncol(X))

if(1==1){}
print("")
{code}

, notice that {{X}} should always be a {{3x5}} matrix.  However, in the first 
case, {{X}} is truncated to a {{3x2}} matrix:

{code}
X1: 3x2
X2: 3x5
{code}

Note: The {{if(1==1){}}} statements are included because otherwise the print 
statements are executed out of order.


> Left-Indexing With Result of DML Function Changes Matrix Size
> -
>
> Key: SYSTEMML-652
> URL: https://issues.apache.org/jira/browse/SYSTEMML-652
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Reporter: Mike Dusenberry
>
> I've found a bug in which assigning the result of a DML function to a portion 
> of a matrix with left-indexing results in the left-hand matrix being reduced 
> in size dimensionally. This bug was encountered while working on the deep 
> learning DML library, and the following simplified example aims to provide a 
> simple, reproducible example.
> Given the following code,
> {code}
> N = 3
> M = 5
> forward = function(matrix[double] X) return (matrix[double] out) {
>   out = 1 / (1 + exp(-X))
> }
> X = rand(rows=N, cols=M)
> X[,1:2] = forward(X[,1:2])
> print("X1: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> X = rand(rows=N, cols=M)
> temp = forward(X[,1:2])
> X[,1:2] = temp
> print("X2: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> print("")
> {code}
> , notice that {{X}} should always be a {{3x5}} matrix, as both cases are 
> equivalent.  However, in the first case, {{X}} is truncated to a {{3x2}} 
> matrix:
> {code}
> X1: 3x2
> X2: 3x5
> {code}
> Note: The {{if(1==1){}}} statements are included because otherwise the print 
> statements are executed out of order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-650) Error while trying to load data as a DataFrame in PySpark

2016-04-26 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258478#comment-15258478
 ] 

Mike Dusenberry commented on SYSTEMML-650:
--

Great, thanks [~kartikkanna...@gmail.com].

> Error while trying to load data as a DataFrame in PySpark
> -
>
> Key: SYSTEMML-650
> URL: https://issues.apache.org/jira/browse/SYSTEMML-650
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.9
> Environment: Cloudera Distribution CDH 5.5.0
> Hadoop 2.6.0
> Spark 1.5.0
> SystemML 0.9.0
> Python 2.7.6
>Reporter: Kartik Kannapur
>  Labels: documentation, newbie
> Fix For: SystemML 0.10
>
>
> I tried to run the sample code for "Jupyter (PySpark) Notebook Example - 
> Poisson Nonnegative Matrix Factorization"  as provided in the documentation.
> The code fails at the line where we try to run the PNMF script on SystemML 
> with Spark:
> {code:xml}
> outputs = ml.executeScript(pnmf, {"X": X_train, "maxiter": 100, "rank": 10}, 
> ["W", "H", "losses"])
> {code}
> The script seems to fail at the first line itself, where *X_train* is passed 
> as a DataFrame into the variable *X*.
> The error message is as below:
> {code:xml}
> /tmp/spark-e7974be5-4438-44b2-ae83-574b2c2bad21/userFiles-5a3c99c5-9bb7-46fe-af83-5119f9358e0f/SystemML.py
>  in executeScript(self, dmlScript, nargs, outputs, configFilePath)
> 126 
> 127 # Execute script
> --> 128 jml_out = self.ml.executeScript(dmlScript, nargs, 
> configFilePath)
> 129 ml_out = MLOutput(jml_out, self.sc)
> 130 return ml_out
> /opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py
>  in __call__(self, *args)
> 536 answer = self.gateway_client.send_command(command)
> 537 return_value = get_return_value(answer, self.gateway_client,
> --> 538 self.target_id, self.name)
> 539 
> 540 for temp_arg in temp_args:
> /opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/pyspark/sql/utils.pyc
>  in deco(*a, **kw)
>  34 def deco(*a, **kw):
>  35 try:
> ---> 36 return f(*a, **kw)
>  37 except py4j.protocol.Py4JJavaError as e:
>  38 s = e.java_exception.toString()
> /opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py
>  in get_return_value(answer, gateway_client, target_id, name)
> 302 raise Py4JError(
> 303 'An error occurred while calling {0}{1}{2}. 
> Trace:\n{3}\n'.
> --> 304 format(target_id, '.', name, value))
> 305 else:
> 306 raise Py4JError(
> Py4JError: An error occurred while calling o79.executeScript. Trace:
> py4j.Py4JException: Method executeScript([class java.lang.String, class 
> java.util.HashMap, null]) does not exist
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
>   at py4j.Gateway.invoke(Gateway.java:252)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Is there any workaround for this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-654) DML Functions Should Override Builtin Functions

2016-04-29 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-654:
-
Description: 
Currently, if a user defines a DML-bodied function that has the same name as a 
builtin function, an error will be returned.  This occurs both if the function 
is defined in the same file as it is being called (which could look like a 
builtin function call, although the user does not wish it to be), or if the 
function is defined in a separate file and called with a namespace notation.  
As we grow the language with an increasing number of builtin functions, this is 
not the desired behavior.  Instead, any DML functions should override any 
builtin functions.

Example 1:
{code}
min = function(int i) {
  print("hi" + i)
}
tmp = min(1)  # fail!
{code}
{code}
: org.apache.sysml.parser.LanguageException: Unsupported Parameters : ERROR: 
null -- line 6, column 0 -- Expecting matrix parameter for function MIN
at 
org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:421)
at 
org.apache.sysml.parser.BuiltinFunctionExpression.checkMatrixParam(BuiltinFunctionExpression.java:1221)
at 
org.apache.sysml.parser.BuiltinFunctionExpression.validateExpression(BuiltinFunctionExpression.java:314)
at 
org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:598)
at 
org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:136)
at 
org.apache.sysml.api.MLContext.executeUsingSimplifiedCompilationChain(MLContext.java:1325)
at 
org.apache.sysml.api.MLContext.compileAndExecuteScript(MLContext.java:1227)
at org.apache.sysml.api.MLContext.executeScript(MLContext.java:1165)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
{code}

Example 2:
{code}
# util.dml
min = function(int i) {
  print("hi" + i)
}
{code}
{code}
source("util.dml") as util
tmp = util::min(1)  # fail!
{code}

  was:
Currently, if a user defines a DML-bodied function that has the same name as a 
builtin function, an error will be returned.  This occurs both if the function 
is defined in the same file as it is being called (which could look like a 
builtin function call, although the user does not wish it to be), or if the 
function is defined in a separate file and called with a namespace notation.  
As we grow the language with an increasing number of builtin functions, this is 
not the desired behavior.  Instead, any DML functions should override any 
builtin functions.

Example 1:
{code}
min = function(int i) {
  print("hi" + i)
}
tmp = min(1)  # fail!
{code}

Example 2:
{code}
# util.dml
min = function(int i) {
  print("hi" + i)
}
{code}
{code}
source("util.dml") as util
tmp = util::min(1)  # fail!
{code}


> DML Functions Should Override Builtin Functions
> ---
>
> Key: SYSTEMML-654
> URL: https://issues.apache.org/jira/browse/SYSTEMML-654
> Project: SystemML
>  Issue Type: Sub-task
>Affects Versions: SystemML 0.10
>Reporter: Mike Dusenberry
>
> Currently, if a user defines a DML-bodied function that has the same name as 
> a builtin function, an error will be returned.  This occurs both if the 
> function is defined in the same file as it is being called (which could look 
> like a builtin function call, although the user does not wish it to be), or 
> if the function is defined in a separate file and called with a namespace 
> notation.  As we grow the language with an increasing number of builtin 
> functions, this is not the desired behavior.  Instead, any DML functions 
> should override any builtin functions.
> Example 1:
> {code}
> min = function(int i) {
>   print("hi" + i)
> }
> tmp = min(1)  # fail!
> {code}
> {code}
> : org.apache.sysml.parser.LanguageException: Unsupported Parameters : ERROR: 
> null -- line 6, column 0 -- Expecting matrix parameter for function MIN
>   at 
> org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:421)
>   at 
> org.apache.sysml.parser.BuiltinFunctionExpression.checkMatrixParam(BuiltinFunctionExpression.java:1221)
>   at 
> 

[jira] [Commented] (SYSTEMML-618) Deep Learning DML Library

2016-05-18 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290212#comment-15290212
 ] 

Mike Dusenberry commented on SYSTEMML-618:
--

Update: Initial version submitted in PR 160 merged into the main SystemML repo 
in [commit 781d24d | 
https://github.com/apache/incubator-systemml/commit/781d24d86dea1de880c6b66a75882ecfa5f1086c].

> Deep Learning DML Library
> -
>
> Key: SYSTEMML-618
> URL: https://issues.apache.org/jira/browse/SYSTEMML-618
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>
> This issue tracks the creation of an experimental, layers-based library in 
> pure PyDML & DML that contains layers with simple forward/backward APIs for 
> affine, convolution (start with 2D), max-pooling, non-linearities (relu, 
> sigmoid, softmax, etc.), dropout, loss functions, other layers, optimizers, 
> and gradient checks.
> *SystemML-NN*: 
> [https://github.com/dusenberrymw/systemml-nn|https://github.com/dusenberrymw/systemml-nn]
> _Current status:_
> * Layers:
> ** Core:
> *** Affine
> *** Spatial Convolution
> *** LSTM
> *** Max Pooling
> *** RNN
> ** Nonlinearities:
> *** ReLU
> *** Sigmoid
> *** Softmax
> *** Tanh
> ** Loss:
> *** Cross-entropy loss
> *** L1 loss
> *** L2 loss
> *** Log ("Logistic") loss
> ** Regularization:
> *** Dropout
> *** L1 reg
> *** L2 reg
> * Optimizers:
> ** Adagrad
> ** Adam
> ** RMSprop
> ** SGD
> ** SGD w/ Momentum
> ** SGD w/ Nesterov Momentum
> * Tests:
> ** Gradient Checks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-508) Extend "executeScript" In MLContext To Accept PyDML

2016-05-09 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276697#comment-15276697
 ] 

Mike Dusenberry commented on SYSTEMML-508:
--

cc [~deron] just fyi...

> Extend "executeScript" In MLContext To Accept PyDML
> ---
>
> Key: SYSTEMML-508
> URL: https://issues.apache.org/jira/browse/SYSTEMML-508
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
> Fix For: SystemML 0.10
>
>
> When executing a script stored in a string with {{MLContext::executeScript}}, 
> PyDML is currently not supported.  This task is to extend the 
> {{executeScript}} API to accept PyDML, much like {{execute}} currently does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (SYSTEMML-508) Extend "executeScript" In MLContext To Accept PyDML

2016-05-09 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry resolved SYSTEMML-508.
--
   Resolution: Fixed
Fix Version/s: SystemML 0.10

> Extend "executeScript" In MLContext To Accept PyDML
> ---
>
> Key: SYSTEMML-508
> URL: https://issues.apache.org/jira/browse/SYSTEMML-508
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
> Fix For: SystemML 0.10
>
>
> When executing a script stored in a string with {{MLContext::executeScript}}, 
> PyDML is currently not supported.  This task is to extend the 
> {{executeScript}} API to accept PyDML, much like {{execute}} currently does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (SYSTEMML-508) Extend "executeScript" In MLContext To Accept PyDML

2016-05-09 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276713#comment-15276713
 ] 

Mike Dusenberry edited comment on SYSTEMML-508 at 5/9/16 5:52 PM:
--

[~deron] Also, for the new {{MLContext}}, I'd like us to think about defaulting 
to PyDML when running from the Python API based on an assumed language 
preference.


was (Author: mwdus...@us.ibm.com):
@deron Also, for the new {{MLContext}}, I'd like us to think about defaulting 
to PyDML when running from the Python API based on an assumed language 
preference.

> Extend "executeScript" In MLContext To Accept PyDML
> ---
>
> Key: SYSTEMML-508
> URL: https://issues.apache.org/jira/browse/SYSTEMML-508
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
> Fix For: SystemML 0.10
>
>
> When executing a script stored in a string with {{MLContext::executeScript}}, 
> PyDML is currently not supported.  This task is to extend the 
> {{executeScript}} API to accept PyDML, much like {{execute}} currently does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-668) Python MLOutput.getDF() Can't Access JVM SQLContext

2016-05-09 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-668:
-
Description: In PySpark, access to the JVM SQLContext from a PySpark 
SQLContext instance -has changed from {{sqlContext._scala_SQLContext}} to 
{{sqlContext._ssql_ctx}}- has always been officially exposed via 
{{sqlContext._ssql_ctx}}.  However, we have been using an unofficial variable, 
{{sqlContext._scala_SQLContext}}, which has been renamed in 2.0, breaking any 
previous code using the former construct, such as our Python 
{{MLOutput.getDF(...)}} method.  Therefore, we just need to update our PySpark 
API to use the official access point.  (was: In PySpark, access to the JVM 
SQLContext from a PySpark SQLContext instance -has changed from 
{{sqlContext._scala_SQLContext}} to {{sqlContext._ssql_ctx}}- has always been 
official exposed via {{sqlContext._ssql_ctx}}.  However, we have been using an 
unofficial variable, {{sqlContext._scala_SQLContext}}, which has been renamed 
in 2.0, breaking any previous code using the former construct, such as our 
Python {{MLOutput.getDF(...)}} method.  Therefore, we just need to update our 
PySpark API to use the official access point.)

> Python MLOutput.getDF() Can't Access JVM SQLContext
> ---
>
> Key: SYSTEMML-668
> URL: https://issues.apache.org/jira/browse/SYSTEMML-668
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
>
> In PySpark, access to the JVM SQLContext from a PySpark SQLContext instance 
> -has changed from {{sqlContext._scala_SQLContext}} to 
> {{sqlContext._ssql_ctx}}- has always been officially exposed via 
> {{sqlContext._ssql_ctx}}.  However, we have been using an unofficial 
> variable, {{sqlContext._scala_SQLContext}}, which has been renamed in 2.0, 
> breaking any previous code using the former construct, such as our Python 
> {{MLOutput.getDF(...)}} method.  Therefore, we just need to update our 
> PySpark API to use the official access point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (SYSTEMML-668) Python MLOutput.getDF() Can't Access JVM SQLContext

2016-05-09 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry closed SYSTEMML-668.


> Python MLOutput.getDF() Can't Access JVM SQLContext
> ---
>
> Key: SYSTEMML-668
> URL: https://issues.apache.org/jira/browse/SYSTEMML-668
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
> Fix For: SystemML 0.10
>
>
> In PySpark, access to the JVM SQLContext from a PySpark SQLContext instance 
> -has changed from {{sqlContext._scala_SQLContext}} to 
> {{sqlContext._ssql_ctx}}- has always been officially exposed via 
> {{sqlContext._ssql_ctx}}.  However, we have been using an unofficial 
> variable, {{sqlContext._scala_SQLContext}}, which has been renamed in 2.0, 
> breaking any previous code using the former construct, such as our Python 
> {{MLOutput.getDF(...)}} method.  Therefore, we just need to update our 
> PySpark API to use the official access point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (SYSTEMML-668) Python MLOutput.getDF() Can't Access JVM SQLContext

2016-05-09 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry resolved SYSTEMML-668.
--
   Resolution: Fixed
Fix Version/s: SystemML 0.10

> Python MLOutput.getDF() Can't Access JVM SQLContext
> ---
>
> Key: SYSTEMML-668
> URL: https://issues.apache.org/jira/browse/SYSTEMML-668
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>Priority: Minor
> Fix For: SystemML 0.10
>
>
> In PySpark, access to the JVM SQLContext from a PySpark SQLContext instance 
> -has changed from {{sqlContext._scala_SQLContext}} to 
> {{sqlContext._ssql_ctx}}- has always been officially exposed via 
> {{sqlContext._ssql_ctx}}.  However, we have been using an unofficial 
> variable, {{sqlContext._scala_SQLContext}}, which has been renamed in 2.0, 
> breaking any previous code using the former construct, such as our Python 
> {{MLOutput.getDF(...)}} method.  Therefore, we just need to update our 
> PySpark API to use the official access point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-618) Deep Learning DML Library

2016-05-10 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-618:
-
Description: 
This issue tracks the creation of an experimental, layers-based library in pure 
PyDML & DML that contains layers with simple forward/backward APIs for affine, 
convolution (start with 2D), max-pooling, non-linearities (relu, sigmoid, 
softmax, etc.), dropout, loss functions, other layers, optimizers, and gradient 
checks.

*SystemML-NN*: 
[https://github.com/dusenberrymw/systemml-nn|https://github.com/dusenberrymw/systemml-nn]
_Current status:_
* Layers:
** Core:
*** Affine
*** Spatial Convolution
*** LSTM
*** Max Pooling
*** RNN
** Nonlinearities:
*** ReLU
*** Sigmoid
*** Softmax
*** Tanh
** Loss:
*** Cross-entropy loss
*** L1 loss
*** L2 loss
*** Log ("Logistic") loss
** Regularization:
*** Dropout
*** L1 reg
*** L2 reg
* Optimizers:
** Adagrad
** Adam
** RMSprop
** SGD
** SGD w/ Momentum
** SGD w/ Nesterov Momentum
* Tests:
** Gradient Checks

  was:
This issue tracks the creation of an experimental, layers-based library in pure 
PyDML & DML that contains layers with simple forward/backward APIs for affine, 
convolution (start with 2D), max-pooling, non-linearities (relu, sigmoid, 
softmax, etc.), dropout, loss functions, other layers, optimizers, and gradient 
checks.

*SystemML-NN*: 
[https://github.com/dusenberrymw/systemml-nn|https://github.com/dusenberrymw/systemml-nn]
_Current status:_
* Layers:
** Core:
*** Affine
*** Spatial Convolution
*** LSTM
*** Max Pooling
*** RNN
** Nonlinearities:
*** ReLU
*** Sigmoid
*** Softmax
*** Tanh
** Loss:
*** Cross-entropy loss
*** L1 loss
*** L2 loss
*** Log ("Logistic") loss
** Regularization:
*** Dropout
*** L1 reg
*** L2 reg
* Optimizers:
** Adagrad
** Adam
** RMSprop
** SGD
** SGD w/ Momentum
** SGD w/ Nesterov Momentum


> Deep Learning DML Library
> -
>
> Key: SYSTEMML-618
> URL: https://issues.apache.org/jira/browse/SYSTEMML-618
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Mike Dusenberry
>Assignee: Mike Dusenberry
>
> This issue tracks the creation of an experimental, layers-based library in 
> pure PyDML & DML that contains layers with simple forward/backward APIs for 
> affine, convolution (start with 2D), max-pooling, non-linearities (relu, 
> sigmoid, softmax, etc.), dropout, loss functions, other layers, optimizers, 
> and gradient checks.
> *SystemML-NN*: 
> [https://github.com/dusenberrymw/systemml-nn|https://github.com/dusenberrymw/systemml-nn]
> _Current status:_
> * Layers:
> ** Core:
> *** Affine
> *** Spatial Convolution
> *** LSTM
> *** Max Pooling
> *** RNN
> ** Nonlinearities:
> *** ReLU
> *** Sigmoid
> *** Softmax
> *** Tanh
> ** Loss:
> *** Cross-entropy loss
> *** L1 loss
> *** L2 loss
> *** Log ("Logistic") loss
> ** Regularization:
> *** Dropout
> *** L1 reg
> *** L2 reg
> * Optimizers:
> ** Adagrad
> ** Adam
> ** RMSprop
> ** SGD
> ** SGD w/ Momentum
> ** SGD w/ Nesterov Momentum
> * Tests:
> ** Gradient Checks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (SYSTEMML-592) GSoC 2016 Project Ideas - SystemML

2016-05-10 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry closed SYSTEMML-592.

Resolution: Done

> GSoC 2016 Project Ideas - SystemML
> --
>
> Key: SYSTEMML-592
> URL: https://issues.apache.org/jira/browse/SYSTEMML-592
> Project: SystemML
>  Issue Type: Brainstorming
>Reporter: Mike Dusenberry
>  Labels: gsoc2016
>
> This JIRA issue serves to hold discussions regarding potential Google Summer 
> of Code 2016 project ideas.  If you have an idea, please reach out!
> Possible areas of focus:
> * ML Algorithms
> * Engine (parser, compiler, runtime; distributed computing, performance, etc.)
> * Continued integration with Spark & PySpark
> * Deep language integration (Python, etc.)
> * Others!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   >