from:"Matthias Boehm \(JIRA\)"

[jira] [Commented] (SYSTEMML-633) Improve Left-Indexing Performance with (Nested) Parfor Loops

2016-04-15 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244030#comment-15244030
 ] 

Matthias Boehm commented on SYSTEMML-633:
-

[~mwdus...@us.ibm.com] you might wanna double check it but here is a dml 
snippet that shows the general idea of vectorizing this pad_img function:

{code}
img_padded = matrix(0, rows=C, cols=(Hin+2*padh)*(Win+2*padw))  # zeros
parfor (c in 1:C) {
img_slice = matrix(img[c,], rows=Hin, cols=Win)  # depth slice C reshaped
img_padded_slice = matrix(0, rows=Hin+2*padh, cols=Win+2*padw)
img_padded_slice[padh+1:padh+Hin, padw+1:padw+Win] = img_slice
img_padded[c,] = matrix(img_padded_slice, rows=1, 
cols=(Hin+2*padh)*(Win+2*padw))  # reshape
}
{code}

{code}
#prepare mask (independent of data)
xpad = matrix(0, rows=Hin+2*padh, cols=Win+2*padw);
xpad[padh+1:padh+Hin, padw+1:padw+Win] = matrix(1, rows=Hin, cols=Win);
rxpad = matrix(xpad, rows=1, cols=(Hin+2*padh)*(Win+2*padw));
mask = removeEmpty(target=t(rxpad) * seq(1,ncol(rxpad)), margin="rows");

#vectorized image padding
img_padded = t(aggregate(target=t(img), groups=mask, fn="sum", 
ngroups=(Hin+2*padh)*(Win+2*padw)));
{code}

The nice things are (1) that you can reuse this mask over all images, and (2) 
the function gets inlined removing unknowns (which might enable update 
in-place). 

> Improve Left-Indexing Performance with (Nested) Parfor Loops
> 
>
> Key: SYSTEMML-633
> URL: https://issues.apache.org/jira/browse/SYSTEMML-633
> Project: SystemML
>  Issue Type: Improvement
>  Components: ParFor
>Reporter: Mike Dusenberry
> Attachments: log.txt
>
>
> In the experimental deep learning DML library I've been building 
> ([https://github.com/dusenberrymw/systemml-nn|https://github.com/dusenberrymw/systemml-nn]),
>  I've experienced severe bottlenecks due to *left-indexing* in parfor loops.  
> Here, I will highlight a few particular instances with simplified examples, 
> but the same issue is shared across many areas of the library, particularly 
> in the convolution and max pooling layers, and is exaggerated in real 
> use-cases.
> *Quick note* on setup for any of the below experiments.  Please grab a copy 
> of the above repo (particularly the {{nn}} directory), and run any 
> experiments with the {{nn}} package available at the base directory of the 
> experiment.
> Scenario: *Convolution*
> * In the library above, the forward pass of the convolution function 
> ([{{conv::forward(...)}} | 
> https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L8]
>  in {{nn/layers/conv.dml}}) essentially accepts a matrix {{X}} of images, a 
> matrix of weights {{W}}, and several other parameters corresponding to image 
> sizes, filter sizes, etc.  It then loops through the images with a {{parfor}} 
> loop, and for each image it pads the image with {{util::pad_image}}, extracts 
> "patches" of the image into columns of a matrix in a sliding fashion across 
> the image with {{util::im2col}}, performs a matrix multiplication between the 
> matrix of patch columns and the weight matrix, and then saves the result into 
> a matrix defined outside of the parfor loop using left-indexing.
> * Left-indexing has been identified as the bottleneck by a wide margin.
> * Left-indexing is used in the main {{conv::forward(...)}} function in the 
> [last line in the parfor 
> loop|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L61],
>  in the 
> [{{util::pad_image(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L196]
>  function used by {{conv::forward(...)}}, as well as in the 
> [{{util::im2col(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L96]
>  function used by {{conv::forward(...)}}.
> * Test script (assuming the {{nn}} package is available):
> ** {{speed-633.dml}} {code}
> source("nn/layers/conv.dml") as conv
> source("nn/util.dml") as util
> # Generate data
> N = 64  # num examples
> C = 30  # num channels
> Hin = 28  # input height
> Win = 28  # input width
> F = 20  # num filters
> Hf = 3  # filter height
> Wf = 3  # filter width
> stride = 1
> pad = 1
> X = rand(rows=N, cols=C*Hin*Win)
> # Create layer
> [W, b] = conv::init(F, C, Hf, Wf)
> # Forward
> [out, Hout, Wout] = conv::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, 
> stride, pad, pad)
> print("Out: " + nrow(out) + "x" + ncol(out))
> print("Hout: " + Hout)
> print("Wout: " + Wout)
> print("")
> print(sum(out))
> {code}
> * Invocation:
> ** {{java -jar 
> $SYSTEMML_HOME/target/systemml-0.10.0-incubating-SNAPSHOT-standalone.jar -f 
> speed-633.dml -stats -explain -exec sin

[jira] [Commented] (SYSTEMML-633) Improve Left-Indexing Performance with (Nested) Parfor Loops

2016-04-16 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244065#comment-15244065
 ] 

Matthias Boehm commented on SYSTEMML-633:
-

while looking over the im2col implementation, I came to the conclusion that we 
should aim for replacing the implementation of 3 nested parfor loops with a 
single vectorized table(ipos, jpos, imgvals) call (where all three arguments 
are column vectors). Since the mapping is very systematic, we should be able to 
come up with an elegant way of computing the ipos and jpos vectors in a 
vectorized manner too.

> Improve Left-Indexing Performance with (Nested) Parfor Loops
> 
>
> Key: SYSTEMML-633
> URL: https://issues.apache.org/jira/browse/SYSTEMML-633
> Project: SystemML
>  Issue Type: Improvement
>  Components: ParFor
>Reporter: Mike Dusenberry
> Attachments: log.txt
>
>
> In the experimental deep learning DML library I've been building 
> ([https://github.com/dusenberrymw/systemml-nn|https://github.com/dusenberrymw/systemml-nn]),
>  I've experienced severe bottlenecks due to *left-indexing* in parfor loops.  
> Here, I will highlight a few particular instances with simplified examples, 
> but the same issue is shared across many areas of the library, particularly 
> in the convolution and max pooling layers, and is exaggerated in real 
> use-cases.
> *Quick note* on setup for any of the below experiments.  Please grab a copy 
> of the above repo (particularly the {{nn}} directory), and run any 
> experiments with the {{nn}} package available at the base directory of the 
> experiment.
> Scenario: *Convolution*
> * In the library above, the forward pass of the convolution function 
> ([{{conv::forward(...)}} | 
> https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L8]
>  in {{nn/layers/conv.dml}}) essentially accepts a matrix {{X}} of images, a 
> matrix of weights {{W}}, and several other parameters corresponding to image 
> sizes, filter sizes, etc.  It then loops through the images with a {{parfor}} 
> loop, and for each image it pads the image with {{util::pad_image}}, extracts 
> "patches" of the image into columns of a matrix in a sliding fashion across 
> the image with {{util::im2col}}, performs a matrix multiplication between the 
> matrix of patch columns and the weight matrix, and then saves the result into 
> a matrix defined outside of the parfor loop using left-indexing.
> * Left-indexing has been identified as the bottleneck by a wide margin.
> * Left-indexing is used in the main {{conv::forward(...)}} function in the 
> [last line in the parfor 
> loop|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L61],
>  in the 
> [{{util::pad_image(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L196]
>  function used by {{conv::forward(...)}}, as well as in the 
> [{{util::im2col(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L96]
>  function used by {{conv::forward(...)}}.
> * Test script (assuming the {{nn}} package is available):
> ** {{speed-633.dml}} {code}
> source("nn/layers/conv.dml") as conv
> source("nn/util.dml") as util
> # Generate data
> N = 64  # num examples
> C = 30  # num channels
> Hin = 28  # input height
> Win = 28  # input width
> F = 20  # num filters
> Hf = 3  # filter height
> Wf = 3  # filter width
> stride = 1
> pad = 1
> X = rand(rows=N, cols=C*Hin*Win)
> # Create layer
> [W, b] = conv::init(F, C, Hf, Wf)
> # Forward
> [out, Hout, Wout] = conv::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, 
> stride, pad, pad)
> print("Out: " + nrow(out) + "x" + ncol(out))
> print("Hout: " + Hout)
> print("Wout: " + Wout)
> print("")
> print(sum(out))
> {code}
> * Invocation:
> ** {{java -jar 
> $SYSTEMML_HOME/target/systemml-0.10.0-incubating-SNAPSHOT-standalone.jar -f 
> speed-633.dml -stats -explain -exec singlenode}}
> * Stats output (modified to output up to 100 instructions):
> ** {code}
> ...
> Total elapsed time:   26.834 sec.
> Total compilation time:   0.529 sec.
> Total execution time:   26.304 sec.
> Number of compiled MR Jobs: 0.
> Number of executed MR Jobs: 0.
> Cache hits (Mem, WB, FS, HDFS): 9196235/0/0/0.
> Cache writes (WB, FS, HDFS):  3070724/0/0.
> Cache times (ACQr/m, RLS, EXP): 1.474/1.120/26.998/0.000 sec.
> HOP DAGs recompiled (PRED, SB): 0/0.
> HOP DAGs recompile time:  0.268 sec.
> Functions recompiled:   129.
> Functions recompile time: 0.841 sec.
> ParFor loops optimized:   1.
> ParFor optimize time:   0.032 sec.
> ParFor initialize time:   0.015 sec.
> ParFor result merge time: 0.028 sec.
> ParFor total update in-place: 0/0/1559

[jira] [Resolved] (SYSTEMML-635) Dmlconfig parsing issues in parfor mr jobs

2016-04-16 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-635.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.10

> Dmlconfig parsing issues in parfor mr jobs
> --
>
> Key: SYSTEMML-635
> URL: https://issues.apache.org/jira/browse/SYSTEMML-635
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.10
>
>
> In case of non-existing dml config files, the parfor's remote MR jobs 
> (Parfor-DPEMR, Parfor-EMR) fail on task setup as described here:
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg00454.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-638) Random Forest Predict Execution Fails

2016-04-18 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246210#comment-15246210
 ] 

Matthias Boehm commented on SYSTEMML-638:
-

thanks for reporting this issue [~ronaghan] . Since it seems to be a 
data-related issue (invalid input to the failing grouped aggregate operation), 
could you share some more details on the input (e.g., types of features: scale 
vs categorical)?

> Random Forest Predict Execution Fails
> -
>
> Key: SYSTEMML-638
> URL: https://issues.apache.org/jira/browse/SYSTEMML-638
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.10
>Reporter: Stacey Ronaghan
>
> Issue executing the prediction for random forest algorithm on SystemML 0.10 
> (incubating) via MLContext with Scala Spark on a cluster.
> Related to [SYSTEMML-597|https://issues.apache.org/jira/browse/SYSTEMML-597]. 
> X is the same input passed into execute for random-forest.dml (mentioned in 
> [SYSTEMML-597|https://issues.apache.org/jira/browse/SYSTEMML-597]) and M is 
> its output model.
> Code:
> {code}
> // Register inputs & outputs for prediction
> ml.reset()
> ml.registerInput("X", X)
> //ml.registerInput("Y", Y)
> ml.registerInput("M", M)
> ml.registerOutput("P")
> //ml.registerOutput("A")
> // Run the script
> //val nargs = Map("X" -> "", "Y" -> "", "M" -> "", "P" -> "", "A" -> "")
> val nargs = Map("X" -> "", "M" -> "", "P" -> "")
> val outputs = 
> ml.execute("/home/biadmin/spark-enablement/installs/SystemML/algorithms/random-forest-predict.dml",
>  nargs)
> val P = outputs.getDF(sqlContext, "P")
> //val A = outputs.getDF(sqlContext, "A")
> {code}
> Output:
> {code}
> import org.apache.sysml.api.MLContext ml: org.apache.sysml.api.MLContext = 
> org.apache.sysml.api.MLContext@5649f7b4 nargs: 
> scala.collection.immutable.Map[String,String] = Map(X -> "", M -> "", P -> 
> "") org.apache.sysml.runtime.DMLRuntimeException: 
> org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program 
> block generated from statement block between lines 68 and 89 -- Error 
> evaluating instruction: 
> CP°groupedagg°target=_mVar60580°groups=_mVar60580°fn=count°k=40°_mVar60581·MATRIX·DOUBLE
>  at org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:152) 
> at 
> org.apache.sysml.api.MLContext.executeUsingSimplifiedCompilationChain(MLContext.java:1365)
>  at 
> org.apache.sysml.api.MLContext.compileAndExecuteScript(MLContext.java:1225) 
> at 
> org.apache.sysml.api.MLContext.compileAndExecuteScript(MLContext.java:1173) 
> at org.apache.sysml.api.MLContext.execute(MLContext.java:640) at 
> org.apache.sysml.api.MLContext.execute(MLContext.java:675) at 
> org.apache.sysml.api.MLContext.execute(MLContext.java:688) at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41)
>  at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:46) 
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:48) at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:50) at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:52) at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:54) at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:56) at 
> $iwC$$iwC$$iwC$$iwC$$iwC.(:58) at 
> $iwC$$iwC$$iwC$$iwC.(:60) at 
> $iwC$$iwC$$iwC.(:62) at $iwC$$iwC.(:64) at 
> $iwC.(:66) at (:68) at .(:72) at 
> .() at .(:7) at .() at 
> $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:497) at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at 
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) at 
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at 
> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at 
> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at 
> org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:646)
>  at 
> org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:611)
>  at 
> org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:604)
>  at 
> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
>  at 
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
>  at 
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:292)
>  at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at 
> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:5

[jira] [Resolved] (SYSTEMML-555) Language extensions for frames

2016-04-19 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-555.
-
Resolution: Done

> Language extensions for frames
> --
>
> Key: SYSTEMML-555
> URL: https://issues.apache.org/jira/browse/SYSTEMML-555
> Project: SystemML
>  Issue Type: Task
>  Components: Compiler, Parser, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>
> Full data frame support requires some modifications of existing builtin 
> functions, especially transform. This task covers these modifications as well 
> as related cleanups to allow seamless usage of transform in JMLC scoring.
> 1) Read parameter 'schema' (* for all strings, or list of value types; if 
> unspecified * for backwards compatibility).
> 2) Transform parameter 'spec' as string instead of 'transformSpec' as 
> filename.
> 3) Transform recode maps inputs and outputs as frames. 
> 4) Split transform and transformapply into separate functions due to 
> different number of outputs (alternatively: always return the recodemaps, 
> even if already passed in).
> 5) Transform parameter 'spec' for existing transform apply to allow different 
> specifications with common recode maps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SYSTEMML-640) Parfor sample script fails w/ dimension mismatch

2016-04-19 Thread Matthias Boehm (JIRA)

Matthias Boehm created SYSTEMML-640:
---

 Summary: Parfor sample script fails w/ dimension mismatch
 Key: SYSTEMML-640
 URL: https://issues.apache.org/jira/browse/SYSTEMML-640
 Project: SystemML
  Issue Type: Bug
  Components: Compiler, ParFor
Affects Versions: SystemML 0.9
Reporter: Matthias Boehm


The parfor util script sample.dml fails with dimension mismatch in special 
cases, where the remote memory budget of map/reduce tasks is larger than the 
driver memory budget and the permutation matrix multiplication would be 
compiled to MR in local parfor but CP in remote parfor execution. 

In these cases, we trigger a forced recompile to CP which internally tries to 
reduce the overhead by recompiling only dags where the runtime plan contains MR 
instructions. This selective recompilation in invalid with permutation matrix 
multiplications that stretch two subsequent dags and the first dag does not 
necessarily contain MR instructions. 

Since meanwhile, the overhead of recompiling average dags (50-100 operators) is 
less than 1ms, we should always recompile the entire parfor body program in 
these cases. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-640) Parfor sample script fails w/ dimension mismatch

2016-04-19 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249165#comment-15249165
 ] 

Matthias Boehm commented on SYSTEMML-640:
-

just FYI [~reinwald] [~niketanpansare] - I'll deliver the fix tomorrow. 

> Parfor sample script fails w/ dimension mismatch
> 
>
> Key: SYSTEMML-640
> URL: https://issues.apache.org/jira/browse/SYSTEMML-640
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler, ParFor
>Affects Versions: SystemML 0.9
>Reporter: Matthias Boehm
>
> The parfor util script sample.dml fails with dimension mismatch in special 
> cases, where the remote memory budget of map/reduce tasks is larger than the 
> driver memory budget and the permutation matrix multiplication would be 
> compiled to MR in local parfor but CP in remote parfor execution. 
> In these cases, we trigger a forced recompile to CP which internally tries to 
> reduce the overhead by recompiling only dags where the runtime plan contains 
> MR instructions. This selective recompilation in invalid with permutation 
> matrix multiplications that stretch two subsequent dags and the first dag 
> does not necessarily contain MR instructions. 
> Since meanwhile, the overhead of recompiling average dags (50-100 operators) 
> is less than 1ms, we should always recompile the entire parfor body program 
> in these cases. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-640) Parfor sample script fails w/ dimension mismatch

2016-04-19 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-640:

Description: 
The parfor util script sample.dml fails with dimension mismatch in special 
cases, where the remote memory budget of map/reduce tasks is larger than the 
driver memory budget and the permutation matrix multiplication would be 
compiled to MR in local parfor but CP in remote parfor execution. 

In these cases, we trigger a forced recompile to CP which internally tries to 
reduce the overhead by recompiling only dags where the runtime plan contains MR 
instructions. This selective recompilation in invalid with permutation matrix 
multiplications that stretch two subsequent dags and the first dag does not 
necessarily contain MR instructions. 

Since meanwhile, the overhead of recompiling average dags (50-100 operators) is 
less than 1ms, we should always recompile the entire parfor body program in 
these cases. 

As a related note: Since we now support removeEmpty with selection vectors, we 
should rewrite these permutation matrix multiplications to remove empty w/ 
selection which is equivalent from a runtime perspective but would simplify 
debugging in comparison to the current multi-dag rewrite. 

  was:
The parfor util script sample.dml fails with dimension mismatch in special 
cases, where the remote memory budget of map/reduce tasks is larger than the 
driver memory budget and the permutation matrix multiplication would be 
compiled to MR in local parfor but CP in remote parfor execution. 

In these cases, we trigger a forced recompile to CP which internally tries to 
reduce the overhead by recompiling only dags where the runtime plan contains MR 
instructions. This selective recompilation in invalid with permutation matrix 
multiplications that stretch two subsequent dags and the first dag does not 
necessarily contain MR instructions. 

Since meanwhile, the overhead of recompiling average dags (50-100 operators) is 
less than 1ms, we should always recompile the entire parfor body program in 
these cases. 


> Parfor sample script fails w/ dimension mismatch
> 
>
> Key: SYSTEMML-640
> URL: https://issues.apache.org/jira/browse/SYSTEMML-640
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler, ParFor
>Affects Versions: SystemML 0.9
>Reporter: Matthias Boehm
>
> The parfor util script sample.dml fails with dimension mismatch in special 
> cases, where the remote memory budget of map/reduce tasks is larger than the 
> driver memory budget and the permutation matrix multiplication would be 
> compiled to MR in local parfor but CP in remote parfor execution. 
> In these cases, we trigger a forced recompile to CP which internally tries to 
> reduce the overhead by recompiling only dags where the runtime plan contains 
> MR instructions. This selective recompilation in invalid with permutation 
> matrix multiplications that stretch two subsequent dags and the first dag 
> does not necessarily contain MR instructions. 
> Since meanwhile, the overhead of recompiling average dags (50-100 operators) 
> is less than 1ms, we should always recompile the entire parfor body program 
> in these cases. 
> As a related note: Since we now support removeEmpty with selection vectors, 
> we should rewrite these permutation matrix multiplications to remove empty w/ 
> selection which is equivalent from a runtime perspective but would simplify 
> debugging in comparison to the current multi-dag rewrite. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SYSTEMML-641) Performance features core block matrix multiply

2016-04-20 Thread Matthias Boehm (JIRA)

Matthias Boehm created SYSTEMML-641:
---

 Summary: Performance features core block matrix multiply 
 Key: SYSTEMML-641
 URL: https://issues.apache.org/jira/browse/SYSTEMML-641
 Project: SystemML
  Issue Type: Task
  Components: Runtime
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-643) Sample script produces outputs w/ wrong dimensions meta data

2016-04-21 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-643:

Description: 
This bug tracks the issues related to the following issue raised on the dev 
list:
https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg00452.html

In detail, this is a compiler bug that occurs on compiling a mr permutation 
matrix mult for removeEmpty(diag(V)) %*% X, where we incorrectly use nrow(V) as 
the number of rows. Along with the fix, this task also covers an extension of 
our testsuite to include sample.dml. 

  was:
This bug tracks the issues related to the following issues raised on the dev 
list:
https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg00452.html

In detail, this is a compiler bug that occurs on compiling a mr permutation 
matrix mult for removeEmpty(diag(V)) %*% X, where we incorrectly use nrow(V) as 
the number of rows. Along with the fix, this task also covers an extension of 
our testsuite to include sample.dml. 


> Sample script produces outputs w/ wrong dimensions meta data
> 
>
> Key: SYSTEMML-643
> URL: https://issues.apache.org/jira/browse/SYSTEMML-643
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>
> This bug tracks the issues related to the following issue raised on the dev 
> list:
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg00452.html
> In detail, this is a compiler bug that occurs on compiling a mr permutation 
> matrix mult for removeEmpty(diag(V)) %*% X, where we incorrectly use nrow(V) 
> as the number of rows. Along with the fix, this task also covers an extension 
> of our testsuite to include sample.dml. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SYSTEMML-643) Sample script produces outputs w/ wrong dimensions meta data

2016-04-21 Thread Matthias Boehm (JIRA)

Matthias Boehm created SYSTEMML-643:
---

 Summary: Sample script produces outputs w/ wrong dimensions meta 
data
 Key: SYSTEMML-643
 URL: https://issues.apache.org/jira/browse/SYSTEMML-643
 Project: SystemML
  Issue Type: Bug
  Components: Compiler
Reporter: Matthias Boehm
Assignee: Matthias Boehm


This bug tracks the issues related to the following issues raised on the dev 
list:
https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg00452.html

In detail, this is a compiler bug that occurs on compiling a mr permutation 
matrix mult for removeEmpty(diag(V)) %*% X, where we incorrectly use nrow(V) as 
the number of rows. Along with the fix, this task also covers an extension of 
our testsuite to include sample.dml. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-643) Sample script produces outputs w/ wrong dimensions meta data

2016-04-21 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253106#comment-15253106
 ] 

Matthias Boehm commented on SYSTEMML-643:
-

cc [~shirisht] [~ethanyifanxu]

> Sample script produces outputs w/ wrong dimensions meta data
> 
>
> Key: SYSTEMML-643
> URL: https://issues.apache.org/jira/browse/SYSTEMML-643
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>
> This bug tracks the issues related to the following issue raised on the dev 
> list:
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg00452.html
> In detail, this is a compiler bug that occurs on compiling a mr permutation 
> matrix mult for removeEmpty(diag(V)) %*% X, where we incorrectly use nrow(V) 
> as the number of rows. Along with the fix, this task also covers an extension 
> of our testsuite to include sample.dml. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-633) Improve Left-Indexing Performance with (Nested) Parfor Loops

2016-04-21 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253209#comment-15253209
 ] 

Matthias Boehm commented on SYSTEMML-633:
-

[~mwdus...@us.ibm.com] Yes indeed this is one of the issues. The other issue is 
a too conservative application of update-in-place for parfor intermediates 
(your test does only include update-in-place for parfor result variables).

> Improve Left-Indexing Performance with (Nested) Parfor Loops
> 
>
> Key: SYSTEMML-633
> URL: https://issues.apache.org/jira/browse/SYSTEMML-633
> Project: SystemML
>  Issue Type: Improvement
>  Components: ParFor
>Reporter: Mike Dusenberry
> Attachments: Im2colWrapper.java, log.txt, systemml-nn.zip
>
>
> In the experimental deep learning DML library I've been building 
> ([https://github.com/dusenberrymw/systemml-nn|https://github.com/dusenberrymw/systemml-nn]),
>  I've experienced severe bottlenecks due to *left-indexing* in parfor loops.  
> Here, I will highlight a few particular instances with simplified examples, 
> but the same issue is shared across many areas of the library, particularly 
> in the convolution and max pooling layers, and is exaggerated in real 
> use-cases.
> *Quick note* on setup for any of the below experiments.  Please grab a copy 
> of the above repo (particularly the {{nn}} directory), and run any 
> experiments with the {{nn}} package available at the base directory of the 
> experiment.
> Scenario: *Convolution*
> * In the library above, the forward pass of the convolution function 
> ([{{conv::forward(...)}} | 
> https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L8]
>  in {{nn/layers/conv.dml}}) essentially accepts a matrix {{X}} of images, a 
> matrix of weights {{W}}, and several other parameters corresponding to image 
> sizes, filter sizes, etc.  It then loops through the images with a {{parfor}} 
> loop, and for each image it pads the image with {{util::pad_image}}, extracts 
> "patches" of the image into columns of a matrix in a sliding fashion across 
> the image with {{util::im2col}}, performs a matrix multiplication between the 
> matrix of patch columns and the weight matrix, and then saves the result into 
> a matrix defined outside of the parfor loop using left-indexing.
> * Left-indexing has been identified as the bottleneck by a wide margin.
> * Left-indexing is used in the main {{conv::forward(...)}} function in the 
> [last line in the parfor 
> loop|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L61],
>  in the 
> [{{util::pad_image(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L196]
>  function used by {{conv::forward(...)}}, as well as in the 
> [{{util::im2col(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L96]
>  function used by {{conv::forward(...)}}.
> * Test script (assuming the {{nn}} package is available):
> ** {{speed-633.dml}} {code}
> source("nn/layers/conv.dml") as conv
> source("nn/util.dml") as util
> # Generate data
> N = 64  # num examples
> C = 30  # num channels
> Hin = 28  # input height
> Win = 28  # input width
> F = 20  # num filters
> Hf = 3  # filter height
> Wf = 3  # filter width
> stride = 1
> pad = 1
> X = rand(rows=N, cols=C*Hin*Win)
> # Create layer
> [W, b] = conv::init(F, C, Hf, Wf)
> # Forward
> [out, Hout, Wout] = conv::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, 
> stride, pad, pad)
> print("Out: " + nrow(out) + "x" + ncol(out))
> print("Hout: " + Hout)
> print("Wout: " + Wout)
> print("")
> print(sum(out))
> {code}
> * Invocation:
> ** {{java -jar 
> $SYSTEMML_HOME/target/systemml-0.10.0-incubating-SNAPSHOT-standalone.jar -f 
> speed-633.dml -stats -explain -exec singlenode}}
> * Stats output (modified to output up to 100 instructions):
> ** {code}
> ...
> Total elapsed time:   26.834 sec.
> Total compilation time:   0.529 sec.
> Total execution time:   26.304 sec.
> Number of compiled MR Jobs: 0.
> Number of executed MR Jobs: 0.
> Cache hits (Mem, WB, FS, HDFS): 9196235/0/0/0.
> Cache writes (WB, FS, HDFS):  3070724/0/0.
> Cache times (ACQr/m, RLS, EXP): 1.474/1.120/26.998/0.000 sec.
> HOP DAGs recompiled (PRED, SB): 0/0.
> HOP DAGs recompile time:  0.268 sec.
> Functions recompiled:   129.
> Functions recompile time: 0.841 sec.
> ParFor loops optimized:   1.
> ParFor optimize time:   0.032 sec.
> ParFor initialize time:   0.015 sec.
> ParFor result merge time: 0.028 sec.
> ParFor total update in-place: 0/0/1559360
> Total JIT compile time:   14.235 sec.
> Total JVM GC count:   94.
> Total JVM GC time:0.366 sec.
> Heavy hitter instructions (name, t

[jira] [Commented] (SYSTEMML-644) l2svm hang whan handle small data

2016-04-21 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253218#comment-15253218
 ] 

Matthias Boehm commented on SYSTEMML-644:
-

[~tommy_cug] without running it, I would say that this run (on the given 
extremely tiny input) just fails to converge in the inner loop which is not 
guarded by max iterations. As a workaround, you can modify the script slightly 
and introduce a parameter mii for maximum inner iterations.

> l2svm hang whan handle small data
> -
>
> Key: SYSTEMML-644
> URL: https://issues.apache.org/jira/browse/SYSTEMML-644
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms
>Affects Versions: SystemML 0.9, SystemML 0.10
> Environment: spark, hadoop, standalone.
>Reporter: Tommy Yu
>
> l2svm hang when process below data.
> X:
> 1.0
> 0.0
> Y:
> 1.0
> 2.0
> With script:
> hadoop jar SystemML.jar -f scripts/algorithms/l2-svm.dml -nvargs X
> =../data/l2svm/X Y=../data/l2svm/Y icpt=0 tol=0.001 reg=1 maxiter=100 
> model=../data/l2svm/w Log=../data/l2svm/Log fmt="text"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-643) Sample script produces outputs w/ wrong dimensions meta data

2016-04-22 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-643.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.10

> Sample script produces outputs w/ wrong dimensions meta data
> 
>
> Key: SYSTEMML-643
> URL: https://issues.apache.org/jira/browse/SYSTEMML-643
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.10
>
>
> This bug tracks the issues related to the following issue raised on the dev 
> list:
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg00452.html
> In detail, this is a compiler bug that occurs on compiling a mr permutation 
> matrix mult for removeEmpty(diag(V)) %*% X, where we incorrectly use nrow(V) 
> as the number of rows. Along with the fix, this task also covers an extension 
> of our testsuite to include sample.dml. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SYSTEMML-649) JMLC/MLContext for scalar output variables

2016-04-24 Thread Matthias Boehm (JIRA)

Matthias Boehm created SYSTEMML-649:
---

 Summary: JMLC/MLContext for scalar output variables
 Key: SYSTEMML-649
 URL: https://issues.apache.org/jira/browse/SYSTEMML-649
 Project: SystemML
  Issue Type: Task
  Components: APIs
Reporter: Matthias Boehm


Right now neither JMLC nor MLContext supports scalar output variables. This 
task aims to extend both APIs with the required primitives.

The workaround is to cast any output scalar on script-level with as.matrix to a 
1-1 matrix and handle it in the calling application. However, especially with 
MLContext this puts an unnecessary burden on the user as he needs to deal with 
RDDs for a simple scalar too. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-649) JMLC/MLContext support for scalar output variables

2016-04-24 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-649:

Summary: JMLC/MLContext support for scalar output variables  (was: 
JMLC/MLContext for scalar output variables)

> JMLC/MLContext support for scalar output variables
> --
>
> Key: SYSTEMML-649
> URL: https://issues.apache.org/jira/browse/SYSTEMML-649
> Project: SystemML
>  Issue Type: Task
>  Components: APIs
>Reporter: Matthias Boehm
>
> Right now neither JMLC nor MLContext supports scalar output variables. This 
> task aims to extend both APIs with the required primitives.
> The workaround is to cast any output scalar on script-level with as.matrix to 
> a 1-1 matrix and handle it in the calling application. However, especially 
> with MLContext this puts an unnecessary burden on the user as he needs to 
> deal with RDDs for a simple scalar too. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-649) JMLC/MLContext support for scalar output variables

2016-04-25 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256794#comment-15256794
 ] 

Matthias Boehm commented on SYSTEMML-649:
-

thanks [~deron] for taking this over. I just tried it and yes there seems to be 
an issue with scalar inputs here. I guess it never showed up because lots of 
users pass constant scalars as name/value arguments to the script. 

Let me know if you need help to resolve this issue.

> JMLC/MLContext support for scalar output variables
> --
>
> Key: SYSTEMML-649
> URL: https://issues.apache.org/jira/browse/SYSTEMML-649
> Project: SystemML
>  Issue Type: Task
>  Components: APIs
>Reporter: Matthias Boehm
>Assignee: Deron Eriksson
>
> Right now neither JMLC nor MLContext supports scalar output variables. This 
> task aims to extend both APIs with the required primitives.
> The workaround is to cast any output scalar on script-level with as.matrix to 
> a 1-1 matrix and handle it in the calling application. However, especially 
> with MLContext this puts an unnecessary burden on the user as he needs to 
> deal with RDDs for a simple scalar too. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-652) Left-Indexing With Result of DML Function Changes Matrix Size

2016-04-26 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259245#comment-15259245
 ] 

Matthias Boehm commented on SYSTEMML-652:
-

On which SystemML version was this encountered? I fixed a similar issue a while 
back - back then it was an issue of our leftindexing-chain vectorization 
rewrite. 

> Left-Indexing With Result of DML Function Changes Matrix Size
> -
>
> Key: SYSTEMML-652
> URL: https://issues.apache.org/jira/browse/SYSTEMML-652
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Reporter: Mike Dusenberry
>
> I've found a bug in which assigning the result of a DML function to a portion 
> of a matrix with left-indexing results in the left-hand matrix being reduced 
> in size dimensionally. This bug was encountered while working on the deep 
> learning DML library, and the following simplified example aims to provide a 
> simple, reproducible example.
> Given the following code,
> {code}
> N = 3
> M = 5
> forward = function(matrix[double] X) return (matrix[double] out) {
>   out = 1 / (1 + exp(-X))
> }
> X = rand(rows=N, cols=M)
> X[,1:2] = forward(X[,1:2])
> print("X1: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> X = rand(rows=N, cols=M)
> temp = forward(X[,1:2])
> X[,1:2] = temp
> print("X2: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> print("")
> {code}
> , notice that {{X}} should always be a {{3x5}} matrix, as both cases are 
> equivalent.  However, in the first case, {{X}} is truncated to a {{3x2}} 
> matrix:
> {code}
> X1: 3x2
> X2: 3x5
> {code}
> Note: The {{if(1==1){}}} statements are included because otherwise the print 
> statements are executed out of order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (SYSTEMML-652) Left-Indexing With Result of DML Function Changes Matrix Size

2016-04-26 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259245#comment-15259245
 ] 

Matthias Boehm edited comment on SYSTEMML-652 at 4/27/16 12:17 AM:
---

On which SystemML version was this encountered? I fixed a similar issue a while 
back - back then it was an issue of our leftindexing-chain vectorization 
rewrite. Anyway, I'll have a look.


was (Author: mboehm7):
On which SystemML version was this encountered? I fixed a similar issue a while 
back - back then it was an issue of our leftindexing-chain vectorization 
rewrite. 

> Left-Indexing With Result of DML Function Changes Matrix Size
> -
>
> Key: SYSTEMML-652
> URL: https://issues.apache.org/jira/browse/SYSTEMML-652
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Reporter: Mike Dusenberry
>
> I've found a bug in which assigning the result of a DML function to a portion 
> of a matrix with left-indexing results in the left-hand matrix being reduced 
> in size dimensionally. This bug was encountered while working on the deep 
> learning DML library, and the following simplified example aims to provide a 
> simple, reproducible example.
> Given the following code,
> {code}
> N = 3
> M = 5
> forward = function(matrix[double] X) return (matrix[double] out) {
>   out = 1 / (1 + exp(-X))
> }
> X = rand(rows=N, cols=M)
> X[,1:2] = forward(X[,1:2])
> print("X1: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> X = rand(rows=N, cols=M)
> temp = forward(X[,1:2])
> X[,1:2] = temp
> print("X2: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> print("")
> {code}
> , notice that {{X}} should always be a {{3x5}} matrix, as both cases are 
> equivalent.  However, in the first case, {{X}} is truncated to a {{3x2}} 
> matrix:
> {code}
> X1: 3x2
> X2: 3x5
> {code}
> Note: The {{if(1==1){}}} statements are included because otherwise the print 
> statements are executed out of order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-652) Left-Indexing With Result of DML Function Changes Matrix Size

2016-04-26 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259550#comment-15259550
 ] 

Matthias Boehm commented on SYSTEMML-652:
-

thanks for catching this [~mwdus...@us.ibm.com]. The issue is that we don't 
support function calls in expressions (left indexing expression here). So the 
workaround is to assign the function output to a temporary variable. Normally, 
we throw a controlled language exception. In this case, however, the function 
got inlined which replaced the indexed identifier with a normal data identifier 
which resulted in an assignment to X and with that changed dimensions of X. 

I now have a patch which prevents inlining in these unsupported situations for 
consistent error handling compared to non-inlined functions. 

Fundamentally, we need to reconsider functions in expressions (it's a ToDo for 
a long time now). The only problem is that function outputs are often unknown 
in size, so we really want to split the DAG anyway after the function call to 
create a recompilation hook. However, we should do this automatically (similar 
to our rewrites for ctable and removeempty) rather than relying on the user to 
explicitly bind to temporary variables (which implicitly cuts the DAG after the 
function call). 

> Left-Indexing With Result of DML Function Changes Matrix Size
> -
>
> Key: SYSTEMML-652
> URL: https://issues.apache.org/jira/browse/SYSTEMML-652
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Reporter: Mike Dusenberry
>
> I've found a bug in which assigning the result of a DML function to a portion 
> of a matrix with left-indexing results in the left-hand matrix being reduced 
> in size dimensionally. This bug was encountered while working on the deep 
> learning DML library, and the following simplified example aims to provide a 
> simple, reproducible example.
> Given the following code,
> {code}
> N = 3
> M = 5
> forward = function(matrix[double] X) return (matrix[double] out) {
>   out = 1 / (1 + exp(-X))
> }
> X = rand(rows=N, cols=M)
> X[,1:2] = forward(X[,1:2])
> print("X1: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> X = rand(rows=N, cols=M)
> temp = forward(X[,1:2])
> X[,1:2] = temp
> print("X2: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> print("")
> {code}
> , notice that {{X}} should always be a {{3x5}} matrix, as both cases are 
> equivalent.  However, in the first case, {{X}} is truncated to a {{3x2}} 
> matrix:
> {code}
> X1: 3x2
> X2: 3x5
> {code}
> Note: The {{if(1==1){}}} statements are included because otherwise the print 
> statements are executed out of order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-652) Left-Indexing With Result of DML Function Changes Matrix Size

2016-04-27 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-652.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.10

> Left-Indexing With Result of DML Function Changes Matrix Size
> -
>
> Key: SYSTEMML-652
> URL: https://issues.apache.org/jira/browse/SYSTEMML-652
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Affects Versions: SystemML 0.9, SystemML 0.10
>Reporter: Mike Dusenberry
> Fix For: SystemML 0.10
>
>
> I've found a bug in which assigning the result of a DML function to a portion 
> of a matrix with left-indexing results in the left-hand matrix being reduced 
> in size dimensionally. This bug was encountered while working on the deep 
> learning DML library, and the following simplified example aims to provide a 
> simple, reproducible example.
> Given the following code,
> {code}
> N = 3
> M = 5
> forward = function(matrix[double] X) return (matrix[double] out) {
>   out = 1 / (1 + exp(-X))
> }
> X = rand(rows=N, cols=M)
> X[,1:2] = forward(X[,1:2])
> print("X1: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> X = rand(rows=N, cols=M)
> temp = forward(X[,1:2])
> X[,1:2] = temp
> print("X2: " + nrow(X) + "x" + ncol(X))
> if(1==1){}
> print("")
> {code}
> , notice that {{X}} should always be a {{3x5}} matrix, as both cases are 
> equivalent.  However, in the first case, {{X}} is truncated to a {{3x2}} 
> matrix:
> {code}
> X1: 3x2
> X2: 3x5
> {code}
> Note: The {{if(1==1){}}} statements are included because otherwise the print 
> statements are executed out of order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SYSTEMML-653) Performance features buffer pool

2016-04-28 Thread Matthias Boehm (JIRA)

Matthias Boehm created SYSTEMML-653:
---

 Summary: Performance features buffer pool 
 Key: SYSTEMML-653
 URL: https://issues.apache.org/jira/browse/SYSTEMML-653
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-641) Performance features core block matrix multiply

2016-04-28 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-641:

Description: 
1) Cache-conscious dense-dense with large skinny rhs (> L3 cache)
2) Scheduling improvements multi-threaded operations with short lhs
3) Column-wise parallelization with wide rhs

> Performance features core block matrix multiply 
> 
>
> Key: SYSTEMML-641
> URL: https://issues.apache.org/jira/browse/SYSTEMML-641
> Project: SystemML
>  Issue Type: Task
>  Components: Runtime
>Reporter: Matthias Boehm
>
> 1) Cache-conscious dense-dense with large skinny rhs (> L3 cache)
> 2) Scheduling improvements multi-threaded operations with short lhs
> 3) Column-wise parallelization with wide rhs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-641) Performance features core block matrix multiply

2016-04-28 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-641.
-
Resolution: Done

> Performance features core block matrix multiply 
> 
>
> Key: SYSTEMML-641
> URL: https://issues.apache.org/jira/browse/SYSTEMML-641
> Project: SystemML
>  Issue Type: Task
>  Components: Runtime
>Reporter: Matthias Boehm
>
> 1) Cache-conscious dense-dense with large skinny rhs (> L3 cache)
> 2) Scheduling improvements multi-threaded operations with short lhs
> 3) Column-wise parallelization with wide rhs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-653) Performance features buffer pool

2016-04-28 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-653:

Description: 
1) Asynchronous cleanup of evicted files
2) Others (async file eviction, variable buffer pool size, etc)
Component/s: Runtime
 Issue Type: Task  (was: Bug)

> Performance features buffer pool 
> -
>
> Key: SYSTEMML-653
> URL: https://issues.apache.org/jira/browse/SYSTEMML-653
> Project: SystemML
>  Issue Type: Task
>  Components: Runtime
>Reporter: Matthias Boehm
>
> 1) Asynchronous cleanup of evicted files
> 2) Others (async file eviction, variable buffer pool size, etc)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (SYSTEMML-400) Multi-threaded reorg operations

2016-04-29 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm reassigned SYSTEMML-400:
---

Assignee: Matthias Boehm

> Multi-threaded reorg operations
> ---
>
> Key: SYSTEMML-400
> URL: https://issues.apache.org/jira/browse/SYSTEMML-400
> Project: SystemML
>  Issue Type: Task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-400) Multi-threaded reorg operations

2016-04-29 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-400:

Description: 1) Transpose operations (initially, dense-dense, sparse-dense) 
w/ parallelization over rows/cols according to input matrix shape. Despite 
significant serial fraction (for result allocation, which might be addressed by 
another change), an initial prototype showed already 2-3x improvements for 
sufficiently large matrices.

> Multi-threaded reorg operations
> ---
>
> Key: SYSTEMML-400
> URL: https://issues.apache.org/jira/browse/SYSTEMML-400
> Project: SystemML
>  Issue Type: Task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>
> 1) Transpose operations (initially, dense-dense, sparse-dense) w/ 
> parallelization over rows/cols according to input matrix shape. Despite 
> significant serial fraction (for result allocation, which might be addressed 
> by another change), an initial prototype showed already 2-3x improvements for 
> sufficiently large matrices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-656) Read-in boolean variable math treated as doubles rather than booleans

2016-05-07 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275446#comment-15275446
 ] 

Matthias Boehm commented on SYSTEMML-656:
-

it's the other way around: TRUE+TRUE == 2.0 is correct (in order to maintain 
consistency with R); we need to fix the language-level handling of TRUE+TRUE to 
proper arithmetic.  

> Read-in boolean variable math treated as doubles rather than booleans
> -
>
> Key: SYSTEMML-656
> URL: https://issues.apache.org/jira/browse/SYSTEMML-656
> Project: SystemML
>  Issue Type: Bug
>Reporter: Deron Eriksson
>
> If we have two boolean variables and add them normally, the addition is 
> treated as boolean algebra (true + true = true).
> {code}
> x = TRUE;
> y = TRUE;
> z = x + y;
> print(x);
> print(y);
> print(z);
> {code}
> produces
> {code}
> TRUE
> TRUE
> TRUE
> {code}
> However, if we read in boolean scalars using the read statement and add the 
> boolean variables, the math ends up giving a double result instead of a 
> boolean result:
> {code}
> x = read("./tmp/sc1", data_type="scalar", value_type="boolean");
> y = read("./tmp/sc2", data_type="scalar", value_type="boolean");
> z = x + y;
> print(x);
> print(y);
> print(z);
> {code}
> This produces (where ./tmp/sc1 contains "TRUE" and ./tmp/sc2 contains "TRUE"):
> {code}
> TRUE
> TRUE
> 2.0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (SYSTEMML-656) Read-in boolean variable math treated as doubles rather than booleans

2016-05-08 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm reassigned SYSTEMML-656:
---

Assignee: Matthias Boehm

> Read-in boolean variable math treated as doubles rather than booleans
> -
>
> Key: SYSTEMML-656
> URL: https://issues.apache.org/jira/browse/SYSTEMML-656
> Project: SystemML
>  Issue Type: Bug
>Reporter: Deron Eriksson
>Assignee: Matthias Boehm
> Fix For: SystemML 0.10
>
>
> If we have two boolean variables and add them normally, the addition is 
> treated as boolean algebra (true + true = true).
> {code}
> x = TRUE;
> y = TRUE;
> z = x + y;
> print(x);
> print(y);
> print(z);
> {code}
> produces
> {code}
> TRUE
> TRUE
> TRUE
> {code}
> However, if we read in boolean scalars using the read statement and add the 
> boolean variables, the math ends up giving a double result instead of a 
> boolean result:
> {code}
> x = read("./tmp/sc1", data_type="scalar", value_type="boolean");
> y = read("./tmp/sc2", data_type="scalar", value_type="boolean");
> z = x + y;
> print(x);
> print(y);
> print(z);
> {code}
> This produces (where ./tmp/sc1 contains "TRUE" and ./tmp/sc2 contains "TRUE"):
> {code}
> TRUE
> TRUE
> 2.0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-656) Read-in boolean variable math treated as doubles rather than booleans

2016-05-08 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-656.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.10

> Read-in boolean variable math treated as doubles rather than booleans
> -
>
> Key: SYSTEMML-656
> URL: https://issues.apache.org/jira/browse/SYSTEMML-656
> Project: SystemML
>  Issue Type: Bug
>Reporter: Deron Eriksson
>Assignee: Matthias Boehm
> Fix For: SystemML 0.10
>
>
> If we have two boolean variables and add them normally, the addition is 
> treated as boolean algebra (true + true = true).
> {code}
> x = TRUE;
> y = TRUE;
> z = x + y;
> print(x);
> print(y);
> print(z);
> {code}
> produces
> {code}
> TRUE
> TRUE
> TRUE
> {code}
> However, if we read in boolean scalars using the read statement and add the 
> boolean variables, the math ends up giving a double result instead of a 
> boolean result:
> {code}
> x = read("./tmp/sc1", data_type="scalar", value_type="boolean");
> y = read("./tmp/sc2", data_type="scalar", value_type="boolean");
> z = x + y;
> print(x);
> print(y);
> print(z);
> {code}
> This produces (where ./tmp/sc1 contains "TRUE" and ./tmp/sc2 contains "TRUE"):
> {code}
> TRUE
> TRUE
> 2.0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-663) Remove redundancies in standalone tar.gz and zip artifacts

2016-05-09 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276747#comment-15276747
 ] 

Matthias Boehm commented on SYSTEMML-663:
-

removing these jars from the standalone assembly sound good to me.

> Remove redundancies in standalone tar.gz and zip artifacts
> --
>
> Key: SYSTEMML-663
> URL: https://issues.apache.org/jira/browse/SYSTEMML-663
> Project: SystemML
>  Issue Type: Task
>  Components: Build
>Reporter: Deron Eriksson
>
> The standalone tar.gz and zip artifacts (such as 
> systemml-0.10.0-incubating-SNAPSHOT-standalone.tar.gz and 
> systemml-0.10.0-incubating-SNAPSHOT-standalone.zip) contain redundancies in 
> their included projects.
> In the lib directory, we have the following redundant jars:
> 1) antlr4-annotations-4.3.jar
> 2) antlr4-runtime-4.3.jar
> 3) wink-json4j-1.4.jar
> These dependencies are also contained (as class files) in the SystemML jar 
> file within the standalone tar.gz and zip files (such as the 
> lib/systemml-0.10.0-incubating-SNAPSHOT.jar file) because they are set to 
> "compile" scope.
> So, either these jars should be removed from the assembly for these 
> standalone artifacts (src/assembly/standalone.xml), or in pom.xml these 
> dependencies should be set to "provided" scope. Also, the LICENSE/NOTICE 
> files (in src/assembly/standalone) should be updated appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-640) Parfor sample script fails w/ dimension mismatch

2016-05-09 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-640.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.10

> Parfor sample script fails w/ dimension mismatch
> 
>
> Key: SYSTEMML-640
> URL: https://issues.apache.org/jira/browse/SYSTEMML-640
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler, ParFor
>Affects Versions: SystemML 0.9
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.10
>
>
> The parfor util script sample.dml fails with dimension mismatch in special 
> cases, where the remote memory budget of map/reduce tasks is larger than the 
> driver memory budget and the permutation matrix multiplication would be 
> compiled to MR in local parfor but CP in remote parfor execution. 
> In these cases, we trigger a forced recompile to CP which internally tries to 
> reduce the overhead by recompiling only dags where the runtime plan contains 
> MR instructions. This selective recompilation in invalid with permutation 
> matrix multiplications that stretch two subsequent dags and the first dag 
> does not necessarily contain MR instructions. 
> Since meanwhile, the overhead of recompiling average dags (50-100 operators) 
> is less than 1ms, we should always recompile the entire parfor body program 
> in these cases. 
> As a related note: Since we now support removeEmpty with selection vectors, 
> we should rewrite these permutation matrix multiplications to remove empty w/ 
> selection which is equivalent from a runtime perspective but would simplify 
> debugging in comparison to the current multi-dag rewrite. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (SYSTEMML-640) Parfor sample script fails w/ dimension mismatch

2016-05-09 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm reassigned SYSTEMML-640:
---

Assignee: Matthias Boehm

> Parfor sample script fails w/ dimension mismatch
> 
>
> Key: SYSTEMML-640
> URL: https://issues.apache.org/jira/browse/SYSTEMML-640
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler, ParFor
>Affects Versions: SystemML 0.9
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.10
>
>
> The parfor util script sample.dml fails with dimension mismatch in special 
> cases, where the remote memory budget of map/reduce tasks is larger than the 
> driver memory budget and the permutation matrix multiplication would be 
> compiled to MR in local parfor but CP in remote parfor execution. 
> In these cases, we trigger a forced recompile to CP which internally tries to 
> reduce the overhead by recompiling only dags where the runtime plan contains 
> MR instructions. This selective recompilation in invalid with permutation 
> matrix multiplications that stretch two subsequent dags and the first dag 
> does not necessarily contain MR instructions. 
> Since meanwhile, the overhead of recompiling average dags (50-100 operators) 
> is less than 1ms, we should always recompile the entire parfor body program 
> in these cases. 
> As a related note: Since we now support removeEmpty with selection vectors, 
> we should rewrite these permutation matrix multiplications to remove empty w/ 
> selection which is equivalent from a runtime perspective but would simplify 
> debugging in comparison to the current multi-dag rewrite. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-608) StepLinregDS algorithm output file issues

2016-05-09 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-608.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.10

> StepLinregDS algorithm output file issues
> -
>
> Key: SYSTEMML-608
> URL: https://issues.apache.org/jira/browse/SYSTEMML-608
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms
>Reporter: Matthias Boehm
> Fix For: SystemML 0.10
>
>
> The StepLinregDS has a couple of script level issues: (1) currently we write 
> the model (if requested) for each individual call of linear regression (which 
> is unnecessary and potentially leads to concurrent write conflicts), and (2) 
> the output format parameter is not properly configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-644) l2svm hang whan handle small data

2016-05-09 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-644.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.10

> l2svm hang whan handle small data
> -
>
> Key: SYSTEMML-644
> URL: https://issues.apache.org/jira/browse/SYSTEMML-644
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms
>Affects Versions: SystemML 0.9, SystemML 0.10
> Environment: spark, hadoop, standalone.
>Reporter: Tommy Yu
> Fix For: SystemML 0.10
>
>
> l2svm hang when process below data.
> X:
> 1.0
> 0.0
> Y:
> 1.0
> 2.0
> With script:
> hadoop jar SystemML.jar -f scripts/algorithms/l2-svm.dml -nvargs X
> =../data/l2svm/X Y=../data/l2svm/Y icpt=0 tol=0.001 reg=1 maxiter=100 
> model=../data/l2svm/w Log=../data/l2svm/Log fmt="text"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SYSTEMML-675) Negative increment support on for/parfor loops

2016-05-09 Thread Matthias Boehm (JIRA)

Matthias Boehm created SYSTEMML-675:
---

 Summary: Negative increment support on for/parfor loops
 Key: SYSTEMML-675
 URL: https://issues.apache.org/jira/browse/SYSTEMML-675
 Project: SystemML
  Issue Type: Bug
  Components: Compiler, Runtime
Reporter: Matthias Boehm


Currently, for and parfor loops do not support negative increments. However, 
unspecified negative increments (e.g., 7:1) are treated as positive increments 
which results in surprising behavior (unexpected because not consistent seq).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-675) Negative increment support on for/parfor loops

2016-05-09 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-675:

Description: Currently, for and parfor loops do not support negative 
increments. However, unspecified negative increments (e.g., 7:1) are treated as 
positive increments which results in surprising behavior (unexpected because 
not consistent seq). This tasks covers two improvements: In a first step, we 
need to improve error handling of for and parfor loops. In a second step, we 
should also add support for negative increments by consolidating the 
functionality with the seq builtin function.   (was: Currently, for and parfor 
loops do not support negative increments. However, unspecified negative 
increments (e.g., 7:1) are treated as positive increments which results in 
surprising behavior (unexpected because not consistent seq).)

> Negative increment support on for/parfor loops
> --
>
> Key: SYSTEMML-675
> URL: https://issues.apache.org/jira/browse/SYSTEMML-675
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>
> Currently, for and parfor loops do not support negative increments. However, 
> unspecified negative increments (e.g., 7:1) are treated as positive 
> increments which results in surprising behavior (unexpected because not 
> consistent seq). This tasks covers two improvements: In a first step, we need 
> to improve error handling of for and parfor loops. In a second step, we 
> should also add support for negative increments by consolidating the 
> functionality with the seq builtin function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SYSTEMML-677) Random data generator for decision tree fails w/ data type mismatch

2016-05-09 Thread Matthias Boehm (JIRA)

Matthias Boehm created SYSTEMML-677:
---

 Summary: Random data generator for decision tree fails w/ data 
type mismatch 
 Key: SYSTEMML-677
 URL: https://issues.apache.org/jira/browse/SYSTEMML-677
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm
 Fix For: SystemML 0.9


The data generator for decision tree is composed of a shell script that calls 
two dml scripts in order to apply the file-based transform (which requires an 
existing file during compilation) in the second script. However, there is a 
data type mismatch as the first script outputs a matrix and the second script 
expects a frame.

This task covers (1) a script level change to output a frame from the first 
script, and (2) a fix for writing the frame meta data file with a value type 
accepted by the subsequent transform. 

Note that the script level change already exploits matrix-frame casting which 
has been introduced as part of SYSTEMML-554 but this builtin function is as of 
today only in CP. This means, the data generator only works for small data that 
fits into the driver memory. Once the Spark/MR converters from SYSTEMML- are 
fully integrated, the script will runs for large data too without further 
script changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-677) Random data generator for decision tree fails w/ data type mismatch

2016-05-09 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-677:

Description: 
The data generator for decision tree is composed of a shell script that calls 
two dml scripts in order to apply the file-based transform (which requires an 
existing file during compilation) in the second script. However, there is a 
data type mismatch as the first script outputs a matrix and the second script 
expects a frame.

This task covers (1) a script level change to output a frame from the first 
script, and (2) a fix for writing the frame meta data file with a value type 
accepted by the subsequent transform. 

Note that the script level change already exploits matrix-frame casting which 
has been introduced as part of SYSTEMML-554 but this builtin function is as of 
today only in CP. This means, the data generator only works for small data that 
fits into the driver memory. Once the Spark/MR converters from SYSTEMML-560 are 
fully integrated, the script will runs for large data too without further 
script changes.

  was:
The data generator for decision tree is composed of a shell script that calls 
two dml scripts in order to apply the file-based transform (which requires an 
existing file during compilation) in the second script. However, there is a 
data type mismatch as the first script outputs a matrix and the second script 
expects a frame.

This task covers (1) a script level change to output a frame from the first 
script, and (2) a fix for writing the frame meta data file with a value type 
accepted by the subsequent transform. 

Note that the script level change already exploits matrix-frame casting which 
has been introduced as part of SYSTEMML-554 but this builtin function is as of 
today only in CP. This means, the data generator only works for small data that 
fits into the driver memory. Once the Spark/MR converters from SYSTEMML- are 
fully integrated, the script will runs for large data too without further 
script changes.


> Random data generator for decision tree fails w/ data type mismatch 
> 
>
> Key: SYSTEMML-677
> URL: https://issues.apache.org/jira/browse/SYSTEMML-677
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
> Fix For: SystemML 0.9
>
>
> The data generator for decision tree is composed of a shell script that calls 
> two dml scripts in order to apply the file-based transform (which requires an 
> existing file during compilation) in the second script. However, there is a 
> data type mismatch as the first script outputs a matrix and the second script 
> expects a frame.
> This task covers (1) a script level change to output a frame from the first 
> script, and (2) a fix for writing the frame meta data file with a value type 
> accepted by the subsequent transform. 
> Note that the script level change already exploits matrix-frame casting which 
> has been introduced as part of SYSTEMML-554 but this builtin function is as 
> of today only in CP. This means, the data generator only works for small data 
> that fits into the driver memory. Once the Spark/MR converters from 
> SYSTEMML-560 are fully integrated, the script will runs for large data too 
> without further script changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (SYSTEMML-677) Random data generator for decision tree fails w/ data type mismatch

2016-05-09 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm reassigned SYSTEMML-677:
---

Assignee: Matthias Boehm

> Random data generator for decision tree fails w/ data type mismatch 
> 
>
> Key: SYSTEMML-677
> URL: https://issues.apache.org/jira/browse/SYSTEMML-677
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.9
>
>
> The data generator for decision tree is composed of a shell script that calls 
> two dml scripts in order to apply the file-based transform (which requires an 
> existing file during compilation) in the second script. However, there is a 
> data type mismatch as the first script outputs a matrix and the second script 
> expects a frame.
> This task covers (1) a script level change to output a frame from the first 
> script, and (2) a fix for writing the frame meta data file with a value type 
> accepted by the subsequent transform. 
> Note that the script level change already exploits matrix-frame casting which 
> has been introduced as part of SYSTEMML-554 but this builtin function is as 
> of today only in CP. This means, the data generator only works for small data 
> that fits into the driver memory. Once the Spark/MR converters from 
> SYSTEMML-560 are fully integrated, the script will runs for large data too 
> without further script changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-677) Random data generator for decision tree fails w/ data type mismatch

2016-05-09 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-677:

Affects Version/s: SystemML 0.9
Fix Version/s: (was: SystemML 0.9)

> Random data generator for decision tree fails w/ data type mismatch 
> 
>
> Key: SYSTEMML-677
> URL: https://issues.apache.org/jira/browse/SYSTEMML-677
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.9
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>
> The data generator for decision tree is composed of a shell script that calls 
> two dml scripts in order to apply the file-based transform (which requires an 
> existing file during compilation) in the second script. However, there is a 
> data type mismatch as the first script outputs a matrix and the second script 
> expects a frame.
> This task covers (1) a script level change to output a frame from the first 
> script, and (2) a fix for writing the frame meta data file with a value type 
> accepted by the subsequent transform. 
> Note that the script level change already exploits matrix-frame casting which 
> has been introduced as part of SYSTEMML-554 but this builtin function is as 
> of today only in CP. This means, the data generator only works for small data 
> that fits into the driver memory. Once the Spark/MR converters from 
> SYSTEMML-560 are fully integrated, the script will runs for large data too 
> without further script changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-677) Random data generator for decision tree fails w/ data type mismatch

2016-05-09 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-677:

Description: 
The data generator for decision tree is composed of a shell script that calls 
two dml scripts in order to apply the file-based transform (which requires an 
existing file during compilation) in the second script. However, there is a 
data type mismatch as the first script outputs a matrix and the second script 
expects a frame.

This task covers (1) a script level change to output a frame from the first 
script, and (2) a fix for writing the frame meta data file with a value type 
accepted by the subsequent transform. 

Note that the script level change already exploits matrix-frame casting which 
has been introduced as part of SYSTEMML-554 but this builtin function is as of 
today only supported in CP. This means, the data generator only works for small 
data that fits into the driver memory. Once the Spark/MR converters from 
SYSTEMML-560 are fully integrated, the script will runs for large data too 
without further script changes.

  was:
The data generator for decision tree is composed of a shell script that calls 
two dml scripts in order to apply the file-based transform (which requires an 
existing file during compilation) in the second script. However, there is a 
data type mismatch as the first script outputs a matrix and the second script 
expects a frame.

This task covers (1) a script level change to output a frame from the first 
script, and (2) a fix for writing the frame meta data file with a value type 
accepted by the subsequent transform. 

Note that the script level change already exploits matrix-frame casting which 
has been introduced as part of SYSTEMML-554 but this builtin function is as of 
today only in CP. This means, the data generator only works for small data that 
fits into the driver memory. Once the Spark/MR converters from SYSTEMML-560 are 
fully integrated, the script will runs for large data too without further 
script changes.


> Random data generator for decision tree fails w/ data type mismatch 
> 
>
> Key: SYSTEMML-677
> URL: https://issues.apache.org/jira/browse/SYSTEMML-677
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.9
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>
> The data generator for decision tree is composed of a shell script that calls 
> two dml scripts in order to apply the file-based transform (which requires an 
> existing file during compilation) in the second script. However, there is a 
> data type mismatch as the first script outputs a matrix and the second script 
> expects a frame.
> This task covers (1) a script level change to output a frame from the first 
> script, and (2) a fix for writing the frame meta data file with a value type 
> accepted by the subsequent transform. 
> Note that the script level change already exploits matrix-frame casting which 
> has been introduced as part of SYSTEMML-554 but this builtin function is as 
> of today only supported in CP. This means, the data generator only works for 
> small data that fits into the driver memory. Once the Spark/MR converters 
> from SYSTEMML-560 are fully integrated, the script will runs for large data 
> too without further script changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-677) Random data generator for decision tree fails w/ data type mismatch

2016-05-09 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277496#comment-15277496
 ] 

Matthias Boehm commented on SYSTEMML-677:
-

cc [~acs_s]

> Random data generator for decision tree fails w/ data type mismatch 
> 
>
> Key: SYSTEMML-677
> URL: https://issues.apache.org/jira/browse/SYSTEMML-677
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.9
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>
> The data generator for decision tree is composed of a shell script that calls 
> two dml scripts in order to apply the file-based transform (which requires an 
> existing file during compilation) in the second script. However, there is a 
> data type mismatch as the first script outputs a matrix and the second script 
> expects a frame.
> This task covers (1) a script level change to output a frame from the first 
> script, and (2) a fix for writing the frame meta data file with a value type 
> accepted by the subsequent transform. 
> Note that the script level change already exploits matrix-frame casting which 
> has been introduced as part of SYSTEMML-554 but this builtin function is as 
> of today only supported in CP. This means, the data generator only works for 
> small data that fits into the driver memory. Once the Spark/MR converters 
> from SYSTEMML-560 are fully integrated, the script will runs for large data 
> too without further script changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-677) Random data generator for decision tree fails w/ data type mismatch

2016-05-10 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-677.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.10

> Random data generator for decision tree fails w/ data type mismatch 
> 
>
> Key: SYSTEMML-677
> URL: https://issues.apache.org/jira/browse/SYSTEMML-677
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.9
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.10
>
>
> The data generator for decision tree is composed of a shell script that calls 
> two dml scripts in order to apply the file-based transform (which requires an 
> existing file during compilation) in the second script. However, there is a 
> data type mismatch as the first script outputs a matrix and the second script 
> expects a frame.
> This task covers (1) a script level change to output a frame from the first 
> script, and (2) a fix for writing the frame meta data file with a value type 
> accepted by the subsequent transform. 
> Note that the script level change already exploits matrix-frame casting which 
> has been introduced as part of SYSTEMML-554 but this builtin function is as 
> of today only supported in CP. This means, the data generator only works for 
> small data that fits into the driver memory. Once the Spark/MR converters 
> from SYSTEMML-560 are fully integrated, the script will runs for large data 
> too without further script changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-675) Negative increment support on for/parfor loops

2016-05-10 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-675.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.10

> Negative increment support on for/parfor loops
> --
>
> Key: SYSTEMML-675
> URL: https://issues.apache.org/jira/browse/SYSTEMML-675
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
> Fix For: SystemML 0.10
>
>
> Currently, for and parfor loops do not support negative increments. However, 
> unspecified negative increments (e.g., 7:1) are treated as positive 
> increments which results in surprising behavior (unexpected because not 
> consistent seq). This tasks covers two improvements: In a first step, we need 
> to improve error handling of for and parfor loops. In a second step, we 
> should also add support for negative increments by consolidating the 
> functionality with the seq builtin function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SYSTEMML-681) Reverse single-element sequence fails w/ compiler/runtime issues

2016-05-10 Thread Matthias Boehm (JIRA)

Matthias Boehm created SYSTEMML-681:
---

 Summary: Reverse single-element sequence fails w/ compiler/runtime 
issues
 Key: SYSTEMML-681
 URL: https://issues.apache.org/jira/browse/SYSTEMML-681
 Project: SystemML
  Issue Type: Bug
  Components: Compiler, Runtime
Reporter: Matthias Boehm


We support sequences with both positive and negative increments. However, the 
special case of a single-element sequence only works for positive increments 
(e.g., seq(1,1,1)) but fails with negative increment (e.g., seq(1,1,-1)) 
although both should return equivalent results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-678) MLContext parallelization

2016-05-10 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279418#comment-15279418
 ] 

Matthias Boehm commented on SYSTEMML-678:
-

thanks for the question [~johannes.tud]. 

In general, systemml provides for every operation that involves matrices (with 
very few exceptions) both single-node in-memory (CP) and data-parallel 
distributed operators (Spark/MR). If the operation (with pinned inputs/outputs) 
fits into the driver memory budget (70% of driver heap size), we execute this 
operation in single-node CP (depending on the operation, 
multi-threaded/single-threaded); otherwise we compile depending on data/cluster 
characteristics distributed operations. For Spark, operator selection is 
slightly different as we also transitively pull certain operations into 
distributed pipelines if inputs are already distributed. Task-parallel 
computation (with parfor assertion) complements these data-parallel operations, 
and can be arbitrarily combined (e.g., multi-threaded single-node execution, 
concurrent data parallel jobs, distributed task-parallel computation). However, 
except some very specific loop vectorization rewrites, we do not yet 
automatically identify subprograms other than parfor to execute in a 
task-parallel manner. Extended automatic vectorization is certainly an 
interesting direction and we welcome any contributions here. 

Now back to the actual script at hand. Even with parfor, SystemML is currently 
not able to run this loop in a task-parallel manner because there are 
loop-carried dependencies over 'sum'. By specifying the parfor parameter 
'check=0' you disable dependency analysis and it runs but would produce 
undefined results. There are often ways to express slightly differently to 
workaround current shortcomings of the compiler. Feel free to post the problem 
at our dev list: d...@systemml.incubator.apache.org. 

> MLContext parallelization
> -
>
> Key: SYSTEMML-678
> URL: https://issues.apache.org/jira/browse/SYSTEMML-678
> Project: SystemML
>  Issue Type: Question
>  Components: Algorithms, Parser, Runtime
>Affects Versions: SystemML 0.10
>Reporter: Johannes Wilke
>
> I try to execute script in the MLContext. It is executing, but it dont 
> parallel. For smaller scripts, it works fine. But this script doesnt and it 
> is not clear why. I think it is because of the 4 loop levels, but I am not 
> sure. 
> Is there a documentation what is parallizable and what isnt?
> If I change the main while-loop, i wish to parallize, to a parfor loop it 
> works.
> Here is the script:
> X = read($Xin)
> P = read($Pin)
> #errorMatrix = matrix(0.0,rows=1,cols=1)
> j = 1
> sum = 0
> while (j <=nrow(X) & sum >= 0){ # this should be parallelized 
> #parfor(j in 1: nrow(X),check=0){
>   first = TRUE
>   windows = matrix(0,rows=1,cols=1)
>   offsetPreWindowDefinitions = 0
>   sumWindowLength = 0
>   mastercount = 0
>   totalwindowLength = 0
>   s = 0
>   for(i in 1: nrow(P)){
>   if((as.scalar(P[i,1])*as.scalar(P[i,2]))>totalwindowLength){
>   totalwindowLength = 
> (as.scalar(P[i,1])*as.scalar(P[i,2]))
>   }
>   s = s+1
>   }
>   lastWindow = matrix(0,rows=sum(P[,1]),cols=1)
>   
>   for(i in 1:nrow(P)){# for every Window-Definition
>   
>   for(k in 1: as.integer(as.scalar(P[i,1]))){# for every pnum
>   column = 
> matrix(0,rows=as.integer(as.scalar(P[1,4])),cols=1)
>   for(l in 1: nrow(column)+1){
>   offsetPreWindowDefinitions = totalwindowLength 
> - (as.scalar(P[i,1])*as.scalar(P[i,2]))
>   tsindex = ((k-1) * as.scalar(P[i,2])) + l-1 + 
> offsetPreWindowDefinitions
>   if(l==nrow(column)+1){
>   lastWindow[sumWindowLength+k,1] = 
> X[j,tsindex+1]
>   } else {
>   
>   column[l,1] = X[j,tsindex+1]
>   }
>   mastercount = mastercount +1
>   #print(mastercount)
>   }
>   if(first){
>   first = FALSE;
>   windows = column
>   } else {
>   windows = cbind(windows,column)
>   }
>   }
>   
>   sumWindowLength = sumWindowLength + as.scalar(P[i,1])
>   }
>   
>   
>   result = matrix(14.3,rows=as.integer(as.scalar(P[1,4])),cols=1)
>   for(i in 
> totalwindowLength:as.integer(as.scalar(P[1,4]))+totalwin

[jira] [Commented] (SYSTEMML-680) eigen() fails with "Unsupported function EIGEN" but diag() works

2016-05-10 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279432#comment-15279432
 ] 

Matthias Boehm commented on SYSTEMML-680:
-

thanks for reporting this issue [~gvele...@gmail.com] - could you please 
provide some more details such as the full stacktrace and maybe data sizes? 

In general, SystemML is supposed to work out-of-the-box on any platform. 
However, eigen is indeed a special operation (similar to qr, lu, cholesky and 
solve) which is currently only supported for singlenode, in-memory operations 
as we only call out to commons-math. A common issue is that we expect commons 
math in the classpath (which is true for hadoop 2.x) but on hadoop 1.x is might 
fail with classnotfound exceptions - the workaround would be to change the 
SystemML pom.xml as decribed here: 
https://apache.github.io/incubator-systemml/troubleshooting-guide.html#classnotfoundexception-for-commons-math3.

> eigen() fails with "Unsupported function EIGEN" but diag() works
> 
>
> Key: SYSTEMML-680
> URL: https://issues.apache.org/jira/browse/SYSTEMML-680
> Project: SystemML
>  Issue Type: Bug
>  Components: APIs
>Affects Versions: SystemML 0.9
> Environment: Linux ip-172-20-42-170 3.13.0-61-generic #100-Ubuntu SMP 
> Wed Jul 29 11:21:34 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
> spark-core_2.10-1.3.0.jar
>Reporter: Golda Velez
>Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Could be some simple error, since I'm new to SystemML
> I'm running a tiny DML script:
> X = read($Xin)
> ee = eigen(X)
> via some things to set up the matrix in scala, ending with
> val sysMlMatrix = RDDConverterUtils.dataFrameToBinaryBlock(sc, df, mc, false)
> val ml = new MLContext(sc)
> ml.reset()
> ml.registerInput("X", sysMlMatrix, numRows, numCols)
> ml.registerOutput("e")
> val nargs = Map("Xin" -> " ", "Eout" -> " ")
> val outputs = ml.execute("dum.dml", nargs)
> I could certainly be doing something wrong, but it does run if I replace 
> eigen() with diag() and both are listed similarly in the guide 
> http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html
> Is eigen() supported currently and does it require some installation of some 
> library?  I didn't see anything about that in the docs.
> Thanks, this looks super useful!
> This might just be a documentation bug, not a code bug, but I'm not sure how 
> else to contact people about it and get it resolved.  Are there forums?
> --Golda



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (SYSTEMML-680) eigen() fails with "Unsupported function EIGEN" but diag() works

2016-05-10 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279432#comment-15279432
 ] 

Matthias Boehm edited comment on SYSTEMML-680 at 5/11/16 3:21 AM:
--

thanks for reporting this issue [~gvele...@gmail.com] - could you please 
provide some more details such as the full stacktrace and maybe data sizes? 

In general, SystemML is supposed to work out-of-the-box on any platform. 
However, eigen is indeed a special operation (similar to qr, lu, cholesky and 
solve) which is currently only supported for singlenode, in-memory operations 
as we only call out to commons-math. A common issue is that we expect commons 
math in the classpath (which is true for hadoop 2.x) but, for example, on 
hadoop 1.x it might fail with classnotfound exceptions - the workaround would 
be to change the SystemML pom.xml as decribed here: 
https://apache.github.io/incubator-systemml/troubleshooting-guide.html#classnotfoundexception-for-commons-math3.


was (Author: mboehm7):
thanks for reporting this issue [~gvele...@gmail.com] - could you please 
provide some more details such as the full stacktrace and maybe data sizes? 

In general, SystemML is supposed to work out-of-the-box on any platform. 
However, eigen is indeed a special operation (similar to qr, lu, cholesky and 
solve) which is currently only supported for singlenode, in-memory operations 
as we only call out to commons-math. A common issue is that we expect commons 
math in the classpath (which is true for hadoop 2.x) but on hadoop 1.x is might 
fail with classnotfound exceptions - the workaround would be to change the 
SystemML pom.xml as decribed here: 
https://apache.github.io/incubator-systemml/troubleshooting-guide.html#classnotfoundexception-for-commons-math3.

> eigen() fails with "Unsupported function EIGEN" but diag() works
> 
>
> Key: SYSTEMML-680
> URL: https://issues.apache.org/jira/browse/SYSTEMML-680
> Project: SystemML
>  Issue Type: Bug
>  Components: APIs
>Affects Versions: SystemML 0.9
> Environment: Linux ip-172-20-42-170 3.13.0-61-generic #100-Ubuntu SMP 
> Wed Jul 29 11:21:34 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
> spark-core_2.10-1.3.0.jar
>Reporter: Golda Velez
>Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Could be some simple error, since I'm new to SystemML
> I'm running a tiny DML script:
> X = read($Xin)
> ee = eigen(X)
> via some things to set up the matrix in scala, ending with
> val sysMlMatrix = RDDConverterUtils.dataFrameToBinaryBlock(sc, df, mc, false)
> val ml = new MLContext(sc)
> ml.reset()
> ml.registerInput("X", sysMlMatrix, numRows, numCols)
> ml.registerOutput("e")
> val nargs = Map("Xin" -> " ", "Eout" -> " ")
> val outputs = ml.execute("dum.dml", nargs)
> I could certainly be doing something wrong, but it does run if I replace 
> eigen() with diag() and both are listed similarly in the guide 
> http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html
> Is eigen() supported currently and does it require some installation of some 
> library?  I didn't see anything about that in the docs.
> Thanks, this looks super useful!
> This might just be a documentation bug, not a code bug, but I'm not sure how 
> else to contact people about it and get it resolved.  Are there forums?
> --Golda



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (SYSTEMML-680) eigen() fails with "Unsupported function EIGEN" but diag() works

2016-05-10 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279432#comment-15279432
 ] 

Matthias Boehm edited comment on SYSTEMML-680 at 5/11/16 3:24 AM:
--

thanks for reporting this issue [~gvele...@gmail.com] - could you please 
provide some more details such as the full stacktrace and maybe data sizes? 

In general, SystemML is supposed to work out-of-the-box on any platform. 
However, eigen is indeed a special operation (similar to qr, lu, cholesky and 
solve) which is currently only supported for singlenode, in-memory operations 
as we only call out to commons-math. A common issue is that we expect commons 
math in the classpath (which is true for hadoop 2.x) but, for example, on 
hadoop 1.x it might fail with classnotfound exceptions - the workaround would 
be to change the SystemML pom.xml as decribed here: 
https://apache.github.io/incubator-systemml/troubleshooting-guide.html#classnotfoundexception-for-commons-math3.

Currently, we use our dev mailing list (d...@systemml.incubator.apache.org) for 
both user and dev questions - so please feel free to post any further questions 
there to make sure it's not overlooked. Thanks.


was (Author: mboehm7):
thanks for reporting this issue [~gvele...@gmail.com] - could you please 
provide some more details such as the full stacktrace and maybe data sizes? 

In general, SystemML is supposed to work out-of-the-box on any platform. 
However, eigen is indeed a special operation (similar to qr, lu, cholesky and 
solve) which is currently only supported for singlenode, in-memory operations 
as we only call out to commons-math. A common issue is that we expect commons 
math in the classpath (which is true for hadoop 2.x) but, for example, on 
hadoop 1.x it might fail with classnotfound exceptions - the workaround would 
be to change the SystemML pom.xml as decribed here: 
https://apache.github.io/incubator-systemml/troubleshooting-guide.html#classnotfoundexception-for-commons-math3.

> eigen() fails with "Unsupported function EIGEN" but diag() works
> 
>
> Key: SYSTEMML-680
> URL: https://issues.apache.org/jira/browse/SYSTEMML-680
> Project: SystemML
>  Issue Type: Bug
>  Components: APIs
>Affects Versions: SystemML 0.9
> Environment: Linux ip-172-20-42-170 3.13.0-61-generic #100-Ubuntu SMP 
> Wed Jul 29 11:21:34 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
> spark-core_2.10-1.3.0.jar
>Reporter: Golda Velez
>Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Could be some simple error, since I'm new to SystemML
> I'm running a tiny DML script:
> X = read($Xin)
> ee = eigen(X)
> via some things to set up the matrix in scala, ending with
> val sysMlMatrix = RDDConverterUtils.dataFrameToBinaryBlock(sc, df, mc, false)
> val ml = new MLContext(sc)
> ml.reset()
> ml.registerInput("X", sysMlMatrix, numRows, numCols)
> ml.registerOutput("e")
> val nargs = Map("Xin" -> " ", "Eout" -> " ")
> val outputs = ml.execute("dum.dml", nargs)
> I could certainly be doing something wrong, but it does run if I replace 
> eigen() with diag() and both are listed similarly in the guide 
> http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html
> Is eigen() supported currently and does it require some installation of some 
> library?  I didn't see anything about that in the docs.
> Thanks, this looks super useful!
> This might just be a documentation bug, not a code bug, but I'm not sure how 
> else to contact people about it and get it resolved.  Are there forums?
> --Golda



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-680) eigen() fails with "Unsupported function EIGEN" but diag() works

2016-05-11 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280521#comment-15280521
 ] 

Matthias Boehm commented on SYSTEMML-680:
-

no problem, but just to be clear, the statement above only applies to eigen, 
qr, lu, cholesky and solve - all other operations are automatically compiled to 
distributed operations on MR/Spark.

> eigen() fails with "Unsupported function EIGEN" but diag() works
> 
>
> Key: SYSTEMML-680
> URL: https://issues.apache.org/jira/browse/SYSTEMML-680
> Project: SystemML
>  Issue Type: Bug
>  Components: APIs
>Affects Versions: SystemML 0.9
> Environment: Linux ip-172-20-42-170 3.13.0-61-generic #100-Ubuntu SMP 
> Wed Jul 29 11:21:34 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
> spark-core_2.10-1.3.0.jar
>Reporter: Golda Velez
>Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Could be some simple error, since I'm new to SystemML
> I'm running a tiny DML script:
> X = read($Xin)
> ee = eigen(X)
> via some things to set up the matrix in scala, ending with
> val sysMlMatrix = RDDConverterUtils.dataFrameToBinaryBlock(sc, df, mc, false)
> val ml = new MLContext(sc)
> ml.reset()
> ml.registerInput("X", sysMlMatrix, numRows, numCols)
> ml.registerOutput("e")
> val nargs = Map("Xin" -> " ", "Eout" -> " ")
> val outputs = ml.execute("dum.dml", nargs)
> I could certainly be doing something wrong, but it does run if I replace 
> eigen() with diag() and both are listed similarly in the guide 
> http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html
> Is eigen() supported currently and does it require some installation of some 
> library?  I didn't see anything about that in the docs.
> Thanks, this looks super useful!
> This might just be a documentation bug, not a code bug, but I'm not sure how 
> else to contact people about it and get it resolved.  Are there forums?
> --Golda



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SYSTEMML-694) Misc performance features

2016-05-13 Thread Matthias Boehm (JIRA)

Matthias Boehm created SYSTEMML-694:
---

 Summary: Misc performance features
 Key: SYSTEMML-694
 URL: https://issues.apache.org/jira/browse/SYSTEMML-694
 Project: SystemML
  Issue Type: Task
  Components: Compiler, Runtime
Reporter: Matthias Boehm
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-694) Misc performance features

2016-05-13 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-694:

Description: 
1) Improved constant folding (all unary operations other than print)
2) Multi-threaded rand min par thesholds

> Misc performance features
> -
>
> Key: SYSTEMML-694
> URL: https://issues.apache.org/jira/browse/SYSTEMML-694
> Project: SystemML
>  Issue Type: Task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Priority: Minor
>
> 1) Improved constant folding (all unary operations other than print)
> 2) Multi-threaded rand min par thesholds



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SYSTEMML-695) Incorrect rand normal w/ fused scalar operation

2016-05-13 Thread Matthias Boehm (JIRA)

Matthias Boehm created SYSTEMML-695:
---

 Summary: Incorrect rand normal w/ fused scalar operation
 Key: SYSTEMML-695
 URL: https://issues.apache.org/jira/browse/SYSTEMML-695
 Project: SystemML
  Issue Type: Bug
  Components: Compiler
Reporter: Matthias Boehm
Assignee: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-695) Incorrect rand normal w/ fused scalar operation

2016-05-13 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283398#comment-15283398
 ] 

Matthias Boehm commented on SYSTEMML-695:
-

thanks for catching this [~niketanpansare] - it was related to incorrect 'fuse 
datagen' rewrites which were unaware of the used pdf. The workaround is to 
disable rewrites, but I'll push the fix tomorrow.

> Incorrect rand normal w/ fused scalar operation
> ---
>
> Key: SYSTEMML-695
> URL: https://issues.apache.org/jira/browse/SYSTEMML-695
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-512) DML Script With UDFs Results In Out Of Memory Error As Compared to Without UDFs

2016-05-13 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283430#comment-15283430
 ] 

Matthias Boehm commented on SYSTEMML-512:
-

After a detailed look, it turned out that the 'as.integer' calls prevented the 
constant propagation of rank at parser level (into the function call). So far 
we only propagate scalar literals  into functions. However, I will deliver 
tomorrow an extended IPA (SYSTEMML-427) that also propagates scalar variables 
into functions that are called once. Thanks [~mwdus...@us.ibm.com] for making a 
case for that.

> DML Script With UDFs Results In Out Of Memory Error As Compared to Without 
> UDFs
> ---
>
> Key: SYSTEMML-512
> URL: https://issues.apache.org/jira/browse/SYSTEMML-512
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
> Attachments: test1.scala, test2.scala
>
>
> Currently, the following script for running a simple version of Poisson 
> non-negative matrix factorization (PNMF) runs in linear time as desired:
> {code}
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF
> n = nrow(V)
> m = ncol(V)
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> # compute negative log-likelihood
> negloglik_temp = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> However, a small refactoring of this same script to pull the core PNMF 
> algorithm and the negative log-likelihood computation out into separate UDFs 
> results in non-linear runtime and a Java out of memory heap error on the same 
> dataset.  
> {code}
> pnmf = function(matrix[double] V, integer max_iteration, integer rank) return 
> (matrix[double] W, matrix[double] H) {
> n = nrow(V)
> m = ncol(V)
> 
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> 
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> }
> negloglikfunc = function(matrix[double] V, matrix[double] W, matrix[double] 
> H) return (double negloglik) {
> negloglik = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> }
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF and evaluate
> [W, H] = pnmf(V, max_iteration, rank)
> negloglik_temp = negloglikfunc(V, W, H)
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> The expectation would be that such modularization at the DML level should be 
> allowed without any impact on performance.
> Details:
> - Data: Amazon product co-purchasing dataset from Stanford 
> [http://snap.stanford.edu/data/amazon0601.html | 
> http://snap.stanford.edu/data/amazon0601.html]
> - Execution mode: Spark {{MLContext}}, but should be applicable to 
> command-line invocation as well. 
> - Error message:
> {code}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:415)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.sparseToDense(MatrixBlock.java:1212)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.examSparsity(MatrixBlock.java:1103)
>   at 
> org.apache.sysml.runtime.instructions.cp.MatrixMatrixArithmeticCPInstruction.processInstruction(MatrixMatrixArithmeticCPInstruction.java:60)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:227)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:169)
>   at 
> org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:183)
>   at 
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:115)
>

[jira] [Updated] (SYSTEMML-427) Extended inter-procedure analysis (constant propagation)

2016-05-13 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-427:

Description: 1) Scalar propagation: So far we only propagate scalar 
literals into functions. Additionally, we should propagate also scalar 
variables (propagated as constants during IPA), which is safe if a function is 
called once.  (was: 1) Scalar propagation: So far we only propagate scalar 
literals into functions. Additionally, we should propagate also scalar 
variables (propagates as constants during IPA), which is safe if a function is 
called once.)

> Extended inter-procedure analysis (constant propagation)
> 
>
> Key: SYSTEMML-427
> URL: https://issues.apache.org/jira/browse/SYSTEMML-427
> Project: SystemML
>  Issue Type: Task
>  Components: Compiler
>Reporter: Matthias Boehm
>
> 1) Scalar propagation: So far we only propagate scalar literals into 
> functions. Additionally, we should propagate also scalar variables 
> (propagated as constants during IPA), which is safe if a function is called 
> once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-427) Extended inter-procedure analysis (constant propagation)

2016-05-13 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-427:

Description: 1) Scalar propagation: So far we only propagate scalar 
literals into functions. Additionally, we should propagate also scalar 
variables (propagates as constants during IPA), which is safe if a function is 
called once.

> Extended inter-procedure analysis (constant propagation)
> 
>
> Key: SYSTEMML-427
> URL: https://issues.apache.org/jira/browse/SYSTEMML-427
> Project: SystemML
>  Issue Type: Task
>  Components: Compiler
>Reporter: Matthias Boehm
>
> 1) Scalar propagation: So far we only propagate scalar literals into 
> functions. Additionally, we should propagate also scalar variables 
> (propagates as constants during IPA), which is safe if a function is called 
> once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-681) Reverse single-element sequence fails w/ compiler/runtime issues

2016-05-13 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-681.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.10

> Reverse single-element sequence fails w/ compiler/runtime issues
> 
>
> Key: SYSTEMML-681
> URL: https://issues.apache.org/jira/browse/SYSTEMML-681
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.10
>
>
> We support sequences with both positive and negative increments. However, the 
> special case of a single-element sequence only works for positive increments 
> (e.g., seq(1,1,1)) but fails with negative increment (e.g., seq(1,1,-1)) 
> although both should return equivalent results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (SYSTEMML-681) Reverse single-element sequence fails w/ compiler/runtime issues

2016-05-13 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm reassigned SYSTEMML-681:
---

Assignee: Matthias Boehm

> Reverse single-element sequence fails w/ compiler/runtime issues
> 
>
> Key: SYSTEMML-681
> URL: https://issues.apache.org/jira/browse/SYSTEMML-681
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.10
>
>
> We support sequences with both positive and negative increments. However, the 
> special case of a single-element sequence only works for positive increments 
> (e.g., seq(1,1,1)) but fails with negative increment (e.g., seq(1,1,-1)) 
> although both should return equivalent results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-680) eigen() fails with "Unsupported function EIGEN" but diag() works

2016-05-13 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283450#comment-15283450
 ] 

Matthias Boehm commented on SYSTEMML-680:
-

just in case anybody else runs into this issue: the error is due to 
{code}
ee = eigen(X)
{code}
as eigen returns both the the eigen values and eigen vectors and hence needs to 
be called as follows
{code}
[eval, evect] = eigen(X)
{code}

> eigen() fails with "Unsupported function EIGEN" but diag() works
> 
>
> Key: SYSTEMML-680
> URL: https://issues.apache.org/jira/browse/SYSTEMML-680
> Project: SystemML
>  Issue Type: Bug
>  Components: APIs
>Affects Versions: SystemML 0.9
> Environment: Linux ip-172-20-42-170 3.13.0-61-generic #100-Ubuntu SMP 
> Wed Jul 29 11:21:34 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
> spark-core_2.10-1.3.0.jar
>Reporter: Golda Velez
>Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Could be some simple error, since I'm new to SystemML
> I'm running a tiny DML script:
> X = read($Xin)
> ee = eigen(X)
> via some things to set up the matrix in scala, ending with
> val sysMlMatrix = RDDConverterUtils.dataFrameToBinaryBlock(sc, df, mc, false)
> val ml = new MLContext(sc)
> ml.reset()
> ml.registerInput("X", sysMlMatrix, numRows, numCols)
> ml.registerOutput("e")
> val nargs = Map("Xin" -> " ", "Eout" -> " ")
> val outputs = ml.execute("dum.dml", nargs)
> I could certainly be doing something wrong, but it does run if I replace 
> eigen() with diag() and both are listed similarly in the guide 
> http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html
> Is eigen() supported currently and does it require some installation of some 
> library?  I didn't see anything about that in the docs.
> Thanks, this looks super useful!
> This might just be a documentation bug, not a code bug, but I'm not sure how 
> else to contact people about it and get it resolved.  Are there forums?
> --Golda



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-633) Improve Left-Indexing Performance with (Nested) Parfor Loops

2016-05-14 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283466#comment-15283466
 ] 

Matthias Boehm commented on SYSTEMML-633:
-

I just tried to reproduce this but got stuck in namespace resolution issues. 
[~mwdus...@us.ibm.com] is the attached setup everything I need?

> Improve Left-Indexing Performance with (Nested) Parfor Loops
> 
>
> Key: SYSTEMML-633
> URL: https://issues.apache.org/jira/browse/SYSTEMML-633
> Project: SystemML
>  Issue Type: Improvement
>  Components: ParFor
>Reporter: Mike Dusenberry
>Priority: Critical
> Attachments: Im2colWrapper.java, log.txt, log.txt, perf-dml.dml, 
> perf-tf.py, perf.sh, run.sh, systemml-nn.zip, time.txt
>
>
> In the experimental deep learning DML library I've been building 
> ([https://github.com/dusenberrymw/systemml-nn|https://github.com/dusenberrymw/systemml-nn]),
>  I've experienced severe bottlenecks due to *left-indexing* in parfor loops.  
> Here, I will highlight a few particular instances with simplified examples, 
> but the same issue is shared across many areas of the library, particularly 
> in the convolution and max pooling layers, and is exaggerated in real 
> use-cases.
> *Quick note* on setup for any of the below experiments.  Please grab a copy 
> of the above repo (particularly the {{nn}} directory), and run any 
> experiments with the {{nn}} package available at the base directory of the 
> experiment.
> Scenario: *Convolution*
> * In the library above, the forward pass of the convolution function 
> ([{{conv::forward(...)}} | 
> https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L8]
>  in {{nn/layers/conv.dml}}) essentially accepts a matrix {{X}} of images, a 
> matrix of weights {{W}}, and several other parameters corresponding to image 
> sizes, filter sizes, etc.  It then loops through the images with a {{parfor}} 
> loop, and for each image it pads the image with {{util::pad_image}}, extracts 
> "patches" of the image into columns of a matrix in a sliding fashion across 
> the image with {{util::im2col}}, performs a matrix multiplication between the 
> matrix of patch columns and the weight matrix, and then saves the result into 
> a matrix defined outside of the parfor loop using left-indexing.
> * Left-indexing has been identified as the bottleneck by a wide margin.
> * Left-indexing is used in the main {{conv::forward(...)}} function in the 
> [last line in the parfor 
> loop|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/layers/conv.dml#L61],
>  in the 
> [{{util::pad_image(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L196]
>  function used by {{conv::forward(...)}}, as well as in the 
> [{{util::im2col(...)}}|https://github.com/dusenberrymw/systemml-nn/blob/f6d3e077ae3c303eb8426b31329d3734e3483d5f/nn/util.dml#L96]
>  function used by {{conv::forward(...)}}.
> * Test script (assuming the {{nn}} package is available):
> ** {{speed-633.dml}} {code}
> source("nn/layers/conv.dml") as conv
> source("nn/util.dml") as util
> # Generate data
> N = 64  # num examples
> C = 30  # num channels
> Hin = 28  # input height
> Win = 28  # input width
> F = 20  # num filters
> Hf = 3  # filter height
> Wf = 3  # filter width
> stride = 1
> pad = 1
> X = rand(rows=N, cols=C*Hin*Win)
> # Create layer
> [W, b] = conv::init(F, C, Hf, Wf)
> # Forward
> [out, Hout, Wout] = conv::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, 
> stride, pad, pad)
> print("Out: " + nrow(out) + "x" + ncol(out))
> print("Hout: " + Hout)
> print("Wout: " + Wout)
> print("")
> print(sum(out))
> {code}
> * Invocation:
> ** {{java -jar 
> $SYSTEMML_HOME/target/systemml-0.10.0-incubating-SNAPSHOT-standalone.jar -f 
> speed-633.dml -stats -explain -exec singlenode}}
> * Stats output (modified to output up to 100 instructions):
> ** {code}
> ...
> Total elapsed time:   26.834 sec.
> Total compilation time:   0.529 sec.
> Total execution time:   26.304 sec.
> Number of compiled MR Jobs: 0.
> Number of executed MR Jobs: 0.
> Cache hits (Mem, WB, FS, HDFS): 9196235/0/0/0.
> Cache writes (WB, FS, HDFS):  3070724/0/0.
> Cache times (ACQr/m, RLS, EXP): 1.474/1.120/26.998/0.000 sec.
> HOP DAGs recompiled (PRED, SB): 0/0.
> HOP DAGs recompile time:  0.268 sec.
> Functions recompiled:   129.
> Functions recompile time: 0.841 sec.
> ParFor loops optimized:   1.
> ParFor optimize time:   0.032 sec.
> ParFor initialize time:   0.015 sec.
> ParFor result merge time: 0.028 sec.
> ParFor total update in-place: 0/0/1559360
> Total JIT compile time:   14.235 sec.
> Total JVM GC count:   94.
> Total JVM GC time:0.366 sec.
> Heavy hitter instructions (

[jira] [Assigned] (SYSTEMML-512) DML Script With UDFs Results In Out Of Memory Error As Compared to Without UDFs

2016-05-14 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm reassigned SYSTEMML-512:
---

Assignee: Matthias Boehm

> DML Script With UDFs Results In Out Of Memory Error As Compared to Without 
> UDFs
> ---
>
> Key: SYSTEMML-512
> URL: https://issues.apache.org/jira/browse/SYSTEMML-512
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>Assignee: Matthias Boehm
> Fix For: SystemML 0.10
>
> Attachments: test1.scala, test2.scala
>
>
> Currently, the following script for running a simple version of Poisson 
> non-negative matrix factorization (PNMF) runs in linear time as desired:
> {code}
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF
> n = nrow(V)
> m = ncol(V)
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> # compute negative log-likelihood
> negloglik_temp = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> However, a small refactoring of this same script to pull the core PNMF 
> algorithm and the negative log-likelihood computation out into separate UDFs 
> results in non-linear runtime and a Java out of memory heap error on the same 
> dataset.  
> {code}
> pnmf = function(matrix[double] V, integer max_iteration, integer rank) return 
> (matrix[double] W, matrix[double] H) {
> n = nrow(V)
> m = ncol(V)
> 
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> 
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> }
> negloglikfunc = function(matrix[double] V, matrix[double] W, matrix[double] 
> H) return (double negloglik) {
> negloglik = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> }
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF and evaluate
> [W, H] = pnmf(V, max_iteration, rank)
> negloglik_temp = negloglikfunc(V, W, H)
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> The expectation would be that such modularization at the DML level should be 
> allowed without any impact on performance.
> Details:
> - Data: Amazon product co-purchasing dataset from Stanford 
> [http://snap.stanford.edu/data/amazon0601.html | 
> http://snap.stanford.edu/data/amazon0601.html]
> - Execution mode: Spark {{MLContext}}, but should be applicable to 
> command-line invocation as well. 
> - Error message:
> {code}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:415)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.sparseToDense(MatrixBlock.java:1212)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.examSparsity(MatrixBlock.java:1103)
>   at 
> org.apache.sysml.runtime.instructions.cp.MatrixMatrixArithmeticCPInstruction.processInstruction(MatrixMatrixArithmeticCPInstruction.java:60)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:227)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:169)
>   at 
> org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:183)
>   at 
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:115)
>   at 
> org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:

[jira] [Resolved] (SYSTEMML-512) DML Script With UDFs Results In Out Of Memory Error As Compared to Without UDFs

2016-05-14 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-512.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.10

> DML Script With UDFs Results In Out Of Memory Error As Compared to Without 
> UDFs
> ---
>
> Key: SYSTEMML-512
> URL: https://issues.apache.org/jira/browse/SYSTEMML-512
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>Assignee: Matthias Boehm
> Fix For: SystemML 0.10
>
> Attachments: test1.scala, test2.scala
>
>
> Currently, the following script for running a simple version of Poisson 
> non-negative matrix factorization (PNMF) runs in linear time as desired:
> {code}
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF
> n = nrow(V)
> m = ncol(V)
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> # compute negative log-likelihood
> negloglik_temp = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> However, a small refactoring of this same script to pull the core PNMF 
> algorithm and the negative log-likelihood computation out into separate UDFs 
> results in non-linear runtime and a Java out of memory heap error on the same 
> dataset.  
> {code}
> pnmf = function(matrix[double] V, integer max_iteration, integer rank) return 
> (matrix[double] W, matrix[double] H) {
> n = nrow(V)
> m = ncol(V)
> 
> range = 0.01
> W = Rand(rows=n, cols=rank, min=0, max=range, pdf="uniform")
> H = Rand(rows=rank, cols=m, min=0, max=range, pdf="uniform")
> 
> i=0
> while(i < max_iteration) {
>   H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)) 
>   W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))
>   i = i + 1;
> }
> }
> negloglikfunc = function(matrix[double] V, matrix[double] W, matrix[double] 
> H) return (double negloglik) {
> negloglik = -1 * (sum(V*log(W%*%H)) - as.scalar(colSums(W)%*%rowSums(H)))
> }
> # data & args
> X = read($X)
> X = X+1 # change product IDs to be 1-based, rather than 0-based
> V = table(X[,1], X[,2])
> V = V[1:$size,1:$size]
> max_iteration = as.integer($maxiter)
> rank = as.integer($rank)
> # run PNMF and evaluate
> [W, H] = pnmf(V, max_iteration, rank)
> negloglik_temp = negloglikfunc(V, W, H)
> # write outputs
> negloglik = matrix(negloglik_temp, rows=1, cols=1)
> write(negloglik, $negloglikout)
> write(W, $Wout)
> write(H, $Hout)
> {code}
> The expectation would be that such modularization at the DML level should be 
> allowed without any impact on performance.
> Details:
> - Data: Amazon product co-purchasing dataset from Stanford 
> [http://snap.stanford.edu/data/amazon0601.html | 
> http://snap.stanford.edu/data/amazon0601.html]
> - Execution mode: Spark {{MLContext}}, but should be applicable to 
> command-line invocation as well. 
> - Error message:
> {code}
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:415)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.sparseToDense(MatrixBlock.java:1212)
>   at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.examSparsity(MatrixBlock.java:1103)
>   at 
> org.apache.sysml.runtime.instructions.cp.MatrixMatrixArithmeticCPInstruction.processInstruction(MatrixMatrixArithmeticCPInstruction.java:60)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:227)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:169)
>   at 
> org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:183)
>   at 
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:115)
>   at 
> org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstruc

[jira] [Assigned] (SYSTEMML-680) eigen() fails with "Unsupported function EIGEN" but diag() works

2016-05-14 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm reassigned SYSTEMML-680:
---

Assignee: Matthias Boehm

> eigen() fails with "Unsupported function EIGEN" but diag() works
> 
>
> Key: SYSTEMML-680
> URL: https://issues.apache.org/jira/browse/SYSTEMML-680
> Project: SystemML
>  Issue Type: Bug
>  Components: APIs
>Affects Versions: SystemML 0.9
> Environment: Linux ip-172-20-42-170 3.13.0-61-generic #100-Ubuntu SMP 
> Wed Jul 29 11:21:34 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
> spark-core_2.10-1.3.0.jar
>Reporter: Golda Velez
>Assignee: Matthias Boehm
>Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Could be some simple error, since I'm new to SystemML
> I'm running a tiny DML script:
> X = read($Xin)
> ee = eigen(X)
> via some things to set up the matrix in scala, ending with
> val sysMlMatrix = RDDConverterUtils.dataFrameToBinaryBlock(sc, df, mc, false)
> val ml = new MLContext(sc)
> ml.reset()
> ml.registerInput("X", sysMlMatrix, numRows, numCols)
> ml.registerOutput("e")
> val nargs = Map("Xin" -> " ", "Eout" -> " ")
> val outputs = ml.execute("dum.dml", nargs)
> I could certainly be doing something wrong, but it does run if I replace 
> eigen() with diag() and both are listed similarly in the guide 
> http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html
> Is eigen() supported currently and does it require some installation of some 
> library?  I didn't see anything about that in the docs.
> Thanks, this looks super useful!
> This might just be a documentation bug, not a code bug, but I'm not sure how 
> else to contact people about it and get it resolved.  Are there forums?
> --Golda



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-680) eigen() fails with "Unsupported function EIGEN" but diag() works

2016-05-14 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-680.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.10

we now added explicit error handling for qr, lu, and eigen to point to the 
correct usage via multi-return assignment.

> eigen() fails with "Unsupported function EIGEN" but diag() works
> 
>
> Key: SYSTEMML-680
> URL: https://issues.apache.org/jira/browse/SYSTEMML-680
> Project: SystemML
>  Issue Type: Bug
>  Components: APIs
>Affects Versions: SystemML 0.9
> Environment: Linux ip-172-20-42-170 3.13.0-61-generic #100-Ubuntu SMP 
> Wed Jul 29 11:21:34 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
> spark-core_2.10-1.3.0.jar
>Reporter: Golda Velez
>Assignee: Matthias Boehm
>Priority: Minor
> Fix For: SystemML 0.10
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Could be some simple error, since I'm new to SystemML
> I'm running a tiny DML script:
> X = read($Xin)
> ee = eigen(X)
> via some things to set up the matrix in scala, ending with
> val sysMlMatrix = RDDConverterUtils.dataFrameToBinaryBlock(sc, df, mc, false)
> val ml = new MLContext(sc)
> ml.reset()
> ml.registerInput("X", sysMlMatrix, numRows, numCols)
> ml.registerOutput("e")
> val nargs = Map("Xin" -> " ", "Eout" -> " ")
> val outputs = ml.execute("dum.dml", nargs)
> I could certainly be doing something wrong, but it does run if I replace 
> eigen() with diag() and both are listed similarly in the guide 
> http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html
> Is eigen() supported currently and does it require some installation of some 
> library?  I didn't see anything about that in the docs.
> Thanks, this looks super useful!
> This might just be a documentation bug, not a code bug, but I'm not sure how 
> else to contact people about it and get it resolved.  Are there forums?
> --Golda



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SYSTEMML-714) Compile error on rewrite 'pushdown sum on binary' w/ multiple consumers

2016-05-26 Thread Matthias Boehm (JIRA)

Matthias Boehm created SYSTEMML-714:
---

 Summary: Compile error on rewrite 'pushdown sum on binary' w/ 
multiple consumers
 Key: SYSTEMML-714
 URL: https://issues.apache.org/jira/browse/SYSTEMML-714
 Project: SystemML
  Issue Type: Bug
Affects Versions: SystemML 0.10
Reporter: Matthias Boehm
 Fix For: SystemML 0.11






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-708) Release checklist for 0.10.0-incubating-rc1

2016-05-26 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303574#comment-15303574
 ] 

Matthias Boehm commented on SYSTEMML-708:
-

known issues: SYSTEMML-714

> Release checklist for 0.10.0-incubating-rc1
> ---
>
> Key: SYSTEMML-708
> URL: https://issues.apache.org/jira/browse/SYSTEMML-708
> Project: SystemML
>  Issue Type: Task
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
>
> || Task || Status || Notes ||
> | All Artifacts and Checksums Present | {panel:bgColor=#bfffba}Pass{panel} | |
> | Release Candidate Build - Windows   | {panel:bgColor=#bfffba}Pass{panel} | |
> | Release Candidate Build - OS X  | {panel:bgColor=#bfffba}Pass{panel} | |
> | Release Candidate Build - Linux | {panel:bgColor=#bfffba}Pass{panel} | |
> | Test Suite Passes - Windows |{panel:bgColor=#bfffba}Pass{panel} | 
> SYSTEMML-712 opened for intermittent test failure |
> | Test Suite Passes - OS X| {panel:bgColor=#bfffba}Pass{panel} | |
> | Test Suite Passes - Linux   | {panel:bgColor=#bfffba}Pass{panel} | |
> | All Binaries Execute| {panel:bgColor=#bfffba}Pass{panel} | 
> Verified on OS X |
> | Check LICENSE and NOTICE Files  | {panel:bgColor=#bfffba}Pass{panel} | 
> non-blocker, SYSTEMML-711 filed |
> | Src Artifact Builds and Tests Pass  | {panel:bgColor=#bfffba}Pass{panel} | 
> 5037 of 5038 passed on OS X (RightIndexingMatrixTest failed) |
> | Single-Node Standalone - Windows| {panel:bgColor=#bfffba}Pass{panel} | |
> | Single-Node Standalone - OS X   | {panel:bgColor=#bfffba}Pass{panel} | |
> | Single-Node Standalone - Linux  | {panel:bgColor=#bfffba}Pass{panel} | |
> | Single-Node Spark   | {panel:bgColor=#bfffba}Pass{panel} | 
> Verified on OS X
> | Single-Node Hadoop  | {panel:bgColor=#bfffba}Pass{panel} | 
> Verified on OS X
> | Notebooks - Jupyter | {panel:bgColor=#bfffba}Pass{panel} | 
> Verified on OS X |
> | Notebooks - Zeppelin| {panel:bgColor=#bfffba}Pass{panel} | 
> Verified on OS X |
> | Performance Suite - Spark   | {panel:bgColor=#bfffba}Pass{panel} | 
> Run on Spark 1.6.1 for data sizes {80MB, 800MB, 8GB, 80GB} |
> | Performance Suite - Hadoop  | | |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-714) Compile error on rewrite 'pushdown sum on binary' w/ multiple consumers

2016-05-26 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-714:

Description: 
The dynamic simplification rewrite 'pushdown sum on binary +' with multiple 
consumes creates a HOP DAG corruption leading to compilation errors. Consider 
the following script as an example
{code}
A = rand(rows=10, cols=1);
B = rand(rows=10, cols=1);
C = rand(rows=10, cols=1);
D = rand(rows=10, cols=1);

r1 = sum(A*B + C*D);
r2 = r1;
print("ret1="+r1+", ret2="+r2);
{code} 

The trace of applied rewrites is as follows

{code}
DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
pushdownSumOnAdditiveBinary.
DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
simplifyDotProductSum.
DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
fuseDatagenReorgOperation.
DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
simplifyDotProductSum.
DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
fuseDatagenReorgOperation
{code}

Finally, this issue results in the following or similar exception on subsequent 
rewrites:

{code}
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at 
org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.simplifyColwiseAggregate(RewriteAlgebraicSimplificationDynamic.java:566)
at 
org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.rule_AlgebraicSimplification(RewriteAlgebraicSimplificationDynamic.java:154)
at 
org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.rule_AlgebraicSimplification(RewriteAlgebraicSimplificationDynamic.java:185)
at 
org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.rewriteHopDAGs(RewriteAlgebraicSimplificationDynamic.java:91)
at 
org.apache.sysml.hops.rewrite.ProgramRewriter.rewriteHopDAGs(ProgramRewriter.java:279)
at 
org.apache.sysml.hops.rewrite.ProgramRewriter.rewriteStatementBlockHopDAGs(ProgramRewriter.java:263)
at 
org.apache.sysml.hops.rewrite.ProgramRewriter.rewriteProgramHopDAGs(ProgramRewriter.java:206)
at 
org.apache.sysml.parser.DMLTranslator.rewriteHopsDAG(DMLTranslator.java:273)
at org.apache.sysml.api.DMLScript.execute(DMLScript.java:602)
at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:337)
{code}

The issue is caused by incorrect handling of multiple parents in the rewrite 
'pushdown sum on binary +'. The workaround is to disable rewrites (optimization 
level 1 instead 2) or to create a "if(1==1){}" cut right after the sum 
(preferred workaround).

> Compile error on rewrite 'pushdown sum on binary' w/ multiple consumers
> ---
>
> Key: SYSTEMML-714
> URL: https://issues.apache.org/jira/browse/SYSTEMML-714
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Affects Versions: SystemML 0.10
>Reporter: Matthias Boehm
> Fix For: SystemML 0.11
>
>
> The dynamic simplification rewrite 'pushdown sum on binary +' with multiple 
> consumes creates a HOP DAG corruption leading to compilation errors. Consider 
> the following script as an example
> {code}
> A = rand(rows=10, cols=1);
> B = rand(rows=10, cols=1);
> C = rand(rows=10, cols=1);
> D = rand(rows=10, cols=1);
> r1 = sum(A*B + C*D);
> r2 = r1;
> print("ret1="+r1+", ret2="+r2);
> {code} 
> The trace of applied rewrites is as follows
> {code}
> DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
> pushdownSumOnAdditiveBinary.
> DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
> simplifyDotProductSum.
> DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
> fuseDatagenReorgOperation.
> DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
> simplifyDotProductSum.
> DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
> fuseDatagenReorgOperation
> {code}
> Finally, this issue results in the following or similar exception on 
> subsequent rewrites:
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
> at java.util.ArrayList.get(ArrayList.java:429)
> at 
> org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.simplifyColwiseAggregate(RewriteAlgebraicSimplificationDynamic.java:566)
> at 
> org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.rule_AlgebraicSimplification(RewriteAlgebraicSimplificationDynamic.java:154)
> at 
> org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.rule_AlgebraicSimplification(RewriteAlgebraicSimplificationDynamic.java:185)
> at 
> org.apache.sysml

[jira] [Updated] (SYSTEMML-714) Compile error on rewrite 'pushdown sum on binary' w/ multiple consumers

2016-05-26 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-714:

Description: 
The dynamic simplification rewrite 'pushdown sum on binary +' with multiple 
consumers creates a HOP DAG corruption leading to compilation errors. Consider 
the following script as an example
{code}
A = rand(rows=10, cols=1);
B = rand(rows=10, cols=1);
C = rand(rows=10, cols=1);
D = rand(rows=10, cols=1);

r1 = sum(A*B + C*D);
r2 = r1;
print("ret1="+r1+", ret2="+r2);
{code} 

The trace of applied rewrites is as follows

{code}
DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
pushdownSumOnAdditiveBinary.
DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
simplifyDotProductSum.
DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
fuseDatagenReorgOperation.
DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
simplifyDotProductSum.
DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
fuseDatagenReorgOperation
{code}

Finally, this issue results in the following or similar exception on subsequent 
rewrites:

{code}
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at 
org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.simplifyColwiseAggregate(RewriteAlgebraicSimplificationDynamic.java:566)
at 
org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.rule_AlgebraicSimplification(RewriteAlgebraicSimplificationDynamic.java:154)
at 
org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.rule_AlgebraicSimplification(RewriteAlgebraicSimplificationDynamic.java:185)
at 
org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.rewriteHopDAGs(RewriteAlgebraicSimplificationDynamic.java:91)
at 
org.apache.sysml.hops.rewrite.ProgramRewriter.rewriteHopDAGs(ProgramRewriter.java:279)
at 
org.apache.sysml.hops.rewrite.ProgramRewriter.rewriteStatementBlockHopDAGs(ProgramRewriter.java:263)
at 
org.apache.sysml.hops.rewrite.ProgramRewriter.rewriteProgramHopDAGs(ProgramRewriter.java:206)
at 
org.apache.sysml.parser.DMLTranslator.rewriteHopsDAG(DMLTranslator.java:273)
at org.apache.sysml.api.DMLScript.execute(DMLScript.java:602)
at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:337)
{code}

The issue is caused by incorrect handling of multiple parents in the rewrite 
'pushdown sum on binary +'. The workaround is to disable rewrites (optimization 
level 1 instead 2) or to create a "if(1==1){}" cut right after the sum 
(preferred workaround).

  was:
The dynamic simplification rewrite 'pushdown sum on binary +' with multiple 
consumes creates a HOP DAG corruption leading to compilation errors. Consider 
the following script as an example
{code}
A = rand(rows=10, cols=1);
B = rand(rows=10, cols=1);
C = rand(rows=10, cols=1);
D = rand(rows=10, cols=1);

r1 = sum(A*B + C*D);
r2 = r1;
print("ret1="+r1+", ret2="+r2);
{code} 

The trace of applied rewrites is as follows

{code}
DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
pushdownSumOnAdditiveBinary.
DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
simplifyDotProductSum.
DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
fuseDatagenReorgOperation.
DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
simplifyDotProductSum.
DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
fuseDatagenReorgOperation
{code}

Finally, this issue results in the following or similar exception on subsequent 
rewrites:

{code}
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at 
org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.simplifyColwiseAggregate(RewriteAlgebraicSimplificationDynamic.java:566)
at 
org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.rule_AlgebraicSimplification(RewriteAlgebraicSimplificationDynamic.java:154)
at 
org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.rule_AlgebraicSimplification(RewriteAlgebraicSimplificationDynamic.java:185)
at 
org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.rewriteHopDAGs(RewriteAlgebraicSimplificationDynamic.java:91)
at 
org.apache.sysml.hops.rewrite.ProgramRewriter.rewriteHopDAGs(ProgramRewriter.java:279)
at 
org.apache.sysml.hops.rewrite.ProgramRewriter.rewriteStatementBlockHopDAGs(ProgramRewriter.java:263)
at 
org.apache.sysml.hops.rewrite.ProgramRewriter.rewriteProgramHopDAGs(ProgramRewriter.java:206)
at 
org.apache.sysml.parser.DMLTranslator.rewriteHopsDAG(DMLTranslator.java:273)
at

[jira] [Commented] (SYSTEMML-693) Automatically invoke toString when user tries to print a matrix

2016-05-27 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305001#comment-15305001
 ] 

Matthias Boehm commented on SYSTEMML-693:
-

the recommended way would be (3) to automatically inject toString on hop 
creation of print / string concatenation (not during parsing) whenever the 
input to either one of the two is a matrix, because it would allow us to 
properly handle the case of large matrices. 

> Automatically invoke toString when user tries to print a matrix
> ---
>
> Key: SYSTEMML-693
> URL: https://issues.apache.org/jira/browse/SYSTEMML-693
> Project: SystemML
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Nakul Jindal
>Priority: Minor
>
> The {{toString}} builtin function was added as [PR 
> #120|https://github.com/apache/incubator-systemml/pull/120] and SYSTEMML-693. 
> The way to print a matrix with this builtin function is
> {code}
> m = ... # Create Matrix
> print("matrix : " + toString(m))
> {code}
> To improve usability, the DML programmer should be able to say
> {code}
> m = ... # Create Matrix
> print("matrix : " + m)
> {code}
> The call to {{toString}} should be automatically inserted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SYSTEMML-747) Wrong in-memory csv reblock decision w/ unknowns

2016-06-02 Thread Matthias Boehm (JIRA)

Matthias Boehm created SYSTEMML-747:
---

 Summary: Wrong in-memory csv reblock decision w/ unknowns
 Key: SYSTEMML-747
 URL: https://issues.apache.org/jira/browse/SYSTEMML-747
 Project: SystemML
  Issue Type: Bug
  Components: Runtime
Affects Versions: SystemML 0.10
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-747) Wrong in-memory csv reblock decision w/ unknowns

2016-06-02 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-747:

Description: The decision on cp in-memory reblock for text input matrices 
is made based on the estimated size in memory. For csv reblock, we support 
persistent reads with unknown dimension sizes. In scenarios with unknown 
dimensions the memory estimate is always negative, resulting always in 
in-memory reblocks which either take very long or even run out of memory.

> Wrong in-memory csv reblock decision w/ unknowns
> 
>
> Key: SYSTEMML-747
> URL: https://issues.apache.org/jira/browse/SYSTEMML-747
> Project: SystemML
>  Issue Type: Bug
>  Components: Runtime
>Affects Versions: SystemML 0.10
>Reporter: Matthias Boehm
>
> The decision on cp in-memory reblock for text input matrices is made based on 
> the estimated size in memory. For csv reblock, we support persistent reads 
> with unknown dimension sizes. In scenarios with unknown dimensions the memory 
> estimate is always negative, resulting always in in-memory reblocks which 
> either take very long or even run out of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SYSTEMML-749) Failed nrow call after spark removeEmpty operation

2016-06-02 Thread Matthias Boehm (JIRA)

Matthias Boehm created SYSTEMML-749:
---

 Summary: Failed nrow call after spark removeEmpty operation
 Key: SYSTEMML-749
 URL: https://issues.apache.org/jira/browse/SYSTEMML-749
 Project: SystemML
  Issue Type: Bug
  Components: Runtime
Affects Versions: SystemML 0.10
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-749) Failed nrow call after spark removeEmpty operation

2016-06-02 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-749:

Description: In the special case of removeEmpty over a completely empty 
input matrix, the spark removeEmpty instruction creates an invalid output of 
dimensions [0 x ncol(in)] or [nrow(in) x 0] which is not supported in SystemML. 
Accordingly, any subsequent operation would have undefined behavior; in case of 
meta data operations like nrow or ncol, this actually leads to an explicit 
error with the following message: "Invalid meta data returned by nrow: 0".

> Failed nrow call after spark removeEmpty operation
> --
>
> Key: SYSTEMML-749
> URL: https://issues.apache.org/jira/browse/SYSTEMML-749
> Project: SystemML
>  Issue Type: Bug
>  Components: Runtime
>Affects Versions: SystemML 0.10
>Reporter: Matthias Boehm
>
> In the special case of removeEmpty over a completely empty input matrix, the 
> spark removeEmpty instruction creates an invalid output of dimensions [0 x 
> ncol(in)] or [nrow(in) x 0] which is not supported in SystemML. Accordingly, 
> any subsequent operation would have undefined behavior; in case of meta data 
> operations like nrow or ncol, this actually leads to an explicit error with 
> the following message: "Invalid meta data returned by nrow: 0".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (SYSTEMML-638) Random Forest Predict Execution Fails

2016-06-03 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-638.
---
Resolution: Fixed

I'm closing this as all related compiler and input issues have been resolved. 

> Random Forest Predict Execution Fails
> -
>
> Key: SYSTEMML-638
> URL: https://issues.apache.org/jira/browse/SYSTEMML-638
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.10
>Reporter: Stacey Ronaghan
>
> Issue executing the prediction for random forest algorithm on SystemML 0.10 
> (incubating) via MLContext with Scala Spark on a cluster.
> Related to [SYSTEMML-597|https://issues.apache.org/jira/browse/SYSTEMML-597]. 
> X is the same input passed into execute for random-forest.dml (mentioned in 
> [SYSTEMML-597|https://issues.apache.org/jira/browse/SYSTEMML-597]) and M is 
> its output model.
> Code:
> {code}
> // Register inputs & outputs for prediction
> ml.reset()
> ml.registerInput("X", X)
> //ml.registerInput("Y", Y)
> ml.registerInput("M", M)
> ml.registerOutput("P")
> //ml.registerOutput("A")
> // Run the script
> //val nargs = Map("X" -> "", "Y" -> "", "M" -> "", "P" -> "", "A" -> "")
> val nargs = Map("X" -> "", "M" -> "", "P" -> "")
> val outputs = 
> ml.execute("/home/biadmin/spark-enablement/installs/SystemML/algorithms/random-forest-predict.dml",
>  nargs)
> val P = outputs.getDF(sqlContext, "P")
> //val A = outputs.getDF(sqlContext, "A")
> {code}
> Output:
> {code}
> import org.apache.sysml.api.MLContext ml: org.apache.sysml.api.MLContext = 
> org.apache.sysml.api.MLContext@5649f7b4 nargs: 
> scala.collection.immutable.Map[String,String] = Map(X -> "", M -> "", P -> 
> "") org.apache.sysml.runtime.DMLRuntimeException: 
> org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program 
> block generated from statement block between lines 68 and 89 -- Error 
> evaluating instruction: 
> CP°groupedagg°target=_mVar60580°groups=_mVar60580°fn=count°k=40°_mVar60581·MATRIX·DOUBLE
>  at org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:152) 
> at 
> org.apache.sysml.api.MLContext.executeUsingSimplifiedCompilationChain(MLContext.java:1365)
>  at 
> org.apache.sysml.api.MLContext.compileAndExecuteScript(MLContext.java:1225) 
> at 
> org.apache.sysml.api.MLContext.compileAndExecuteScript(MLContext.java:1173) 
> at org.apache.sysml.api.MLContext.execute(MLContext.java:640) at 
> org.apache.sysml.api.MLContext.execute(MLContext.java:675) at 
> org.apache.sysml.api.MLContext.execute(MLContext.java:688) at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41)
>  at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:46) 
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:48) at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:50) at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:52) at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:54) at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:56) at 
> $iwC$$iwC$$iwC$$iwC$$iwC.(:58) at 
> $iwC$$iwC$$iwC$$iwC.(:60) at 
> $iwC$$iwC$$iwC.(:62) at $iwC$$iwC.(:64) at 
> $iwC.(:66) at (:68) at .(:72) at 
> .() at .(:7) at .() at 
> $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:497) at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at 
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) at 
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at 
> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at 
> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at 
> org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:646)
>  at 
> org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:611)
>  at 
> org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:604)
>  at 
> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
>  at 
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
>  at 
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:292)
>  at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at 
> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.

[jira] [Resolved] (SYSTEMML-695) Incorrect rand normal w/ fused scalar operation

2016-06-03 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-695.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.10

> Incorrect rand normal w/ fused scalar operation
> ---
>
> Key: SYSTEMML-695
> URL: https://issues.apache.org/jira/browse/SYSTEMML-695
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (SYSTEMML-714) Compile error on rewrite 'pushdown sum on binary' w/ multiple consumers

2016-06-03 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-714.
---

> Compile error on rewrite 'pushdown sum on binary' w/ multiple consumers
> ---
>
> Key: SYSTEMML-714
> URL: https://issues.apache.org/jira/browse/SYSTEMML-714
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Affects Versions: SystemML 0.10
>Reporter: Matthias Boehm
> Fix For: SystemML 0.11
>
>
> The dynamic simplification rewrite 'pushdown sum on binary +' with multiple 
> consumers creates a HOP DAG corruption leading to compilation errors. 
> Consider the following script as an example
> {code}
> A = rand(rows=10, cols=1);
> B = rand(rows=10, cols=1);
> C = rand(rows=10, cols=1);
> D = rand(rows=10, cols=1);
> r1 = sum(A*B + C*D);
> r2 = r1;
> print("ret1="+r1+", ret2="+r2);
> {code} 
> The trace of applied rewrites is as follows
> {code}
> DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
> pushdownSumOnAdditiveBinary.
> DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
> simplifyDotProductSum.
> DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
> fuseDatagenReorgOperation.
> DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
> simplifyDotProductSum.
> DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
> fuseDatagenReorgOperation
> {code}
> Finally, this issue results in the following or similar exception on 
> subsequent rewrites:
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
> at java.util.ArrayList.get(ArrayList.java:429)
> at 
> org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.simplifyColwiseAggregate(RewriteAlgebraicSimplificationDynamic.java:566)
> at 
> org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.rule_AlgebraicSimplification(RewriteAlgebraicSimplificationDynamic.java:154)
> at 
> org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.rule_AlgebraicSimplification(RewriteAlgebraicSimplificationDynamic.java:185)
> at 
> org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.rewriteHopDAGs(RewriteAlgebraicSimplificationDynamic.java:91)
> at 
> org.apache.sysml.hops.rewrite.ProgramRewriter.rewriteHopDAGs(ProgramRewriter.java:279)
> at 
> org.apache.sysml.hops.rewrite.ProgramRewriter.rewriteStatementBlockHopDAGs(ProgramRewriter.java:263)
> at 
> org.apache.sysml.hops.rewrite.ProgramRewriter.rewriteProgramHopDAGs(ProgramRewriter.java:206)
> at 
> org.apache.sysml.parser.DMLTranslator.rewriteHopsDAG(DMLTranslator.java:273)
> at org.apache.sysml.api.DMLScript.execute(DMLScript.java:602)
> at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:337)
> {code}
> The issue is caused by incorrect handling of multiple parents in the rewrite 
> 'pushdown sum on binary +'. The workaround is to disable rewrites 
> (optimization level 1 instead 2) or to create a "if(1==1){}" cut right after 
> the sum (preferred workaround).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (SYSTEMML-695) Incorrect rand normal w/ fused scalar operation

2016-06-03 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-695.
---

> Incorrect rand normal w/ fused scalar operation
> ---
>
> Key: SYSTEMML-695
> URL: https://issues.apache.org/jira/browse/SYSTEMML-695
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-714) Compile error on rewrite 'pushdown sum on binary' w/ multiple consumers

2016-06-03 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-714.
-
Resolution: Fixed

> Compile error on rewrite 'pushdown sum on binary' w/ multiple consumers
> ---
>
> Key: SYSTEMML-714
> URL: https://issues.apache.org/jira/browse/SYSTEMML-714
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Affects Versions: SystemML 0.10
>Reporter: Matthias Boehm
> Fix For: SystemML 0.11
>
>
> The dynamic simplification rewrite 'pushdown sum on binary +' with multiple 
> consumers creates a HOP DAG corruption leading to compilation errors. 
> Consider the following script as an example
> {code}
> A = rand(rows=10, cols=1);
> B = rand(rows=10, cols=1);
> C = rand(rows=10, cols=1);
> D = rand(rows=10, cols=1);
> r1 = sum(A*B + C*D);
> r2 = r1;
> print("ret1="+r1+", ret2="+r2);
> {code} 
> The trace of applied rewrites is as follows
> {code}
> DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
> pushdownSumOnAdditiveBinary.
> DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
> simplifyDotProductSum.
> DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
> fuseDatagenReorgOperation.
> DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
> simplifyDotProductSum.
> DEBUG rewrite.RewriteAlgebraicSimplificationDynamic: Applied 
> fuseDatagenReorgOperation
> {code}
> Finally, this issue results in the following or similar exception on 
> subsequent rewrites:
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
> at java.util.ArrayList.get(ArrayList.java:429)
> at 
> org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.simplifyColwiseAggregate(RewriteAlgebraicSimplificationDynamic.java:566)
> at 
> org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.rule_AlgebraicSimplification(RewriteAlgebraicSimplificationDynamic.java:154)
> at 
> org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.rule_AlgebraicSimplification(RewriteAlgebraicSimplificationDynamic.java:185)
> at 
> org.apache.sysml.hops.rewrite.RewriteAlgebraicSimplificationDynamic.rewriteHopDAGs(RewriteAlgebraicSimplificationDynamic.java:91)
> at 
> org.apache.sysml.hops.rewrite.ProgramRewriter.rewriteHopDAGs(ProgramRewriter.java:279)
> at 
> org.apache.sysml.hops.rewrite.ProgramRewriter.rewriteStatementBlockHopDAGs(ProgramRewriter.java:263)
> at 
> org.apache.sysml.hops.rewrite.ProgramRewriter.rewriteProgramHopDAGs(ProgramRewriter.java:206)
> at 
> org.apache.sysml.parser.DMLTranslator.rewriteHopsDAG(DMLTranslator.java:273)
> at org.apache.sysml.api.DMLScript.execute(DMLScript.java:602)
> at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:337)
> {code}
> The issue is caused by incorrect handling of multiple parents in the rewrite 
> 'pushdown sum on binary +'. The workaround is to disable rewrites 
> (optimization level 1 instead 2) or to create a "if(1==1){}" cut right after 
> the sum (preferred workaround).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (SYSTEMML-490) Runtime Platform Should Automatically Be Set To Hybrid_Spark When Executed On Spark

2016-06-03 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm reassigned SYSTEMML-490:
---

Assignee: Matthias Boehm

> Runtime Platform Should Automatically Be Set To Hybrid_Spark When Executed On 
> Spark
> ---
>
> Key: SYSTEMML-490
> URL: https://issues.apache.org/jira/browse/SYSTEMML-490
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>Assignee: Matthias Boehm
>
> Currently, the default runtime platform is set to "hybrid" mode, which is an 
> automatically optimized hybrid between single-node and Hadoop MR.  When 
> running on Spark, we should automatically detect and change the mode to the 
> correct setting of "hybrid_spark".  Of course, our {{sparkDML.sh}} script 
> appends this runtime mode explicitly, but a user shouldn't have to do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (SYSTEMML-749) Failed nrow call after spark removeEmpty operation

2016-06-03 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm reassigned SYSTEMML-749:
---

Assignee: Matthias Boehm

> Failed nrow call after spark removeEmpty operation
> --
>
> Key: SYSTEMML-749
> URL: https://issues.apache.org/jira/browse/SYSTEMML-749
> Project: SystemML
>  Issue Type: Bug
>  Components: Runtime
>Affects Versions: SystemML 0.10
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>
> In the special case of removeEmpty over a completely empty input matrix, the 
> spark removeEmpty instruction creates an invalid output of dimensions [0 x 
> ncol(in)] or [nrow(in) x 0] which is not supported in SystemML. Accordingly, 
> any subsequent operation would have undefined behavior; in case of meta data 
> operations like nrow or ncol, this actually leads to an explicit error with 
> the following message: "Invalid meta data returned by nrow: 0".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (SYSTEMML-747) Wrong in-memory csv reblock decision w/ unknowns

2016-06-03 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm reassigned SYSTEMML-747:
---

Assignee: Matthias Boehm

> Wrong in-memory csv reblock decision w/ unknowns
> 
>
> Key: SYSTEMML-747
> URL: https://issues.apache.org/jira/browse/SYSTEMML-747
> Project: SystemML
>  Issue Type: Bug
>  Components: Runtime
>Affects Versions: SystemML 0.10
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>
> The decision on cp in-memory reblock for text input matrices is made based on 
> the estimated size in memory. For csv reblock, we support persistent reads 
> with unknown dimension sizes. In scenarios with unknown dimensions the memory 
> estimate is always negative, resulting always in in-memory reblocks which 
> either take very long or even run out of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (SYSTEMML-707) diag not generate square matrix if given Nx1 matrix of zeroes

2016-06-03 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm reassigned SYSTEMML-707:
---

Assignee: Matthias Boehm

> diag not generate square matrix if given Nx1 matrix of zeroes
> -
>
> Key: SYSTEMML-707
> URL: https://issues.apache.org/jira/browse/SYSTEMML-707
> Project: SystemML
>  Issue Type: Bug
>  Components: APIs
>Reporter: Deron Eriksson
>Assignee: Matthias Boehm
>
> Thank you Matthew Plourde for finding this!
> If an Nx1 matrix of 0's is given to the diag() function, an Nx1 matrix of 0's 
> is returned. However, if an Nx1 matrix consists of any values that aren't 
> 0's, an NxN diagonal matrix is returned. This is inconsistent and the Nx1 
> matrix of 0's to diag() should probably return an NxN matrix.
> Example 1:
> {code}
> zeroes=matrix(0, 5, 1);
> print(toString(zeroes));
> print(toString(diag(zeroes)));
> {code}
> gives:
> {code}
> 0.000
> 0.000
> 0.000
> 0.000
> 0.000
> 0.000
> 0.000
> 0.000
> 0.000
> 0.000
> {code}
> Example 2:
> {code}
> ones=matrix(1, 5, 1);
> print(toString(ones));
> print(toString(diag(ones)));
> {code}
> gives:
> {code}
> 1.000
> 1.000
> 1.000
> 1.000
> 1.000
> 1.000 0.000 0.000 0.000 0.000
> 0.000 1.000 0.000 0.000 0.000
> 0.000 0.000 1.000 0.000 0.000
> 0.000 0.000 0.000 1.000 0.000
> 0.000 0.000 0.000 0.000 1.000
> {code}
> Example 3:
> {code}
> nums=matrix("0 1 2 3 4", 5, 1);
> print(toString(nums));
> print(toString(diag(nums)));
> {code}
> gives:
> {code}
> 0.000
> 1.000
> 2.000
> 3.000
> 4.000
> 0.000 0.000 0.000 0.000 0.000
> 0.000 1.000 0.000 0.000 0.000
> 0.000 0.000 2.000 0.000 0.000
> 0.000 0.000 0.000 3.000 0.000
> 0.000 0.000 0.000 0.000 4.000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-707) diag not generate square matrix if given Nx1 matrix of zeroes

2016-06-03 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-707.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.11

> diag not generate square matrix if given Nx1 matrix of zeroes
> -
>
> Key: SYSTEMML-707
> URL: https://issues.apache.org/jira/browse/SYSTEMML-707
> Project: SystemML
>  Issue Type: Bug
>  Components: APIs
>Reporter: Deron Eriksson
>Assignee: Matthias Boehm
> Fix For: SystemML 0.11
>
>
> Thank you Matthew Plourde for finding this!
> If an Nx1 matrix of 0's is given to the diag() function, an Nx1 matrix of 0's 
> is returned. However, if an Nx1 matrix consists of any values that aren't 
> 0's, an NxN diagonal matrix is returned. This is inconsistent and the Nx1 
> matrix of 0's to diag() should probably return an NxN matrix.
> Example 1:
> {code}
> zeroes=matrix(0, 5, 1);
> print(toString(zeroes));
> print(toString(diag(zeroes)));
> {code}
> gives:
> {code}
> 0.000
> 0.000
> 0.000
> 0.000
> 0.000
> 0.000
> 0.000
> 0.000
> 0.000
> 0.000
> {code}
> Example 2:
> {code}
> ones=matrix(1, 5, 1);
> print(toString(ones));
> print(toString(diag(ones)));
> {code}
> gives:
> {code}
> 1.000
> 1.000
> 1.000
> 1.000
> 1.000
> 1.000 0.000 0.000 0.000 0.000
> 0.000 1.000 0.000 0.000 0.000
> 0.000 0.000 1.000 0.000 0.000
> 0.000 0.000 0.000 1.000 0.000
> 0.000 0.000 0.000 0.000 1.000
> {code}
> Example 3:
> {code}
> nums=matrix("0 1 2 3 4", 5, 1);
> print(toString(nums));
> print(toString(diag(nums)));
> {code}
> gives:
> {code}
> 0.000
> 1.000
> 2.000
> 3.000
> 4.000
> 0.000 0.000 0.000 0.000 0.000
> 0.000 1.000 0.000 0.000 0.000
> 0.000 0.000 2.000 0.000 0.000
> 0.000 0.000 0.000 3.000 0.000
> 0.000 0.000 0.000 0.000 4.000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-747) Wrong in-memory csv reblock decision w/ unknowns

2016-06-03 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-747.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.11

> Wrong in-memory csv reblock decision w/ unknowns
> 
>
> Key: SYSTEMML-747
> URL: https://issues.apache.org/jira/browse/SYSTEMML-747
> Project: SystemML
>  Issue Type: Bug
>  Components: Runtime
>Affects Versions: SystemML 0.10
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.11
>
>
> The decision on cp in-memory reblock for text input matrices is made based on 
> the estimated size in memory. For csv reblock, we support persistent reads 
> with unknown dimension sizes. In scenarios with unknown dimensions the memory 
> estimate is always negative, resulting always in in-memory reblocks which 
> either take very long or even run out of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-490) Runtime Platform Should Automatically Be Set To Hybrid_Spark When Executed On Spark

2016-06-03 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-490.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.11

> Runtime Platform Should Automatically Be Set To Hybrid_Spark When Executed On 
> Spark
> ---
>
> Key: SYSTEMML-490
> URL: https://issues.apache.org/jira/browse/SYSTEMML-490
> Project: SystemML
>  Issue Type: Bug
>Reporter: Mike Dusenberry
>Assignee: Matthias Boehm
> Fix For: SystemML 0.11
>
>
> Currently, the default runtime platform is set to "hybrid" mode, which is an 
> automatically optimized hybrid between single-node and Hadoop MR.  When 
> running on Spark, we should automatically detect and change the mode to the 
> correct setting of "hybrid_spark".  Of course, our {{sparkDML.sh}} script 
> appends this runtime mode explicitly, but a user shouldn't have to do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-749) Failed nrow call after spark removeEmpty operation

2016-06-03 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-749.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.11

> Failed nrow call after spark removeEmpty operation
> --
>
> Key: SYSTEMML-749
> URL: https://issues.apache.org/jira/browse/SYSTEMML-749
> Project: SystemML
>  Issue Type: Bug
>  Components: Runtime
>Affects Versions: SystemML 0.10
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.11
>
>
> In the special case of removeEmpty over a completely empty input matrix, the 
> spark removeEmpty instruction creates an invalid output of dimensions [0 x 
> ncol(in)] or [nrow(in) x 0] which is not supported in SystemML. Accordingly, 
> any subsequent operation would have undefined behavior; in case of meta data 
> operations like nrow or ncol, this actually leads to an explicit error with 
> the following message: "Invalid meta data returned by nrow: 0".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SYSTEMML-753) Reference-based update-in-place

2016-06-03 Thread Matthias Boehm (JIRA)

Matthias Boehm created SYSTEMML-753:
---

 Summary: Reference-based update-in-place 
 Key: SYSTEMML-753
 URL: https://issues.apache.org/jira/browse/SYSTEMML-753
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-753) Reference-based update-in-place

2016-06-03 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-753:

Component/s: Runtime

> Reference-based update-in-place 
> 
>
> Key: SYSTEMML-753
> URL: https://issues.apache.org/jira/browse/SYSTEMML-753
> Project: SystemML
>  Issue Type: Bug
>  Components: Runtime
>Reporter: Matthias Boehm
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-753) Basic loop update in-place leftindexing

2016-06-03 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-753:

Issue Type: Task  (was: Bug)
   Summary: Basic loop update in-place leftindexing  (was: Reference-based 
update-in-place )

> Basic loop update in-place leftindexing
> ---
>
> Key: SYSTEMML-753
> URL: https://issues.apache.org/jira/browse/SYSTEMML-753
> Project: SystemML
>  Issue Type: Task
>  Components: Runtime
>Reporter: Matthias Boehm
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SYSTEMML-754) Shallow bufferpool serialize sparse matrices via CSR conversion

2016-06-03 Thread Matthias Boehm (JIRA)

Matthias Boehm created SYSTEMML-754:
---

 Summary: Shallow bufferpool serialize sparse matrices via CSR 
conversion
 Key: SYSTEMML-754
 URL: https://issues.apache.org/jira/browse/SYSTEMML-754
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-752) Support for Lasso and ElasticNet?

2016-06-03 Thread Matthias Boehm (JIRA)


[ 
https://issues.apache.org/jira/browse/SYSTEMML-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315340#comment-15315340
 ] 

Matthias Boehm commented on SYSTEMML-752:
-

thanks [~MechCoder] - you might want to check out 
https://github.com/apache/incubator-systemml/blob/master/scripts/staging/regression/lasso/lasso.dml
 and use it as a starting point.

> Support for Lasso and ElasticNet?
> -
>
> Key: SYSTEMML-752
> URL: https://issues.apache.org/jira/browse/SYSTEMML-752
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Manoj Kumar
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SYSTEMML-755) Extended multi-threaded rand (column parallelization)

2016-06-05 Thread Matthias Boehm (JIRA)

Matthias Boehm created SYSTEMML-755:
---

 Summary: Extended multi-threaded rand (column parallelization) 
 Key: SYSTEMML-755
 URL: https://issues.apache.org/jira/browse/SYSTEMML-755
 Project: SystemML
  Issue Type: Task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-755) Improvements multi-threaded rand (column parallelization)

2016-06-05 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-755:

Summary: Improvements multi-threaded rand (column parallelization)   (was: 
Extended multi-threaded rand (column parallelization) )

> Improvements multi-threaded rand (column parallelization) 
> --
>
> Key: SYSTEMML-755
> URL: https://issues.apache.org/jira/browse/SYSTEMML-755
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-755) Improvements multi-threaded rand (column parallelization)

2016-06-05 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-755:

Description: So far, we parallelize rand over row blocks in order to avoid 
synchronization for sparse row updates. This leads to single-threaded execution 
if a matrix has fewer rows than the blocksize (i.e., 1K rows). For dense 
matrices this is unnecessary. This task accordingly aims to generalize these 
multi-threaded rand operations to parallelize optionally over columns if the 
output matrix is dense and if there are more column blocks than row blocks.   
(was: So far, we parallelize rand over row blocks in order to avoid 
synchronization for sparse row updates. This leads to single-threaded execution 
if a matrix has fewer rows than the blocksize (i.e., 1K rows). For dense 
matrices this is unnecessary. This task accordingly aims to generalize these 
multi-threaded rand operations to parallelize optionally over columns if the 
output matrix is dense and if there are more column than row blocks. )

> Improvements multi-threaded rand (column parallelization) 
> --
>
> Key: SYSTEMML-755
> URL: https://issues.apache.org/jira/browse/SYSTEMML-755
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>
> So far, we parallelize rand over row blocks in order to avoid synchronization 
> for sparse row updates. This leads to single-threaded execution if a matrix 
> has fewer rows than the blocksize (i.e., 1K rows). For dense matrices this is 
> unnecessary. This task accordingly aims to generalize these multi-threaded 
> rand operations to parallelize optionally over columns if the output matrix 
> is dense and if there are more column blocks than row blocks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SYSTEMML-755) Improvements multi-threaded rand (column parallelization)

2016-06-05 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-755:

Description: So far, we parallelize rand over row blocks in order to avoid 
synchronization for sparse row updates. This leads to single-threaded execution 
if a matrix has fewer rows than the blocksize (i.e., 1K rows). For dense 
matrices this is unnecessary. This task accordingly aims to generalize these 
multi-threaded rand operations to parallelize optionally over columns if the 
output matrix is dense and if there are more column than row blocks. 

> Improvements multi-threaded rand (column parallelization) 
> --
>
> Key: SYSTEMML-755
> URL: https://issues.apache.org/jira/browse/SYSTEMML-755
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>
> So far, we parallelize rand over row blocks in order to avoid synchronization 
> for sparse row updates. This leads to single-threaded execution if a matrix 
> has fewer rows than the blocksize (i.e., 1K rows). For dense matrices this is 
> unnecessary. This task accordingly aims to generalize these multi-threaded 
> rand operations to parallelize optionally over columns if the output matrix 
> is dense and if there are more column than row blocks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SYSTEMML-755) Improvements multi-threaded rand (column parallelization)

2016-06-06 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-755.
-
   Resolution: Done
Fix Version/s: SystemML 0.11

> Improvements multi-threaded rand (column parallelization) 
> --
>
> Key: SYSTEMML-755
> URL: https://issues.apache.org/jira/browse/SYSTEMML-755
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
> Fix For: SystemML 0.11
>
>
> So far, we parallelize rand over row blocks in order to avoid synchronization 
> for sparse row updates. This leads to single-threaded execution if a matrix 
> has fewer rows than the blocksize (i.e., 1K rows). For dense matrices this is 
> unnecessary. This task accordingly aims to generalize these multi-threaded 
> rand operations to parallelize optionally over columns if the output matrix 
> is dense and if there are more column blocks than row blocks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (SYSTEMML-755) Improvements multi-threaded rand (column parallelization)

2016-06-06 Thread Matthias Boehm (JIRA)


 [ 
https://issues.apache.org/jira/browse/SYSTEMML-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-755.
---
Assignee: Matthias Boehm

> Improvements multi-threaded rand (column parallelization) 
> --
>
> Key: SYSTEMML-755
> URL: https://issues.apache.org/jira/browse/SYSTEMML-755
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.11
>
>
> So far, we parallelize rand over row blocks in order to avoid synchronization 
> for sparse row updates. This leads to single-threaded execution if a matrix 
> has fewer rows than the blocksize (i.e., 1K rows). For dense matrices this is 
> unnecessary. This task accordingly aims to generalize these multi-threaded 
> rand operations to parallelize optionally over columns if the output matrix 
> is dense and if there are more column blocks than row blocks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1354 matches

Mail list logo