[jira] [Resolved] (SYSTEMML-1818) Perftest: Kmeans train fails for 10Kx100, k=50, w/ forced singlenode

2017-07-29 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1818.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Perftest: Kmeans train fails for 10Kx100, k=50, w/ forced singlenode
> 
>
> Key: SYSTEMML-1818
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1818
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> The kmeans algorithm fails when forced to singlenode with the following 
> exception:
> {code}
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Invalid values for 
> matrix indexing: dimensions of the source matrix [451x10] do not match the 
> shape of the matrix specified by indices [1:10, 1:10].
> at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.leftIndexingOperations(MatrixBlock.java:3654)
> at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.leftIndexingOperations(MatrixBlock.java:3631)
> at 
> org.apache.sysml.runtime.instructions.cp.MatrixIndexingCPInstruction.processInstruction(MatrixIndexingCPInstruction.java:95)
> at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:286)
> {code}
> Furthermore, there seems to be an issue of unnecessary spark context creation 
> - in hybrid_spark mode we do not instantiate the spark context, while in 
> forced singlenode we do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1818) Perftest: Kmeans train fails for 10Kx100, k=50, w/ forced singlenode

2017-07-29 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1818.


> Perftest: Kmeans train fails for 10Kx100, k=50, w/ forced singlenode
> 
>
> Key: SYSTEMML-1818
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1818
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> The kmeans algorithm fails when forced to singlenode with the following 
> exception:
> {code}
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Invalid values for 
> matrix indexing: dimensions of the source matrix [451x10] do not match the 
> shape of the matrix specified by indices [1:10, 1:10].
> at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.leftIndexingOperations(MatrixBlock.java:3654)
> at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.leftIndexingOperations(MatrixBlock.java:3631)
> at 
> org.apache.sysml.runtime.instructions.cp.MatrixIndexingCPInstruction.processInstruction(MatrixIndexingCPInstruction.java:95)
> at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:286)
> {code}
> Furthermore, there seems to be an issue of unnecessary spark context creation 
> - in hybrid_spark mode we do not instantiate the spark context, while in 
> forced singlenode we do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1824) Wrong transformapply output with col name specification and subset of cols

2017-08-01 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1824:
-
Description: 
Given a frame input with 3 rows and column names {{name1, name2}}, the 
following script produces incorrect results:
{code}
X = read("X.csv", data_type="frame", format="csv", header=TRUE);
spec = "{ids: false, recode: [ name1, name2 ]}";
[Y,M] = transformencode(target=X, spec=spec);
spec2 = "{ids: false, recode: [ name2 ]}";
Z = transformapply(target=X[,2], spec=spec2, meta=M)
print(toString(Z));
{code}

The output is supposed to be 3x1 and properly recoded but currently returns
{code}
NaN
NaN
NaN
{code}

  was:
Given a frame input with column names {{name1, name2}}, the following script 
produces incorrect results:
{code}
X = read("X.csv", data_type="frame", format="csv", header=TRUE);
spec = "{ids: false, recode: [ name1, name2 ]}";
[Y,M] = transformencode(target=X, spec=spec);
spec2 = "{ids: false, recode: [ name2 ]}";
Z = transformapply(target=X[,2], spec=spec2, meta=M)
print(toString(Z));
{code}

The output is supposed to be 1x3 and properly recoded but currently returns
{code}
NaN
NaN
NaN
{code}


> Wrong transformapply output with col name specification and subset of cols
> --
>
> Key: SYSTEMML-1824
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1824
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>
> Given a frame input with 3 rows and column names {{name1, name2}}, the 
> following script produces incorrect results:
> {code}
> X = read("X.csv", data_type="frame", format="csv", header=TRUE);
> spec = "{ids: false, recode: [ name1, name2 ]}";
> [Y,M] = transformencode(target=X, spec=spec);
> spec2 = "{ids: false, recode: [ name2 ]}";
> Z = transformapply(target=X[,2], spec=spec2, meta=M)
> print(toString(Z));
> {code}
> The output is supposed to be 3x1 and properly recoded but currently returns
> {code}
> NaN
> NaN
> NaN
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1822) java.lang.ArrayStoreException with for-loop inside UDF

2017-08-01 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110162#comment-16110162
 ] 

Matthias Boehm commented on SYSTEMML-1822:
--

ok, I just gave it a try and was not able to reproduce this. Could it be that 
https://github.com/apache/systemml/commit/b67f18641b74c08c7f30447eafe45c6e2945fd0f
 already fixed this?

Also there seems to be a logging issue in your environment. This trace 
information is only created if log=TRACE - could it be that your log4j 
configuration is missing. 

> java.lang.ArrayStoreException with for-loop inside UDF
> --
>
> Key: SYSTEMML-1822
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1822
> Project: SystemML
>  Issue Type: Bug
>Reporter: Gus Jenkins
>
> Hey I'm working on something and came upon a bug. I boiled the problem down 
> to the following. If I run the following code, it works fine. But if I run 
> the latter code with a simple for-loop in the UDF, I get an error. This 
> actually may be the same case with any loop. Please see below. I also 
> included the error message.
> {code}
> discrete_samples = function(matrix[double] weights) return(double ix){
> ix = 0.00
> }
> weights = matrix(1, rows = 5, cols = 1)
> ix = discrete_samples(weights)
> print("Index here: " + ix)
> {code}
> {code}
> discrete_samples = function(matrix[double] weights) return(double ix){
> ix = 0.00
> for(j in 1:5){
> print("Hello")
> }
> }
> weights = matrix(1, rows = 5, cols = 1)
> ix = discrete_samples(weights)
> print("Index here: " + ix)
> {code}
> {code}
> stcs-mbp:spark-2.1.1-bin-hadoop2.7 stc$ spark-submit SystemML.jar -f 
> /Users/stc/Desktop/systemml/scripts/algorithms/test.dml 
> log4j:WARN No appenders could be found for logger 
> (org.apache.hadoop.util.Shell).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Exception in thread "main" org.apache.sysml.api.DMLException: 
> java.lang.ArrayStoreException: java.lang.Integer
> at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:533)
> at org.apache.sysml.api.DMLScript.main(DMLScript.java:233)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ArrayStoreException: java.lang.Integer
> at java.util.AbstractCollection.toArray(AbstractCollection.java:196)
> at 
> org.apache.sysml.hops.ipa.FunctionCallSizeInfo.toString(FunctionCallSizeInfo.java:306)
> at java.lang.String.valueOf(String.java:2994)
> at java.lang.StringBuilder.append(StringBuilder.java:131)
> at 
> org.apache.sysml.hops.ipa.InterProceduralAnalysis.analyzeProgram(InterProceduralAnalysis.java:181)
> at 
> org.apache.sysml.parser.DMLTranslator.rewriteHopsDAG(DMLTranslator.java:269)
> at org.apache.sysml.api.DMLScript.execute(DMLScript.java:761)
> at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:506)
> ... 10 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1824) Wrong transformapply output with col name specification and subset of cols

2017-08-01 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1824:


 Summary: Wrong transformapply output with col name specification 
and subset of cols
 Key: SYSTEMML-1824
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1824
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm


Given a frame input with column names {{name1, name2}}, the following script 
produces incorrect results:
{code}
X = read("X.csv", data_type="frame", format="csv", header=TRUE);
spec = "{ids: false, recode: [ name1, name2 ]}";
[Y,M] = transformencode(target=X, spec=spec);
spec2 = "{ids: false, recode: [ name2 ]}";
Z = transformapply(target=X[,2], spec=spec2, meta=M)
print(toString(Z));
{code}

The output is supposed to be 1x3 and properly recoded but currently returns
{code}
NaN
NaN
NaN
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1823) Missing process exit due to active frame reader threadpool

2017-08-01 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1823:
-
Summary: Missing process exit due to active frame reader threadpool  (was: 
Missing process exit due to pending tasks in reader threadpools)

> Missing process exit due to active frame reader threadpool
> --
>
> Key: SYSTEMML-1823
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1823
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1823) Missing process exit due to pending tasks in reader threadpools

2017-08-01 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1823:


 Summary: Missing process exit due to pending tasks in reader 
threadpools
 Key: SYSTEMML-1823
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1823
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1839) NPE on parfor initialization w/o log4j configuration

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1839.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> NPE on parfor initialization w/o log4j configuration
> 
>
> Key: SYSTEMML-1839
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1839
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> When calling SystemML in embedded deployments (e.g., through JMLC), there is 
> not necessarily a log4j configuration in the classpath or JVM arguments. In 
> such environments the static initialization of {{ParForStatementBlock}} fails 
> with a nullpointer exception because we try to obtain the default log level 
> and convert it to string although this default might be null.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1840) Transform spec literals should be checked during validate

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1840.


> Transform spec literals should be checked during validate
> -
>
> Key: SYSTEMML-1840
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1840
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Currently, there is no validation happening for transform specifications 
> during initial compilation. This is very annoying, especially when trying to 
> encode large files, which takes a while to read in, just to find out that the 
> given transform specification was invalid json. Here is an example:
> {code}
> Caused by: org.apache.wink.json4j.JSONException: Expecting '{' on line 1, 
> column 4 instead, obtained token: 'Token: String - 'ids''
> at org.apache.wink.json4j.internal.Parser.parseObject(Parser.java:193)
> at org.apache.wink.json4j.internal.Parser.parse(Parser.java:130)
> at org.apache.wink.json4j.internal.Parser.parse(Parser.java:95)
> at org.apache.wink.json4j.JSONObject.(JSONObject.java:138)
> at 
> org.apache.sysml.runtime.transform.encode.EncoderFactory.createEncoder(EncoderFactory.java:56)
> {code}
> This task aims to parse the transform specification if its available as a 
> literal string during the language validation step.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1838) Performance issues sparse/ultra-sparse binary read

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1838.


> Performance issues sparse/ultra-sparse binary read
> --
>
> Key: SYSTEMML-1838
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1838
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Recent experiments with PageRank (20 iterations) on a 1M x 1M, sp=0.001 input 
> showed that the actual iterations are indeed very fast, at peak memory 
> bandwidth (i.e., ~500ms per iteration in CP only) but the initial read is 
> unnecessarily slow, and thus dominating the entire execution time. For 
> example, in this scenario, the read took 41s. 
> This task aims to improve the read performance of sparse and ultra-sparse 
> matrices into CP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1836) Large GC overhead for scripts w/ row-wise generated operators.

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1836.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Large GC overhead for scripts w/ row-wise generated operators.
> --
>
> Key: SYSTEMML-1836
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1836
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> This task aims to improve the unnecessary large garbage collection overhead 
> for scripts with many row-wise fused operators. For example, Kmeans and 
> Mlogreg over 10M x 10 inputs show GC overheads of 102s and 37s, respectively.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1292) Support spark codegen instructions w/ multiple RDD inputs

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1292.


> Support spark codegen instructions w/ multiple RDD inputs
> -
>
> Key: SYSTEMML-1292
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1292
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> This task aims to support spark codegen instructions (for all templates) over 
> multiple RDD inputs if not all side inputs fit into the local and remote 
> broadcast memory budgets. In detail, this might entail either (1) generating 
> custom RDD operations and functions for various combinations of input RDDs, 
> or (2) a generalization of the related spark instructions regarding the input 
> RDD construction and a generic function signature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1838) Performance issues sparse/ultra-sparse binary read

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1838.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Performance issues sparse/ultra-sparse binary read
> --
>
> Key: SYSTEMML-1838
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1838
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Recent experiments with PageRank (20 iterations) on a 1M x 1M, sp=0.001 input 
> showed that the actual iterations are indeed very fast, at peak memory 
> bandwidth (i.e., ~500ms per iteration in CP only) but the initial read is 
> unnecessarily slow, and thus dominating the entire execution time. For 
> example, in this scenario, the read took 41s. 
> This task aims to improve the read performance of sparse and ultra-sparse 
> matrices into CP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1443) Handling of plan selection constraints (e.g., memory/blocksize)

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1443.
--
Resolution: Done
  Assignee: Matthias Boehm

> Handling of plan selection constraints (e.g., memory/blocksize)
> ---
>
> Key: SYSTEMML-1443
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1443
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1292) Support spark codegen instructions w/ multiple RDD inputs

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1292.
--
Resolution: Done
  Assignee: Matthias Boehm

> Support spark codegen instructions w/ multiple RDD inputs
> -
>
> Key: SYSTEMML-1292
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1292
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> This task aims to support spark codegen instructions (for all templates) over 
> multiple RDD inputs if not all side inputs fit into the local and remote 
> broadcast memory budgets. In detail, this might entail either (1) generating 
> custom RDD operations and functions for various combinations of input RDDs, 
> or (2) a generalization of the related spark instructions regarding the input 
> RDD construction and a generic function signature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1443) Handling of plan selection constraints (e.g., memory/blocksize)

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1443.


> Handling of plan selection constraints (e.g., memory/blocksize)
> ---
>
> Key: SYSTEMML-1443
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1443
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1842) Compression decision lost after recompilation or codegen

2017-08-15 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1842:


 Summary: Compression decision lost after recompilation or codegen
 Key: SYSTEMML-1842
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1842
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm


Even with forced compression (compressed.linalg=true), compression is currently 
not applied if the respective HOP DAG is recompiled or subject to code 
generation. The root cause is an incomplete deep copy of the HOP DAG which 
loses the compression flag.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1843) Wrong loop update-in-place decisions

2017-08-15 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1843:


 Summary: Wrong loop update-in-place decisions 
 Key: SYSTEMML-1843
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1843
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm


For special cases, where a matrix is simply updated in a loop, the rewrite for 
marking updated loop variables as update-in-place mistakenly flags these 
variables. For example, consider the following script:

{code}
...
for(i in 1:100) {
  q = as.matrix(sum(X * U%*%t(V)))
  print("at iteration "+i);
}
{code}

and the related hop explain output

{code}
FOR (lines 9-13) [in-place=[q]]
--GENERIC (lines 10-12) [recompile=true]
(46) TRead X [8026324,2330066,1000,1000,22507155] [0,0,1317 -> 1317MB], 
CP
(48) TRead U [8026324,10,1000,1000,80263240] [0,0,612 -> 612MB], CP
(49) TRead V [2330066,10,1000,1000,23300660] [0,0,178 -> 178MB], CP
(50) r(t) (49) [10,2330066,1000,1000,23300660] [178,0,178 -> 356MB], CP
(51) ba(+*) (48,50) [8026324,2330066,1000,1000,-1] 
[790,85611347,142683904 -> 228296041MB], SPARK
(52) b(*) (46,51) [8026324,2330066,1000,1000,-1] [142685221,0,1317 -> 
142686537MB], SPARK
(53) ua(+RC) (52) [0,0,-1,-1,-1] [1317,0,0 -> 1317MB], SPARK
(54) u(cast_as_matrix) (53) [1,1,1000,1000,-1] [0,0,0 -> 0MB]
(55) TWrite q (54) [1,1,1000,1000,-1] [0,0,0 -> 0MB], CP
(47) TRead i [0,0,0,0,-1] [0,0,0 -> 0MB], CP
(57) b(+) (47) [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
(58) u(print) (57) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
{code}

As can be seen above variable q is mistakenly marked as update in place, which 
causes unnecessary copies and thus can negatively affect performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1841) Performance issue codegen outer over ultra-sparse matrices

2017-08-15 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1841:


 Summary: Performance issue codegen outer over ultra-sparse matrices
 Key: SYSTEMML-1841
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1841
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm


Experiments with codegen outer operations over the Amazon Books review dataset 
(8,026,324 x 2,330,066, nnz=22,507,155, i.e., sparsity=10^(-6)) showed 
unnecessary overhead for this ultra-sparse data set. This task aims to remove 
this overhead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1839) NPE on parfor initialization w/o log4j configuration

2017-08-14 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1839:


 Summary: NPE on parfor initialization w/o log4j configuration
 Key: SYSTEMML-1839
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1839
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm


When calling SystemML in embedded deployments (e.g., through JMLC), there is 
not necessarily a log4j configuration in the classpath or JVM arguments. In 
such environments the static initialization of {{ParForStatementBlock}} fails 
with a nullpointer exception because we try to obtain the default log level and 
convert it to string although this default might be null.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1811) Can we Implement X%*%t(X) in a better way?

2017-08-10 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121215#comment-16121215
 ] 

Matthias Boehm commented on SYSTEMML-1811:
--

That's fine - don't worry. I just closed it to cleanup the list of open issues.

> Can we Implement X%*%t(X) in a better way?
> --
>
> Key: SYSTEMML-1811
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1811
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Janardhan
>Assignee: Matthias Boehm
> Fix For: Not Applicable
>
>
> A matrix multiplied by its self transpose is a frequent occurrence in many 
> algorithms ( a lot of them). There is definitely a way to take into 
> consideration the special properties of this matrix operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1837) Unary aggregate w/ corrections output to large physical blocks

2017-08-11 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1837:


 Summary: Unary aggregate w/ corrections output to large physical 
blocks
 Key: SYSTEMML-1837
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1837
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm


Many unary aggregate operations store corrections in additional columns or 
rows. For example, {{rowSums(X)}} uses a two-column output to store sums and 
corrections. In CP, we drop these corrections immediately after the operations, 
while in MR and Spark these corrections are dropped after final aggregation. 
The issue is that the {{MatrixBlock::dropLastRowsOrColums}} does not actually 
drop the correction but simply shifts all values in the right starting 
positions. Hence, the physical output is actually larger than what the memory 
estimates represent. This leads to unnecessary large memory consumption during 
subsequent operations and in the buffer pool, which can lead to OOMs. This task 
aims to fix {{MatrixBlock::dropLastRowsOrColums}}. 

In a subsequent task, we could also modify all unary aggregates to never 
allocate the multi-column/row output when executed in CP. However, this 
requires custom code paths for the different backends. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1292) Support spark codegen instructions w/ multiple RDD inputs

2017-08-11 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1292:
-
Summary: Support spark codegen instructions w/ multiple RDD inputs  (was: 
Generate n-ary rdd operations)

> Support spark codegen instructions w/ multiple RDD inputs
> -
>
> Key: SYSTEMML-1292
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1292
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> This task aims to support spark codegen instructions (for all templates) over 
> multiple RDD inputs if not all side inputs fit into the local and remote 
> broadcast memory budgets. In detail, this might entail either (1) generating 
> custom RDD operations and functions for various combinations of input RDDs, 
> or (2) a generalization of the related spark instructions regarding the input 
> RDD construction and a generic function signature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1292) Generate n-ary rdd operations

2017-08-11 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1292:
-
Description: This task aims to support spark codegen instructions (for all 
templates) over multiple RDD inputs if not all side inputs fit into the local 
and remote broadcast memory budgets. In detail, this might entail either (1) 
generating custom RDD operations and functions for various combinations of 
input RDDs, or (2) a generalization of the related spark instructions regarding 
the input RDD construction and a generic function signature.

> Generate n-ary rdd operations
> -
>
> Key: SYSTEMML-1292
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1292
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> This task aims to support spark codegen instructions (for all templates) over 
> multiple RDD inputs if not all side inputs fit into the local and remote 
> broadcast memory budgets. In detail, this might entail either (1) generating 
> custom RDD operations and functions for various combinations of input RDDs, 
> or (2) a generalization of the related spark instructions regarding the input 
> RDD construction and a generic function signature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1846) Transformapply w/ column names fails with index-out-of-bounds

2017-08-16 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1846:


 Summary: Transformapply w/ column names fails with 
index-out-of-bounds
 Key: SYSTEMML-1846
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1846
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm


Given a simple transformapply scenario as shown in the following script
{code}
spec = "{ids: false, recode: [ zipcode, district, view ]}";
[X, M] = transformencode(target=F, spec=spec);
spec2 = "{ids: false, recode: [ zipcode ]}";
X2 = transformapply(target=F[,1], spec=spec2, meta=M);
{code}

currently leads to index out-of-bounds exceptions because the column name 
zipcode is not found in the column names of the meta data frame. The root cause 
is a wrong assumption of sorted column names in the underlying implementation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1750) Optional dynamic recompilation for JMLC training

2017-07-07 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1750.
--
   Resolution: Done
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Optional dynamic recompilation for JMLC training
> 
>
> Key: SYSTEMML-1750
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1750
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> There are scenarios where JMLC is used for training on moderately sized 
> input. Due to the use of prepared scripts (which are compiled without size 
> information) and forced singlenode execution type, this can lead to 
> performance problems caused by poor plan choices. This task aims to (1) 
> expose compiler configurations such as dynamic recompilation and 
> multi-threading at JMLC API and (2) rework the recompilation framework for 
> singlenode execution type.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1749) Single-threaded csv frame reader incorrect output w/ multiple splits

2017-07-07 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1749.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Single-threaded csv frame reader incorrect output w/ multiple splits
> 
>
> Key: SYSTEMML-1749
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1749
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> The single-threaded csv frame reader produces incorrect outputs if the used 
> record reader returns multiple splits because each split content is written 
> at starting position 0, which ultimately previously read content.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1745) Support rowwise cumsum operations

2017-07-07 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1745.


> Support rowwise cumsum operations
> -
>
> Key: SYSTEMML-1745
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1745
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1761) Sparsity-exploiting weighted squared loss w/o weights

2017-07-11 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1761:


 Summary: Sparsity-exploiting weighted squared loss w/o weights
 Key: SYSTEMML-1761
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1761
 Project: SystemML
  Issue Type: Task
Reporter: Matthias Boehm


There are existing rewrites and fused operators for weighted squared loss 
(wsloss). However, for the wsloss type {{NONE}}, i.e., without weights 
{{sum((X-(U%*%t(V)))^2)}}, the implementation is not sparsity-exploiting 
leading huge (unnecessary) computation overhead. As it turns out this 
expression can be rewritten into a sparsity-exploiting form as follows:
{code}
sum ((X - U %*% t(V)) ^ 2)
-> sum(X^2) - sum(2 * (X * (U%*%t(V + sum((U%*%t(V))^2)
-> sum(X^2) - sum(2 * (X * (U%*%t(V + sum ((t(U) %*% U) * (t(V) %*% V))
{code}

This task aims to change the block-level wsloss NONE implementation to exploit 
this logical rewrite by computing {{sum(X^2) - sum(2 * (X * (U%*%t(V}} in a 
sparsity-exploiting pass over non-zeros in X and a subsequent correct for {{+ 
sum ((t(U) %*% U) * (t(V) %*% V))}} via two tsmm operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1756) Potential infinite recursion in Explain#explain(DMLProgram, Program, ExplainType)

2017-07-11 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083416#comment-16083416
 ] 

Matthias Boehm commented on SYSTEMML-1756:
--

[~tedyu] if it's not too much to ask could you please create a PR for this? 

> Potential infinite recursion in Explain#explain(DMLProgram, Program, 
> ExplainType)
> -
>
> Key: SYSTEMML-1756
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1756
> Project: SystemML
>  Issue Type: Bug
>Reporter: Ted Yu
>
> Here is related code:
> {code}
> public static String explain(DMLProgram prog, Program rtprog, 
> ExplainType type)
> throws HopsException, DMLRuntimeException, LanguageException {
> return explain(prog, rtprog, type);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1759) make UDFs callable from expressions

2017-07-11 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083412#comment-16083412
 ] 

Matthias Boehm commented on SYSTEMML-1759:
--

yes, this should definitely be addressed; the same issues has been raised here: 
https://www.mail-archive.com/dev@systemml.apache.org/msg00095.html

> make UDFs callable from expressions
> ---
>
> Key: SYSTEMML-1759
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1759
> Project: SystemML
>  Issue Type: Improvement
>  Components: Parser
>Affects Versions: SystemML 1.0
> Environment: IBM DataScience Experience
>Reporter: Romeo Kienzer
>Priority: Minor
>
> SystemML Parser stops with exeption:
> only builtin functions allowed as part of
> expression
> But the following re-write could be automated during parsing
> delta3 = -(Y-yHat) * sigmoidPrime(z3)
> =>
> smp = sigmoidPrime(z3)
> delta3 = -(Y-yHat) * smp
> Please consider the following code:
> script = """
> 
> #
> sigmoid = function(matrix[double] z) return (matrix[double] z) {
> z = 1/(1+exp(-z))
> }
> 
> 
> sigmoidPrime = function(matrix[double] z) return (matrix[double] z) {
> #Gradient of sigmoid
> z = exp(-z)/(1+exp(-z))
> }
> 
> X=matrix("3 5 5 1 10 2", rows=3, cols=2) 
> inputLayerSize = 2
> outputLayerSize = 1
> hiddenLayerSize = 3
> 
> W1 = rand(rows=inputLayerSize,cols=hiddenLayerSize)
> W2 = rand(rows=hiddenLayerSize,cols=outputLayerSize)
> 
> feedForward = function (matrix[double] X,
> matrix[double] W1,
> matrix[double] W2) return (matrix[double] 
> z2,matrix[double] z3,matrix[double] Y) {
> z2 =  X %*% W1
> a2 =  sigmoid(z2)
> z3 = (a2 %*% W2)
> Y = sigmoid(z3)
> }
> 
> 
> gradient = function(matrix[double] X,
> matrix[double] W1,
> matrix[double] W2,
> matrix[double] Y) return (matrix[double] 
> dJdW1,matrix[double] dJdW1) {
> #Compute derivative with respect to W and W2 for a given X and y:
> [z2,z3,Yhat] = feedForward(X,W1,W2)
>
> delta3 = -(Y-yHat) * sigmoidPrime(z3)
> dJdW2 = t(a2) %*% delta3
> 
> delta2 = (delta3 %*% t(W2))*sigmoidPrime(z2)
> dJdW1 = t(X) %*% delta2  
> }
> 
> [z2,z3,Yhat]=feedForward(X,W1,W2)
> nrx = nrow(X)
> ncx = ncol(X)
> nrw1 = nrow(W1)
> ncw1 = ncol(W1)
> """
> I'm getting a parser exeption saying that
> Caused by: org.apache.sysml.parser.ParseException: 
> -- The following 
> 2 parse issues were encountered:
> 1 [line 39:25] [Validation error] -> delta3 = -(Y-yHat) * 
> sigmoidPrime(z3)only builtin functions allowed as part of
> expression
> 2 [line 42:32] [Validation error] -> delta2 = (delta3 %*% > 
> t(W2))*sigmoidPrime(z2)only builtin functions allowed as part of
> expression



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1761) Sparsity-exploiting weighted squared loss w/o weights

2017-07-11 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1761:
-
Description: 
There are existing rewrites and fused operators for weighted squared loss 
(wsloss). However, for the wsloss type {{NONE}}, i.e., without weights 
{{sum((X-(U%*%t(V)))^2)}}, the implementation is not sparsity-exploiting 
leading to huge (unnecessary) computation overhead for the outer-product-like 
multiply of factors. As it turns out, this expression can be rewritten into a 
sparsity-exploiting form as follows:
{code}
sum ((X - U %*% t(V)) ^ 2)
-> sum(X^2) - sum(2 * (X * (U%*%t(V + sum((U%*%t(V))^2)
-> sum(X^2) - sum(2 * (X * (U%*%t(V + sum ((t(U) %*% U) * (t(V) %*% V))
{code}

This task aims to change the block-level wsloss NONE implementation to exploit 
this logical rewrite by computing {{sum(X^2) - sum(2 * (X * (U%*%t(V}} in a 
sparsity-exploiting pass over non-zeros in X and a subsequent correction for 
{{+ sum ((t(U) %*% U) * (t(V) %*% V))}} via two tsmm operations.

  was:
There are existing rewrites and fused operators for weighted squared loss 
(wsloss). However, for the wsloss type {{NONE}}, i.e., without weights 
{{sum((X-(U%*%t(V)))^2)}}, the implementation is not sparsity-exploiting 
leading to huge (unnecessary) computation overhead for the outer-product-like 
multiply of factors. As it turns out, this expression can be rewritten into a 
sparsity-exploiting form as follows:
{code}
sum ((X - U %*% t(V)) ^ 2)
-> sum(X^2) - sum(2 * (X * (U%*%t(V + sum((U%*%t(V))^2)
-> sum(X^2) - sum(2 * (X * (U%*%t(V + sum ((t(U) %*% U) * (t(V) %*% V))
{code}

This task aims to change the block-level wsloss NONE implementation to exploit 
this logical rewrite by computing {{sum(X^2) - sum(2 * (X * (U%*%t(V}} in a 
sparsity-exploiting pass over non-zeros in X and a subsequent correct for {{+ 
sum ((t(U) %*% U) * (t(V) %*% V))}} via two tsmm operations.


> Sparsity-exploiting weighted squared loss w/o weights
> -
>
> Key: SYSTEMML-1761
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1761
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>
> There are existing rewrites and fused operators for weighted squared loss 
> (wsloss). However, for the wsloss type {{NONE}}, i.e., without weights 
> {{sum((X-(U%*%t(V)))^2)}}, the implementation is not sparsity-exploiting 
> leading to huge (unnecessary) computation overhead for the outer-product-like 
> multiply of factors. As it turns out, this expression can be rewritten into a 
> sparsity-exploiting form as follows:
> {code}
> sum ((X - U %*% t(V)) ^ 2)
> -> sum(X^2) - sum(2 * (X * (U%*%t(V + sum((U%*%t(V))^2)
> -> sum(X^2) - sum(2 * (X * (U%*%t(V + sum ((t(U) %*% U) * (t(V) %*% V))
> {code}
> This task aims to change the block-level wsloss NONE implementation to 
> exploit this logical rewrite by computing {{sum(X^2) - sum(2 * (X * 
> (U%*%t(V}} in a sparsity-exploiting pass over non-zeros in X and a 
> subsequent correction for {{+ sum ((t(U) %*% U) * (t(V) %*% V))}} via two 
> tsmm operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1761) Sparsity-exploiting weighted squared loss w/o weights

2017-07-11 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1761:
-
Description: 
There are existing rewrites and fused operators for weighted squared loss 
(wsloss). However, for the wsloss type {{NONE}}, i.e., without weights 
{{sum((X-(U%*%t(V)))^2)}}, the implementation is not sparsity-exploiting 
leading to huge (unnecessary) computation overhead for the outer-product-like 
multiply of factors. As it turns out, this expression can be rewritten into a 
sparsity-exploiting form as follows:
{code}
sum ((X - U %*% t(V)) ^ 2)
-> sum(X^2) - sum(2 * (X * (U%*%t(V + sum((U%*%t(V))^2)
-> sum(X^2) - sum(2 * (X * (U%*%t(V + sum ((t(U) %*% U) * (t(V) %*% V))
{code}

This task aims to change the block-level wsloss NONE implementation to exploit 
this logical rewrite by computing {{sum(X^2) - sum(2 * (X * (U%*%t(V}} in a 
sparsity-exploiting pass over non-zeros in X and a subsequent correct for {{+ 
sum ((t(U) %*% U) * (t(V) %*% V))}} via two tsmm operations.

  was:
There are existing rewrites and fused operators for weighted squared loss 
(wsloss). However, for the wsloss type {{NONE}}, i.e., without weights 
{{sum((X-(U%*%t(V)))^2)}}, the implementation is not sparsity-exploiting 
leading huge (unnecessary) computation overhead. As it turns out this 
expression can be rewritten into a sparsity-exploiting form as follows:
{code}
sum ((X - U %*% t(V)) ^ 2)
-> sum(X^2) - sum(2 * (X * (U%*%t(V + sum((U%*%t(V))^2)
-> sum(X^2) - sum(2 * (X * (U%*%t(V + sum ((t(U) %*% U) * (t(V) %*% V))
{code}

This task aims to change the block-level wsloss NONE implementation to exploit 
this logical rewrite by computing {{sum(X^2) - sum(2 * (X * (U%*%t(V}} in a 
sparsity-exploiting pass over non-zeros in X and a subsequent correct for {{+ 
sum ((t(U) %*% U) * (t(V) %*% V))}} via two tsmm operations.


> Sparsity-exploiting weighted squared loss w/o weights
> -
>
> Key: SYSTEMML-1761
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1761
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>
> There are existing rewrites and fused operators for weighted squared loss 
> (wsloss). However, for the wsloss type {{NONE}}, i.e., without weights 
> {{sum((X-(U%*%t(V)))^2)}}, the implementation is not sparsity-exploiting 
> leading to huge (unnecessary) computation overhead for the outer-product-like 
> multiply of factors. As it turns out, this expression can be rewritten into a 
> sparsity-exploiting form as follows:
> {code}
> sum ((X - U %*% t(V)) ^ 2)
> -> sum(X^2) - sum(2 * (X * (U%*%t(V + sum((U%*%t(V))^2)
> -> sum(X^2) - sum(2 * (X * (U%*%t(V + sum ((t(U) %*% U) * (t(V) %*% V))
> {code}
> This task aims to change the block-level wsloss NONE implementation to 
> exploit this logical rewrite by computing {{sum(X^2) - sum(2 * (X * 
> (U%*%t(V}} in a sparsity-exploiting pass over non-zeros in X and a 
> subsequent correct for {{+ sum ((t(U) %*% U) * (t(V) %*% V))}} via two tsmm 
> operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1757) Optional selective function recompilation in JMLC

2017-07-10 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1757:


 Summary: Optional selective function recompilation in JMLC
 Key: SYSTEMML-1757
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1757
 Project: SystemML
  Issue Type: Task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1752) Cache-conscious mmchain matrix multiply for wide matrices

2017-07-08 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1752:
-
Description: 
The fused mmchain matrix multiply for patterns such as {{t(X) %*% (w * (X %*% 
v)) }} uses row-wise {{dotProduct}} and {{vectMultAdd}} operations, which works 
very well for the common case of tall matrices where individual rows fit 
into L1 cache. However, for graph and text scenarios with wide matrices this 
leads to cache trashing on the input and output vectors.

This task aims to generalize these dense and sparse operations to perform the 
computation in a cache-conscious manner when necessary, by accessing fragments 
of the input and output vector for groups of rows. For dense this is trivial to 
realize while for sparse it requires a careful determination of the block sizes 
according to the input sparsity. 
 Issue Type: Task  (was: Bug)

> Cache-conscious mmchain matrix multiply for wide matrices
> -
>
> Key: SYSTEMML-1752
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1752
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>
> The fused mmchain matrix multiply for patterns such as {{t(X) %*% (w * (X %*% 
> v)) }} uses row-wise {{dotProduct}} and {{vectMultAdd}} operations, which 
> works very well for the common case of tall matrices where individual 
> rows fit into L1 cache. However, for graph and text scenarios with wide 
> matrices this leads to cache trashing on the input and output vectors.
> This task aims to generalize these dense and sparse operations to perform the 
> computation in a cache-conscious manner when necessary, by accessing 
> fragments of the input and output vector for groups of rows. For dense this 
> is trivial to realize while for sparse it requires a careful determination of 
> the block sizes according to the input sparsity. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-888) Add PNMF algorithm to SystemML

2017-07-08 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079357#comment-16079357
 ] 

Matthias Boehm commented on SYSTEMML-888:
-

a version of PNMF.dml is in {{scripts/staging/}} and the related R script can 
be found in {{src/test/scripts/functions/codegen}}.

> Add PNMF algorithm to SystemML
> --
>
> Key: SYSTEMML-888
> URL: https://issues.apache.org/jira/browse/SYSTEMML-888
> Project: SystemML
>  Issue Type: Task
>  Components: Algorithms
>Reporter: Deron Eriksson
>Assignee: Janardhan
>
> Add the Poisson Nonnegative Matrix Factorization algorithm to the SystemML 
> algorithms.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1754) Performance removeEmpty w/ shallow copy if unmodified

2017-07-08 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1754:


 Summary: Performance removeEmpty w/ shallow copy if unmodified
 Key: SYSTEMML-1754
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1754
 Project: SystemML
  Issue Type: Task
Reporter: Matthias Boehm


Often removeEmpty is used to guard against special cases with entirely empty 
rows or columns. In case of no removed rows or columns the full copy into the 
output is unnecessarily inefficient. This task aims to modify removeEmpty 
rows/cols for both dense and sparse matrices to leverage a shallow copy if 
determined output dimensions match the input dimensions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1768) Cleanup SystemML-config.xml

2017-07-12 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1768:


 Summary: Cleanup SystemML-config.xml
 Key: SYSTEMML-1768
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1768
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm


cp.parallel.matrixmult -> cp.parallel.ops
cp.parallel.textio -> cp.parallel.io



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1426) Rename builtin function ceil to ceiling

2017-07-12 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1426:
-
Labels: beginner  (was: )

> Rename builtin function ceil to ceiling
> ---
>
> Key: SYSTEMML-1426
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1426
> Project: SystemML
>  Issue Type: Sub-task
>  Components: APIs, Compiler, Runtime
>Reporter: Matthias Boehm
>  Labels: beginner
> Fix For: SystemML 1.0
>
>
> The builtin function ceil unnecessarily differs from R's ceiling, which might 
> cause confusion. Hence, this task aims to rename ceil to ceiling.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1762) Improve the matrix reshape function for the Spark mode

2017-07-13 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085249#comment-16085249
 ] 

Matthias Boehm commented on SYSTEMML-1762:
--

thanks [~Tenma] for catching this issue. As it turned out, this issue occurs in 
the special case if, for a given input block, we create at least three output 
blocks and the first and last output block have the same row index. For 
example, if we have an output matrix of 13 column blocks and we computed (1,12) 
and (1,1) as the first and last output block index, we missed the middle index 
(1,13).

> Improve the matrix reshape function for the Spark mode
> --
>
> Key: SYSTEMML-1762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1762
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, ParFor, Runtime
>Reporter: Fei Hu
>Assignee: Fei Hu
> Attachments: MNIST_Distrib_Sgd.scala
>
>
> When running the [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  it works well in the hybrid mode. But in the Spark mode, there are some 
> errors about
> {{java.lang.NullPointerException}}  and 
> {{java.lang.ArrayIndexOutOfBoundsException: 1000}} when reshaping the matrix. 
> The involved functions are 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeSparse}} and 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#reshapeDense}}. The 
> reason is that the output matrix index computed by 
> {{org.apache.sysml.runtime.matrix.data.LibMatrixReorg#computeResultBlockIndex}}
>  does not match the keys in the {{HashMap rix}}. 
> To reproduce the error, the attached scala file {{MNIST_Distrib_Sgd.scala}} 
> could be used to run the distributed MNIST example.  
> In addition, if adding some codes to ignore the null output matrix block from 
> {{MatrixBlock out = rix.get(ixtmp)}},  the distributed MNIST example could 
> run in the Spark mode, but the result may not be right. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1755) Failed instruction generation during dynamic recompilation

2017-07-14 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1755.


> Failed instruction generation during dynamic recompilation
> --
>
> Key: SYSTEMML-1755
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1755
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> {code}
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Unable to recompile 
> program block.
> at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:159)
> at 
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:115)
> ... 13 more
> Caused by: java.lang.NullPointerException
> at org.apache.sysml.lops.BinaryScalar.getOpcode(BinaryScalar.java:119)
> at 
> org.apache.sysml.lops.BinaryScalar.getInstructions(BinaryScalar.java:84)
> at 
> org.apache.sysml.lops.compile.Dag.generateControlProgramJobs(Dag.java:1405)
> at org.apache.sysml.lops.compile.Dag.doGreedyGrouping(Dag.java:1175)
> at org.apache.sysml.lops.compile.Dag.getJobs(Dag.java:269)
> at 
> org.apache.sysml.hops.recompile.Recompiler.recompileHopsDag(Recompiler.java:240)
> at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:147)
> ... 14 more
> {code}
> The root cause was a simplification rewrite for binary matrix-scalar 
> operations which did not account for unsupported scalar operations such as 
> {{OpOp2.QUANTILE, OpOp2.CENTRALMOMENT, OpOp2.MINUS1_MULT, OpOp2.MINUS_NZ, 
> OpOp2.LOG_NZ}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1755) Failed instruction generation during dynamic recompilation

2017-07-14 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1755.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Failed instruction generation during dynamic recompilation
> --
>
> Key: SYSTEMML-1755
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1755
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> {code}
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Unable to recompile 
> program block.
> at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:159)
> at 
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:115)
> ... 13 more
> Caused by: java.lang.NullPointerException
> at org.apache.sysml.lops.BinaryScalar.getOpcode(BinaryScalar.java:119)
> at 
> org.apache.sysml.lops.BinaryScalar.getInstructions(BinaryScalar.java:84)
> at 
> org.apache.sysml.lops.compile.Dag.generateControlProgramJobs(Dag.java:1405)
> at org.apache.sysml.lops.compile.Dag.doGreedyGrouping(Dag.java:1175)
> at org.apache.sysml.lops.compile.Dag.getJobs(Dag.java:269)
> at 
> org.apache.sysml.hops.recompile.Recompiler.recompileHopsDag(Recompiler.java:240)
> at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:147)
> ... 14 more
> {code}
> The root cause was a simplification rewrite for binary matrix-scalar 
> operations which did not account for unsupported scalar operations such as 
> {{OpOp2.QUANTILE, OpOp2.CENTRALMOMENT, OpOp2.MINUS1_MULT, OpOp2.MINUS_NZ, 
> OpOp2.LOG_NZ}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1768) Cleanup SystemML-config.xml

2017-07-14 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1768.


> Cleanup SystemML-config.xml
> ---
>
> Key: SYSTEMML-1768
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1768
> Project: SystemML
>  Issue Type: Sub-task
>  Components: APIs, Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>  Labels: beginner
> Fix For: SystemML 1.0
>
>
> cp.parallel.matrixmult -> cp.parallel.ops
> cp.parallel.textio -> cp.parallel.io



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1765) Reading of dml scripts from object stores (main, mlcontext)

2017-07-14 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1765.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Reading of dml scripts from object stores (main, mlcontext)
> ---
>
> Key: SYSTEMML-1765
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1765
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1768) Cleanup SystemML-config.xml

2017-07-14 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1768.
--
Resolution: Fixed
  Assignee: Matthias Boehm

> Cleanup SystemML-config.xml
> ---
>
> Key: SYSTEMML-1768
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1768
> Project: SystemML
>  Issue Type: Sub-task
>  Components: APIs, Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>  Labels: beginner
> Fix For: SystemML 1.0
>
>
> cp.parallel.matrixmult -> cp.parallel.ops
> cp.parallel.textio -> cp.parallel.io



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1767) Performance issues codegen rowwise (column aggregation) w/ wide matrices

2017-07-14 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1767.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Performance issues codegen rowwise (column aggregation) w/ wide matrices
> 
>
> Key: SYSTEMML-1767
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1767
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> On scenarios with wide matrices of millions of features, the codegen rowwise 
> template shows performance issues due to unnecessary multi-threading which 
> requires additional memory per thread for partial aggregation which leads to 
> cache thrashing. We should similarly to the mmchain operator establish a 
> threshold for maximum temporary results and fall back to sequential 
> operations if this threshold is exceeded.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1769) Potential null dereference in PreparedScript#enableFunctionRecompile

2017-07-14 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088339#comment-16088339
 ] 

Matthias Boehm commented on SYSTEMML-1769:
--

thanks for catching this [~tedyu] - may I ask which automated tool you're using 
to find these implementation issues?

> Potential null dereference in PreparedScript#enableFunctionRecompile
> 
>
> Key: SYSTEMML-1769
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1769
> Project: SystemML
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> Here is related code:
> {code}
> FunctionCallGraph fgraph = _prog.getProgramBlocks().isEmpty() ? null :
>   new 
> FunctionCallGraph(_prog.getProgramBlocks().get(0).getStatementBlock().getDMLProg());
> ...
>   if( !fgraph.isRecursiveFunction(fkey) ) {
> {code}
> The assignment indicates that fgraph may be null.
> In the for loop, we should check fgraph against null before dereferencing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1772) Perftest: MultiLogReg 100M x 1K, sparse fails with OOM

2017-07-14 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1772:


 Summary: Perftest: MultiLogReg 100M x 1K, sparse fails with OOM
 Key: SYSTEMML-1772
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1772
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm


Our perftest MultiLogReg 100M x 1K, sparse fails with the following OOM when 
ran with 20GB driver budget. 

{code}
java.lang.OutOfMemoryError: GC overhead limit exceeded
17/07/14 13:42:04 WARN hdfs.BlockReaderFactory: I/O error constructing remote 
block reader.
java.io.EOFException: Premature EOF: no length prefix available
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2282)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:423)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:818)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:697)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:673)
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
at 
org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
at 
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1829)
at 
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1843)
at 
org.apache.sysml.runtime.io.ReaderBinaryBlockParallel$ReadFileTask.call(ReaderBinaryBlockParallel.java:150)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{code}

Thanks for catching this issue [~acs_s]. The root cause can be seen in the 
following HOP characteristics and the generated runtime plan which contains a 
CP mmchain operation for hop 456

{code}
17/07/14 16:48:50 INFO recompile.Recompiler: EXPLAIN RECOMPILE 
GENERIC (lines 207-208):
--(432) TRead X [1,1000,1000,1000,78303] [0,0,23270 -> 23270MB], 
SPARK
--(439) r(t) (432) [1000,1,1000,1000,78303] [23270,0,11444 -> 
34714MB], SPARK
--(431) TRead P [1,2,1000,1000,2] [0,0,1526 -> 1526MB], CP
--(436) rix (431) [1,1,1000,1000,-1] [1526,0,763 -> 2289MB], CP
--(1276) u(sprop) (436) [1,1,1000,1000,-1] [763,0,763 -> 1526MB], CP
--(429) TRead ssX_V [1000,1,1000,1000,1000] [0,0,0 -> 0MB], CP
--(437) ba(+*) (432,429) [1,1,1000,1000,-1] [23270,0,763 -> 24033MB], 
SPARK
--(1275) b(*) (1276,437) [1,1,1000,1000,-1] [1526,0,763 -> 2289MB], CP
--(456) ba(+*) (439,1275) [1000,1,1000,1000,-1] [12207,0,0 -> 12207MB], CP
--(457) TWrite HV (456) [1000,1,1000,1000,-1] [0,0,0 -> 0MB], CP
{code}

The final matrix multiplication for {{t(X) tmp}} fits in CP and satisfied the 
mmchain pattern However, mmchain avoids the transpose (assuming that X must fit 
into memory given that t(X) fits in memory). Given our MCSR and CSR 
representations this is not necessarily true because there each row has a 
certain sparse row overhead independent of the number of non-zeros.

We should consider this scenario during execution type selection and send the 
entire pattern to SPARK in these cases which is anyway a good idea because the 
first matrix multiplications is already in SPARK. If the additional broadcast 
and blocksize constraints are met we compile a SPARK mmchain, otherwise two 
subsequent SPARK matrix multiplications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1593) Performance issues rexpand to ultra-sparse matrix

2017-07-14 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1593.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Performance issues rexpand to ultra-sparse matrix
> -
>
> Key: SYSTEMML-1593
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1593
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> For a detailed description see 
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01741.html
> The issue is caused by (1) wrong input partitioning (small vector input to 
> huge output only leverages a small degree of parallelism), and (2) 
> unnecessary shuffle. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1772) Perftest: MultiLogReg 100M x 1K, sparse fails with OOM

2017-07-14 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1772.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Perftest: MultiLogReg 100M x 1K, sparse fails with OOM
> --
>
> Key: SYSTEMML-1772
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1772
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Our perftest MultiLogReg 100M x 1K, sparse fails with the following OOM when 
> ran with 20GB driver budget. 
> {code}
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 17/07/14 13:42:04 WARN hdfs.BlockReaderFactory: I/O error constructing remote 
> block reader.
> java.io.EOFException: Premature EOF: no length prefix available
>   at 
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2282)
>   at 
> org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:423)
>   at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:818)
>   at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:697)
>   at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:673)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
>   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
>   at java.io.DataInputStream.readFully(DataInputStream.java:195)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1829)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1843)
>   at 
> org.apache.sysml.runtime.io.ReaderBinaryBlockParallel$ReadFileTask.call(ReaderBinaryBlockParallel.java:150)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Thanks for catching this issue [~acs_s]. The root cause can be seen in the 
> following HOP characteristics and the generated runtime plan which contains a 
> CP mmchain operation for hop 456
> {code}
> 17/07/14 16:48:50 INFO recompile.Recompiler: EXPLAIN RECOMPILE 
> GENERIC (lines 207-208):
> --(432) TRead X [1,1000,1000,1000,78303] [0,0,23270 -> 23270MB], 
> SPARK
> --(439) r(t) (432) [1000,1,1000,1000,78303] [23270,0,11444 -> 
> 34714MB], SPARK
> --(431) TRead P [1,2,1000,1000,2] [0,0,1526 -> 1526MB], CP
> --(436) rix (431) [1,1,1000,1000,-1] [1526,0,763 -> 2289MB], CP
> --(1276) u(sprop) (436) [1,1,1000,1000,-1] [763,0,763 -> 1526MB], CP
> --(429) TRead ssX_V [1000,1,1000,1000,1000] [0,0,0 -> 0MB], CP
> --(437) ba(+*) (432,429) [1,1,1000,1000,-1] [23270,0,763 -> 24033MB], 
> SPARK
> --(1275) b(*) (1276,437) [1,1,1000,1000,-1] [1526,0,763 -> 2289MB], CP
> --(456) ba(+*) (439,1275) [1000,1,1000,1000,-1] [12207,0,0 -> 12207MB], CP
> --(457) TWrite HV (456) [1000,1,1000,1000,-1] [0,0,0 -> 0MB], CP
> {code}
> The final matrix multiplication for {{t(X) tmp}} fits in CP and satisfied the 
> mmchain pattern However, mmchain avoids the transpose (assuming that X must 
> fit into memory given that t(X) fits in memory). Given our MCSR and CSR 
> representations this is not necessarily true because there each row has a 
> certain sparse row overhead independent of the number of non-zeros.
> We should consider this scenario during execution type selection and send the 
> entire pattern to SPARK in these cases which is anyway a good idea because 
> the first matrix multiplications is already in SPARK. If the additional 
> broadcast and blocksize constraints are met we compile a SPARK mmchain, 
> otherwise two subsequent SPARK matrix multiplications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1773) Improve JMLC error handling of invalid inputs

2017-07-14 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1773.
--
Resolution: Fixed
  Assignee: Matthias Boehm

> Improve JMLC error handling of invalid inputs
> -
>
> Key: SYSTEMML-1773
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1773
> Project: SystemML
>  Issue Type: Sub-task
>  Components: APIs, Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> The JMLC API uses two different mechanisms for binding input parameters (aka 
> $ parameters) and input variables. We should exploit this for better error 
> handling in order to avoid silent errors if users for example miss the $ 
> prefix for input parameters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1663) New simplification rewrite for binary multiplication chains

2017-07-15 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088495#comment-16088495
 ] 

Matthias Boehm commented on SYSTEMML-1663:
--

https://github.com/apache/systemml/commit/eca9dbbb85971af688e81c9254538c53fc429b30

> New simplification rewrite for binary multiplication chains
> ---
>
> Key: SYSTEMML-1663
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1663
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler
>Reporter: Matthias Boehm
>Assignee: Dylan Hutchison
>  Labels: beginner
> Fix For: SystemML 1.0
>
>
> There are various scripts that use chains of binary element-wise 
> multiplications such as {{A * B * B}} or {{B * A * B}}, which are currently 
> compiled to {{(A * B) * B}} and {{(B * A) * B}}, respectively. We should 
> explicitly reason about and simply this to expose the unary operation {{B^2}} 
> which can be evaluated much more efficiently, in case of both singlenode and 
> distributed operations. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1776) Collapse subsequent assignvar and rmvar instructions

2017-07-17 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1776:


 Summary: Collapse subsequent assignvar and rmvar instructions
 Key: SYSTEMML-1776
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1776
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (SYSTEMML-1774) Improve Parfor parallelism for deep learning

2017-07-17 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090967#comment-16090967
 ] 

Matthias Boehm edited comment on SYSTEMML-1774 at 7/18/17 1:56 AM:
---

well, of course I'm happy to help here but let's separate the individual issues 
first.

1) NPE in ConvolutionCPInstruction: [~niketanpansare] could you please have a 
look into this issue? The compiled -1 parameter is a bit suspicious. Anyway, it 
should not throw a nullpointer. Also, why is there a 
ConvolutionUtils.scalarOperations - these convolution operations should call 
the existing scalar operations.

2) Parfor REMOTE_SPARK: Just to be clear running in spark execution mode and 
forcing REMOTE_SPARK is an invalid configuration. We have the mechanisms to 
force the recompile to CP for all instructions in the parfor body but this does 
not apply for conflicting configurations.

The real issue here is the need to force spark and/or remote_spark at all. No 
library of dml scripts should force REMOTE_SPARK (other than for testing) 
because it can create many issues such as unnecessary OOMs or 
counter-productive performance (e.g., in your configuration the driver has more 
virtual cores than your remote executor). If there are limitations of size 
propagation which prevent us from compiling this automatically if beneficial, 
we should fix the underlying root cause. [~Tenma] and [~dusenberrymw] could you 
please provide the configuration of a scenario where REMOTE_SPARK was 
beneficial but not automatically chosen and I'll take care if it. 



was (Author: mboehm7):
well, of course I'm happy to help here but let's separate the individual issues 
first.

1) NPE in ConvolutionCPInstruction: [~niketanpansare] could you please have a 
look into this issue? The compiled -1 parameter is a bit suspicious. Anyway, it 
should not throw a nullpointer. Also, why is there a 
ConvolutionUtils.scalarOperations - these convolution operations should call 
the existing scalar operations.

2) Parfor REMOTE_SPARK: Just to be clear running in spark execution mode and 
forcing REMOTE_SPARK is an invalid configuration. We have the mechanisms to 
force the recompile to CP for all instructions in the parfor body but this does 
not apply for conflicting configurations.

The real issue here is the need to force spark and/or remote_spark at all. No 
library of dml scripts should force REMOTE_SPARK (other than for testing) 
because it can create many issues such as unnecessary OOMs or 
counter-productive performance (e.g., in your configuration the driver has more 
vcores as your remote executor). If there are limitations of size propagation 
which prevent us from compiling this automatically if beneficial, we should fix 
the underlying root cause. [~Tenma] and [~dusenberrymw] could you please 
provide the configuration of a scenario where REMOTE_SPARK was beneficial but 
not automatically chosen and I'll take care if it. 


> Improve Parfor parallelism for deep learning
> 
>
> Key: SYSTEMML-1774
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1774
> Project: SystemML
>  Issue Type: Improvement
>  Components: Algorithms, Compiler, ParFor
>Affects Versions: SystemML 1.0
>Reporter: Fei Hu
>  Labels: deeplearning
> Attachments: Explain_For_HYBRID_SPARK_Mode_With_ErrorInfo.txt, 
> Explain_For_Spark_Mode.txt, MNIST_Distrib_Sgd.scala, 
> mnist_lenet_distrib_sgd.dml
>
>
> When running the  [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  each mini-batch could ideally run in parallel without interaction. We try to 
> force {{parfor (j in 1:parallel_batches)}} at line 137 of 
> {{nn/examples/mnist_lenet_distrib_sgd.dml}} to be {{parfor (j in 
> 1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use 
> {{REMOTE_SPARK}} mode, but got some errors about 
> {{org.apache.sysml.runtime.DMLRuntimeException: Not supported: Instructions 
> of type other than CP instructions}} using the mode {{SPARK}}, and the error 
> {{java.lang.NullPointerException}} using the mode {{HYBRID_SPARK}}. More log 
> information can be found at the following comments. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (SYSTEMML-1774) Improve Parfor parallelism for deep learning

2017-07-17 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090967#comment-16090967
 ] 

Matthias Boehm edited comment on SYSTEMML-1774 at 7/18/17 1:56 AM:
---

well, of course I'm happy to help here but let's separate the individual issues 
first.

1) NPE in ConvolutionCPInstruction: [~niketanpansare] could you please have a 
look into this issue? The compiled -1 parameter is a bit suspicious. Anyway, it 
should not throw a nullpointer. Also, why is there a 
ConvolutionUtils.scalarOperations - these convolution operations should call 
the existing scalar operations.

2) Parfor REMOTE_SPARK: Just to be clear running in spark execution mode and 
forcing REMOTE_SPARK is an invalid configuration. We have the mechanisms to 
force the recompile to CP for all instructions in the parfor body but this does 
not apply for conflicting configurations.

The real issue here is the need to force spark and/or remote_spark at all. No 
library of dml scripts should force REMOTE_SPARK (other than for testing) 
because it can create many issues such as unnecessary OOMs or 
counter-productive performance (e.g., in your configuration the driver has more 
vcores as your remote executor). If there are limitations of size propagation 
which prevent us from compiling this automatically if beneficial, we should fix 
the underlying root cause. [~Tenma] and [~dusenberrymw] could you please 
provide the configuration of a scenario where REMOTE_SPARK was beneficial but 
not automatically chosen and I'll take care if it. 



was (Author: mboehm7):
well, of course I'm happy to help here but let's separate the individual issues 
first.

1) NPE in ConvolutionCPInstruction: [~niketanpansare] could you please have a 
look into this issue? The compiled -1 parameter is a bit suspicious. Anyway, it 
should not throw a nullpointer. Also, why is there a 
ConvolutionUtils.scalarOperations - these convolution operations should call 
the existing scalar operations.

2) Parfor REMOTE_SPARK: Just to be clear running in spark execution mode and 
forcing REMOTE_SPARK is an invalid configuration. We have the mechanisms to 
force the recompile to CP for all instructions in the parfor body but this does 
not apply for conflicting configurations.

The real issue here is the need to force spark and/or remote_spark at all. No 
library of dml scripts should force REMOTE_SPARK (other than for testing) 
because it can create many issues such as unnecessary OOMs. If there are 
limitations of size propagation which prevent us from compiling this 
automatically if beneficial, we should fix the underlying root cause. [~Tenma] 
and [~dusenberrymw] could you please provide the configuration of a scenario 
where REMOTE_SPARK was beneficial but not automatically chosen and I'll take 
care if it. 


> Improve Parfor parallelism for deep learning
> 
>
> Key: SYSTEMML-1774
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1774
> Project: SystemML
>  Issue Type: Improvement
>  Components: Algorithms, Compiler, ParFor
>Affects Versions: SystemML 1.0
>Reporter: Fei Hu
>  Labels: deeplearning
> Attachments: Explain_For_HYBRID_SPARK_Mode_With_ErrorInfo.txt, 
> Explain_For_Spark_Mode.txt, MNIST_Distrib_Sgd.scala, 
> mnist_lenet_distrib_sgd.dml
>
>
> When running the  [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  each mini-batch could ideally run in parallel without interaction. We try to 
> force {{parfor (j in 1:parallel_batches)}} at line 137 of 
> {{nn/examples/mnist_lenet_distrib_sgd.dml}} to be {{parfor (j in 
> 1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use 
> {{REMOTE_SPARK}} mode, but got some errors about 
> {{org.apache.sysml.runtime.DMLRuntimeException: Not supported: Instructions 
> of type other than CP instructions}} using the mode {{SPARK}}, and the error 
> {{java.lang.NullPointerException}} using the mode {{HYBRID_SPARK}}. More log 
> information can be found at the following comments. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1753) OOM on parfor local in-memory result merge

2017-07-09 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1753.


> OOM on parfor local in-memory result merge
> --
>
> Key: SYSTEMML-1753
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1753
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Consider a scenario with relatively large parfor result variable (e.g., many 
> iterations with relatively large output vector each). There are conditions 
> under which the parfor local in-memory result merge runs unnecessarily out of 
> memory (as shown below) due to allocating the result in sparse, collecting 
> all (sparse) outputs and finally converting this result to dense. For result 
> merge without compare, the target number of non-zeros are exactly known, 
> allowing us to directly allocate the result in the correct format, which 
> reduces memory pressure by more than 2x (dense matrix in sparse format).
> {code}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:362)
> at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.sparseToDense(MatrixBlock.java:1136)
> at 
> org.apache.sysml.runtime.matrix.data.MatrixBlock.examSparsity(MatrixBlock.java:1019)
> at 
> org.apache.sysml.runtime.controlprogram.parfor.ResultMergeLocalMemory.executeSerialMerge(ResultMergeLocalMemory.java:114)
> at 
> org.apache.sysml.runtime.controlprogram.ParForProgramBlock.consolidateAndCheckResults(ParForProgramBlock.java:1751)
> at 
> org.apache.sysml.runtime.controlprogram.ParForProgramBlock.executeLocalParFor(ParForProgramBlock.java:814)
> at 
> org.apache.sysml.runtime.controlprogram.ParForProgramBlock.execute(ParForProgramBlock.java:635)
> at 
> org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1754) Performance removeEmpty w/ shallow copy if unmodified

2017-07-09 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1754.
--
   Resolution: Done
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Performance removeEmpty w/ shallow copy if unmodified
> -
>
> Key: SYSTEMML-1754
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1754
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Often removeEmpty is used to guard against special cases with entirely empty 
> rows or columns. In case of no removed rows or columns the full copy into the 
> output is unnecessarily inefficient. This task aims to modify removeEmpty 
> rows/cols for both dense and sparse matrices to leverage a shallow copy if 
> determined output dimensions match the input dimensions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1755) Failed instruction generation during dynamic recompilation

2017-07-09 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1755:
-
Description: 
{code}
Caused by: org.apache.sysml.runtime.DMLRuntimeException: Unable to recompile 
program block.
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:159)
at 
org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:115)
... 13 more
Caused by: java.lang.NullPointerException
at org.apache.sysml.lops.BinaryScalar.getOpcode(BinaryScalar.java:119)
at 
org.apache.sysml.lops.BinaryScalar.getInstructions(BinaryScalar.java:84)
at 
org.apache.sysml.lops.compile.Dag.generateControlProgramJobs(Dag.java:1405)
at org.apache.sysml.lops.compile.Dag.doGreedyGrouping(Dag.java:1175)
at org.apache.sysml.lops.compile.Dag.getJobs(Dag.java:269)
at 
org.apache.sysml.hops.recompile.Recompiler.recompileHopsDag(Recompiler.java:240)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:147)
... 14 more
{code}

The root cause was a simplification rewrite for binary matrix-scalar operations 
which did not account for unsupported scalar operations such as 
{{OpOp2.QUANTILE, OpOp2.CENTRALMOMENT, OpOp2.MINUS1_MULT, OpOp2.MINUS_NZ, 
OpOp2.LOG_NZ}}.

  was:
{code}
Caused by: org.apache.sysml.runtime.DMLRuntimeException: Unable to recompile 
program block.
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:159)
at 
org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:115)
... 13 more
Caused by: java.lang.NullPointerException
at org.apache.sysml.lops.BinaryScalar.getOpcode(BinaryScalar.java:119)
at 
org.apache.sysml.lops.BinaryScalar.getInstructions(BinaryScalar.java:84)
at 
org.apache.sysml.lops.compile.Dag.generateControlProgramJobs(Dag.java:1405)
at org.apache.sysml.lops.compile.Dag.doGreedyGrouping(Dag.java:1175)
at org.apache.sysml.lops.compile.Dag.getJobs(Dag.java:269)
at 
org.apache.sysml.hops.recompile.Recompiler.recompileHopsDag(Recompiler.java:240)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:147)
... 14 more

{code}


> Failed instruction generation during dynamic recompilation
> --
>
> Key: SYSTEMML-1755
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1755
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>
> {code}
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Unable to recompile 
> program block.
> at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:159)
> at 
> org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:115)
> ... 13 more
> Caused by: java.lang.NullPointerException
> at org.apache.sysml.lops.BinaryScalar.getOpcode(BinaryScalar.java:119)
> at 
> org.apache.sysml.lops.BinaryScalar.getInstructions(BinaryScalar.java:84)
> at 
> org.apache.sysml.lops.compile.Dag.generateControlProgramJobs(Dag.java:1405)
> at org.apache.sysml.lops.compile.Dag.doGreedyGrouping(Dag.java:1175)
> at org.apache.sysml.lops.compile.Dag.getJobs(Dag.java:269)
> at 
> org.apache.sysml.hops.recompile.Recompiler.recompileHopsDag(Recompiler.java:240)
> at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:147)
> ... 14 more
> {code}
> The root cause was a simplification rewrite for binary matrix-scalar 
> operations which did not account for unsupported scalar operations such as 
> {{OpOp2.QUANTILE, OpOp2.CENTRALMOMENT, OpOp2.MINUS1_MULT, OpOp2.MINUS_NZ, 
> OpOp2.LOG_NZ}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1792) Performance issue sparse-dense matrix multiply

2017-07-20 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1792:


 Summary: Performance issue sparse-dense matrix multiply
 Key: SYSTEMML-1792
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1792
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm


Our sparse-dense matrix multiply is already cache conscious but used very small 
block static block sizes, which were optimized for moderate sparsity. However, 
for cases with very sparse matrices (and skinny right hand size matrices), the 
small block sizes add substantial overhead of more than an order of magnitude. 
This task aims to make these block sizes adaptive, consistent with our 
cache-conscious implementations of sparsity exploiting matrix multiply 
operators such as wsloss.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1791) Performance features frame blocks

2017-07-19 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1791:


 Summary: Performance features frame blocks
 Key: SYSTEMML-1791
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1791
 Project: SystemML
  Issue Type: Task
Reporter: Matthias Boehm


Recent experiments have shown that there are unnecessary overheads in various 
frame block operations. This task is an umbrella for all related performance 
improvements. In detail, this includes:

* Shallow copy for column indexing
* Bidirectional reuse of recode maps in meta data frames
* Avoid unnecessary long-string-double parsing on transformapply



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1774) Improve Parfor parallelism for deep learning

2017-07-18 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091976#comment-16091976
 ] 

Matthias Boehm commented on SYSTEMML-1774:
--

ad 2) Forced spark execution mode together with parfor REMOTE_SPARK are invalid 
because it would require to run all operations as distributed spark operations 
as well as the surrounding parfor as a distributed spark operation. It is 
invalid because there are no nested spark/mapreduce operations (i.e., RDD 
operations that calls another RDD operation) since this could lead to 
deadlocks. By specifying spark and (and thus forcing local parfor) you 
effectively run multiple concurrent distributed operations on the cluster which 
leads to full cluster utilization on small data.

> Improve Parfor parallelism for deep learning
> 
>
> Key: SYSTEMML-1774
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1774
> Project: SystemML
>  Issue Type: Improvement
>  Components: Algorithms, Compiler, ParFor
>Affects Versions: SystemML 1.0
>Reporter: Fei Hu
>  Labels: deeplearning
> Attachments: Explain_For_HYBRID_SPARK_Mode_With_ErrorInfo.txt, 
> Explain_For_Spark_Mode.txt, MNIST_Distrib_Sgd.scala, 
> mnist_lenet_distrib_sgd.dml
>
>
> When running the  [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  each mini-batch could ideally run in parallel without interaction. We try to 
> force {{parfor (j in 1:parallel_batches)}} at line 137 of 
> {{nn/examples/mnist_lenet_distrib_sgd.dml}} to be {{parfor (j in 
> 1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use 
> {{REMOTE_SPARK}} mode, but got some errors about 
> {{org.apache.sysml.runtime.DMLRuntimeException: Not supported: Instructions 
> of type other than CP instructions}} using the mode {{SPARK}}, and the error 
> {{java.lang.NullPointerException}} using the mode {{HYBRID_SPARK}}. More log 
> information can be found at the following comments. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (SYSTEMML-1774) Improve Parfor parallelism for deep learning

2017-07-18 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091976#comment-16091976
 ] 

Matthias Boehm edited comment on SYSTEMML-1774 at 7/18/17 6:43 PM:
---

ad 2) Forced spark execution mode together with parfor REMOTE_SPARK are invalid 
because it would require to run all operations as distributed spark operations 
as well as the surrounding parfor as a distributed spark operation. It is 
invalid because there are no nested spark/mapreduce operations (i.e., RDD 
operations that call another RDD operation) since this could lead to deadlocks. 
By specifying spark and (and thus forcing local parfor) you effectively run 
multiple concurrent distributed operations on the cluster which leads to full 
cluster utilization on small data.


was (Author: mboehm7):
ad 2) Forced spark execution mode together with parfor REMOTE_SPARK are invalid 
because it would require to run all operations as distributed spark operations 
as well as the surrounding parfor as a distributed spark operation. It is 
invalid because there are no nested spark/mapreduce operations (i.e., RDD 
operations that calls another RDD operation) since this could lead to 
deadlocks. By specifying spark and (and thus forcing local parfor) you 
effectively run multiple concurrent distributed operations on the cluster which 
leads to full cluster utilization on small data.

> Improve Parfor parallelism for deep learning
> 
>
> Key: SYSTEMML-1774
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1774
> Project: SystemML
>  Issue Type: Improvement
>  Components: Algorithms, Compiler, ParFor
>Affects Versions: SystemML 1.0
>Reporter: Fei Hu
>  Labels: deeplearning
> Attachments: Explain_For_HYBRID_SPARK_Mode_With_ErrorInfo.txt, 
> Explain_For_Spark_Mode.txt, MNIST_Distrib_Sgd.scala, 
> mnist_lenet_distrib_sgd.dml
>
>
> When running the  [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  each mini-batch could ideally run in parallel without interaction. We try to 
> force {{parfor (j in 1:parallel_batches)}} at line 137 of 
> {{nn/examples/mnist_lenet_distrib_sgd.dml}} to be {{parfor (j in 
> 1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use 
> {{REMOTE_SPARK}} mode, but got some errors about 
> {{org.apache.sysml.runtime.DMLRuntimeException: Not supported: Instructions 
> of type other than CP instructions}} using the mode {{SPARK}}, and the error 
> {{java.lang.NullPointerException}} using the mode {{HYBRID_SPARK}}. More log 
> information can be found at the following comments. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (SYSTEMML-1774) Improve Parfor parallelism for deep learning

2017-07-18 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092458#comment-16092458
 ] 

Matthias Boehm edited comment on SYSTEMML-1774 at 7/19/17 1:40 AM:
---

here are a couple of guesses: (1) the expensive operations are still ran in CP 
because distributed operations are globally disabled for any convolution ops 
(because they are experimental), (2) running concurrent spark operations fully 
exploits your cluster and not just a single node, and (3) potentially fewer 
evictions, given the very small driver and sparks lazy evaluation. 

For your experiments, I would recommend to run with reasonable driver sizes and 
a cluster of multiple nodes.


was (Author: mboehm7):
here are a couple of guesses: (1) the expensive operations are still ran in CP 
because distributed operations are globally disabled for any convolution ops 
(because they are experimental), (2) running concurrent spark operations fully 
exploits your cluster and not just a single node, and (3) potentially fewer 
evictions, given the very small driver and sparks lazy evaluation. I can spent 
a couple of hours later this week this to profile this.

> Improve Parfor parallelism for deep learning
> 
>
> Key: SYSTEMML-1774
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1774
> Project: SystemML
>  Issue Type: Improvement
>  Components: Algorithms, Compiler, ParFor
>Affects Versions: SystemML 1.0
>Reporter: Fei Hu
>  Labels: deeplearning
> Attachments: Explain_For_HYBRID_SPARK_Mode_With_ErrorInfo.txt, 
> Explain_For_Spark_Mode.txt, MNIST_Distrib_Sgd.scala, 
> mnist_lenet_distrib_sgd.dml
>
>
> When running the  [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  each mini-batch could ideally run in parallel without interaction. We try to 
> force {{parfor (j in 1:parallel_batches)}} at line 137 of 
> {{nn/examples/mnist_lenet_distrib_sgd.dml}} to be {{parfor (j in 
> 1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use 
> {{REMOTE_SPARK}} mode, but got some errors about 
> {{org.apache.sysml.runtime.DMLRuntimeException: Not supported: Instructions 
> of type other than CP instructions}} using the mode {{SPARK}}, and the error 
> {{java.lang.NullPointerException}} using the mode {{HYBRID_SPARK}}. More log 
> information can be found at the following comments. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1774) Improve Parfor parallelism for deep learning

2017-07-19 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16093561#comment-16093561
 ] 

Matthias Boehm commented on SYSTEMML-1774:
--

well, I added SYSTEMML-1782 which will solve this issue and generally improve 
the handling of indexing in parfor contexts as well as indexing in general.

> Improve Parfor parallelism for deep learning
> 
>
> Key: SYSTEMML-1774
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1774
> Project: SystemML
>  Issue Type: Improvement
>  Components: Algorithms, Compiler, ParFor
>Affects Versions: SystemML 1.0
>Reporter: Fei Hu
>  Labels: deeplearning
> Attachments: Explain_For_HYBRID_SPARK_Mode_With_ErrorInfo.txt, 
> Explain_For_Spark_Mode.txt, MNIST_Distrib_Sgd.scala, 
> mnist_lenet_distrib_sgd.dml
>
>
> When running the  [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  each mini-batch could ideally run in parallel without interaction. We try to 
> force {{parfor (j in 1:parallel_batches)}} at line 137 of 
> {{nn/examples/mnist_lenet_distrib_sgd.dml}} to be {{parfor (j in 
> 1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use 
> {{REMOTE_SPARK}} mode, but got some errors about 
> {{org.apache.sysml.runtime.DMLRuntimeException: Not supported: Instructions 
> of type other than CP instructions}} using the mode {{SPARK}}, and the error 
> {{java.lang.NullPointerException}} using the mode {{HYBRID_SPARK}}. More log 
> information can be found at the following comments. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1778) Extend runtime plan cost model for spark instructions

2017-07-18 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1778:
-
Description: This task aims to extend the existing costing of runtime plans 
(see {{CostEstimatorStaticRuntime}}) by the ability to cost spark instructions. 
In detail, this entails (1) adding FLOP estimates for all missing spark 
instruction opcodes, and the (2) handling of distributed caching (initial read, 
read from cache including reasoning about aggregate memory).

> Extend runtime plan cost model for spark instructions
> -
>
> Key: SYSTEMML-1778
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1778
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>
> This task aims to extend the existing costing of runtime plans (see 
> {{CostEstimatorStaticRuntime}}) by the ability to cost spark instructions. In 
> detail, this entails (1) adding FLOP estimates for all missing spark 
> instruction opcodes, and the (2) handling of distributed caching (initial 
> read, read from cache including reasoning about aggregate memory).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1780) Extend resource optimizer for cloud resources

2017-07-18 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1780:


 Summary: Extend resource optimizer for cloud resources
 Key: SYSTEMML-1780
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1780
 Project: SystemML
  Issue Type: Task
Reporter: Matthias Boehm


This task aims to extend the existing resource optimizer, which currently only 
minimizes time without over-provisioning in bounded YARN cluster by reasoning 
about CP/MR memory budgets. In detail, we want to include the number of nodes 
and node types with multiple different optimization criteria: e.g., minimize 
time under monetary cost constraints, minimize monetary cost under time 
constraints. 

Note that for iterative algorithms it is impossible to satisfy hard constraints 
on total execution time without knowing the number of iterations until 
converge. However, providing these optimization objectives with well-documented 
heuristics for unknown number of iterations (e.g., fixed constants) would still 
be very valuable for users who struggle with resource provisioning in cloud 
environments.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1778) Extend runtime plan cost model for spark instructions

2017-07-18 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1778:
-
Summary: Extend runtime plan cost model for spark instructions  (was: 
Extended runtime plan cost model for spark instructions)

> Extend runtime plan cost model for spark instructions
> -
>
> Key: SYSTEMML-1778
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1778
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1781) Enable external calls to the resource optimizer

2017-07-18 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1781:


 Summary: Enable external calls to the resource optimizer
 Key: SYSTEMML-1781
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1781
 Project: SystemML
  Issue Type: Task
Reporter: Matthias Boehm


So far the resource optimizer, is only called from within SystemML in order to 
decide on a CP/MR memory configuration for YARN environments. This task aims to 
enable external calls (e.g., from command line) to this resource optimizer in 
order to embed it into automated scripts for determining and allocating 
necessary cloud resources.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1774) Improve Parfor parallelism for deep learning

2017-07-18 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092508#comment-16092508
 ] 

Matthias Boehm commented on SYSTEMML-1774:
--

ok after some initial debugging with {{hybrid_spark + parfor}}} and driver Xmx 
4g, it seems that the parfor optimizer decided for a parallel degree of 1 
(single-threaded, which caused the slow down) due to the following (unknown) 
memory estimates:
{code}
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=BIAS_ADD, name=26_out, memest=7.635730732E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=MAX_POOLING, name=28_out, memest=7.63573052E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=BIAS_ADD, name=29_out, memest=7.635730988E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=MAX_POOLING, name=31_out, memest=7.63573052E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=MAX_POOLING_BACKWARD, name=42_dX, memest=1.1453595736E10).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=DIRECT_CONV2D_BACKWARD_DATA, name=45_dX, 
memest=7.636140164E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=DIRECT_CONV2D_BACKWARD_FILTER, name=45_dW, 
memest=7.636140164E9).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=MAX_POOLING_BACKWARD, name=46_dX, memest=1.1453595736E10).
17/07/18 19:50:15 WARN opt.CostEstimator: Memory estimate larger than budget 
but CP exec type (op=DIRECT_CONV2D_BACKWARD_FILTER, name=48_dW, 
memest=3.819088816E9).
{code}

For more evidence, here is a fragment of the parfor plan with {{hybrid_spark + 
parfor}}

{code}

 EXPLAIN OPT TREE (type=ABSTRACT_PLAN, size=122)

--PARFOR (lines 137-213), exec=CP, k=1, dp=NONE, tp=FACTORING, rm=REMOTE_SPARK
GENERIC (lines 139-162), exec=CP, k=1
--rix, exec=CP, k=1
--b(+), exec=CP, k=1
--b(%%), exec=CP, k=1
--b(*), exec=CP, k=1
--b(-), exec=CP, k=1
--u(nrow), exec=CP, k=1
--b(min), exec=CP, k=1
--b(-), exec=CP, k=1
--b(+), exec=CP, k=1
--rix, exec=CP, k=1
--BIAS_ADD, exec=CP, k=16
--DIRECT_CONV2D, exec=CP, k=16
{code}

in contrast, the parfor plan with {{spark + parfor}} looks as follows:

{code}

 EXPLAIN OPT TREE (type=ABSTRACT_PLAN, size=122)

--PARFOR (lines 137-213), exec=CP, k=4, dp=NONE, tp=NAIVE, rm=REMOTE_SPARK
GENERIC (lines 139-162), exec=CP, k=1
--rix, exec=SPARK, k=1
--b(+), exec=SPARK, k=1
--b(%%), exec=SPARK, k=1
--b(*), exec=SPARK, k=1
--b(-), exec=SPARK, k=1
--u(nrow), exec=CP, k=1
--b(min), exec=SPARK, k=1
--b(-), exec=SPARK, k=1
--b(+), exec=SPARK, k=1
--rix, exec=SPARK, k=1
--BIAS_ADD, exec=CP, k=4
--DIRECT_CONV2D, exec=CP, k=4
{code}

Note that the degree of parallelism of 4 is actually incorrect given the 
unknown memory estimates of convolution ops above. This requires some deeper 
analysis.

So the bottom line is, the real issue originates from size propagation issues 
and there are two action items here: (1) address the size propagation issue, 
and (2) fix the bug of potentially incorrect handling of memory estimates for 
convolution ops with forced spark execution mode.

> Improve Parfor parallelism for deep learning
> 
>
> Key: SYSTEMML-1774
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1774
> Project: SystemML
>  Issue Type: Improvement
>  Components: Algorithms, Compiler, ParFor
>Affects Versions: SystemML 1.0
>Reporter: Fei Hu
>  Labels: deeplearning
> Attachments: Explain_For_HYBRID_SPARK_Mode_With_ErrorInfo.txt, 
> Explain_For_Spark_Mode.txt, MNIST_Distrib_Sgd.scala, 
> mnist_lenet_distrib_sgd.dml
>
>
> When running the  [distributed MNIST LeNet example | 
> https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml],
>  each mini-batch could ideally run in parallel without interaction. We try to 
> force {{parfor (j in 1:parallel_batches)}} at line 137 of 
> {{nn/examples/mnist_lenet_distrib_sgd.dml}} to be {{parfor (j in 
> 1:parallel_batches, mode=REMOTE_SPARK, opt=CONSTRAINED)}} use 
> {{REMOTE_SPARK}} mode, but got some errors about 
> {{org.apache.sysml.runtime.DMLRuntimeException: Not supported: Instructions 
> of type other than CP instructions}} using the mode {{SPARK}}, and the error 
> {{java.lang.NullPointerException}} using the mode {{HYBRID_SPARK}}. More log 
> information can 

[jira] [Created] (SYSTEMML-1778) Extended runtime plan cost model for spark instructions

2017-07-18 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1778:


 Summary: Extended runtime plan cost model for spark instructions
 Key: SYSTEMML-1778
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1778
 Project: SystemML
  Issue Type: Task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1791) Performance features frame blocks

2017-07-20 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1791.
--
   Resolution: Done
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Performance features frame blocks
> -
>
> Key: SYSTEMML-1791
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1791
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Recent experiments have shown that there are unnecessary overheads in various 
> frame block operations. This task is an umbrella for all related performance 
> improvements. In detail, this includes:
> * Shallow copy for column indexing
> * Bidirectional reuse of recode maps in meta data frames
> * Avoid unnecessary long-string-double parsing on transformapply



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1792) Performance issue sparse-dense matrix multiply

2017-07-20 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1792.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Performance issue sparse-dense matrix multiply
> --
>
> Key: SYSTEMML-1792
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1792
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Our sparse-dense matrix multiply is already cache conscious but used very 
> small block static block sizes, which were optimized for moderate sparsity. 
> However, for cases with very sparse matrices (and skinny right hand size 
> matrices), the small block sizes add substantial overhead of more than an 
> order of magnitude. This task aims to make these block sizes adaptive, 
> consistent with our cache-conscious implementations of sparsity exploiting 
> matrix multiply operators such as wsloss.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1790) FrameBlock reset fails with ArrayIndexOutOfBoundsException

2017-07-20 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1790.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> FrameBlock reset fails with ArrayIndexOutOfBoundsException 
> ---
>
> Key: SYSTEMML-1790
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1790
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> A FrameBlock reset, e.g., on feeding the same reuse frame block multiple 
> times into slice with different data sizes, currently does not work properly, 
> leading to an ArrayIndexOutOfBoundsException on the actual data copy if the 
> target is larger than then previously allocated block.
> {code}
> java.lang.ArrayIndexOutOfBoundsException
> at java.lang.System.arraycopy(Native Method)
> at 
> org.apache.sysml.runtime.matrix.data.FrameBlock$StringArray.set(FrameBlock.java:1280)
> at 
> org.apache.sysml.runtime.matrix.data.FrameBlock.sliceOperations(FrameBlock.java:884)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1791) Performance features frame blocks

2017-07-20 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1791.


> Performance features frame blocks
> -
>
> Key: SYSTEMML-1791
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1791
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Recent experiments have shown that there are unnecessary overheads in various 
> frame block operations. This task is an umbrella for all related performance 
> improvements. In detail, this includes:
> * Shallow copy for column indexing
> * Bidirectional reuse of recode maps in meta data frames
> * Avoid unnecessary long-string-double parsing on transformapply



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1800) Matrix/frame block reader utils from streams

2017-07-21 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1800:


 Summary: Matrix/frame block reader utils from streams
 Key: SYSTEMML-1800
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1800
 Project: SystemML
  Issue Type: Task
Reporter: Matthias Boehm
Priority: Minor


In JMLC deployments, models and meta data is often read from resource streams 
of packaged artifacts. This task aims to add some util functions for 
deserialization of matrix and frame blocks directly from such input streams in 
order to avoid the expensive code path of reading text formats from streams.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1444) UDFs w/ single output in expressions

2017-07-25 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099769#comment-16099769
 ] 

Matthias Boehm commented on SYSTEMML-1444:
--

thanks again for taking this over [~return_01]. There are three different 
integration approaches (1) split expressions at parser/language level, (2) 
special handling of functions with a single output (but leave the current 
multi-output handling unchanged), and (3) full multi-output integration of 
functions into the HOP/LOP compiler (which requires quite involved 
modifications from the language all the way down to the instruction 
generation). 

While writing up the steps for (3), I came to the conclusion that its 
integration complexity might not be justified. Hence, I'm currently in favor of 
(2) which would be rather straight-forward, achieve the goals, and allow us to 
later extend this to (3). I'd like to think a little more about the 
implications. Once we settled on the general approach, I'll help to define the 
individual subtasks. 

> UDFs w/ single output in expressions
> 
>
> Key: SYSTEMML-1444
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1444
> Project: SystemML
>  Issue Type: Sub-task
>  Components: APIs, Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Janardhan
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1800) Matrix/frame block reader utils from streams

2017-07-22 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1800.
--
   Resolution: Done
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Matrix/frame block reader utils from streams
> 
>
> Key: SYSTEMML-1800
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1800
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>Priority: Minor
> Fix For: SystemML 1.0
>
>
> In JMLC deployments, models and meta data is often read from resource streams 
> of packaged artifacts. This task aims to add some util functions for 
> deserialization of matrix and frame blocks directly from such input streams 
> in order to avoid the expensive code path of reading text formats from 
> streams.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1800) Matrix/frame block reader utils from streams

2017-07-22 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1800.


> Matrix/frame block reader utils from streams
> 
>
> Key: SYSTEMML-1800
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1800
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>Priority: Minor
> Fix For: SystemML 1.0
>
>
> In JMLC deployments, models and meta data is often read from resource streams 
> of packaged artifacts. This task aims to add some util functions for 
> deserialization of matrix and frame blocks directly from such input streams 
> in order to avoid the expensive code path of reading text formats from 
> streams.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1787) Column-range indexing in rowwise templates

2017-07-22 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1787.


> Column-range indexing in rowwise templates
> --
>
> Key: SYSTEMML-1787
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1787
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1788) Column aggregation in cellwise templates

2017-07-22 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1788.


> Column aggregation in cellwise templates
> 
>
> Key: SYSTEMML-1788
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1788
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1788) Column aggregation in cellwise templates

2017-07-22 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1788.
--
   Resolution: Done
 Assignee: Matthias Boehm
Fix Version/s: (was: SystemML 0.14)
   SystemML 1.0

> Column aggregation in cellwise templates
> 
>
> Key: SYSTEMML-1788
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1788
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1506) Codegen only supported through dmlscript (spark_submit, hadoop)

2017-07-22 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1506.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Codegen only supported through dmlscript (spark_submit, hadoop)
> ---
>
> Key: SYSTEMML-1506
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1506
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> This task aims to support codegen through all APIs, i.e., in addition to 
> DMLScript also through MLContext and JMLC.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1787) Column-range indexing in rowwise templates

2017-07-22 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1787.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: (was: SystemML 0.14)
   SystemML 1.0

> Column-range indexing in rowwise templates
> --
>
> Key: SYSTEMML-1787
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1787
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1506) Codegen only supported through dmlscript (spark_submit, hadoop)

2017-07-22 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1506.


> Codegen only supported through dmlscript (spark_submit, hadoop)
> ---
>
> Key: SYSTEMML-1506
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1506
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> This task aims to support codegen through all APIs, i.e., in addition to 
> DMLScript also through MLContext and JMLC.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1444) UDFs w/ single output in expressions

2017-07-24 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098985#comment-16098985
 ] 

Matthias Boehm commented on SYSTEMML-1444:
--

I'll comment on the first question with more details later today but here is 
already the link to the commit you asked for
https://github.com/apache/systemml/commit/91ef325969684f41473a25e308216237922e70f1

> UDFs w/ single output in expressions
> 
>
> Key: SYSTEMML-1444
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1444
> Project: SystemML
>  Issue Type: Sub-task
>  Components: APIs, Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Janardhan
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1801) Incomplete codegen candidate exploration

2017-07-23 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1801:


 Summary: Incomplete codegen candidate exploration
 Key: SYSTEMML-1801
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1801
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm


The code generation candidate exploration via open-fuse-merge-close showed 
incomplete partial fusion plans for complex DAG structures. This task aims to 
resolve these issues, including (1) better debug output of memo table entries, 
(2) fixes of plan enumeration, and (3) avoid too eager pruning.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SYSTEMML-1811) Can we Implement X%*%t(X) in a better way?

2017-07-26 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102314#comment-16102314
 ] 

Matthias Boehm commented on SYSTEMML-1811:
--

For all backends (CP, SPARK, MR), we actually have physical operators, called 
tsmm (transpose-self matrix multiplication) for t(X)%*%X and X%*%t(X). At block 
level there are also special implementations for this operation:

https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixMult.java#L1648

> Can we Implement X%*%t(X) in a better way?
> --
>
> Key: SYSTEMML-1811
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1811
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Janardhan
>
> A matrix multiplied by its self transpose is a frequent occurrence in many 
> algorithms ( a lot of them). There is definitely a way to take into 
> consideration the special properties of this matrix operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1812) Rework codegen candidate exporation algorithm

2017-07-26 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1812:


 Summary: Rework codegen candidate exporation algorithm
 Key: SYSTEMML-1812
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1812
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm


This task aims to simplify the existing codegen candidate exploration algorithm 
and improve its efficiency by checking fusion conditions per distinct template 
type not per memo table entry.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1779) Obtain cloud resource meta data

2017-07-18 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1779:


 Summary: Obtain cloud resource meta data
 Key: SYSTEMML-1779
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1779
 Project: SystemML
  Issue Type: Task
Reporter: Matthias Boehm


This task aims to automatically collect meta data about available cloud 
resources, including node types, max number of node constraints, number of 
virtual cores, available memory, available local storage, and monetary cost.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1787) Column-range indexing in rowwise templates

2017-07-19 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1787:


 Summary: Column-range indexing in rowwise templates
 Key: SYSTEMML-1787
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1787
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1788) Column aggregation in cellwise templates

2017-07-19 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1788:


 Summary: Column aggregation in cellwise templates
 Key: SYSTEMML-1788
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1788
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1790) FrameBlock reset fails with ArrayIndexOutOfBoundsException

2017-07-19 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1790:


 Summary: FrameBlock reset fails with 
ArrayIndexOutOfBoundsException 
 Key: SYSTEMML-1790
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1790
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm


A FrameBlock reset, e.g., on feeding the same reuse frame block multiple times 
into slice with different data sizes, currently does not work properly, leading 
to an ArrayIndexOutOfBoundsException on the actual data copy if the target is 
larger than then previously allocated block.

{code}
java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at 
org.apache.sysml.runtime.matrix.data.FrameBlock$StringArray.set(FrameBlock.java:1280)
at 
org.apache.sysml.runtime.matrix.data.FrameBlock.sliceOperations(FrameBlock.java:884)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1765) Reading of dml scripts from object stores (main, mlcontext)

2017-07-12 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1765:


 Summary: Reading of dml scripts from object stores (main, 
mlcontext)
 Key: SYSTEMML-1765
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1765
 Project: SystemML
  Issue Type: Task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1319) Statistical estimates over compressed matrix blocks

2017-07-12 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1319.
--
Resolution: Done
  Assignee: Matthias Boehm

> Statistical estimates over compressed matrix blocks
> ---
>
> Key: SYSTEMML-1319
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1319
> Project: SystemML
>  Issue Type: Sub-task
>  Components: APIs, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Statistical estimates like moment, cov, aggregate, table, median, and 
> quantiles can be efficiently computed over compressed matrix blocks by 
> mapping distinct items + counts to weighted statistical estimates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1538) Improved dynamic recompilation (size update after rewrites)

2017-07-12 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1538.
--
Resolution: Done
  Assignee: Matthias Boehm

> Improved dynamic recompilation (size update after rewrites)
> ---
>
> Key: SYSTEMML-1538
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1538
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Dynamic recompilation currently first updates matrix characteristics and 
> subsequently applied dynamic rewrites and operator selection which depend on 
> the updates stats. However, there are various scenarios where applied 
> rewrites simplify the propagation of statistics. Hence, we should 
> additionally update statistics after rewrites in order to increase the 
> potential of subsequent operator selection and code generation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1289) Support compressed matrix blocks

2017-07-12 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1289.
--
Resolution: Done
  Assignee: Matthias Boehm

> Support compressed matrix blocks
> 
>
> Key: SYSTEMML-1289
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1289
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> This task aims to support all fused operator templates over compressed matrix 
> blocks, without decompression.
> 1) Cellwise and multi-aggregate operator templates (column-wise processing)
> 2) Row-wise operator templates (row decompression)
> 3) Outer-product operator templates (column-wise processing)
> 4) Exploitation of distinct tuples whenever safe to do so.
> 5) Side input handling with partial decompression (e.g., leverage random 
> access of DDC groups) 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1538) Improved dynamic recompilation (size update after rewrites)

2017-07-12 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1538.


> Improved dynamic recompilation (size update after rewrites)
> ---
>
> Key: SYSTEMML-1538
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1538
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Dynamic recompilation currently first updates matrix characteristics and 
> subsequently applied dynamic rewrites and operator selection which depend on 
> the updates stats. However, there are various scenarios where applied 
> rewrites simplify the propagation of statistics. Hence, we should 
> additionally update statistics after rewrites in order to increase the 
> potential of subsequent operator selection and code generation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1555) Decouple literal replacement from in-place recompilation

2017-07-12 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1555.
--
Resolution: Done
  Assignee: Matthias Boehm

> Decouple literal replacement from in-place recompilation
> 
>
> Key: SYSTEMML-1555
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1555
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> The current literal replacement framework contains basic scalar literal 
> replacement as well as the replacement of small matrix operations with their 
> literal results. If this framework is invoked with temporary matrix objects 
> created during size propagation any matrix operation would obviously fail. So 
> far, this created no problems because literal replacement was tied to 
> recompilations that are not in-place, i.e., recompilations that create a deep 
> copy of the hop dag, which in turn only happens for single-dag recompilations.
> This task aims to decouple the literal replacement from in-place 
> recompilations in order to increase the literal replacement potential and 
> allow for a more flexible use of this literal replacement framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SYSTEMML-1741) Rework codegen cost-based plan selector (opt V2)

2017-06-30 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1741:
-
Description: 
So far our the cost-based codegen plan selector considers all materialization 
points per connected component of partial fusion plan candidates as well as 
multi aggregates and various cleanups in independent steps. This aims to rework 
this into a full-fledged cost-based optimizer to address poor plan choices 
encountered in various complex DAGs.

In detail, the new cost-based plan selector needs to address the following in 
an holistic manner:
* Potential materialization points (operators with multiple consumers), decided 
on a per-consumer basis
* Sparsity exploitation (in cost model and template flagging) incl ordering of 
inputs
* Decisions on (overlapping) template types
* Multi-aggregates for cell- and row-templates
* Constraints and costs for distributed operations (see SYSTEMML-1443)

  was:
So far our the cost-based codegen plan selector considers all materialization 
points per connected component of partial fusion plan candidates as well as 
multi aggregates and various cleanups in independent steps. This aims to rework 
this into a full-fledged cost-based optimizer to address poor plan choices 
encountered in various complex DAGs.

In detail, the new cost-based plan selector needs to address the following in 
an holistic manner:
* Potential materialization points (operators with multiple consumers), decided 
on a per-consumer basis
* Sparsity exploitation (in cost model and template flagging) incl ordering of 
inputs
* Decisions on (overlapping) template types
* Multi-aggregates for cell- and row-templates
* Constraints and costs for distributed operations


> Rework codegen cost-based plan selector (opt V2)
> 
>
> Key: SYSTEMML-1741
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1741
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
> Fix For: SystemML 0.14
>
>
> So far our the cost-based codegen plan selector considers all materialization 
> points per connected component of partial fusion plan candidates as well as 
> multi aggregates and various cleanups in independent steps. This aims to 
> rework this into a full-fledged cost-based optimizer to address poor plan 
> choices encountered in various complex DAGs.
> In detail, the new cost-based plan selector needs to address the following in 
> an holistic manner:
> * Potential materialization points (operators with multiple consumers), 
> decided on a per-consumer basis
> * Sparsity exploitation (in cost model and template flagging) incl ordering 
> of inputs
> * Decisions on (overlapping) template types
> * Multi-aggregates for cell- and row-templates
> * Constraints and costs for distributed operations (see SYSTEMML-1443)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1741) Rework codegen cost-based plan selector (opt V2)

2017-06-30 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1741:


 Summary: Rework codegen cost-based plan selector (opt V2)
 Key: SYSTEMML-1741
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1741
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm


So far our the cost-based codegen plan selector considers all materialization 
points per connected component of partial fusion plan candidates as well as 
multi aggregates and various cleanups in independent steps. This aims to rework 
this into a full-fledged cost-based optimizer to address poor plan choices 
encountered in various complex DAGs.

In detail, the new cost-based plan selector needs to address the following in 
an holistic manner:
* Potential materialization points (operators with multiple consumers), decided 
on a per-consumer basis
* Sparsity exploitation (in cost model and template flagging) incl ordering of 
inputs
* Decisions on (overlapping) template types
* Multi-aggregates for cell- and row-templates
* Constraints and costs for distributed operations



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   4   5   6   7   8   9   10   >