[jira] [Created] (SYSTEMML-1843) Wrong loop update-in-place decisions

2017-08-15 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1843:


 Summary: Wrong loop update-in-place decisions 
 Key: SYSTEMML-1843
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1843
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm


For special cases, where a matrix is simply updated in a loop, the rewrite for 
marking updated loop variables as update-in-place mistakenly flags these 
variables. For example, consider the following script:

{code}
...
for(i in 1:100) {
  q = as.matrix(sum(X * U%*%t(V)))
  print("at iteration "+i);
}
{code}

and the related hop explain output

{code}
FOR (lines 9-13) [in-place=[q]]
--GENERIC (lines 10-12) [recompile=true]
(46) TRead X [8026324,2330066,1000,1000,22507155] [0,0,1317 -> 1317MB], 
CP
(48) TRead U [8026324,10,1000,1000,80263240] [0,0,612 -> 612MB], CP
(49) TRead V [2330066,10,1000,1000,23300660] [0,0,178 -> 178MB], CP
(50) r(t) (49) [10,2330066,1000,1000,23300660] [178,0,178 -> 356MB], CP
(51) ba(+*) (48,50) [8026324,2330066,1000,1000,-1] 
[790,85611347,142683904 -> 228296041MB], SPARK
(52) b(*) (46,51) [8026324,2330066,1000,1000,-1] [142685221,0,1317 -> 
142686537MB], SPARK
(53) ua(+RC) (52) [0,0,-1,-1,-1] [1317,0,0 -> 1317MB], SPARK
(54) u(cast_as_matrix) (53) [1,1,1000,1000,-1] [0,0,0 -> 0MB]
(55) TWrite q (54) [1,1,1000,1000,-1] [0,0,0 -> 0MB], CP
(47) TRead i [0,0,0,0,-1] [0,0,0 -> 0MB], CP
(57) b(+) (47) [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP
(58) u(print) (57) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
{code}

As can be seen above variable q is mistakenly marked as update in place, which 
causes unnecessary copies and thus can negatively affect performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1842) Compression decision lost after recompilation or codegen

2017-08-15 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1842:


 Summary: Compression decision lost after recompilation or codegen
 Key: SYSTEMML-1842
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1842
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm


Even with forced compression (compressed.linalg=true), compression is currently 
not applied if the respective HOP DAG is recompiled or subject to code 
generation. The root cause is an incomplete deep copy of the HOP DAG which 
loses the compression flag.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SYSTEMML-1841) Performance issue codegen outer over ultra-sparse matrices

2017-08-15 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-1841:


 Summary: Performance issue codegen outer over ultra-sparse matrices
 Key: SYSTEMML-1841
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1841
 Project: SystemML
  Issue Type: Bug
Reporter: Matthias Boehm


Experiments with codegen outer operations over the Amazon Books review dataset 
(8,026,324 x 2,330,066, nnz=22,507,155, i.e., sparsity=10^(-6)) showed 
unnecessary overhead for this ultra-sparse data set. This task aims to remove 
this overhead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1443) Handling of plan selection constraints (e.g., memory/blocksize)

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1443.
--
Resolution: Done
  Assignee: Matthias Boehm

> Handling of plan selection constraints (e.g., memory/blocksize)
> ---
>
> Key: SYSTEMML-1443
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1443
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1292) Support spark codegen instructions w/ multiple RDD inputs

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1292.
--
Resolution: Done
  Assignee: Matthias Boehm

> Support spark codegen instructions w/ multiple RDD inputs
> -
>
> Key: SYSTEMML-1292
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1292
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> This task aims to support spark codegen instructions (for all templates) over 
> multiple RDD inputs if not all side inputs fit into the local and remote 
> broadcast memory budgets. In detail, this might entail either (1) generating 
> custom RDD operations and functions for various combinations of input RDDs, 
> or (2) a generalization of the related spark instructions regarding the input 
> RDD construction and a generic function signature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1443) Handling of plan selection constraints (e.g., memory/blocksize)

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1443.


> Handling of plan selection constraints (e.g., memory/blocksize)
> ---
>
> Key: SYSTEMML-1443
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1443
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1292) Support spark codegen instructions w/ multiple RDD inputs

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1292.


> Support spark codegen instructions w/ multiple RDD inputs
> -
>
> Key: SYSTEMML-1292
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1292
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> This task aims to support spark codegen instructions (for all templates) over 
> multiple RDD inputs if not all side inputs fit into the local and remote 
> broadcast memory budgets. In detail, this might entail either (1) generating 
> custom RDD operations and functions for various combinations of input RDDs, 
> or (2) a generalization of the related spark instructions regarding the input 
> RDD construction and a generic function signature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1838) Performance issues sparse/ultra-sparse binary read

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1838.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Performance issues sparse/ultra-sparse binary read
> --
>
> Key: SYSTEMML-1838
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1838
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Recent experiments with PageRank (20 iterations) on a 1M x 1M, sp=0.001 input 
> showed that the actual iterations are indeed very fast, at peak memory 
> bandwidth (i.e., ~500ms per iteration in CP only) but the initial read is 
> unnecessarily slow, and thus dominating the entire execution time. For 
> example, in this scenario, the read took 41s. 
> This task aims to improve the read performance of sparse and ultra-sparse 
> matrices into CP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1838) Performance issues sparse/ultra-sparse binary read

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1838.


> Performance issues sparse/ultra-sparse binary read
> --
>
> Key: SYSTEMML-1838
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1838
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Recent experiments with PageRank (20 iterations) on a 1M x 1M, sp=0.001 input 
> showed that the actual iterations are indeed very fast, at peak memory 
> bandwidth (i.e., ~500ms per iteration in CP only) but the initial read is 
> unnecessarily slow, and thus dominating the entire execution time. For 
> example, in this scenario, the read took 41s. 
> This task aims to improve the read performance of sparse and ultra-sparse 
> matrices into CP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1836) Large GC overhead for scripts w/ row-wise generated operators.

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1836.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> Large GC overhead for scripts w/ row-wise generated operators.
> --
>
> Key: SYSTEMML-1836
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1836
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> This task aims to improve the unnecessary large garbage collection overhead 
> for scripts with many row-wise fused operators. For example, Kmeans and 
> Mlogreg over 10M x 10 inputs show GC overheads of 102s and 37s, respectively.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SYSTEMML-1839) NPE on parfor initialization w/o log4j configuration

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1839.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 1.0

> NPE on parfor initialization w/o log4j configuration
> 
>
> Key: SYSTEMML-1839
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1839
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> When calling SystemML in embedded deployments (e.g., through JMLC), there is 
> not necessarily a log4j configuration in the classpath or JVM arguments. In 
> such environments the static initialization of {{ParForStatementBlock}} fails 
> with a nullpointer exception because we try to obtain the default log level 
> and convert it to string although this default might be null.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (SYSTEMML-1840) Transform spec literals should be checked during validate

2017-08-15 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-1840.


> Transform spec literals should be checked during validate
> -
>
> Key: SYSTEMML-1840
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1840
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 1.0
>
>
> Currently, there is no validation happening for transform specifications 
> during initial compilation. This is very annoying, especially when trying to 
> encode large files, which takes a while to read in, just to find out that the 
> given transform specification was invalid json. Here is an example:
> {code}
> Caused by: org.apache.wink.json4j.JSONException: Expecting '{' on line 1, 
> column 4 instead, obtained token: 'Token: String - 'ids''
> at org.apache.wink.json4j.internal.Parser.parseObject(Parser.java:193)
> at org.apache.wink.json4j.internal.Parser.parse(Parser.java:130)
> at org.apache.wink.json4j.internal.Parser.parse(Parser.java:95)
> at org.apache.wink.json4j.JSONObject.(JSONObject.java:138)
> at 
> org.apache.sysml.runtime.transform.encode.EncoderFactory.createEncoder(EncoderFactory.java:56)
> {code}
> This task aims to parse the transform specification if its available as a 
> literal string during the language validation step.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)