[jira] [Created] (SYSTEMML-1843) Wrong loop update-in-place decisions
Matthias Boehm created SYSTEMML-1843: Summary: Wrong loop update-in-place decisions Key: SYSTEMML-1843 URL: https://issues.apache.org/jira/browse/SYSTEMML-1843 Project: SystemML Issue Type: Bug Reporter: Matthias Boehm For special cases, where a matrix is simply updated in a loop, the rewrite for marking updated loop variables as update-in-place mistakenly flags these variables. For example, consider the following script: {code} ... for(i in 1:100) { q = as.matrix(sum(X * U%*%t(V))) print("at iteration "+i); } {code} and the related hop explain output {code} FOR (lines 9-13) [in-place=[q]] --GENERIC (lines 10-12) [recompile=true] (46) TRead X [8026324,2330066,1000,1000,22507155] [0,0,1317 -> 1317MB], CP (48) TRead U [8026324,10,1000,1000,80263240] [0,0,612 -> 612MB], CP (49) TRead V [2330066,10,1000,1000,23300660] [0,0,178 -> 178MB], CP (50) r(t) (49) [10,2330066,1000,1000,23300660] [178,0,178 -> 356MB], CP (51) ba(+*) (48,50) [8026324,2330066,1000,1000,-1] [790,85611347,142683904 -> 228296041MB], SPARK (52) b(*) (46,51) [8026324,2330066,1000,1000,-1] [142685221,0,1317 -> 142686537MB], SPARK (53) ua(+RC) (52) [0,0,-1,-1,-1] [1317,0,0 -> 1317MB], SPARK (54) u(cast_as_matrix) (53) [1,1,1000,1000,-1] [0,0,0 -> 0MB] (55) TWrite q (54) [1,1,1000,1000,-1] [0,0,0 -> 0MB], CP (47) TRead i [0,0,0,0,-1] [0,0,0 -> 0MB], CP (57) b(+) (47) [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP (58) u(print) (57) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB] {code} As can be seen above variable q is mistakenly marked as update in place, which causes unnecessary copies and thus can negatively affect performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (SYSTEMML-1842) Compression decision lost after recompilation or codegen
Matthias Boehm created SYSTEMML-1842: Summary: Compression decision lost after recompilation or codegen Key: SYSTEMML-1842 URL: https://issues.apache.org/jira/browse/SYSTEMML-1842 Project: SystemML Issue Type: Bug Reporter: Matthias Boehm Even with forced compression (compressed.linalg=true), compression is currently not applied if the respective HOP DAG is recompiled or subject to code generation. The root cause is an incomplete deep copy of the HOP DAG which loses the compression flag. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (SYSTEMML-1841) Performance issue codegen outer over ultra-sparse matrices
Matthias Boehm created SYSTEMML-1841: Summary: Performance issue codegen outer over ultra-sparse matrices Key: SYSTEMML-1841 URL: https://issues.apache.org/jira/browse/SYSTEMML-1841 Project: SystemML Issue Type: Bug Reporter: Matthias Boehm Experiments with codegen outer operations over the Amazon Books review dataset (8,026,324 x 2,330,066, nnz=22,507,155, i.e., sparsity=10^(-6)) showed unnecessary overhead for this ultra-sparse data set. This task aims to remove this overhead. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (SYSTEMML-1443) Handling of plan selection constraints (e.g., memory/blocksize)
[ https://issues.apache.org/jira/browse/SYSTEMML-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm resolved SYSTEMML-1443. -- Resolution: Done Assignee: Matthias Boehm > Handling of plan selection constraints (e.g., memory/blocksize) > --- > > Key: SYSTEMML-1443 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1443 > Project: SystemML > Issue Type: Sub-task > Components: Compiler, Runtime >Reporter: Matthias Boehm >Assignee: Matthias Boehm > Fix For: SystemML 1.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (SYSTEMML-1292) Support spark codegen instructions w/ multiple RDD inputs
[ https://issues.apache.org/jira/browse/SYSTEMML-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm resolved SYSTEMML-1292. -- Resolution: Done Assignee: Matthias Boehm > Support spark codegen instructions w/ multiple RDD inputs > - > > Key: SYSTEMML-1292 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1292 > Project: SystemML > Issue Type: Sub-task > Components: Compiler, Runtime >Reporter: Matthias Boehm >Assignee: Matthias Boehm > Fix For: SystemML 1.0 > > > This task aims to support spark codegen instructions (for all templates) over > multiple RDD inputs if not all side inputs fit into the local and remote > broadcast memory budgets. In detail, this might entail either (1) generating > custom RDD operations and functions for various combinations of input RDDs, > or (2) a generalization of the related spark instructions regarding the input > RDD construction and a generic function signature. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (SYSTEMML-1443) Handling of plan selection constraints (e.g., memory/blocksize)
[ https://issues.apache.org/jira/browse/SYSTEMML-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm closed SYSTEMML-1443. > Handling of plan selection constraints (e.g., memory/blocksize) > --- > > Key: SYSTEMML-1443 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1443 > Project: SystemML > Issue Type: Sub-task > Components: Compiler, Runtime >Reporter: Matthias Boehm >Assignee: Matthias Boehm > Fix For: SystemML 1.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (SYSTEMML-1292) Support spark codegen instructions w/ multiple RDD inputs
[ https://issues.apache.org/jira/browse/SYSTEMML-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm closed SYSTEMML-1292. > Support spark codegen instructions w/ multiple RDD inputs > - > > Key: SYSTEMML-1292 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1292 > Project: SystemML > Issue Type: Sub-task > Components: Compiler, Runtime >Reporter: Matthias Boehm >Assignee: Matthias Boehm > Fix For: SystemML 1.0 > > > This task aims to support spark codegen instructions (for all templates) over > multiple RDD inputs if not all side inputs fit into the local and remote > broadcast memory budgets. In detail, this might entail either (1) generating > custom RDD operations and functions for various combinations of input RDDs, > or (2) a generalization of the related spark instructions regarding the input > RDD construction and a generic function signature. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (SYSTEMML-1838) Performance issues sparse/ultra-sparse binary read
[ https://issues.apache.org/jira/browse/SYSTEMML-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm resolved SYSTEMML-1838. -- Resolution: Fixed Assignee: Matthias Boehm Fix Version/s: SystemML 1.0 > Performance issues sparse/ultra-sparse binary read > -- > > Key: SYSTEMML-1838 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1838 > Project: SystemML > Issue Type: Bug >Reporter: Matthias Boehm >Assignee: Matthias Boehm > Fix For: SystemML 1.0 > > > Recent experiments with PageRank (20 iterations) on a 1M x 1M, sp=0.001 input > showed that the actual iterations are indeed very fast, at peak memory > bandwidth (i.e., ~500ms per iteration in CP only) but the initial read is > unnecessarily slow, and thus dominating the entire execution time. For > example, in this scenario, the read took 41s. > This task aims to improve the read performance of sparse and ultra-sparse > matrices into CP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (SYSTEMML-1838) Performance issues sparse/ultra-sparse binary read
[ https://issues.apache.org/jira/browse/SYSTEMML-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm closed SYSTEMML-1838. > Performance issues sparse/ultra-sparse binary read > -- > > Key: SYSTEMML-1838 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1838 > Project: SystemML > Issue Type: Bug >Reporter: Matthias Boehm >Assignee: Matthias Boehm > Fix For: SystemML 1.0 > > > Recent experiments with PageRank (20 iterations) on a 1M x 1M, sp=0.001 input > showed that the actual iterations are indeed very fast, at peak memory > bandwidth (i.e., ~500ms per iteration in CP only) but the initial read is > unnecessarily slow, and thus dominating the entire execution time. For > example, in this scenario, the read took 41s. > This task aims to improve the read performance of sparse and ultra-sparse > matrices into CP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (SYSTEMML-1836) Large GC overhead for scripts w/ row-wise generated operators.
[ https://issues.apache.org/jira/browse/SYSTEMML-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm resolved SYSTEMML-1836. -- Resolution: Fixed Assignee: Matthias Boehm Fix Version/s: SystemML 1.0 > Large GC overhead for scripts w/ row-wise generated operators. > -- > > Key: SYSTEMML-1836 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1836 > Project: SystemML > Issue Type: Bug >Reporter: Matthias Boehm >Assignee: Matthias Boehm > Fix For: SystemML 1.0 > > > This task aims to improve the unnecessary large garbage collection overhead > for scripts with many row-wise fused operators. For example, Kmeans and > Mlogreg over 10M x 10 inputs show GC overheads of 102s and 37s, respectively. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (SYSTEMML-1839) NPE on parfor initialization w/o log4j configuration
[ https://issues.apache.org/jira/browse/SYSTEMML-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm resolved SYSTEMML-1839. -- Resolution: Fixed Assignee: Matthias Boehm Fix Version/s: SystemML 1.0 > NPE on parfor initialization w/o log4j configuration > > > Key: SYSTEMML-1839 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1839 > Project: SystemML > Issue Type: Bug >Reporter: Matthias Boehm >Assignee: Matthias Boehm > Fix For: SystemML 1.0 > > > When calling SystemML in embedded deployments (e.g., through JMLC), there is > not necessarily a log4j configuration in the classpath or JVM arguments. In > such environments the static initialization of {{ParForStatementBlock}} fails > with a nullpointer exception because we try to obtain the default log level > and convert it to string although this default might be null. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (SYSTEMML-1840) Transform spec literals should be checked during validate
[ https://issues.apache.org/jira/browse/SYSTEMML-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm closed SYSTEMML-1840. > Transform spec literals should be checked during validate > - > > Key: SYSTEMML-1840 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1840 > Project: SystemML > Issue Type: Bug >Reporter: Matthias Boehm >Assignee: Matthias Boehm > Fix For: SystemML 1.0 > > > Currently, there is no validation happening for transform specifications > during initial compilation. This is very annoying, especially when trying to > encode large files, which takes a while to read in, just to find out that the > given transform specification was invalid json. Here is an example: > {code} > Caused by: org.apache.wink.json4j.JSONException: Expecting '{' on line 1, > column 4 instead, obtained token: 'Token: String - 'ids'' > at org.apache.wink.json4j.internal.Parser.parseObject(Parser.java:193) > at org.apache.wink.json4j.internal.Parser.parse(Parser.java:130) > at org.apache.wink.json4j.internal.Parser.parse(Parser.java:95) > at org.apache.wink.json4j.JSONObject.(JSONObject.java:138) > at > org.apache.sysml.runtime.transform.encode.EncoderFactory.createEncoder(EncoderFactory.java:56) > {code} > This task aims to parse the transform specification if its available as a > literal string during the language validation step. -- This message was sent by Atlassian JIRA (v6.4.14#64029)