[GitHub] [systemds] skogler commented on pull request #993: [SYSTEMDS-265] Entity resolution pipelines and primitives.

2020-07-19 Thread GitBox
skogler commented on pull request #993: URL: https://github.com/apache/systemds/pull/993#issuecomment-660726427 Okay, its a problem with the combination of maven surefire and JUnit parameterized tests. Running the test from IDEA works fine. Removing the parametrization also makes

[GitHub] [systemds] skogler commented on pull request #993: [SYSTEMDS-265] Entity resolution pipelines and primitives.

2020-07-19 Thread GitBox
skogler commented on pull request #993: URL: https://github.com/apache/systemds/pull/993#issuecomment-660731660 Yeah, the only way I can find to get the tests to run reliably is to set the maven surefire plugin option `parallel` to `none`.

[GitHub] [systemds] j143 commented on pull request #997: [SYSTEMDS-1863] Full MLContext test for LinearReg

2020-07-20 Thread GitBox
j143 commented on pull request #997: URL: https://github.com/apache/systemds/pull/997#issuecomment-660851656 This can be of interest to @mboehm7 @phaniarnab . Open for feedback from all the devs. :smile: This is an

[GitHub] [systemds] Baunsgaard commented on a change in pull request #993: [SYSTEMDS-265] Entity resolution pipelines and primitives.

2020-07-20 Thread GitBox
Baunsgaard commented on a change in pull request #993: URL: https://github.com/apache/systemds/pull/993#discussion_r457212235 ## File path: scripts/staging/entity-resolution/README.md ## @@ -0,0 +1,99 @@ +# Entity Resolution + +## Pipeline design and primitives + +We provide

[GitHub] [systemds] j143 opened a new pull request #997: [SYSTEMDS-1863] Full MLContext test for LinearReg

2020-07-20 Thread GitBox
j143 opened a new pull request #997: URL: https://github.com/apache/systemds/pull/997 * Takes advantage of existing R algorithm scripts used for codegen testing. * This would improve the testing by allowing us to provide all the necessary inputs into the script.

[GitHub] [systemds] skogler commented on a change in pull request #993: [SYSTEMDS-265] Entity resolution pipelines and primitives.

2020-07-20 Thread GitBox
skogler commented on a change in pull request #993: URL: https://github.com/apache/systemds/pull/993#discussion_r457618489 ## File path: pom.xml ## @@ -257,12 +257,6 @@ 3.0.0-M4

[GitHub] [systemds] skogler commented on a change in pull request #993: [SYSTEMDS-265] Entity resolution pipelines and primitives.

2020-07-20 Thread GitBox
skogler commented on a change in pull request #993: URL: https://github.com/apache/systemds/pull/993#discussion_r457302869 ## File path: src/test/java/org/apache/sysds/test/applications/EntityResolutionBinaryTest.java ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache

[GitHub] [systemds] skogler commented on a change in pull request #993: [SYSTEMDS-265] Entity resolution pipelines and primitives.

2020-07-20 Thread GitBox
skogler commented on a change in pull request #993: URL: https://github.com/apache/systemds/pull/993#discussion_r457312560 ## File path: scripts/staging/entity-resolution/README.md ## @@ -0,0 +1,99 @@ +# Entity Resolution + +## Pipeline design and primitives + +We provide two

[GitHub] [systemds] skogler commented on a change in pull request #993: [SYSTEMDS-265] Entity resolution pipelines and primitives.

2020-07-20 Thread GitBox
skogler commented on a change in pull request #993: URL: https://github.com/apache/systemds/pull/993#discussion_r457312560 ## File path: scripts/staging/entity-resolution/README.md ## @@ -0,0 +1,99 @@ +# Entity Resolution + +## Pipeline design and primitives + +We provide two

[GitHub] [systemds] skogler commented on a change in pull request #993: [SYSTEMDS-265] Entity resolution pipelines and primitives.

2020-07-20 Thread GitBox
skogler commented on a change in pull request #993: URL: https://github.com/apache/systemds/pull/993#discussion_r457312560 ## File path: scripts/staging/entity-resolution/README.md ## @@ -0,0 +1,99 @@ +# Entity Resolution + +## Pipeline design and primitives + +We provide two

[GitHub] [systemds] skogler commented on a change in pull request #993: [SYSTEMDS-265] Entity resolution pipelines and primitives.

2020-07-20 Thread GitBox
skogler commented on a change in pull request #993: URL: https://github.com/apache/systemds/pull/993#discussion_r457312560 ## File path: scripts/staging/entity-resolution/README.md ## @@ -0,0 +1,99 @@ +# Entity Resolution + +## Pipeline design and primitives + +We provide two

[GitHub] [systemds] phaniarnab commented on pull request #997: [SYSTEMDS-1863] Full MLContext test for LinearReg

2020-07-21 Thread GitBox
phaniarnab commented on pull request #997: URL: https://github.com/apache/systemds/pull/997#issuecomment-661749819 This is good. LGTM. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [systemds] phaniarnab commented on pull request #992: [SYSTEMDS-2523] Update to Spark 2.4.6

2020-07-21 Thread GitBox
phaniarnab commented on pull request #992: URL: https://github.com/apache/systemds/pull/992#issuecomment-661754419 I didn't go through all the comments in the cited PRs, but I'm curious to see how this Spark/Hadoop upgrade impacts the performance. It might not improve anything, but also

[GitHub] [systemds] Baunsgaard commented on a change in pull request #993: [SYSTEMDS-265] Entity resolution pipelines and primitives.

2020-07-21 Thread GitBox
Baunsgaard commented on a change in pull request #993: URL: https://github.com/apache/systemds/pull/993#discussion_r457960775 ## File path: scripts/staging/entity-resolution/entity-clustering.dml ## @@ -0,0 +1,119 @@

[GitHub] [systemds] Baunsgaard commented on pull request #993: [SYSTEMDS-265] Entity resolution pipelines and primitives.

2020-07-21 Thread GitBox
Baunsgaard commented on pull request #993: URL: https://github.com/apache/systemds/pull/993#issuecomment-661746853 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [systemds] kev-inn commented on a change in pull request #984: [SYSTEMDS-2540] Stop Exception and Test improvements

2020-07-21 Thread GitBox
kev-inn commented on a change in pull request #984: URL: https://github.com/apache/systemds/pull/984#discussion_r458131888 ## File path: src/test/java/org/apache/sysds/test/functions/misc/DataTypeCastingTest.java ## @@ -85,10 +85,10 @@ public void testMatrixToMatrix()

[GitHub] [systemds] j143 commented on pull request #997: [SYSTEMDS-1863] Full MLContext test for LinearReg

2020-07-21 Thread GitBox
j143 commented on pull request #997: URL: https://github.com/apache/systemds/pull/997#issuecomment-661855002 Thank you. This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [systemds] Shafaq-Siddiqi commented on pull request #988: [SYSTEMDS-353] Synthetic Minority Over-sampling Technique (SMOTE)

2020-07-23 Thread GitBox
Shafaq-Siddiqi commented on pull request #988: URL: https://github.com/apache/systemds/pull/988#issuecomment-663111606 > I know this is in progress, but still commenting: is it possible to replace the `for` blocks with `parfor`? I tried but there are some matrix dependencies

[GitHub] [systemds] mboehm7 commented on pull request #996: Implementation of Hyperband

2020-07-23 Thread GitBox
mboehm7 commented on pull request #996: URL: https://github.com/apache/systemds/pull/996#issuecomment-663176078 Just for closure - now all raised bugs have been fixed in master. @OsChri this was a great catch. This is an

[GitHub] [systemds] kev-inn commented on a change in pull request #984: [SYSTEMDS-2540] Stop Exception and Test improvements

2020-07-22 Thread GitBox
kev-inn commented on a change in pull request #984: URL: https://github.com/apache/systemds/pull/984#discussion_r458937066 ## File path: src/main/java/org/apache/sysds/hops/rewrite/RewriteConstantFolding.java ## @@ -98,13 +98,7 @@ private Hop rConstantFoldingExpression( Hop

[GitHub] [systemds] kev-inn commented on a change in pull request #984: [SYSTEMDS-2540] Stop Exception and Test improvements

2020-07-22 Thread GitBox
kev-inn commented on a change in pull request #984: URL: https://github.com/apache/systemds/pull/984#discussion_r458943856 ## File path: src/test/java/org/apache/sysds/test/functions/io/csv/ReadCSVTest1.java ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software

[GitHub] [systemds] kev-inn commented on pull request #980: [SYSTEMDS-135] log4j.properties simplified

2020-07-22 Thread GitBox
kev-inn commented on pull request #980: URL: https://github.com/apache/systemds/pull/980#issuecomment-662562801 Thanks for the explanation, sounds good. Leaving configurations up to the user is better in my opinion and I also agree with your sentiment on the template files. LGTM :+1:

[GitHub] [systemds] kev-inn commented on a change in pull request #984: [SYSTEMDS-2540] Stop Exception and Test improvements

2020-07-22 Thread GitBox
kev-inn commented on a change in pull request #984: URL: https://github.com/apache/systemds/pull/984#discussion_r458946615 ## File path: src/test/java/org/apache/sysds/test/AutomatedTestBase.java ## @@ -978,19 +989,26 @@ protected void runRScript(boolean newWay) {

[GitHub] [systemds] j143 opened a new pull request #1000: Notebook for SystemDS MLContext on databricks

2020-07-22 Thread GitBox
j143 opened a new pull request #1000: URL: https://github.com/apache/systemds/pull/1000 * Run SystemDS library loaded cluster, with MLContext. * This notebook uses scala. This is an automated message from the Apache Git

[GitHub] [systemds] j143 opened a new pull request #999: Notebook for SystemDS on colab for developers

2020-07-22 Thread GitBox
j143 opened a new pull request #999: URL: https://github.com/apache/systemds/pull/999 * Creates a workspace with all the dependencies for project build. * Helps prototype the DML code in browser. This is an automated

[GitHub] [systemds] j143 commented on pull request #1000: Notebook for SystemDS MLContext on databricks

2020-07-22 Thread GitBox
j143 commented on pull request #1000: URL: https://github.com/apache/systemds/pull/1000#issuecomment-662627140 Protip: (setting up databricks cluster) **Step 1:** ![image](https://user-images.githubusercontent.com/53068787/88215390-3295ab00-cc79-11ea-8fe2-f6c748db649f.png)

[GitHub] [systemds] j143 opened a new pull request #1001: Verify the examples run correctly in the documentation

2020-07-22 Thread GitBox
j143 opened a new pull request #1001: URL: https://github.com/apache/systemds/pull/1001 - contains some changes related to changes make the code work. This is an automated message from the Apache Git Service. To respond to

[GitHub] [systemds] mandadipavan opened a new pull request #995: Updated command to run Univa-Stats.dml algotithm.

2020-07-18 Thread GitBox
mandadipavan opened a new pull request #995: URL: https://github.com/apache/systemds/pull/995 In the example of Univariate Statistics , in the 1st command runStandaloneSystemDS.sh file missing. It should be changed to systemds.

[GitHub] [systemds] mboehm7 commented on pull request #991: [MINOR] Docsupdate 1 Remove timestamp java

2020-07-20 Thread GitBox
mboehm7 commented on pull request #991: URL: https://github.com/apache/systemds/pull/991#issuecomment-661178461 LGTM - thanks @Baunsgaard This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [systemds] asfgit closed pull request #991: [MINOR] Docsupdate 1 Remove timestamp java

2020-07-20 Thread GitBox
asfgit closed pull request #991: URL: https://github.com/apache/systemds/pull/991 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [systemds] mandadipavan closed pull request #995: Updated command to run Univar-Stats.dml algotithm.

2020-07-18 Thread GitBox
mandadipavan closed pull request #995: URL: https://github.com/apache/systemds/pull/995 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [systemds] mboehm7 commented on pull request #996: Implementation of Hyperband

2020-07-20 Thread GitBox
mboehm7 commented on pull request #996: URL: https://github.com/apache/systemds/pull/996#issuecomment-661209935 Also thanks for catching the `eval` issues - I'll fix them in subsequent commits. This is an automated message

[GitHub] [systemds] mboehm7 commented on pull request #996: Implementation of Hyperband

2020-07-20 Thread GitBox
mboehm7 commented on pull request #996: URL: https://github.com/apache/systemds/pull/996#issuecomment-661208460 LGTM - thanks for this great new builtin @OsChri. I just slightly changed the test to use fixed seeds and replaced the for loops for joint sorting of two matrices with

[GitHub] [systemds] skogler commented on a change in pull request #993: [SYSTEMDS-265] Entity resolution pipelines and primitives.

2020-07-20 Thread GitBox
skogler commented on a change in pull request #993: URL: https://github.com/apache/systemds/pull/993#discussion_r457582462 ## File path: src/test/java/org/apache/sysds/test/applications/EntityResolutionBinaryTest.java ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache

[GitHub] [systemds] mboehm7 commented on pull request #984: [SYSTEMDS-2540] Stop Exception and Test improvements

2020-07-25 Thread GitBox
mboehm7 commented on pull request #984: URL: https://github.com/apache/systemds/pull/984#issuecomment-663911394 Thanks for the initiative @Baunsgaard. Some of these changes are very good, on others I'm kind of split. I would recommend we merge it in (with the changes I made) and see how

[GitHub] [systemds] mboehm7 edited a comment on pull request #984: [SYSTEMDS-2540] Stop Exception and Test improvements

2020-07-25 Thread GitBox
mboehm7 edited a comment on pull request #984: URL: https://github.com/apache/systemds/pull/984#issuecomment-663911394 Thanks for the initiative @Baunsgaard. Some of these changes are very good, on others I'm kind of split. I would recommend we merge it in (with the changes I made) and

[GitHub] [systemds] asfgit closed pull request #984: [SYSTEMDS-2540] Stop Exception and Test improvements

2020-07-25 Thread GitBox
asfgit closed pull request #984: URL: https://github.com/apache/systemds/pull/984 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [systemds] Baunsgaard opened a new pull request #1021: [MINOR] Remove Overwrite Logging Level

2020-08-14 Thread GitBox
Baunsgaard opened a new pull request #1021: URL: https://github.com/apache/systemds/pull/1021 This PR removes the local in file overwriting of logging level, where found. This is an automated message from the Apache Git

[GitHub] [systemds] Shafaq-Siddiqi commented on pull request #1146: CorrectTypos Builtin Script

2021-01-11 Thread GitBox
Shafaq-Siddiqi commented on pull request #1146: URL: https://github.com/apache/systemds/pull/1146#issuecomment-758283020 Hello @AlexanderErtl, Thank you for your contribution. If you are still working on the Spark functionality then could you please mark your PR as "WIP" to save it from

[GitHub] [systemds] Baunsgaard opened a new pull request #1151: [SYSTEMDS-2792] Sparse Overlapping Matrix

2021-01-12 Thread GitBox
Baunsgaard opened a new pull request #1151: URL: https://github.com/apache/systemds/pull/1151 This commit change the Overlapping matrix to drastically reduce decompression time in cases of right hand side sparse matrix multiplication. Other than this many of the methods are cleaned

[GitHub] [systemds] juliale-15 opened a new pull request #1135: Design Document for Python Script Generator

2020-12-26 Thread GitBox
juliale-15 opened a new pull request #1135: URL: https://github.com/apache/systemds/pull/1135 @A-Postl and I created this first version of a design document for the python script generator. We would appreciate feedback if our planned approach could work like this or not.

[GitHub] [systemds] gPathpp opened a new pull request #1145: Decision Tree Feature

2021-01-07 Thread GitBox
gPathpp opened a new pull request #1145: URL: https://github.com/apache/systemds/pull/1145 Work in progress. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [systemds] AlexanderErtl opened a new pull request #1146: CorrectTypos Builtin Script

2021-01-07 Thread GitBox
AlexanderErtl opened a new pull request #1146: URL: https://github.com/apache/systemds/pull/1146 CorrectTypos builtin script for ExecType.CP (ExecType.SPARK not currently functional) This is an automated message from the

[GitHub] [systemds] Baunsgaard commented on a change in pull request #1112: [SYSTEMDS-2738] Federated rdiag instruction

2020-11-27 Thread GitBox
Baunsgaard commented on a change in pull request #1112: URL: https://github.com/apache/systemds/pull/1112#discussion_r531578960 ## File path: src/main/java/org/apache/sysds/runtime/instructions/fed/ReorgFEDInstruction.java ## @@ -50,20 +77,196 @@ public static

[GitHub] [systemds] Shafaq-Siddiqi commented on a change in pull request #1117: [WIP] Builtin function statsNA

2020-11-27 Thread GitBox
Shafaq-Siddiqi commented on a change in pull request #1117: URL: https://github.com/apache/systemds/pull/1117#discussion_r531556439 ## File path: scripts/builtin/statsNA.dml ## @@ -0,0 +1,212 @@ +#- +# +# Licensed to

[GitHub] [systemds] Shafaq-Siddiqi commented on a change in pull request #1117: [WIP] Builtin function statsNA

2020-11-27 Thread GitBox
Shafaq-Siddiqi commented on a change in pull request #1117: URL: https://github.com/apache/systemds/pull/1117#discussion_r531554010 ## File path: scripts/builtin/statsNA.dml ## @@ -0,0 +1,212 @@ +#- +# +# Licensed to

[GitHub] [systemds] Shafaq-Siddiqi commented on a change in pull request #1117: [WIP] Builtin function statsNA

2020-11-27 Thread GitBox
Shafaq-Siddiqi commented on a change in pull request #1117: URL: https://github.com/apache/systemds/pull/1117#discussion_r531568712 ## File path: scripts/builtin/statsNA.dml ## @@ -0,0 +1,212 @@ +#- +# +# Licensed to

[GitHub] [systemds] Shafaq-Siddiqi commented on a change in pull request #1117: [WIP] Builtin function statsNA

2020-11-27 Thread GitBox
Shafaq-Siddiqi commented on a change in pull request #1117: URL: https://github.com/apache/systemds/pull/1117#discussion_r531558217 ## File path: scripts/builtin/statsNA.dml ## @@ -0,0 +1,212 @@ +#- +# +# Licensed to

[GitHub] [systemds] Shafaq-Siddiqi commented on a change in pull request #1117: [WIP] Builtin function statsNA

2020-11-27 Thread GitBox
Shafaq-Siddiqi commented on a change in pull request #1117: URL: https://github.com/apache/systemds/pull/1117#discussion_r531566877 ## File path: scripts/builtin/statsNA.dml ## @@ -0,0 +1,212 @@ +#- +# +# Licensed to

[GitHub] [systemds] haubitzer opened a new pull request #1117: [WIP] Builtin function statsNA

2020-11-26 Thread GitBox
haubitzer opened a new pull request #1117: URL: https://github.com/apache/systemds/pull/1117 **Work in progress** * no test implemented yet * open question marked with "TODO" This is an automated message from the

[GitHub] [systemds] Baunsgaard merged pull request #1114: [SYSTEMDS-2696] Overlapping relational operations

2020-11-24 Thread GitBox
Baunsgaard merged pull request #1114: URL: https://github.com/apache/systemds/pull/1114 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [systemds] tobiasrieger commented on pull request #1113: [WIP][SYSTEMDS-2550] Shuffle data partitioner

2020-11-24 Thread GitBox
tobiasrieger commented on pull request #1113: URL: https://github.com/apache/systemds/pull/1113#issuecomment-732810912 I'm talking about the read, that originally reads the input. I've already asked Sebastian B. to take a look, as I don't think the issue is with the parameter server.

[GitHub] [systemds] haubitzer opened a new pull request #1121: Testcase stats na

2020-12-04 Thread GitBox
haubitzer opened a new pull request #1121: URL: https://github.com/apache/systemds/pull/1121 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [systemds] haubitzer closed pull request #1121: Testcase stats na

2020-12-04 Thread GitBox
haubitzer closed pull request #1121: URL: https://github.com/apache/systemds/pull/1121 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [systemds] Baunsgaard merged pull request #1116: [SYSTEMDS-2741] Compressed overlapping unary aggregates

2020-11-24 Thread GitBox
Baunsgaard merged pull request #1116: URL: https://github.com/apache/systemds/pull/1116 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [systemds] Baunsgaard opened a new pull request #1116: [SYSTEMDS-2741] Compressed overlapping unary aggregates

2020-11-24 Thread GitBox
Baunsgaard opened a new pull request #1116: URL: https://github.com/apache/systemds/pull/1116 This commit add the functionality to do unary aggregates on the compressed overlapping matrices This is an automated message from

[GitHub] [systemds] Baunsgaard merged pull request #1107: [SYSTEMDS-2704] Extra tests federated read

2020-11-23 Thread GitBox
Baunsgaard merged pull request #1107: URL: https://github.com/apache/systemds/pull/1107 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [systemds] sebwrede commented on pull request #1113: [WIP][SYSTEMDS-2550] Shuffle data partitioner

2020-11-23 Thread GitBox
sebwrede commented on pull request #1113: URL: https://github.com/apache/systemds/pull/1113#issuecomment-732130997 **Generally, I think this PR looks good.** In AutomatedTestBase:640, you create a new PrivacyConstraint. This means that it writes the privacy level to the metadata file.

[GitHub] [systemds] Baunsgaard commented on pull request #1114: [SYSTEMDS-2696] Overlapping relational operations

2020-11-23 Thread GitBox
Baunsgaard commented on pull request #1114: URL: https://github.com/apache/systemds/pull/1114#issuecomment-732284324 Measurements before was wrong actual results are: ```code cla , 3.1200 lcla, 0.4000 mkl , 62.3300 cla ,

[GitHub] [systemds] Baunsgaard opened a new pull request #1114: [SYSTEMDS-2696] Overlapping relational operations

2020-11-23 Thread GitBox
Baunsgaard opened a new pull request #1114: URL: https://github.com/apache/systemds/pull/1114 This commit adds relational support for relational operations, (< > <= etc) in the compressed space, for overlapping matrices. If the relational expression returns a constant matrix, the

[GitHub] [systemds] Shafaq-Siddiqi opened a new pull request #1115: [SYSTEMDS-2661, 2662]: Data Cleaning Pipelines Version 1

2020-11-23 Thread GitBox
Shafaq-Siddiqi opened a new pull request #1115: URL: https://github.com/apache/systemds/pull/1115 Pipelines Optimizer and various minor built-ins This commit contains, 1. Optimizer for cleaning pipelines 2. Minor built-ins imputeByMean, imputeByMedian, frameSort,

[GitHub] [systemds] Baunsgaard opened a new pull request #1118: [SYSTEMDS-2695 + 2743] Compressed Row parallel left mult & optimized divide

2020-11-28 Thread GitBox
Baunsgaard opened a new pull request #1118: URL: https://github.com/apache/systemds/pull/1118 This PR contains re-enabling parallel left multiplication for sparse matrices, plus row based parallelization of dense. Furthermore, it also contains optimization of Binary and scalar divide,

[GitHub] [systemds] sebwrede opened a new pull request #1120: [MINOR] Fine-Grained Constraints in Privacy Monitor

2020-12-01 Thread GitBox
sebwrede opened a new pull request #1120: URL: https://github.com/apache/systemds/pull/1120 Refactor of the handling of fine-grained privacy constraints in PrivacyMonitor. This also removes some code not needed anymore.

[GitHub] [systemds] Shafaq-Siddiqi closed pull request #1115: [SYSTEMDS-2661, 2662]: Data Cleaning Pipelines Version 1

2020-11-25 Thread GitBox
Shafaq-Siddiqi closed pull request #1115: URL: https://github.com/apache/systemds/pull/1115 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [systemds] Shafaq-Siddiqi closed pull request #1055: [SYSTEMDS-2661, 2662]: Pipelines Optimizer and various minor built-ins

2020-11-23 Thread GitBox
Shafaq-Siddiqi closed pull request #1055: URL: https://github.com/apache/systemds/pull/1055 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [systemds] mboehm7 commented on pull request #1113: [WIP][SYSTEMDS-2550] Shuffle data partitioner

2020-11-28 Thread GitBox
mboehm7 commented on pull request #1113: URL: https://github.com/apache/systemds/pull/1113#issuecomment-735300632 ok, this invalid data consolidation issue is now fixed in master - please rebase. This is an automated

[GitHub] [systemds] Vulturemox opened a new pull request #1119: [WIP] DataWig Design Document

2020-11-29 Thread GitBox
Vulturemox opened a new pull request #1119: URL: https://github.com/apache/systemds/pull/1119 **Work in Progress** This is our first Design Document for the project of implementing DataWig in SystemDS, we would appreciate feedback whether this approach is actually reasonable.

[GitHub] [systemds] Shafaq-Siddiqi commented on a change in pull request #1119: [WIP] DataWig Design Document

2020-12-02 Thread GitBox
Shafaq-Siddiqi commented on a change in pull request #1119: URL: https://github.com/apache/systemds/pull/1119#discussion_r534075365 ## File path: scripts/staging/datawig/DesignDocument.md ## @@ -0,0 +1,52 @@ +# DataWig Design Document +Julian Rakuschek, Noah Ruhmer +### Basic

[GitHub] [systemds] Shafaq-Siddiqi commented on a change in pull request #1119: [WIP] DataWig Design Document

2020-12-01 Thread GitBox
Shafaq-Siddiqi commented on a change in pull request #1119: URL: https://github.com/apache/systemds/pull/1119#discussion_r533299065 ## File path: scripts/staging/datawig/DesignDocument.md ## @@ -0,0 +1,52 @@ +# DataWig Design Document +Julian Rakuschek, Noah Ruhmer +### Basic

[GitHub] [systemds] Vulturemox commented on a change in pull request #1119: [WIP] DataWig Design Document

2020-12-01 Thread GitBox
Vulturemox commented on a change in pull request #1119: URL: https://github.com/apache/systemds/pull/1119#discussion_r533320131 ## File path: scripts/staging/datawig/DesignDocument.md ## @@ -0,0 +1,52 @@ +# DataWig Design Document +Julian Rakuschek, Noah Ruhmer +### Basic Idea

[GitHub] [systemds] Shafaq-Siddiqi commented on a change in pull request #1119: [WIP] DataWig Design Document

2020-12-01 Thread GitBox
Shafaq-Siddiqi commented on a change in pull request #1119: URL: https://github.com/apache/systemds/pull/1119#discussion_r533325224 ## File path: scripts/staging/datawig/DesignDocument.md ## @@ -0,0 +1,52 @@ +# DataWig Design Document +Julian Rakuschek, Noah Ruhmer +### Basic

[GitHub] [systemds] Vulturemox commented on a change in pull request #1119: [WIP] DataWig Design Document

2020-12-01 Thread GitBox
Vulturemox commented on a change in pull request #1119: URL: https://github.com/apache/systemds/pull/1119#discussion_r533365636 ## File path: scripts/staging/datawig/DesignDocument.md ## @@ -0,0 +1,52 @@ +# DataWig Design Document +Julian Rakuschek, Noah Ruhmer +### Basic Idea

[GitHub] [systemds] Shafaq-Siddiqi commented on a change in pull request #1119: [WIP] DataWig Design Document

2020-12-01 Thread GitBox
Shafaq-Siddiqi commented on a change in pull request #1119: URL: https://github.com/apache/systemds/pull/1119#discussion_r533287586 ## File path: scripts/staging/datawig/DesignDocument.md ## @@ -0,0 +1,52 @@ +# DataWig Design Document +Julian Rakuschek, Noah Ruhmer +### Basic

[GitHub] [systemds] Vulturemox commented on a change in pull request #1119: [WIP] DataWig Design Document

2020-12-01 Thread GitBox
Vulturemox commented on a change in pull request #1119: URL: https://github.com/apache/systemds/pull/1119#discussion_r533320677 ## File path: scripts/staging/datawig/DesignDocument.md ## @@ -0,0 +1,52 @@ +# DataWig Design Document +Julian Rakuschek, Noah Ruhmer +### Basic Idea

[GitHub] [systemds] Baunsgaard closed pull request #1118: [SYSTEMDS-2695 + 2743] Compressed Row parallel left mult & optimized divide

2020-12-01 Thread GitBox
Baunsgaard closed pull request #1118: URL: https://github.com/apache/systemds/pull/1118 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [systemds] Shafaq-Siddiqi commented on pull request #1125: [WIP] Scikit-learn converter

2020-12-17 Thread GitBox
Shafaq-Siddiqi commented on pull request #1125: URL: https://github.com/apache/systemds/pull/1125#issuecomment-747368122 Hi, I appreciate the design draft, it is a good effort. I would suggest doing the mapping of Scikit-learn algorithms to DML and vice versa. Keep it simple you only

[GitHub] [systemds] Baunsgaard commented on pull request #1127: [SYSTEMDS-2760] Transpose micro benchmark

2020-12-17 Thread GitBox
Baunsgaard commented on pull request #1127: URL: https://github.com/apache/systemds/pull/1127#issuecomment-747426098 > I'll have a look tonight and see what we can do. Airline was dense, right? Yes airline is dense, and i don't seem to be able to reproduce the bad performance

[GitHub] [systemds] mboehm7 commented on pull request #1127: [SYSTEMDS-2760] Transpose micro benchmark

2020-12-17 Thread GitBox
mboehm7 commented on pull request #1127: URL: https://github.com/apache/systemds/pull/1127#issuecomment-747424448 I'll have a look tonight and see what we can do. Airline was dense, right? This is an automated message from

[GitHub] [systemds] mboehm7 commented on pull request #1126: [BUGFIX] Federated LMCG Bug

2020-12-17 Thread GitBox
mboehm7 commented on pull request #1126: URL: https://github.com/apache/systemds/pull/1126#issuecomment-747405712 LGTM - thanks for the test @sebwrede. I now added explicit error handling for inconsistent federated data characteristics, and fixed the test accordingly (the underlying

[GitHub] [systemds] Baunsgaard commented on pull request #1127: [SYSTEMDS-2760] Transpose micro benchmark

2020-12-17 Thread GitBox
Baunsgaard commented on pull request #1127: URL: https://github.com/apache/systemds/pull/1127#issuecomment-747424955 The large 15 mil case seems to have little to no difference. But there still is a bug somewhere. XPS: ```bash scripts/perftest/results/transpose-large.log

[GitHub] [systemds] Baunsgaard edited a comment on pull request #1127: [SYSTEMDS-2760] Transpose micro benchmark

2020-12-17 Thread GitBox
Baunsgaard edited a comment on pull request #1127: URL: https://github.com/apache/systemds/pull/1127#issuecomment-747426098 > I'll have a look tonight and see what we can do. Airline was dense, right? Yes airline is dense, and i don't seem to be able to reproduce the bad performance

[GitHub] [systemds] asfgit closed pull request #1126: [BUGFIX] Federated LMCG Bug

2020-12-17 Thread GitBox
asfgit closed pull request #1126: URL: https://github.com/apache/systemds/pull/1126 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [systemds] Baunsgaard opened a new pull request #1127: [SYSTEMDS-2760] Transpose micro benchmark

2020-12-17 Thread GitBox
Baunsgaard opened a new pull request #1127: URL: https://github.com/apache/systemds/pull/1127 This PR contains a simple addition to the micro benchmarks. This time transpose of a matrix is measured. 3 basic cases: "skinny" with 2.5mil rows 50 cols "wide" with 50 cols and

[GitHub] [systemds] mboehm7 commented on pull request #1125: [WIP] Scikit-learn converter

2020-12-17 Thread GitBox
mboehm7 commented on pull request #1125: URL: https://github.com/apache/systemds/pull/1125#issuecomment-747535307 In general, that's a good starting point. We had another use case of importing sk-learn pipelines in mind, but adding the sklearn-onnx-dml model converter is also an

[GitHub] [systemds] mboehm7 commented on pull request #1127: [SYSTEMDS-2760] Transpose micro benchmark

2020-12-17 Thread GitBox
mboehm7 commented on pull request #1127: URL: https://github.com/apache/systemds/pull/1127#issuecomment-747736880 ok, I just pushed some minor performance improvements for sparse-sparse transpose operations which reduced the execution time of ten 2.5M x 50 (sparsity=0.1, seed=12)

[GitHub] [systemds] Baunsgaard commented on pull request #1127: [SYSTEMDS-2760] Transpose micro benchmark

2020-12-18 Thread GitBox
Baunsgaard commented on pull request #1127: URL: https://github.com/apache/systemds/pull/1127#issuecomment-748005848 When looking at before and after (the way i tested it was dropping the transpose commit from the history.) it looks like i might have done something wrong in the initial

[GitHub] [systemds] mboehm7 commented on pull request #1123: [SYSTEMDS-2756] Scale and PCA builtin update

2020-12-12 Thread GitBox
mboehm7 commented on pull request #1123: URL: https://github.com/apache/systemds/pull/1123#issuecomment-743759054 Thanks @Baunsgaard for eliminating the unnecessary colMeans in case of center and scale. However, please refrain from unnecessary changes of APIs and external behavior. I'll

[GitHub] [systemds] Baunsgaard opened a new pull request #1124: [SYSTEMDS-2757] PCA Predict and Inverse

2020-12-12 Thread GitBox
Baunsgaard opened a new pull request #1124: URL: https://github.com/apache/systemds/pull/1124 Adds a predict function for PCA and an inverse function. The predict is for unseen data, that the PCA was not trained for just like our other predict functions for other algorithms The

[GitHub] [systemds] Baunsgaard commented on pull request #1123: [SYSTEMDS-2756] Scale and PCA builtin update

2020-12-12 Thread GitBox
Baunsgaard commented on pull request #1123: URL: https://github.com/apache/systemds/pull/1123#issuecomment-743778523 > but you **wanted** to do this ups, logic fine ... execution wrong. Great catch, thanks! This is an

[GitHub] [systemds] asfgit closed pull request #1123: [SYSTEMDS-2756] Scale and PCA builtin update

2020-12-12 Thread GitBox
asfgit closed pull request #1123: URL: https://github.com/apache/systemds/pull/1123 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [systemds] mboehm7 commented on pull request #1123: [SYSTEMDS-2756] Scale and PCA builtin update

2020-12-12 Thread GitBox
mboehm7 commented on pull request #1123: URL: https://github.com/apache/systemds/pull/1123#issuecomment-743766561 ad 1) besides the changed overall behavior, the comment referred to `replace(target=ScaleFactor, pattern=NaN, replacement=1e-16);`, which would need to replace zero as this

[GitHub] [systemds] Baunsgaard commented on a change in pull request #1124: [SYSTEMDS-2757] PCA Predict and Inverse

2020-12-12 Thread GitBox
Baunsgaard commented on a change in pull request #1124: URL: https://github.com/apache/systemds/pull/1124#discussion_r541622750 ## File path: scripts/builtin/scale.dml ## @@ -19,29 +19,48 @@ # #- -# Scale and

[GitHub] [systemds] Baunsgaard commented on pull request #1123: [SYSTEMDS-2756] Scale and PCA builtin update

2020-12-12 Thread GitBox
Baunsgaard commented on pull request #1123: URL: https://github.com/apache/systemds/pull/1123#issuecomment-743764247 1. I changed the replace Nan because the NaN would be introduced in cases of division by zero. therefore it made sense to change the replacement on the scale factor. This

[GitHub] [systemds] mboehm7 commented on pull request #1123: [SYSTEMDS-2756] Scale and PCA builtin update

2020-12-12 Thread GitBox
mboehm7 commented on pull request #1123: URL: https://github.com/apache/systemds/pull/1123#issuecomment-743777089 ad 1) there is a mismatch between what you wanted to do and what your code actual did, the comment just pointed that out. The PR did this `replace(target=ScaleFactor,

[GitHub] [systemds] Baunsgaard commented on pull request #1123: [SYSTEMDS-2756] Scale and PCA builtin update

2020-12-12 Thread GitBox
Baunsgaard commented on pull request #1123: URL: https://github.com/apache/systemds/pull/1123#issuecomment-743772534 > ad 1) besides the changed overall behavior, the comment referred to `replace(target=ScaleFactor, pattern=NaN, replacement=1e-16);`, which would need to replace zero as

[GitHub] [systemds] Baunsgaard opened a new pull request #1122: [SYSTEMDS-2748+2756] Compressed Sparse MM + TSMM + Scale builtin docs

2020-12-11 Thread GitBox
Baunsgaard opened a new pull request #1122: URL: https://github.com/apache/systemds/pull/1122 This commit contains various changes 1. Compressed Sparse matrix multiplication 2. modified matrix multiplication to push down information of transposing to the ba+* op. to allow

[GitHub] [systemds] ywcb00 opened a new pull request #1133: [SYSTEMDS-2747] Federated Quaternary Operation WCeMM

2020-12-23 Thread GitBox
ywcb00 opened a new pull request #1133: URL: https://github.com/apache/systemds/pull/1133 This is a PR for adding WCeMM as a first federated quaternary operation. The PR contains the implementations for parsing and processing the instruction, as well as tests to test the instruction.

[GitHub] [systemds] phaniarnab opened a new pull request #1134: Fix lineage cache eviction test

2020-12-23 Thread GitBox
phaniarnab opened a new pull request #1134: URL: https://github.com/apache/systemds/pull/1134 PR to run tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [systemds] tobiasrieger closed pull request #1113: [SYSTEMDS-2550] Shuffle data partitioner

2020-12-18 Thread GitBox
tobiasrieger closed pull request #1113: URL: https://github.com/apache/systemds/pull/1113 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [systemds] tobiasrieger opened a new pull request #1131: [SYSTEMDS-2550] Balancing and data partitioning

2020-12-18 Thread GitBox
tobiasrieger opened a new pull request #1131: URL: https://github.com/apache/systemds/pull/1131 This PR includes the closed PR #1113 and all changes proposed in its comments. It was rebased on master and consolidated to make it easier to merge Changes list: - Added four new

<    1   2   3   4   5   6   7   8   9   10   >