[GitHub] [systemds] Baunsgaard closed pull request #992: [SYSTEMDS-2523] Update to Spark 2.4.6

2020-09-06 Thread GitBox


Baunsgaard closed pull request #992:
URL: https://github.com/apache/systemds/pull/992


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard commented on pull request #992: [SYSTEMDS-2523] Update to Spark 2.4.6

2020-09-06 Thread GitBox


Baunsgaard commented on pull request #992:
URL: https://github.com/apache/systemds/pull/992#issuecomment-687754049


   Closing, because of plans to update Spark version after 2.0 release of 
systemds.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard closed pull request #1016: [SYSTEMDS-2611] Compressed mean

2020-09-06 Thread GitBox


Baunsgaard closed pull request #1016:
URL: https://github.com/apache/systemds/pull/1016


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard commented on pull request #1047: [SYSTEMDS-2651] Add faster multiple federated workers startup

2020-09-06 Thread GitBox


Baunsgaard commented on pull request #1047:
URL: https://github.com/apache/systemds/pull/1047#issuecomment-687735766


   agree! also really like the PR!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard opened a new pull request #1049: [MINOR] Python API hardening, and stability

2020-09-06 Thread GitBox


Baunsgaard opened a new pull request #1049:
URL: https://github.com/apache/systemds/pull/1049


   Minor changes in Python API startup for ease of startup if systemds is 
installed somewhere else it will use that systemds.
   This practically means that if you have systemds home set, it will allow the 
python to use that systemds, while if it is not set, it will default back to 
the installed jar files from the PIP install.
   
   This is a debated topic in  #992, where it is argued that it would make it 
harder for a user if the PIP does not contain the jar files.
   
   but it is now here in a new PR that can do both.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard commented on pull request #1047: [SYSTEMDS-2651] Add faster multiple federated workers startup

2020-09-06 Thread GitBox


Baunsgaard commented on pull request #1047:
URL: https://github.com/apache/systemds/pull/1047#issuecomment-687736854


   But the issue is that it is hard / impossible to debug the processes while 
it is possible when inside the same JVM using threads, and in that context if 
we really want to make it smart then we would need to change the system such 
that it does not produce these static variables and objects that potentially 
also lead to more bugs in other parts of the program.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] mboehm7 commented on pull request #1036: [SYSTEMDS-2635] Builtin for missing value imputation using forward an…

2020-09-06 Thread GitBox


mboehm7 commented on pull request #1036:
URL: https://github.com/apache/systemds/pull/1036#issuecomment-687819591


   LGTM - thanks for the additional builtin function @Shafaq-Siddiqi. I only 
slightly modified the formatting and changed the for loops over columns to 
parfor loops. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] asfgit closed pull request #1036: [SYSTEMDS-2635] Builtin for missing value imputation using forward an…

2020-09-06 Thread GitBox


asfgit closed pull request #1036:
URL: https://github.com/apache/systemds/pull/1036


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] mboehm7 commented on pull request #1046: [SYSTEMDS-2556,2560] Add federated Encoder impute support and improve Omit

2020-09-06 Thread GitBox


mboehm7 commented on pull request #1046:
URL: https://github.com/apache/systemds/pull/1046#issuecomment-687847007


   LGTM - thanks @kev-inn  for the added impute, and general clean. I modified 
the Encoder interface though to keep it independent of data structures from the 
federated backend. Besides formatting, I also left another TODO in Omit (but 
low priority).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] asfgit closed pull request #1046: [SYSTEMDS-2556,2560] Add federated Encoder impute support and improve Omit

2020-09-06 Thread GitBox


asfgit closed pull request #1046:
URL: https://github.com/apache/systemds/pull/1046


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] kev-inn opened a new pull request #1047: [SYSTEMDS-2651] Add faster multiple federated workers startup

2020-09-02 Thread GitBox


kev-inn opened a new pull request #1047:
URL: https://github.com/apache/systemds/pull/1047


   Adds a new function to start multiple federated workers. The function first 
starts multiple processes and then waits for all of them to be ready for a 
connection by pinging (by sending a `CLEAR`) them.
   
   This method cannot be used for threads (running federated workers), because 
they share static variables, therefore their startup interferes with each 
other. Switching purely to processes should definitely be considered.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Shafaq-Siddiqi closed pull request #988: [SYSTEMDS-2658] Synthetic Minority Over-sampling Technique (SMOTE)

2020-09-02 Thread GitBox


Shafaq-Siddiqi closed pull request #988:
URL: https://github.com/apache/systemds/pull/988


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


sebwrede commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486439575



##
File path: src/main/java/org/apache/sysds/parser/DataExpression.java
##
@@ -2239,9 +2251,44 @@ public boolean isRead()
 * Sets privacy of identifier if privacy variable parameter is set.  
 */
private void setPrivacy(){
-   Expression eprivacy = getVarParam("privacy");
-   if ( eprivacy != null ){
-   
getOutput().setPrivacy(PrivacyLevel.valueOf(eprivacy.toString()));
+   Expression eprivacy = getVarParam(PRIVACY);

Review comment:
   Is it easier to read in the current version, @Baunsgaard ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard opened a new pull request #1052: [MINOR] Fix OLE skipList

2020-09-10 Thread GitBox


Baunsgaard opened a new pull request #1052:
URL: https://github.com/apache/systemds/pull/1052


   Fix bug in right multiplication OLE where the disabled skiplist would
   break the execution, because it was disabled.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


sebwrede commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486439247



##
File path: 
src/main/java/org/apache/sysds/runtime/instructions/cp/MultiReturnBuiltinCPInstruction.java
##
@@ -47,6 +47,10 @@ public CPOperand getOutput(int i) {
return _outputs.get(i);
}
 
+   public ArrayList getOutputs(){

Review comment:
   I have changed it now. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede opened a new pull request #1053: [MINOR] Move Privacy Handling of UDFs

2020-09-11 Thread GitBox


sebwrede opened a new pull request #1053:
URL: https://github.com/apache/systemds/pull/1053


   The privacy handling of UDFs is moved from the individual instruction 
classes to the FederatedWorkerHandler.
   This will handle the privacy constraints of the input before any execution 
of the UDF and it will be easier to maintain since the call to the 
PrivacyMonitor does not have to be written in every "execute" implementation of 
FederatedUDF. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede opened a new pull request #1054: [MINOR] Remove Exceptions From FederatedResponse

2020-09-11 Thread GitBox


sebwrede opened a new pull request #1054:
URL: https://github.com/apache/systemds/pull/1054


   Exceptions added to the FederatedResponse risk exposing data from the 
federated worker. Exceptions need to be caught and then a new exception could 
be created and added to the FederatedResponse without all details from the 
original exception. 
   An example of how to add an exception to the FederatedResponse without 
exposing data can be seen in "executeCommand" in FederatedWorkerHandler.java:
   
   `catch (DMLPrivacyException | FederatedWorkerHandlerException ex) {return 
new FederatedResponse(ResponseType.ERROR, ex);}`
   
   This code catches DMLPrivacyExceptions and FederatedWorkerHandlerExceptions 
and any other type of exception is handled by a different catch-block where the 
name of the exception class is put into the message of a 
FederatedWorkerHandlerException. In this way, we can throw DMLPrivacyExceptions 
and FederatedWorkerHandlerExceptions without including any private data in the 
exception message and all other exceptions will not be returned in the 
FederatedResponse. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard merged pull request #1052: [MINOR] Fix OLE skipList

2020-09-11 Thread GitBox


Baunsgaard merged pull request #1052:
URL: https://github.com/apache/systemds/pull/1052


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede opened a new pull request #1051: Fine-Grained Privacy Constraints 2 Rebased

2020-09-09 Thread GitBox


sebwrede opened a new pull request #1051:
URL: https://github.com/apache/systemds/pull/1051


   This PR replaces the [Fine-Grained Privacy Constraints 2 
PR](https://github.com/apache/systemds/pull/985). 
   This PR adds fine-grained privacy constraints and adapts the privacy 
constraint handling to the new federated architecture. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede closed pull request #985: Fine-Grained Privacy Constraints 2

2020-09-09 Thread GitBox


sebwrede closed pull request #985:
URL: https://github.com/apache/systemds/pull/985


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on pull request #985: Fine-Grained Privacy Constraints 2

2020-09-09 Thread GitBox


sebwrede commented on pull request #985:
URL: https://github.com/apache/systemds/pull/985#issuecomment-689635208


   This PR has been replaced by https://github.com/apache/systemds/pull/1051.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] tobiasrieger opened a new pull request #1050: [WIP][SYSTEMDS-2550] implemented Externalizable for ListObjects

2020-09-08 Thread GitBox


tobiasrieger opened a new pull request #1050:
URL: https://github.com/apache/systemds/pull/1050


   To transmit the parameter ListObject of MatrixObjects to the federated 
Workers it was necessary to serialize a list. This pull request also contains a 
unit test, with a recursive call by testing a list in a list. Also old code in 
the test was changed to a parameterized variant for brevity.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] kev-inn commented on pull request #1047: [SYSTEMDS-2651] Add faster multiple federated workers startup

2020-09-07 Thread GitBox


kev-inn commented on pull request #1047:
URL: https://github.com/apache/systemds/pull/1047#issuecomment-688257975


   > But the issue is that it is hard / impossible to debug the processes while 
it is possible when inside the same JVM using threads, and in that context if 
we really want to make it smart then we would need to change the system such 
that it does not produce these static variables and objects that potentially 
also lead to more bugs in other parts of the program.
   
   Yes, that is the one major downside. Intellij supports attaching to the 
debugger to process, but I am not sure if that process can be automated. 
[`ThreadLocal`](https://docs.oracle.com/javase/8/docs/api/java/lang/ThreadLocal.html)
 would be another option for using threads, but I believe that would lead to 
performance loss, even though we only get debug improvements.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] kev-inn edited a comment on pull request #1047: [SYSTEMDS-2651] Add faster multiple federated workers startup

2020-09-07 Thread GitBox


kev-inn edited a comment on pull request #1047:
URL: https://github.com/apache/systemds/pull/1047#issuecomment-688257975


   > But the issue is that it is hard / impossible to debug the processes while 
it is possible when inside the same JVM using threads, and in that context if 
we really want to make it smart then we would need to change the system such 
that it does not produce these static variables and objects that potentially 
also lead to more bugs in other parts of the program.
   
   Yes, that is the one major downside. Intellij supports attaching to the 
debugger to process, but I am not sure if that operation can be automated. 
[`ThreadLocal`](https://docs.oracle.com/javase/8/docs/api/java/lang/ThreadLocal.html)
 would be another option for using threads, but I believe that would lead to 
performance loss, even though we only get debug improvements.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard merged pull request #1052: [MINOR] Fix OLE skipList

2020-09-12 Thread GitBox


Baunsgaard merged pull request #1052:
URL: https://github.com/apache/systemds/pull/1052







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on a change in pull request #1054: [MINOR] Remove Exceptions From FederatedResponse

2020-09-14 Thread GitBox


sebwrede commented on a change in pull request #1054:
URL: https://github.com/apache/systemds/pull/1054#discussion_r487786099



##
File path: 
src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederatedWorkerHandler.java
##
@@ -155,7 +155,7 @@ private FederatedResponse executeCommand(FederatedRequest 
request) {
catch (Exception ex) {
return new FederatedResponse(ResponseType.ERROR,
new FederatedWorkerHandlerException("Exception 
of type "
-   + ex.getClass() + " thrown when processing 
request", ex));
+   + ex.getClass() + " thrown when processing 
request"));

Review comment:
   We need to send the DMLPrivacyExceptions since we want the coordinator 
to understand why a FederatedRequest was rejected by the worker. In that way, 
it would later be possible for the coordinator to mitigate the problem by 
sending a different request which does not violate the privacy constraints.
   The DMLPrivacyException class is only used when checking privacy 
constraints, hence we control which information is put in the instances of the 
class. This means that we can safely return the exception to the coordinator 
without worrying about exposing private data. The same idea applies to the 
FederatedWorkerHandlerException: we create instances of this class and we are 
therefore able to control the information in the exception instances. 
   This is different from the other exceptions because they could be thrown 
from anywhere, be of any exception type, and contain any kind of information. 
This means that it is safer to catch such exceptions and return our own 
exception type with a message we control without including the original 
exception in the federated response. 
   We could add the catch clause for DMLPrivacyExceptions in other place, but 
generally I try to throw exceptions early and catch them late. If I catch 
DMLPrivacyExceptions in other places it should be to rethrow them and only 
catch them in executeCommand so that we only create the FederatedResponse once 
instead of having that logic in several different places. Also, the 
DMLPrivacyException is only thrown in some places which means that it should 
never be caught in for instance the "execClear" method. However, we can still 
add it so that the exception is caught if the code is changed to throw the 
privacy exception in the future. 
   I have changed the code now so that it has the catch clause for 
DMLPrivacyException and FederatedWorkerHandlerException before the Exception 
catch clauses. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Shafaq-Siddiqi opened a new pull request #1055: [SYSTEMDS-2661, 2662]: Pipelines Optimizer and various minor built-ins

2020-09-14 Thread GitBox


Shafaq-Siddiqi opened a new pull request #1055:
URL: https://github.com/apache/systemds/pull/1055


   This commit contains,
   1. Optimizer for cleaning pipelines
   2. Minor built-ins imputeByMean, imputeByMedian, frameSort, vectorToCsv.dml
   3. minor fixes for resolving warnings in different dml scripts



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] asfgit closed pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-13 Thread GitBox


asfgit closed pull request #1051:
URL: https://github.com/apache/systemds/pull/1051


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] mboehm7 commented on pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-13 Thread GitBox


mboehm7 commented on pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#issuecomment-691663906


   LGTM. Thanks for the patch @sebwrede and reconciling the tests with the new 
federated backend. I only made some minor changes: removed warnings, fixed few 
formatting issues, renamed new package to `finegrained`, and moved static 
methods for constraint parsing from `DataExpression` to `PrivacyUtils`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] mboehm7 commented on pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-13 Thread GitBox


mboehm7 commented on pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#issuecomment-691664020


   Also thanks to @Baunsgaard for the earlier review.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] mboehm7 commented on pull request #1053: [MINOR] Move Privacy Handling of UDFs

2020-09-13 Thread GitBox


mboehm7 commented on pull request #1053:
URL: https://github.com/apache/systemds/pull/1053#issuecomment-691666709


   LGTM - thanks @sebwrede 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] asfgit closed pull request #1053: [MINOR] Move Privacy Handling of UDFs

2020-09-13 Thread GitBox


asfgit closed pull request #1053:
URL: https://github.com/apache/systemds/pull/1053


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] mboehm7 commented on a change in pull request #1054: [MINOR] Remove Exceptions From FederatedResponse

2020-09-13 Thread GitBox


mboehm7 commented on a change in pull request #1054:
URL: https://github.com/apache/systemds/pull/1054#discussion_r487525755



##
File path: 
src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederatedWorkerHandler.java
##
@@ -155,7 +155,7 @@ private FederatedResponse executeCommand(FederatedRequest 
request) {
catch (Exception ex) {
return new FederatedResponse(ResponseType.ERROR,
new FederatedWorkerHandlerException("Exception 
of type "
-   + ex.getClass() + " thrown when processing 
request", ex));
+   + ex.getClass() + " thrown when processing 
request"));

Review comment:
   Could you elaborate why we send the exception for privacy exceptions but 
NOT for all other exceptions? Shouldn't it be the other way around? In that 
spirit I would also add the catch clause for PrivacyExceptions in the other 
places, while in general send the exceptions to allow for debugging at the 
coordinator.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] phaniarnab merged pull request #1048: [MINOR] Fix bugs in the release scripts

2020-09-04 Thread GitBox


phaniarnab merged pull request #1048:
URL: https://github.com/apache/systemds/pull/1048


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] phaniarnab opened a new pull request #1048: [MINOR] Fix bugs in the release scripts

2020-09-04 Thread GitBox


phaniarnab opened a new pull request #1048:
URL: https://github.com/apache/systemds/pull/1048


   PR to check if tests are passing.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] kev-inn opened a new pull request #1046: [SYSTEMDS-2556,2560] Add federated Encoder impute support and improve Omit

2020-08-31 Thread GitBox


kev-inn opened a new pull request #1046:
URL: https://github.com/apache/systemds/pull/1046


   Adds support for federated execution for the final encoder 
`EncoderMVImpute`. This should finish support for federated transform 
operations (perf and improvements being TODO).
   
   ## `EncoderMVImput`
   
   Note that I removed quite a bit of code from `EncoderMVImpute` as it seems 
to not be in use at all, please confirm if this is fine.
   
   ## `EncoderOmit`
   
   I added the perf improvement of the TODO, since omit anyway had some 
problems which needed a fix.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


sebwrede commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486407377



##
File path: 
src/main/java/org/apache/sysds/runtime/instructions/spark/SPInstruction.java
##
@@ -41,16 +41,15 @@
}
 
protected final SPType _sptype;
-   protected final Operator _optr;
protected final boolean _requiresLabelUpdate;
 
protected SPInstruction(SPType type, String opcode, String istr) {
this(type, null, opcode, istr);
}
 
protected SPInstruction(SPType type, Operator op, String opcode, String 
istr) {
+   super(op);

Review comment:
   Because they all have the same field, but it was written in each of the 
classes. I placed it into their super class (Instruction) and then decided that 
calling the super constructor is a good idea, since we would then be able to 
change the way the operator is set for all subclasses by changing the 
Instruction constructor. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


sebwrede commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486373173



##
File path: src/main/java/org/apache/sysds/runtime/instructions/Instruction.java
##
@@ -38,7 +41,21 @@
FEDERATED
}

-   private static final Log LOG = 
LogFactory.getLog(Instruction.class.getName());
+   protected static final Log LOG = 
LogFactory.getLog(Instruction.class.getName());

Review comment:
   Private is also good. I do not need the "protected" to be there. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


sebwrede commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486376414



##
File path: src/main/java/org/apache/sysds/parser/DataExpression.java
##
@@ -2239,9 +2251,44 @@ public boolean isRead()
 * Sets privacy of identifier if privacy variable parameter is set.  
 */
private void setPrivacy(){
-   Expression eprivacy = getVarParam("privacy");
-   if ( eprivacy != null ){
-   
getOutput().setPrivacy(PrivacyLevel.valueOf(eprivacy.toString()));
+   Expression eprivacy = getVarParam(PRIVACY);

Review comment:
   Yes, I also considered splitting it up, but the current implementation 
actually fits the general style of this class :stuck_out_tongue: .
   I can change it now, if it is too confusing like this. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


sebwrede commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486379692



##
File path: 
src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederatedWorkerHandler.java
##
@@ -280,8 +281,9 @@ private FederatedResponse execInstruction(FederatedRequest 
request) {
ExecutionContext ec = _ecm.get(request.getTID());
BasicProgramBlock pb = new BasicProgramBlock(null);
pb.getInstructions().clear();
-   pb.getInstructions().add(InstructionParser
-   .parseSingleInstruction((String)request.getParam(0)));
+   Instruction receivedInstruction = InstructionParser
+   
.parseSingleInstruction((String)request.getParam(0));
+   pb.getInstructions().add(receivedInstruction);

Review comment:
   It does not do anything differently. I thought at some point that I 
would need to handle the receivedInstruction, but then I decided against it. I 
could change it back like it was before and the code would work the same way, 
so right now it is just a question of readability: is the new version easier to 
read for you, @Baunsgaard ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


sebwrede commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486383020



##
File path: 
src/main/java/org/apache/sysds/runtime/instructions/cp/MultiReturnBuiltinCPInstruction.java
##
@@ -47,6 +47,10 @@ public CPOperand getOutput(int i) {
return _outputs.get(i);
}
 
+   public ArrayList getOutputs(){

Review comment:
   I also thought about doing that. It will always be an ArrayList, so in 
the end it would not make much of a difference. It is more an API-design thing. 
Do you prefer the List type? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


sebwrede commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486385202



##
File path: 
src/main/java/org/apache/sysds/runtime/instructions/spark/SPInstruction.java
##
@@ -41,16 +41,15 @@
}
 
protected final SPType _sptype;
-   protected final Operator _optr;
protected final boolean _requiresLabelUpdate;
 
protected SPInstruction(SPType type, String opcode, String istr) {
this(type, null, opcode, istr);
}
 
protected SPInstruction(SPType type, Operator op, String opcode, String 
istr) {
+   super(op);

Review comment:
   It calls the super class constructor, which sets the Operator field 
(which is in the super class). 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


Baunsgaard commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486391285



##
File path: 
src/main/java/org/apache/sysds/runtime/instructions/spark/SPInstruction.java
##
@@ -41,16 +41,15 @@
}
 
protected final SPType _sptype;
-   protected final Operator _optr;
protected final boolean _requiresLabelUpdate;
 
protected SPInstruction(SPType type, String opcode, String istr) {
this(type, null, opcode, istr);
}
 
protected SPInstruction(SPType type, Operator op, String opcode, String 
istr) {
+   super(op);

Review comment:
   yes, but why do you need to change it? because this change is done in 
CP, SP, Fed, and GPU, and i don't see why.?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


Baunsgaard commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486393184



##
File path: 
src/main/java/org/apache/sysds/runtime/instructions/cp/MultiReturnBuiltinCPInstruction.java
##
@@ -47,6 +47,10 @@ public CPOperand getOutput(int i) {
return _outputs.get(i);
}
 
+   public ArrayList getOutputs(){

Review comment:
   well, its an opinion thing, i like to return the highest abstraction, i 
think we had a course this was mentioned as a point.
   Software Engineering... :+1: 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


sebwrede commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486393369



##
File path: 
src/main/java/org/apache/sysds/runtime/privacy/FineGrained/FineGrainedPrivacyList.java
##
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.runtime.privacy.FineGrained;
+
+import java.io.Serializable;
+import java.util.AbstractMap;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.Map;
+
+import org.apache.sysds.runtime.privacy.PrivacyConstraint.PrivacyLevel;
+
+/**
+ * Simple implementation of retrieving fine-grained privacy constraints
+ * based on pairs in an ArrayList.
+ */
+public class FineGrainedPrivacyList implements FineGrainedPrivacy {
+
+   private ArrayList> 
constraintCollection = new ArrayList<>();
+
+   @Override
+   public void put(DataRange dataRange, PrivacyLevel privacyLevel) {
+   constraintCollection.add(new AbstractMap.SimpleEntry(dataRange, privacyLevel));
+   }
+
+   @Override
+   public Map getPrivacyLevel(DataRange 
searchRange) {
+   Map matches = new LinkedHashMap<>();

Review comment:
   This depends on how it is used. In this case, the "containsValue" is the 
most used call on the returned map. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


Baunsgaard commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486393950



##
File path: src/main/java/org/apache/sysds/parser/DataExpression.java
##
@@ -2239,9 +2251,44 @@ public boolean isRead()
 * Sets privacy of identifier if privacy variable parameter is set.  
 */
private void setPrivacy(){
-   Expression eprivacy = getVarParam("privacy");
-   if ( eprivacy != null ){
-   
getOutput().setPrivacy(PrivacyLevel.valueOf(eprivacy.toString()));
+   Expression eprivacy = getVarParam(PRIVACY);

Review comment:
   i only ask because when i was going through all the code, this stood 
out, and was ...  to complicated to review.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


sebwrede commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486393369



##
File path: 
src/main/java/org/apache/sysds/runtime/privacy/FineGrained/FineGrainedPrivacyList.java
##
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.runtime.privacy.FineGrained;
+
+import java.io.Serializable;
+import java.util.AbstractMap;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.Map;
+
+import org.apache.sysds.runtime.privacy.PrivacyConstraint.PrivacyLevel;
+
+/**
+ * Simple implementation of retrieving fine-grained privacy constraints
+ * based on pairs in an ArrayList.
+ */
+public class FineGrainedPrivacyList implements FineGrainedPrivacy {
+
+   private ArrayList> 
constraintCollection = new ArrayList<>();
+
+   @Override
+   public void put(DataRange dataRange, PrivacyLevel privacyLevel) {
+   constraintCollection.add(new AbstractMap.SimpleEntry(dataRange, privacyLevel));
+   }
+
+   @Override
+   public Map getPrivacyLevel(DataRange 
searchRange) {
+   Map matches = new LinkedHashMap<>();

Review comment:
   This depends on how it is used. In this case, the "containsValue" is the 
most used call on the returned map. 
   Which map type would you suggest?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


sebwrede commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486396715



##
File path: src/main/java/org/apache/sysds/runtime/privacy/PrivacyPropagator.java
##
@@ -19,15 +19,19 @@
 
 package org.apache.sysds.runtime.privacy;
 
+import java.util.*;

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


sebwrede commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486396082



##
File path: 
src/main/java/org/apache/sysds/runtime/privacy/FineGrained/FineGrainedPrivacyList.java
##
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.runtime.privacy.FineGrained;
+
+import java.io.Serializable;
+import java.util.AbstractMap;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.Map;
+
+import org.apache.sysds.runtime.privacy.PrivacyConstraint.PrivacyLevel;
+
+/**
+ * Simple implementation of retrieving fine-grained privacy constraints
+ * based on pairs in an ArrayList.
+ */
+public class FineGrainedPrivacyList implements FineGrainedPrivacy {
+
+   private ArrayList> 
constraintCollection = new ArrayList<>();
+
+   @Override
+   public void put(DataRange dataRange, PrivacyLevel privacyLevel) {
+   constraintCollection.add(new AbstractMap.SimpleEntry(dataRange, privacyLevel));
+   }
+
+   @Override
+   public Map getPrivacyLevel(DataRange 
searchRange) {
+   Map matches = new LinkedHashMap<>();
+   for ( Map.Entry constraint : 
constraintCollection ){
+   if ( constraint.getKey().overlaps(searchRange) ) 
+   matches.put(constraint.getKey(), 
constraint.getValue());
+   }
+   return matches;
+   }
+
+   @Override
+   public Map getPrivacyLevelOfElement(long[] 
searchIndex) {
+   Map matches = new LinkedHashMap<>();
+   constraintCollection.forEach( constraint -> { 
+   if (constraint.getKey().contains(searchIndex)) 
+   matches.put(constraint.getKey(), 
constraint.getValue()); 
+   } );
+   return matches;
+   }
+
+   @Override
+   public DataRange[] getDataRangesOfPrivacyLevel(PrivacyLevel 
privacyLevel) {
+   ArrayList matches = new ArrayList<>();
+   constraintCollection.forEach(constraint -> { if 
(constraint.getValue() == privacyLevel) matches.add(constraint.getKey()); } );
+   return matches.toArray(new DataRange[0]);
+   }
+
+   @Override
+   public void removeAllConstraints() {
+   constraintCollection.clear();
+   }
+
+   @Override
+   public boolean hasConstraints() {
+   return !constraintCollection.isEmpty();
+   }
+
+   @Override
+   public Map getAllConstraints() {

Review comment:
   This method is currently not in use. At some point, I thought I would 
use it to serialize the fine-grained privacy constraints. The reason I did not 
delete it yet is if I for some reason need it later. Should I delete it or keep 
it as it is? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


sebwrede commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486399379



##
File path: src/main/java/org/apache/sysds/runtime/privacy/PrivacyPropagator.java
##
@@ -414,4 +591,25 @@ else if ( inst instanceof SqlCPInstruction )
instructionOutputNames = new 
String[]{((SqlCPInstruction) inst).getOutputVariableName()};
return instructionOutputNames;
}
+
+   private static ArrayList getOutputOperands(Instruction inst){
+   // The order of the following statements is important
+   if ( inst instanceof 
MultiReturnParameterizedBuiltinCPInstruction )
+   return ((MultiReturnParameterizedBuiltinCPInstruction) 
inst).getOutputs();
+   else if ( inst instanceof MultiReturnBuiltinCPInstruction )
+   return ((MultiReturnBuiltinCPInstruction) 
inst).getOutputs();
+   else if ( inst instanceof ComputationCPInstruction )
+   return getSingletonList(((ComputationCPInstruction) 
inst).getOutput());
+   else if ( inst instanceof VariableCPInstruction )
+   return getSingletonList(((VariableCPInstruction) 
inst).getOutput());
+   else if ( inst instanceof SqlCPInstruction )
+   return getSingletonList(((SqlCPInstruction) 
inst).getOutput());
+   return new ArrayList();

Review comment:
   The alternative is to have a method in Instruction called "getOutputs" 
and then implement it in the subclasses. This is however not the general design 
we use at the moment, but it would make much of the code cleaner to look at. 
For instance, most of the switch statements in this class could also be avoided 
if we changed the design and used Java inheritance instead. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


sebwrede commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486400993



##
File path: 
src/test/java/org/apache/sysds/test/functions/privacy/BuiltinGLMTest.java
##
@@ -228,6 +229,9 @@ public void runtestGLM(PrivacyConstraint privacyConstraint, 
Class expectedExc
{  100,   10,  2,  1.0,  2,  0.0,  3.0,   0.0,  
2.0,  2.5 },   // Binomial two-column.logit
{  200,   10,  2,  1.0,  3,  0.0,  3.0,   0.0,  
2.0,  2.5 },   // Binomial two-column.probit
};
-   return Arrays.asList(data);
+   if ( runAll )
+   return Arrays.asList(data);
+   else
+   return Arrays.asList(new Object[][]{data[0]});

Review comment:
   Yes, it is very slow. The data has now also been reduced, so it is way 
faster now. If we later need the different versions of GLM to test different 
use cases of the privacy constraints, then we can remove the above code and 
thereby include all test cases, but for now it is not necessary. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


sebwrede commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486402441



##
File path: src/test/java/org/apache/sysds/test/functions/privacy/GLMTest.java
##
@@ -154,7 +155,10 @@ public GLMTest (int numRecords_, int numFeatures_, int 
distFamilyType_, double d
{  100,  10,  2, -1.0,  4,  0.0,  0.01, 3.0,  -2.0,  
1.0,  1.0, GLMType.Bernoullicloglog1 }, // Bernoulli {-1, 1}.cloglog
{  200,  10,  2, -1.0,  5,  0.0,  0.01, 3.0,   0.0,  
2.0,  1.0, GLMType.Bernoullicauchit },  // Bernoulli {-1, 1}.cauchit
};
-   return Arrays.asList(data);
+   if ( runAll )
+   return Arrays.asList(data);
+   else 
+   return Arrays.asList( new Object[][]{data[0]} );

Review comment:
   Indeed :smiley: 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] sebwrede commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


sebwrede commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486403288



##
File path: 
src/main/java/org/apache/sysds/runtime/instructions/cp/MultiReturnBuiltinCPInstruction.java
##
@@ -47,6 +47,10 @@ public CPOperand getOutput(int i) {
return _outputs.get(i);
}
 
+   public ArrayList getOutputs(){

Review comment:
   Yes, I usually also go for that option. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


Baunsgaard commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486337594



##
File path: src/main/java/org/apache/sysds/runtime/instructions/Instruction.java
##
@@ -38,7 +41,21 @@
FEDERATED
}

-   private static final Log LOG = 
LogFactory.getLog(Instruction.class.getName());
+   protected static final Log LOG = 
LogFactory.getLog(Instruction.class.getName());

Review comment:
   I made this private to ensure that the LOG is located in the same file 
it is called from.
   If you want to make use of it somewhere else make a new Log object in that 
file.

##
File path: 
src/main/java/org/apache/sysds/runtime/instructions/spark/SPInstruction.java
##
@@ -41,16 +41,15 @@
}
 
protected final SPType _sptype;
-   protected final Operator _optr;
protected final boolean _requiresLabelUpdate;
 
protected SPInstruction(SPType type, String opcode, String istr) {
this(type, null, opcode, istr);
}
 
protected SPInstruction(SPType type, Operator op, String opcode, String 
istr) {
+   super(op);

Review comment:
   can you explain to me this super class call?

##
File path: src/test/java/org/apache/sysds/test/functions/privacy/GLMTest.java
##
@@ -154,7 +155,10 @@ public GLMTest (int numRecords_, int numFeatures_, int 
distFamilyType_, double d
{  100,  10,  2, -1.0,  4,  0.0,  0.01, 3.0,  -2.0,  
1.0,  1.0, GLMType.Bernoullicloglog1 }, // Bernoulli {-1, 1}.cloglog
{  200,  10,  2, -1.0,  5,  0.0,  0.01, 3.0,   0.0,  
2.0,  1.0, GLMType.Bernoullicauchit },  // Bernoulli {-1, 1}.cauchit
};
-   return Arrays.asList(data);
+   if ( runAll )
+   return Arrays.asList(data);
+   else 
+   return Arrays.asList( new Object[][]{data[0]} );

Review comment:
   again the filter?

##
File path: src/main/java/org/apache/sysds/runtime/instructions/Instruction.java
##
@@ -38,7 +41,21 @@
FEDERATED
}

-   private static final Log LOG = 
LogFactory.getLog(Instruction.class.getName());
+   protected static final Log LOG = 
LogFactory.getLog(Instruction.class.getName());
+   protected final Operator _optr;
+
+   protected Instruction(Operator _optr){
+   this._optr = _optr;
+   }
+
+   // local flag for debug output
+   private static final boolean LTRACE = false;
+   static {
+   // for internal debugging only
+   if( LTRACE ) {
+   
Logger.getLogger("org.apache.sysds.runtime.instructions.Instruction").setLevel(Level.TRACE);

Review comment:
   we removed all the instances of this overwriting of logging.
   Please change to use the file in:
   /src/test/resources/log4j.properties
   
   if you want to debug something specific.
   

##
File path: 
src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederatedWorkerHandler.java
##
@@ -280,8 +281,9 @@ private FederatedResponse execInstruction(FederatedRequest 
request) {
ExecutionContext ec = _ecm.get(request.getTID());
BasicProgramBlock pb = new BasicProgramBlock(null);
pb.getInstructions().clear();
-   pb.getInstructions().add(InstructionParser
-   .parseSingleInstruction((String)request.getParam(0)));
+   Instruction receivedInstruction = InstructionParser
+   
.parseSingleInstruction((String)request.getParam(0));
+   pb.getInstructions().add(receivedInstruction);

Review comment:
   I don't see what this change does.

##
File path: 
src/main/java/org/apache/sysds/runtime/privacy/FineGrained/FineGrainedPrivacyList.java
##
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.runtime.privacy.FineGrained;
+
+import java.io.Serializable;
+import java.util.AbstractMap;
+import java.util.ArrayList;
+import 

[GitHub] [systemds] sebwrede commented on a change in pull request #1051: [SYSTEMDS-2605] Fine-Grained Privacy Constraints 2 Rebased

2020-09-10 Thread GitBox


sebwrede commented on a change in pull request #1051:
URL: https://github.com/apache/systemds/pull/1051#discussion_r486369426



##
File path: src/main/java/org/apache/sysds/runtime/instructions/Instruction.java
##
@@ -38,7 +41,21 @@
FEDERATED
}

-   private static final Log LOG = 
LogFactory.getLog(Instruction.class.getName());
+   protected static final Log LOG = 
LogFactory.getLog(Instruction.class.getName());
+   protected final Operator _optr;
+
+   protected Instruction(Operator _optr){
+   this._optr = _optr;
+   }
+
+   // local flag for debug output
+   private static final boolean LTRACE = false;
+   static {
+   // for internal debugging only
+   if( LTRACE ) {
+   
Logger.getLogger("org.apache.sysds.runtime.instructions.Instruction").setLevel(Level.TRACE);

Review comment:
   I am not using it, so I just removed it. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard commented on a change in pull request #1050: [WIP][SYSTEMDS-2550] implemented Externalizable for ListObjects

2020-09-10 Thread GitBox


Baunsgaard commented on a change in pull request #1050:
URL: https://github.com/apache/systemds/pull/1050#discussion_r486332722



##
File path: 
src/test/java/org/apache/sysds/test/component/paramserv/SerializationTest.java
##
@@ -19,7 +19,9 @@
 
 package org.apache.sysds.test.component.paramserv;
 
+import java.io.*;

Review comment:
   no wildcard imports





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] tobiasrieger commented on a change in pull request #1050: [WIP][SYSTEMDS-2550] implemented Externalizable for ListObjects

2020-09-10 Thread GitBox


tobiasrieger commented on a change in pull request #1050:
URL: https://github.com/apache/systemds/pull/1050#discussion_r486364637



##
File path: 
src/test/java/org/apache/sysds/test/component/paramserv/SerializationTest.java
##
@@ -19,7 +19,9 @@
 
 package org.apache.sysds.test.component.paramserv;
 
+import java.io.*;

Review comment:
   thank you, fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] OlgaOvcharenko commented on pull request #1040: Built-in function for bivariate statistics

2020-10-09 Thread GitBox


OlgaOvcharenko commented on pull request #1040:
URL: https://github.com/apache/systemds/pull/1040#issuecomment-706165102


   @Baunsgaard I also pushed modified FederationUtils (with fed min, max, sum, 
mean), because it seems like privacy tests are failing without this changes



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] mboehm7 commented on pull request #1079: fix bug in CostEstimator.java

2020-10-18 Thread GitBox


mboehm7 commented on pull request #1079:
URL: https://github.com/apache/systemds/pull/1079#issuecomment-711149500


   LGTM - thanks @XorSum for catching this issue and the related fix. I'll 
merge this in, as we indeed removed unnecessary entries in this instruction (in 
the past this instruction included both row and column blocksize). However, 
over the last years we mostly switched away from runtime plan costing to 
analytical cost models at hop level. This is the reason why this code path was 
also not tested. If you're doing a project were you need the costs, I would 
recommend to use something like our cost-based operator fusion optimizer: 
   
https://github.com/apache/systemds/blob/master/src/main/java/org/apache/sysds/hops/codegen/opt/PlanSelectionFuseCostBasedV2.java#L1014
 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] asfgit closed pull request #1079: fix bug in CostEstimator.java

2020-10-18 Thread GitBox


asfgit closed pull request #1079:
URL: https://github.com/apache/systemds/pull/1079


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] XorSum opened a new pull request #1079: fix bug in CostEstimator.java

2020-10-15 Thread GitBox


XorSum opened a new pull request #1079:
URL: https://github.com/apache/systemds/pull/1079


   When i use `CostEstimator` to estimate the time cost of a programblock, an  
error occurred:
   
   ``` java
   java.lang.NumberFormatException: For input string: "copy"
at 
java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.base/java.lang.Long.parseLong(Long.java:692)
at java.base/java.lang.Long.parseLong(Long.java:817)
at 
org.apache.sysds.hops.cost.CostEstimator.maintainCPInstVariableStatistics(CostEstimator.java:218)
at 
org.apache.sysds.hops.cost.CostEstimator.rGetTimeEstimate(CostEstimator.java:138)
at 
org.apache.sysds.hops.cost.CostEstimator.getTimeEstimate(CostEstimator.java:89)
at 
org.apache.sysds.hops.cost.CostEstimationWrapper.getTimeEstimate(CostEstimationWrapper.java:79)
   ```
   
   Look at the source code in  `org.apache.sysds.hops.cost.CostEstimator.java`  
from line 204 to line 221:
   
   ``` java
private static void maintainCPInstVariableStatistics( CPInstruction 
inst, HashMap stats )
{
if( inst instanceof VariableCPInstruction )
{
String optype = inst.getOpcode();
String[] parts = 
InstructionUtils.getInstructionParts(inst.toString());

if( optype.equals("createvar") ) {
if( parts.length<10 )
return;
String varname = parts[1];
long rlen = Long.parseLong(parts[6]);
long clen = Long.parseLong(parts[7]);
int blen = Integer.parseInt(parts[8]);
long nnz = Long.parseLong(parts[10]);
VarStats vs = new VarStats(rlen, clen, blen, 
nnz, false);
stats.put(varname, vs);
}
   // ..
   }
   }
   ```
   
   and  an example of  instruction:
   
   ``` 
   CP createvar _mVar9 scratch_space//_p18828_192.168.1.151//_t0/temp0 true 
MATRIX binary -1 1 1000 -1 copy
   ```
   
   The index of nnz is 10 in the source.
   
   Actually, the index of nnz is 9. 
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard merged pull request #1080: [SYSTEMDS-2686] Compressed overlapping column groups

2020-10-19 Thread GitBox


Baunsgaard merged pull request #1080:
URL: https://github.com/apache/systemds/pull/1080


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard closed pull request #1076: Update install.md

2020-10-19 Thread GitBox


Baunsgaard closed pull request #1076:
URL: https://github.com/apache/systemds/pull/1076


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard commented on pull request #1076: Update install.md

2020-10-19 Thread GitBox


Baunsgaard commented on pull request #1076:
URL: https://github.com/apache/systemds/pull/1076#issuecomment-712105055


   Hi @manushree635 
   
   Thanks for the guide, I have taken a look and decided to change the install 
to use Brew.
   Therefore i changed it completely, while also following the Ubuntu guides 
structure.
   Unfortunately while changing it, i missed changing the user to you, you are 
welcome to open another PR, and i will properly merge it as you next time.
   
   You are welcome to reach out if there is any issue with the guide!
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] OlgaOvcharenko opened a new pull request #1081: [SYSTEMDS-2548] Federated right indexing

2020-10-19 Thread GitBox


OlgaOvcharenko opened a new pull request #1081:
URL: https://github.com/apache/systemds/pull/1081


   This PR adds federated right indexing.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard merged pull request #1077: [SYSTEMDS-2613-2614] Sparse & dense compressed Matrix Mult

2020-10-19 Thread GitBox


Baunsgaard merged pull request #1077:
URL: https://github.com/apache/systemds/pull/1077


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard commented on pull request #1078: [MINOR] MKL native MM setNonZeros to number of cells

2020-10-18 Thread GitBox


Baunsgaard commented on pull request #1078:
URL: https://github.com/apache/systemds/pull/1078#issuecomment-711303063


   Closing based on comments



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard opened a new pull request #1080: [SYSTEMDS-2686] Compressed overlapping column groups

2020-10-18 Thread GitBox


Baunsgaard opened a new pull request #1080:
URL: https://github.com/apache/systemds/pull/1080


[SYSTEMDS-2686] Compressed overlapping column groups
   
   This commit change the compressed right multiplication to perform in
   compressed space, resulting in significantly faster execution.
   The technique employed results in overlapping column groups, that contain
   partial results of the matrix multiplication.
   The downside is that some operations that previously was possible on
   compressed space no longer works, for the overlapping column groups.
   
   To still support all the operations, then decompression is used for
   cases where it is impossible to execute on the compressed matrices.
   
   Another addition is that statistics now contain compression and
   decompression times if compression is enabled.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard opened a new pull request #1082: [SYSTEMDS-2689] Decompress Lop Operation

2020-10-19 Thread GitBox


Baunsgaard opened a new pull request #1082:
URL: https://github.com/apache/systemds/pull/1082


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard commented on pull request #1040: Built-in function for bivariate statistics

2020-10-10 Thread GitBox


Baunsgaard commented on pull request #1040:
URL: https://github.com/apache/systemds/pull/1040#issuecomment-706551018


   @OlgaOvcharenko Thanks for the PR. :1st_place_medal: 
   
   While merging i fixed the indentation in the test files, and reduced the 
number of tests run on RowCol aggregate since each test require startup of each 
worker of 1 second it really grows fast. Thanks again.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard opened a new pull request #1078: [MINOR] MKL native MM setNonZeros to number of cells

2020-10-10 Thread GitBox


Baunsgaard opened a new pull request #1078:
URL: https://github.com/apache/systemds/pull/1078


   This PR is sort of an question.
   
   Is it okay for our MKL to assume that the output matrix is fully dense with 
all values !=0? 
   
   If so then we have an potential improved performance.
   
   I have observed up to 30% in some cases (most likely due to inter run 
variance) but around 10% on average, and it is because the native call to 
matrix multiplication has to transfer the matrix out and and then into java 
again and then afterwards it loops through all values to count 0's forcing 
another iteration through the matrix once more.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard opened a new pull request #1077: [SYSTEMDS-2613-2614] Sparse & dense compressed Matrix Mult

2020-10-10 Thread GitBox


Baunsgaard opened a new pull request #1077:
URL: https://github.com/apache/systemds/pull/1077


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] manushree635 opened a new pull request #1076: Update install.md

2020-10-10 Thread GitBox


manushree635 opened a new pull request #1076:
URL: https://github.com/apache/systemds/pull/1076


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard closed pull request #1040: Built-in function for bivariate statistics

2020-10-10 Thread GitBox


Baunsgaard closed pull request #1040:
URL: https://github.com/apache/systemds/pull/1040


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] mboehm7 commented on pull request #1070: Rewriting ALS functions

2020-10-10 Thread GitBox


mboehm7 commented on pull request #1070:
URL: https://github.com/apache/systemds/pull/1070#issuecomment-706555293


   LGTM - thanks for the new builtin functions @gabrielaozegovic. This 
completes the AMLS project. 
   
   Just for the record, during the merge, I made the following modifications: 
(1) added an als function, (2) renamed als_cg and als_ds to alsCG and alsDS 
(for consistency with the lm builtin functions), (3) merged the tests, (4) fix 
the test formatting (tabs over spaces in java code), (5) removed stats/explain 
from tests, (6) fixed the function input arguments (e.g., unnecessary format, 
reg for alsDS) and comments, (7) fixed the output of alsDS (the reason why it 
was failing was that the output R was not transposed because it was written to 
Rt instead of R), and (8) made some minor cleanups within the algorithms.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] asfgit closed pull request #1070: Rewriting ALS functions

2020-10-10 Thread GitBox


asfgit closed pull request #1070:
URL: https://github.com/apache/systemds/pull/1070


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] mboehm7 commented on pull request #1078: [MINOR] MKL native MM setNonZeros to number of cells

2020-10-10 Thread GitBox


mboehm7 commented on pull request #1078:
URL: https://github.com/apache/systemds/pull/1078#issuecomment-706566376


   The current runtime/compiler requires the non-zeros to be exact or unknown 
(-1) - so we cannot simply set it to fully dense. This information could be 
mistakenly used (e.g., when rewriting `sum(X!=0)` to `nnz(X)`)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] asfgit closed pull request #1071: Lasso and PPCA

2020-10-10 Thread GitBox


asfgit closed pull request #1071:
URL: https://github.com/apache/systemds/pull/1071


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] gabrielaozegovic commented on pull request #1070: Rewriting ALS functions

2020-10-09 Thread GitBox


gabrielaozegovic commented on pull request #1070:
URL: https://github.com/apache/systemds/pull/1070#issuecomment-706074537


   I pushed my latest changes to the branch.
   I did output verification, which I think should be fine since it's matrix 
factorization. 
   For some reason, one test is failing again, even though I did not change 
anything else.
   Thank you!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] mboehm7 commented on pull request #1078: [MINOR] MKL native MM setNonZeros to number of cells

2020-10-11 Thread GitBox


mboehm7 commented on pull request #1078:
URL: https://github.com/apache/systemds/pull/1078#issuecomment-706680533


   well, for now I would recommend to properly maintain the non-zeros. It can 
only become a moderate overhead if the output is large compared to the compute 
(cubic compute and linear write vs linear counting). What usually helps is to 
do the nnz maintenance at the end of each multi-threaded computation task, 
which gives locality.  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] OlgaOvcharenko commented on pull request #1040: Built-in function for bivariate statistics

2020-10-09 Thread GitBox


OlgaOvcharenko commented on pull request #1040:
URL: https://github.com/apache/systemds/pull/1040#issuecomment-706165102


   @Baunsgaard I also pushed modified FederationUtils (with fed min, max, sum, 
mean), because it seems like privacy tests are failing without this changes



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] phaniarnab closed pull request #1074: Postprocess Privacy Constraints Before Lineage Cache

2020-10-14 Thread GitBox


phaniarnab closed pull request #1074:
URL: https://github.com/apache/systemds/pull/1074


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] gabrielaozegovic commented on pull request #1070: Rewriting ALS functions

2020-10-09 Thread GitBox


gabrielaozegovic commented on pull request #1070:
URL: https://github.com/apache/systemds/pull/1070#issuecomment-706074537


   I pushed my latest changes to the branch.
   I did output verification, which I think should be fine since it's matrix 
factorization. 
   For some reason, one test is failing again, even though I did not change 
anything else.
   Thank you!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard closed pull request #1038: [MINOR] Fixed a bug in the start script

2020-08-25 Thread GitBox


Baunsgaard closed pull request #1038:
URL: https://github.com/apache/systemds/pull/1038


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Shafaq-Siddiqi opened a new pull request #1034: [SYSTEMDS-2633] Builtin "dml_map()" will execute the given Java code …

2020-08-23 Thread GitBox


Shafaq-Siddiqi opened a new pull request #1034:
URL: https://github.com/apache/systemds/pull/1034


   …on a Frame.
   
   The built-in use the Janino compiler for run-time code generation and 
compilation and accepts a frame and string (containing Java code) as input and 
execute the code in string on frame and returns the output frame.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] OlgaOvcharenko opened a new pull request #1035: Univar and bivar builtins

2020-08-23 Thread GitBox


OlgaOvcharenko opened a new pull request #1035:
URL: https://github.com/apache/systemds/pull/1035


   New univar and bivar builtins. Also fixed federated min and max.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] OlgaOvcharenko closed pull request #1039: Builtin bivar

2020-08-26 Thread GitBox


OlgaOvcharenko closed pull request #1039:
URL: https://github.com/apache/systemds/pull/1039


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] kev-inn opened a new pull request #1027: [SYSTEMDS-2558][SYSTEMDS-2554][SYSTEMDS-2561] Fed frame recode transform (decode) support

2020-08-18 Thread GitBox


kev-inn opened a new pull request #1027:
URL: https://github.com/apache/systemds/pull/1027


   Adds decode support for recode, pass-through and composite (containing only 
those two).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] asfgit closed pull request #1028: [SYSTEMDS-2600,2626] Fix federated backend request interference

2020-08-18 Thread GitBox


asfgit closed pull request #1028:
URL: https://github.com/apache/systemds/pull/1028


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] mboehm7 opened a new pull request #1028: [SYSTEMDS-2600,2626] Fix federated backend request interference

2020-08-18 Thread GitBox


mboehm7 opened a new pull request #1028:
URL: https://github.com/apache/systemds/pull/1028


   This patch fixes two major issues of request interference from multiple
   coordinator threads.
   
   First, we now properly maintain separate execution context at the
   federated site for different request streams from parfor workers which
   otherwise could interfer (e.g., on rmvar instructions for shared input
   variables)
   
   Second, even within a stream federated requests (e.g., execute and
   cleanup) could out output each other if there are no data dependencies
   or synchronization between them. We now added barriers for federated
   requests wherever this was necessary.
   
   Last, this patch also fixes unnecessary warning messages of the parfor
   optimizer, specifically in a setting with forced singlenode execution.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard opened a new pull request #1029: cache specific for tests

2020-08-19 Thread GitBox


Baunsgaard opened a new pull request #1029:
URL: https://github.com/apache/systemds/pull/1029


   Trials for caching in tests using custom actions



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard opened a new pull request #1042: [SYSTEMDS-2645] Python API K-Means algorithm

2020-08-28 Thread GitBox


Baunsgaard opened a new pull request #1042:
URL: https://github.com/apache/systemds/pull/1042


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard opened a new pull request #1044: [SYSTEMDS-2647] Python API MultiLogReg Algorithm

2020-08-28 Thread GitBox


Baunsgaard opened a new pull request #1044:
URL: https://github.com/apache/systemds/pull/1044


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard merged pull request #1042: [SYSTEMDS-2645] Python API K-Means algorithm

2020-08-28 Thread GitBox


Baunsgaard merged pull request #1042:
URL: https://github.com/apache/systemds/pull/1042


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard closed pull request #1043: [SYSTEMDS-2646] Python API PCA algorithm

2020-08-28 Thread GitBox


Baunsgaard closed pull request #1043:
URL: https://github.com/apache/systemds/pull/1043


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard commented on pull request #1043: [SYSTEMDS-2646] Python API PCA algorithm

2020-08-28 Thread GitBox


Baunsgaard commented on pull request #1043:
URL: https://github.com/apache/systemds/pull/1043#issuecomment-682534142


   merged



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard opened a new pull request #1043: [SYSTEMDS-2646] Python API PCA algorithm

2020-08-28 Thread GitBox


Baunsgaard opened a new pull request #1043:
URL: https://github.com/apache/systemds/pull/1043


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] Baunsgaard merged pull request #1044: [SYSTEMDS-2647] Python API MultiLogReg Algorithm

2020-08-28 Thread GitBox


Baunsgaard merged pull request #1044:
URL: https://github.com/apache/systemds/pull/1044


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] mboehm7 commented on pull request #1041: [WIP][top-k slicing] current update for review

2020-08-28 Thread GitBox


mboehm7 commented on pull request #1041:
URL: https://github.com/apache/systemds/pull/1041#issuecomment-683141558


   LGTM - thanks @gilgenbergg for the update and tests, as well as 
incorporating the licenses.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [systemds] asfgit closed pull request #1031: [SYSTEMDS-2555] Federated transform dummycoding encoding decoding

2020-08-28 Thread GitBox


asfgit closed pull request #1031:
URL: https://github.com/apache/systemds/pull/1031


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   5   6   7   8   9   10   >