[GitHub] [flink] walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an util class to build result row and generate …

2019-12-09 Thread GitBox
walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an 
util class to build result row and generate …
URL: https://github.com/apache/flink/pull/9355#discussion_r355813464
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/test/java/org/apache/flink/ml/common/utils/OutputColsHelperTest.java
 ##
 @@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.utils;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.types.Row;
+
+import org.junit.Assert;
+import org.junit.Test;
+
+/**
+ * Unit test for OutputColsHelper.
+ */
+public class OutputColsHelperTest {
+
+   private TableSchema tableSchema = new TableSchema(
+   new String[]{"f0", "f1", "f2"},
+   new TypeInformation[]{
+   TypeInformation.of(String.class),
+   TypeInformation.of(Long.class),
+   TypeInformation.of(Integer.class)
+   }
+   );
+   private String[] reservedColNames = new String[]{"f0"};
+   private Row row = Row.of("a", 1L, 1);
+
+   @Test
+   public void testResultSchema() {
+   TableSchema expectSchema = new TableSchema(
+   new String[]{"f0", "f1", "f2", "res"},
+   new TypeInformation[]{
+   TypeInformation.of(String.class),
+   TypeInformation.of(Long.class),
+   TypeInformation.of(Integer.class),
+   TypeInformation.of(String.class)
+   }
+   );
+   OutputColsHelper helper = new OutputColsHelper(
+   tableSchema, "res",
+   TypeInformation.of(String.class)
+   );
+   Assert.assertEquals(helper.getResultSchema(), expectSchema);
 
 Review comment:
   @xuyang1706 according to the tradition of `junit`, the expected value should 
be the first in the `assertXXX` calls. could you please fix this? 
   See: http://junit.sourceforge.net/javadoc/org/junit/Assert.html


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an util class to build result row and generate …

2019-12-09 Thread GitBox
walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an 
util class to build result row and generate …
URL: https://github.com/apache/flink/pull/9355#discussion_r355579619
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/common/utils/OutputColsHelper.java
 ##
 @@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.utils;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+
+/**
+ * Utils for merging input data with output data.
+ *
+ * Input:
+ * 1) Schema of input data being predicted or transformed.
+ * 2) Output column names of the prediction/transformation operator.
+ * 3) Output column types of the prediction/transformation operator.
+ * 4) Reserved column names, which is a subset of input data's column names 
that we want to preserve.
+ *
+ * Output:
+ * 1)The result data schema. The result data is a combination of the preserved 
columns and the operator's
+ * output columns.
+ *
+ * Several rules are followed:
+ * 1) If reserved columns are not given, then all columns of input data is 
reserved.
+ * 2)The reserved columns are arranged ahead of the operator's output columns 
in the final output.
+ * 3) If some of the reserved column names overlap with those of operator's 
output columns, then the operator's
+ * output columns override the conflicting reserved columns.
+ * 4) The reserved columns in the result table preserve their orders as in the 
input table.
+ *
+ * For example, if we have input data schema of ["id":INT, "f1":FLOAT, 
"f2":DOUBLE], and the operator outputs
+ * a column "label" with type STRING, and we want to preserve the column "id", 
then we get the result
+ * schema of ["id":INT, "label":STRING].
+ *
+ * end user should not directly interact with this helper class. instead it 
will be indirectly used via concrete algorithms.
+ */
+public class OutputColsHelper implements Serializable {
+   private String[] inputColNames;
+   private TypeInformation[] inputColTypes;
 
 Review comment:
   Also added the type generic here `TypeInformation[]`. see 
walterddr@802fa4c


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an util class to build result row and generate …

2019-12-08 Thread GitBox
walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an 
util class to build result row and generate …
URL: https://github.com/apache/flink/pull/9355#discussion_r355205368
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/common/utils/OutputColsHelper.java
 ##
 @@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.utils;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+
+/**
+ * Utils for merging input data with output data.
+ *
+ * Input:
+ * 1) Schema of input data being predicted or transformed.
+ * 2) Output column names of the prediction/transformation operator.
+ * 3) Output column types of the prediction/transformation operator.
+ * 4) Reserved column names, which is a subset of input data's column names 
that we want to preserve.
+ *
+ * Output:
+ * 1)The result data schema. The result data is a combination of the preserved 
columns and the operator's
+ * output columns.
+ *
+ * Several rules are followed:
+ * 1) If reserved columns are not given, then all columns of input data is 
reserved.
+ * 2)The reserved columns are arranged ahead of the operator's output columns 
in the final output.
+ * 3) If some of the reserved column names overlap with those of operator's 
output columns, then the operator's
+ * output columns override the conflicting reserved columns.
+ * 4) The reserved columns in the result table preserve their orders as in the 
input table.
+ *
+ * For example, if we have input data schema of ["id":INT, "f1":FLOAT, 
"f2":DOUBLE], and the operator outputs
+ * a column "label" with type STRING, and we want to preserve the column "id", 
then we get the result
+ * schema of ["id":INT, "label":STRING].
+ *
+ * end user should not directly interact with this helper class. instead it 
will be indirectly used via concrete algorithms.
+ */
+public class OutputColsHelper implements Serializable {
+   private String[] inputColNames;
+   private TypeInformation[] inputColTypes;
+   private String[] outputColNames;
+   private TypeInformation[] outputColTypes;
+
+   /**
+* Column indices in the input data that would be forward to the result.
+*/
+   private int[] reservedCols;
+
+   /**
+* The positions of reserved columns in the result.
+*/
+   private int[] reservedColsPosInResult;
+
+   /**
+* The positions of output columns in the result.
+*/
+   private int[] outputColsPosInResult;
+
+   public OutputColsHelper(TableSchema inputSchema, String outputColName, 
TypeInformation outputColType) {
+   this(inputSchema, outputColName, outputColType, 
inputSchema.getFieldNames());
+   }
+
+   public OutputColsHelper(TableSchema inputSchema, String outputColName, 
TypeInformation outputColType,
+   String[] 
reservedColNames) {
+   this(inputSchema, new String[]{outputColName}, new 
TypeInformation[]{outputColType}, reservedColNames);
+   }
+
+   public OutputColsHelper(TableSchema inputSchema, String[] 
outputColNames, TypeInformation[] outputColTypes) {
+   this(inputSchema, outputColNames, outputColTypes, 
inputSchema.getFieldNames());
+   }
+
+   /**
+* The constructor.
+*
+* @param inputSchema  Schema of input data being predicted or 
transformed.
+* @param outputColNames   Output column names of the 
prediction/transformation operator.
+* @param outputColTypes   Output column types of the 
prediction/transformation operator.
+* @param reservedColNames Reserved column names, which is a subset of 
input data's column names that we want to preserve.
+*/
+   public OutputColsHelper(TableSchema inputSchema, String[] 
outputColNames, TypeInformation[] outputColTypes,
+ 

[GitHub] [flink] walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an util class to build result row and generate …

2019-12-08 Thread GitBox
walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an 
util class to build result row and generate …
URL: https://github.com/apache/flink/pull/9355#discussion_r355205395
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/common/utils/OutputColsHelper.java
 ##
 @@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.utils;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+
+/**
+ * Utils for merging input data with output data.
+ *
+ * Input:
+ * 1) Schema of input data being predicted or transformed.
+ * 2) Output column names of the prediction/transformation operator.
+ * 3) Output column types of the prediction/transformation operator.
+ * 4) Reserved column names, which is a subset of input data's column names 
that we want to preserve.
 
 Review comment:
   nit: list via ``: see 
https://github.com/walterddr/flink/commit/802fa4c6530d90b6322c7be548b0953f17cbf003


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an util class to build result row and generate …

2019-12-04 Thread GitBox
walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an 
util class to build result row and generate …
URL: https://github.com/apache/flink/pull/9355#discussion_r354100028
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/common/utils/OutputColsHelper.java
 ##
 @@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.utils;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+
+/**
+ * Util for generating output schema when doing prediction or transformation.
+ *
+ * Input:
+ * 1) Schema of input data being predicted or transformed.
+ * 2) Output column names of the prediction/transformation operator.
+ * 3) Output column types of the prediction/transformation operator.
+ * 4) Reserved column names, which is a subset of input data's column names 
that we want to preserve.
+ *
+ * Output:
+ * 1)The result data schema. The result data is a combination of the preserved 
columns and the operator's
+ * output columns.
+ *
+ * Several rules are followed:
+ * 1) If reserved columns are not given, then all columns of input data is 
reserved.
+ * 2)The reserved columns are arranged ahead of the operator's output columns 
in the final output.
+ * 3) If some of the reserved column names overlap with those of operator's 
output columns, then the operator's
+ * output columns override the conflicting reserved columns.
+ *
+ * For example, if we have input data schema of ["id":INT, "f1":FLOAT, 
"f2":DOUBLE], and the operator outputs
+ * a column "label" with type STRING, and we want to preserve the column "id", 
then we get the output
+ * schema of ["id":INT, "label":STRING].
+ */
+public class OutputColsHelper implements Serializable {
 
 Review comment:
   In response to my previous comment. I would add a paragraph to make clear in 
the Javadoc that: end user should not directly interact with this helper class. 
instead it will be indirectly used via concrete algorithms. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an util class to build result row and generate …

2019-12-04 Thread GitBox
walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an 
util class to build result row and generate …
URL: https://github.com/apache/flink/pull/9355#discussion_r354099798
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/common/utils/OutputColsHelper.java
 ##
 @@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.utils;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+
+/**
+ * Util for generating output schema when doing prediction or transformation.
+ *
+ * Input:
+ * 1) Schema of input data being predicted or transformed.
+ * 2) Output column names of the prediction/transformation operator.
+ * 3) Output column types of the prediction/transformation operator.
+ * 4) Reserved column names, which is a subset of input data's column names 
that we want to preserve.
+ *
+ * Output:
+ * 1)The result data schema. The result data is a combination of the preserved 
columns and the operator's
+ * output columns.
+ *
+ * Several rules are followed:
+ * 1) If reserved columns are not given, then all columns of input data is 
reserved.
+ * 2)The reserved columns are arranged ahead of the operator's output columns 
in the final output.
+ * 3) If some of the reserved column names overlap with those of operator's 
output columns, then the operator's
 
 Review comment:
   I see your point. I still reserves my opinion. but I think one point you 
made was very crucial: since this is an internal tooling (e.g. end-user does 
not directly interact with this class) I am ok with this behavior. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an util class to build result row and generate …

2019-11-10 Thread GitBox
walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an 
util class to build result row and generate …
URL: https://github.com/apache/flink/pull/9355#discussion_r344502293
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/common/utils/OutputColsHelper.java
 ##
 @@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.utils;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+
+/**
+ * Util for generating output schema when doing prediction or transformation.
+ *
+ * Input:
+ * 1) Schema of input data being predicted or transformed.
+ * 2) Output column names of the prediction/transformation operator.
+ * 3) Output column types of the prediction/transformation operator.
+ * 4) Reserved column names, which is a subset of input data's column names 
that we want to preserve.
+ *
+ * Output:
+ * 1)The result data schema. The result data is a combination of the preserved 
columns and the operator's
+ * output columns.
+ *
+ * Several rules are followed:
+ * 1) If reserved columns are not given, then all columns of input data is 
reserved.
+ * 2)The reserved columns are arranged ahead of the operator's output columns 
in the final output.
+ * 3) If some of the reserved column names overlap with those of operator's 
output columns, then the operator's
+ * output columns override the conflicting reserved columns.
+ *
+ * For example, if we have input data schema of ["id":INT, "f1":FLOAT, 
"f2":DOUBLE], and the operator outputs
+ * a column "label" with type STRING, and we want to preserve the column "id", 
then we get the output
+ * schema of ["id":INT, "label":STRING].
+ */
+public class OutputColsHelper implements Serializable {
 
 Review comment:
   I would suggest putting this as part of the Javadoc
   --> Unlike Scala Java has no module private modifier like `private[flink]`, 
but it is still good to point out IMO


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an util class to build result row and generate …

2019-11-10 Thread GitBox
walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an 
util class to build result row and generate …
URL: https://github.com/apache/flink/pull/9355#discussion_r344502198
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/common/utils/OutputColsHelper.java
 ##
 @@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.utils;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+
+/**
+ * Util for generating output schema when doing prediction or transformation.
+ *
+ * Input:
+ * 1) Schema of input data being predicted or transformed.
+ * 2) Output column names of the prediction/transformation operator.
+ * 3) Output column types of the prediction/transformation operator.
+ * 4) Reserved column names, which is a subset of input data's column names 
that we want to preserve.
+ *
+ * Output:
+ * 1)The result data schema. The result data is a combination of the preserved 
columns and the operator's
+ * output columns.
+ *
+ * Several rules are followed:
+ * 1) If reserved columns are not given, then all columns of input data is 
reserved.
 
 Review comment:
   Please see my comments below:


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an util class to build result row and generate …

2019-11-10 Thread GitBox
walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an 
util class to build result row and generate …
URL: https://github.com/apache/flink/pull/9355#discussion_r344502174
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/common/utils/OutputColsHelper.java
 ##
 @@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.utils;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+
+/**
+ * Util for generating output schema when doing prediction or transformation.
+ *
+ * Input:
+ * 1) Schema of input data being predicted or transformed.
+ * 2) Output column names of the prediction/transformation operator.
+ * 3) Output column types of the prediction/transformation operator.
+ * 4) Reserved column names, which is a subset of input data's column names 
that we want to preserve.
+ *
+ * Output:
+ * 1)The result data schema. The result data is a combination of the preserved 
columns and the operator's
+ * output columns.
+ *
+ * Several rules are followed:
+ * 1) If reserved columns are not given, then all columns of input data is 
reserved.
+ * 2)The reserved columns are arranged ahead of the operator's output columns 
in the final output.
+ * 3) If some of the reserved column names overlap with those of operator's 
output columns, then the operator's
 
 Review comment:
   replying to both comments above: the logic IMO assumes too much user 
behavior: the default reserving all columns in the input does seems convenient. 
but do you think adding a flag from the input argument (e.g. 
`reseverAllColumns`) makes more sense?
   
   As a framework I think the responsibility is to strictly track the final 
result column name uniqueness. By silently drop an input column that users who 
write a transformation might not have the full knowledge of seems to be an 
unnecessary responsibility to hand it over to the user.
   
   I would propose:
   1. by default reserves no input columns (this is also the standard behavior 
of a transform in dataset/datastream - makes it consistent with user 
expectation.
   2. create a `inputAsPassthrough` boolean flag when initializing 
`OutputColsHelper` so that users explicitly have to say "I want all input 
columns reserved"
   3. when `inputAsPassthrough` flag enables; we will take `reservedColumn...` 
arguments into consideration. 
   
   nit: I prefer the word `passthroughColumn...` over `reserve` makes more 
sense to me. It would've been nice if some native English speaker can provide 
more insights on this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an util class to build result row and generate …

2019-11-04 Thread GitBox
walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an 
util class to build result row and generate …
URL: https://github.com/apache/flink/pull/9355#discussion_r342379204
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/common/utils/OutputColsHelper.java
 ##
 @@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.utils;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+
+/**
+ * Util for generating output schema when doing prediction or transformation.
+ *
+ * Input:
+ * 1) Schema of input data being predicted or transformed.
+ * 2) Output column names of the prediction/transformation operator.
+ * 3) Output column types of the prediction/transformation operator.
+ * 4) Reserved column names, which is a subset of input data's column names 
that we want to preserve.
+ *
+ * Output:
+ * 1)The result data schema. The result data is a combination of the preserved 
columns and the operator's
+ * output columns.
+ *
+ * Several rules are followed:
+ * 1) If reserved columns are not given, then all columns of input data is 
reserved.
 
 Review comment:
   I think this is actually the one default that confuses me: why if reserved 
columns are not given then preserves all input data? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an util class to build result row and generate …

2019-11-04 Thread GitBox
walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an 
util class to build result row and generate …
URL: https://github.com/apache/flink/pull/9355#discussion_r342379344
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/common/utils/OutputColsHelper.java
 ##
 @@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.utils;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+
+/**
+ * Util for generating output schema when doing prediction or transformation.
+ *
+ * Input:
+ * 1) Schema of input data being predicted or transformed.
+ * 2) Output column names of the prediction/transformation operator.
+ * 3) Output column types of the prediction/transformation operator.
+ * 4) Reserved column names, which is a subset of input data's column names 
that we want to preserve.
+ *
+ * Output:
+ * 1)The result data schema. The result data is a combination of the preserved 
columns and the operator's
+ * output columns.
+ *
+ * Several rules are followed:
+ * 1) If reserved columns are not given, then all columns of input data is 
reserved.
+ * 2)The reserved columns are arranged ahead of the operator's output columns 
in the final output.
+ * 3) If some of the reserved column names overlap with those of operator's 
output columns, then the operator's
 
 Review comment:
   I think we should throw an exception when output column names and the 
user-defined reserved column names have conflict (ultimately both are defined 
by user, this shouldn't occur on the users' perspective)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an util class to build result row and generate …

2019-11-04 Thread GitBox
walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an 
util class to build result row and generate …
URL: https://github.com/apache/flink/pull/9355#discussion_r342377733
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/common/utils/OutputColsHelper.java
 ##
 @@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.utils;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+
+/**
+ * Util for generating output schema when doing prediction or transformation.
+ *
+ * Input:
+ * 1) Schema of input data being predicted or transformed.
+ * 2) Output column names of the prediction/transformation operator.
+ * 3) Output column types of the prediction/transformation operator.
+ * 4) Reserved column names, which is a subset of input data's column names 
that we want to preserve.
+ *
+ * Output:
+ * 1)The result data schema. The result data is a combination of the preserved 
columns and the operator's
+ * output columns.
+ *
+ * Several rules are followed:
+ * 1) If reserved columns are not given, then all columns of input data is 
reserved.
+ * 2)The reserved columns are arranged ahead of the operator's output columns 
in the final output.
+ * 3) If some of the reserved column names overlap with those of operator's 
output columns, then the operator's
+ * output columns override the conflicting reserved columns.
+ *
+ * For example, if we have input data schema of ["id":INT, "f1":FLOAT, 
"f2":DOUBLE], and the operator outputs
+ * a column "label" with type STRING, and we want to preserve the column "id", 
then we get the output
+ * schema of ["id":INT, "label":STRING].
+ */
+public class OutputColsHelper implements Serializable {
 
 Review comment:
   based on the usage #9413 and #9523 . this can be private to flink, yes? (I 
do not foresee user directly interacting with this Helper function) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an util class to build result row and generate …

2019-10-07 Thread GitBox
walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an 
util class to build result row and generate …
URL: https://github.com/apache/flink/pull/9355#discussion_r332207840
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/common/utils/OutputColsHelper.java
 ##
 @@ -0,0 +1,190 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.utils;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+
+/**
+ * Util for generating output schema.
+ *
+ * Need following information:
+ * 1) data schema
+ * 2) output column names
+ * 3) output column types
+ * 4) keep column names
+ *
+ * The following roles are followed:
+ * 1)If reserved columns is null, then reserve all columns from the origin 
dataSet.
+ * 2)If some of the reserved column names are the same as output column names, 
then they are
 
 Review comment:
   hmm in this case the javadoc is confusing to me still (especially with the 
3rd line). Let's use an example:
   assume the output row is `["a":1, "b":2, "c":true, "d":3.4]`, is my 
assumptions below correct?

   there's only one option of the `dataSchema`:
   ```
   dataSchema: ["a":INT, "b":INT, "c":BOOL, "d":FLOAT]
   ```
   2. `outputColumnName` should be a subset of `{"a", "b", "c", "d"}`, but 
preserves the ordering defined.
   3. `outputColumnType` should match exactly the size of `outputColumnName` 
and should match the exact type at the corresponding index of `dataSchema` - 
(if point3 is true, why do we need outputColumnType?)
   4. `reserveColumnName` should be a subset of `{"a", "b", "c", "d"}`, but 
ordering follows how `dataSchema` defines.
   5. I am assuming you will output the result by first putting 
`reservedColumn`s then `outputColumn`s. (if point 2,4,5 are true, won't you 
have duplicate naming in your final output?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an util class to build result row and generate …

2019-10-07 Thread GitBox
walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an 
util class to build result row and generate …
URL: https://github.com/apache/flink/pull/9355#discussion_r332207840
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/common/utils/OutputColsHelper.java
 ##
 @@ -0,0 +1,190 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.utils;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+
+/**
+ * Util for generating output schema.
+ *
+ * Need following information:
+ * 1) data schema
+ * 2) output column names
+ * 3) output column types
+ * 4) keep column names
+ *
+ * The following roles are followed:
+ * 1)If reserved columns is null, then reserve all columns from the origin 
dataSet.
+ * 2)If some of the reserved column names are the same as output column names, 
then they are
 
 Review comment:
   hmm in this case the javadoc is confusing to me still (especially with point 
#3). Let's use an example:
   assume the output row is `["a":1, "b":2, "c":true, "d":3.4]`, is my 
assumptions below correct?

   there's only one option of the `dataSchema`:
   ```
   dataSchema: ["a":INT, "b":INT, "c":BOOL, "d":FLOAT]
   ```
   2. `outputColumnName` should be a subset of `{"a", "b", "c", "d"}`, but 
preserves the ordering defined.
   3. `outputColumnType` should match exactly the size of `outputColumnName` 
and should match the exact type at the corresponding index of `dataSchema` - 
(if point3 is true, why do we need outputColumnType?)
   4. `reserveColumnName` should be a subset of `{"a", "b", "c", "d"}`, but 
ordering follows how `dataSchema` defines.
   5. I am assuming you will output the result by first putting 
`reservedColumn`s then `outputColumn`s. (if point 2,4,5 are true, won't you 
have duplicate naming in your final output?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an util class to build result row and generate …

2019-09-29 Thread GitBox
walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an 
util class to build result row and generate …
URL: https://github.com/apache/flink/pull/9355#discussion_r329372704
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/common/utils/OutputColsHelper.java
 ##
 @@ -0,0 +1,190 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.utils;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+
+/**
+ * Util for generating output schema.
+ *
+ * Need following information:
+ * 1) data schema
+ * 2) output column names
+ * 3) output column types
+ * 4) keep column names
+ *
+ * The following roles are followed:
+ * 1)If reserved columns is null, then reserve all columns from the origin 
dataSet.
+ * 2)If some of the reserved column names are the same as output column names, 
then they are
 
 Review comment:
   does this means you select a particular subset of columns to output? (and if 
null, output all from `outputColumnName` ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an util class to build result row and generate …

2019-09-29 Thread GitBox
walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an 
util class to build result row and generate …
URL: https://github.com/apache/flink/pull/9355#discussion_r329372857
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/common/utils/OutputColsHelper.java
 ##
 @@ -0,0 +1,190 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.utils;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+
+/**
+ * Util for generating output schema.
+ *
+ * Need following information:
+ * 1) data schema
+ * 2) output column names
+ * 3) output column types
+ * 4) keep column names
+ *
+ * The following roles are followed:
+ * 1)If reserved columns is null, then reserve all columns from the origin 
dataSet.
+ * 2)If some of the reserved column names are the same as output column names, 
then they are
+ * replaced by the output at value and type, but with the kept column names 
have their order kept.
+ * 3)[result columns] = ([reserve columns] subtract [output columns]) + 
[output columns]
+ *
+ */
+public class OutputColsHelper implements Serializable {
+   private transient String[] dataColNames;
+   private transient TypeInformation[] dataColTypes;
+   private transient String[] outputColNames;
+   private transient TypeInformation[] outputColTypes;
+
+   private int resultLength;
+   private int[] reservedColIndices;
+   private int[] reservedToResultIndices;
+   private int[] outputToResultIndices;
+
+   public OutputColsHelper(TableSchema dataSchema, String outputColName, 
TypeInformation outputColType) {
+   this(dataSchema, outputColName, outputColType, null);
+   }
+
+   public OutputColsHelper(TableSchema dataSchema, String outputColName, 
TypeInformation outputColType,
+   String[] 
reservedColNames) {
+   this(dataSchema, new String[] {outputColName}, new 
TypeInformation[] {outputColType}, reservedColNames);
+   }
+
+   public OutputColsHelper(TableSchema dataSchema, String[] 
outputColNames, TypeInformation[] outputColTypes) {
+   this(dataSchema, outputColNames, outputColTypes, null);
+   }
+
+   public OutputColsHelper(TableSchema dataSchema, String[] 
outputColNames, TypeInformation[] outputColTypes,
+   String[] 
reservedColNames) {
+   this.dataColNames = dataSchema.getFieldNames();
+   this.dataColTypes = dataSchema.getFieldTypes();
+   this.outputColNames = outputColNames;
+   this.outputColTypes = outputColTypes;
+
+   HashSet  toReservedCols = new HashSet <>(
+   Arrays.asList(
+   reservedColNames == null ? this.dataColNames : 
reservedColNames
+   )
+   );
+
+   ArrayList  reservedColIndices = new ArrayList 
<>(toReservedCols.size());
+   ArrayList  reservedColToResultIndex = new ArrayList 
<>(toReservedCols.size());
+   outputToResultIndices = new int[outputColNames.length];
+   Arrays.fill(outputToResultIndices, -1);
+   int index = 0;
+   for (int i = 0; i < dataColNames.length; i++) {
+   int key = ArrayUtils.indexOf(outputColNames, 
dataColNames[i]);
+   if (key >= 0) {
+   outputToResultIndices[key] = index++;
+   continue;
+   }
+   if (toReservedCols.contains(dataColNames[i])) {
+   reservedColIndices.add(i);
+   reservedColToResultIndex.add(index++);
+   }
+   }
+   for (int i = 0; i < outputToResultIndices.length; i++) {
+   

[GitHub] [flink] walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an util class to build result row and generate …

2019-09-29 Thread GitBox
walterddr commented on a change in pull request #9355: [FLINK-13577][ml] Add an 
util class to build result row and generate …
URL: https://github.com/apache/flink/pull/9355#discussion_r329372657
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/common/utils/OutputColsHelper.java
 ##
 @@ -0,0 +1,190 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.utils;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+
+/**
+ * Util for generating output schema.
+ *
+ * Need following information:
+ * 1) data schema
+ * 2) output column names
+ * 3) output column types
+ * 4) keep column names
 
 Review comment:
   this should either be `reserved column names` or the following java doc 
should change `reserved column` to `keep column`. 
   
   also if we are using the word `keep` I think it is better to use `kept`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services