[GitHub] [flink-table-store] JingsongLi commented on a change in pull request #67: [FLINK-26911] Introduce parallelism setter for table store

2022-04-01 Thread GitBox


JingsongLi commented on a change in pull request #67:
URL: https://github.com/apache/flink-table-store/pull/67#discussion_r841016700



##
File path: 
flink-table-store-connector/src/test/java/org/apache/flink/table/store/connector/ReadWriteTableITCase.java
##
@@ -1713,4 +1748,74 @@ private static StreamExecutionEnvironment 
buildBatchEnv() {
 env.setParallelism(2);
 return env;
 }
+
+@Test
+public void testSourceParallelism() throws Exception {
+String managedTable = createSourceAndManagedTable(false, false, false, 
false, false).f1;
+
+// without hint
+String query = prepareSimpleSelectQuery(managedTable, 
Collections.emptyMap());
+assertThat(sourceParallelism(query)).isEqualTo(env.getParallelism());
+
+// with hint
+query =
+prepareSimpleSelectQuery(
+managedTable, 
Collections.singletonMap(SCAN_PARALLELISM.key(), "66"));
+System.out.println(query);
+assertThat(sourceParallelism(query)).isEqualTo(66);
+}
+
+private int sourceParallelism(String sql) {
+DataStream stream = tEnv.toChangelogStream(tEnv.sqlQuery(sql));
+return stream.getParallelism();
+}
+
+@Test
+public void testSinkParallelism() {
+testSinkParallelism(null, env.getParallelism());
+testSinkParallelism(23, 23);
+}
+

Review comment:
   this is for testing both default and hint




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-table-store] JingsongLi commented on a change in pull request #67: [FLINK-26911] Introduce parallelism setter for table store

2022-04-01 Thread GitBox


JingsongLi commented on a change in pull request #67:
URL: https://github.com/apache/flink-table-store/pull/67#discussion_r841016680



##
File path: 
flink-table-store-connector/src/test/java/org/apache/flink/table/store/connector/ReadWriteTableITCase.java
##
@@ -1713,4 +1748,74 @@ private static StreamExecutionEnvironment 
buildBatchEnv() {
 env.setParallelism(2);
 return env;
 }
+
+@Test
+public void testSourceParallelism() throws Exception {

Review comment:
   test both default and hint




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] ZhangChaoming commented on pull request #19325: [FLINK-26943][table] Add DATE_ADD supported in SQL & Table API

2022-04-01 Thread GitBox


ZhangChaoming commented on pull request #19325:
URL: https://github.com/apache/flink/pull/19325#issuecomment-1086557594


   @slinkydeveloper OK, I try use new stack.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-table-store] JingsongLi commented on a change in pull request #67: [FLINK-26911] Introduce parallelism setter for table store

2022-04-01 Thread GitBox


JingsongLi commented on a change in pull request #67:
URL: https://github.com/apache/flink-table-store/pull/67#discussion_r841016341



##
File path: 
flink-table-store-connector/src/main/java/org/apache/flink/table/store/connector/source/TableStoreSource.java
##
@@ -130,15 +135,29 @@ public DataStructureConverter 
createDataStructureConverter(
 },
 projectFields);
 }
-TableStore.SourceBuilder builder =
+
+TableStore.SourceBuilder sourceBuilder =
 tableStore
 .sourceBuilder()
 .withContinuousMode(streaming)
 .withLogSourceProvider(logSourceProvider)
 .withProjection(projectFields)
 
.withPartitionPredicate(PredicateConverter.convert(partitionFilters))
-
.withFieldPredicate(PredicateConverter.convert(fieldFilters));
-return SourceProvider.of(builder.build());
+
.withFieldPredicate(PredicateConverter.convert(fieldFilters))
+
.withParallelism(tableStore.options().get(SCAN_PARALLELISM));
+
+return new DataStreamScanProvider() {
+@Override
+public DataStream produceDataStream(
+ProviderContext providerContext, 
StreamExecutionEnvironment env) {
+return sourceBuilder.withEnv(env).build();
+}
+
+@Override
+public boolean isBounded() {
+return !streaming;

Review comment:
   hybrid mode must be streaming.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] ZhangChaoming commented on a change in pull request #18386: [FLINK-25684][table] Support enhanced `show databases` syntax

2022-04-01 Thread GitBox


ZhangChaoming commented on a change in pull request #18386:
URL: https://github.com/apache/flink/pull/18386#discussion_r841015721



##
File path: 
flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/api/TableEnvironmentTest.scala
##
@@ -656,13 +656,48 @@ class TableEnvironmentTest {
   def testExecuteSqlWithShowDatabases(): Unit = {
 val tableResult1 = tableEnv.executeSql("CREATE DATABASE db1 COMMENT 
'db1_comment'")
 assertEquals(ResultKind.SUCCESS, tableResult1.getResultKind)
-val tableResult2 = tableEnv.executeSql("SHOW DATABASES")
+val tableResult2 = tableEnv.executeSql("CREATE DATABASE db2 COMMENT 
'db2_comment'")
+assertEquals(ResultKind.SUCCESS, tableResult2.getResultKind)

Review comment:
   @slinkydeveloper I can not find a suitable method to comparing 
enumeration variables by using `org.assertj.core.api.Assertions.assertThat`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-ml] lindong28 commented on a change in pull request #73: [FLINK-26626] Add Transformer and Estimator for StandardScaler

2022-04-01 Thread GitBox


lindong28 commented on a change in pull request #73:
URL: https://github.com/apache/flink-ml/pull/73#discussion_r841012351



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/feature/standardscaler/StandardScalerParams.java
##
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.feature.standardscaler;
+
+import org.apache.flink.ml.common.param.HasFeaturesCol;
+import org.apache.flink.ml.common.param.HasPredictionCol;
+import org.apache.flink.ml.param.BooleanParam;
+import org.apache.flink.ml.param.Param;
+
+/**
+ * Params for {@link StandardScaler}.
+ *
+ * @param  The class type of this instance.
+ */
+public interface StandardScalerParams extends HasFeaturesCol, 
HasPredictionCol {

Review comment:
   Spark's `StandardScaler` uses `inputCol` and `outputCol` as its 
parameters. Would it be better to follow the same approach here?
   
   Note that this `StandardScaler` is used to transform features rather than 
making predictions. Thus it does not seem intuitive to name its output as 
`predictionCol`. We can submit a followup PR to fix this problem.
   
   `HasFeaturesCol` should probably be renamed to `HasFeatureCol` as it 
represents just one column.
   
   And `HasInputCol`  is more consistent with `HasOutputCol` than 
`HasFeatureCol` does.

##
File path: 
flink-ml-lib/src/test/java/org/apache/flink/ml/feature/StandardScalerTest.java
##
@@ -0,0 +1,279 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.feature;
+
+import org.apache.flink.api.common.restartstrategy.RestartStrategies;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.ml.feature.standardscaler.StandardScaler;
+import org.apache.flink.ml.feature.standardscaler.StandardScalerModel;
+import org.apache.flink.ml.feature.standardscaler.StandardScalerModelData;
+import org.apache.flink.ml.linalg.DenseVector;
+import org.apache.flink.ml.linalg.Vector;
+import org.apache.flink.ml.linalg.Vectors;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.ml.util.StageTestUtils;
+import 
org.apache.flink.streaming.api.environment.ExecutionCheckpointingOptions;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.test.util.AbstractTestBase;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.collections.IteratorUtils;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+
+/** Tests {@link StandardScaler} and {@link StandardScalerModel}. */
+public class StandardScalerTest extends AbstractTestBase {
+@Rule public final TemporaryFolder tempFolder = new TemporaryFolder();
+private StreamExecutionEnvironment env;
+private StreamTableEnvironment tEnv;
+private Table denseTable;
+
+private final List denseInput =
+Arrays.asList(
+Row.of(Vectors.dense(-2.5, 9, 1)),
+Row.of(Vectors.dense(1.4, -5, 1)),
+Row.of(Vectors.dense(2, -1, 

[jira] [Commented] (FLINK-25738) Translate/fix translations for "FileSystem" connector page of "Connectors > DataStream Connectors"

2022-04-01 Thread RocMarshal (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516222#comment-17516222
 ] 

RocMarshal commented on FLINK-25738:


Hi [~martijnvisser] , I notice that the related PRs were completed. Would we 
close the ticket now?

> Translate/fix translations for "FileSystem" connector page of "Connectors > 
> DataStream Connectors" 
> ---
>
> Key: FLINK-25738
> URL: https://issues.apache.org/jira/browse/FLINK-25738
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Martijn Visser
>Assignee: RocMarshal
>Priority: Major
>  Labels: chinese-translation
>
> After the merge of https://github.com/apache/flink/pull/18288 to resolve 
> https://issues.apache.org/jira/browse/FLINK-20188 multiple pages needs to be 
> translated or changed documentation needs to be reviewed, translated and 
> corrected where possible.
> It involves the following pages from the documentation:
> * docs/content.zh/docs/connectors/datastream/filesystem.md (This has a 
> partial Chinese translation but since it was a complete overhaul, I've copied 
> the English text in)
> * docs/content.zh/docs/connectors/datastream/formats/avro.md
> * docs/content.zh/docs/connectors/datastream/formats/azure_table_storage.md
> * docs/content.zh/docs/connectors/datastream/formats/hadoop.md
> * docs/content.zh/docs/connectors/datastream/formats/mongodb.md
> * docs/content.zh/docs/connectors/datastream/formats/overview.md
> * docs/content.zh/docs/connectors/datastream/formats/parquet.md
> * docs/content.zh/docs/connectors/datastream/formats/text_files.md
> * docs/content.zh/docs/connectors/table/filesystem.md (This has a partial 
> Chinese translation but since it was a complete overhaul, I've copied the 
> English text in)
> * docs/content.zh/docs/deployment/filesystems/s3.md (Just needs a check, it 
> should only be link updates)
> * docs/content.zh/docs/dev/datastream/execution_mode.md (Just needs a check, 
> it should only be link updates)
> * 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27001) Support to specify the resource of the operator

2022-04-01 Thread Aitozi (Jira)
Aitozi created FLINK-27001:
--

 Summary: Support to specify the resource of the operator 
 Key: FLINK-27001
 URL: https://issues.apache.org/jira/browse/FLINK-27001
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Aitozi


Supporting to specify the operator resource requirements and limits



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27000) Support to set JVM args for operator

2022-04-01 Thread Aitozi (Jira)
Aitozi created FLINK-27000:
--

 Summary: Support to set JVM args for operator
 Key: FLINK-27000
 URL: https://issues.apache.org/jira/browse/FLINK-27000
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Aitozi


In production we often need to set the JVM option to operator



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26982) strike a balance between reuse the same RelNode and project/filter/limit/partition push down

2022-04-01 Thread zoucao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516221#comment-17516221
 ] 

zoucao edited comment on FLINK-26982 at 4/2/22 5:25 AM:


I think we can optimize it in 2 ways:
1). add a new method in `Supports project/filter/limit/partition PushDown 
interface`, which is called `boolean enablePushDown()`, then add a new config 
`source.enforce-reuse` whose default value is false, if users want to reuse the 
source relNode, they can invoke the source table by add hints /*+ 
options('source.enforce-reuse'='true') */. In this situation, the method 
enablePushDown will return false to ensure the source table can be reused.

2). scan all the source relNode and calculate the number of the same source 
relNode before doing optimization, if the number is more than the threshold 
specified by users, then we do not push project/filter/limit/partition down to 
ensure reuse.

cc [~jark], [~godfreyhe], What do you think about this?


was (Author: zoucao):
I think we can optimize it in 2 ways:
1). add a new method in Supports project/filter/limit/partition PushDown 
interface, which is called `boolean enablePushDown`, then add a new config 
`source.enforce-reuse` whose default value is false, if users want to reuse the 
source relNode, they can invoke the source table by add hints /*+ 
options('source.enforce-reuse'='true') */. In this situation, the method 
enablePushDown will return false to ensure the source table can be reused.

2). scan all the source relNode and calculate the number of the same source 
relNode before doing optimization, if the number is more than the threshold 
specified by users, then we do not push project/filter/limit/partition down to 
ensure reuse.

cc [~jark], [~godfreyhe], What do you think about this?

>  strike a balance between reuse the same RelNode and 
> project/filter/limit/partition push down
> -
>
> Key: FLINK-26982
> URL: https://issues.apache.org/jira/browse/FLINK-26982
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Reporter: zoucao
>Priority: Major
>
> Now, Flink has effective reuse logic to reuse the same RelNode and subplan, 
> but it will lose efficacy in some situations, like 
> project/filter/limit/partition push down, if one of them is enabled, the new 
> source is not the same as old one, such that the new source can not be reused 
> anymore. 
> For some complicated SQL, many views will be created from the same source 
> table, and the scan RelNode can not be reused, such that many of the same 
> threads about reading source data will be created in one task, which will 
> cause the memory problem and sometimes will cause reading amplification.
> Should we do something to enforce reusing some specific relNode decided by 
> users themselves?
> The following SQL shows the situation proposed above.
> {code:java}
> create table fs(
> a int,
> b string,
> c  bigint
> ) PARTITIONED by ( c )with (
> 'connector' = 'filesystem',
> 'format' = 'csv',
> 'path' = 'file:///tmp/test'
> );
> select * from
>(select * from fs limit 1)
> union all
>(select * from fs where a = 2)
> union all
>(select 1, b, c from fs)
> union all
>(select 1, b, c from fs where c = 1)
> {code}
> == Optimized Execution Plan ==
> {code:java}
> Union(all=[true], union=[a, b, c])
> :- Union(all=[true], union=[a, b, c])
> :  :- Union(all=[true], union=[a, b, c])
> :  :  :- Limit(offset=[0], fetch=[1])
> :  :  :  +- Exchange(distribution=[single])
> :  :  : +- TableSourceScan(table=[[default_catalog, default_database, fs, 
> limit=[1]]], fields=[a, b, c])
> :  :  +- Calc(select=[CAST(2 AS INTEGER) AS a, b, c], where=[(a = 2)])
> :  : +- TableSourceScan(table=[[default_catalog, default_database, fs, 
> filter=[=(a, 2)]]], fields=[a, b, c])
> :  +- Calc(select=[CAST(1 AS INTEGER) AS EXPR$0, b, c])
> : +- TableSourceScan(table=[[default_catalog, default_database, fs, 
> project=[b, c], metadata=[]]], fields=[b, c])
> +- Calc(select=[CAST(1 AS INTEGER) AS EXPR$0, b, CAST(1 AS BIGINT) AS c])
>+- TableSourceScan(table=[[default_catalog, default_database, fs, 
> partitions=[{c=1}], project=[b], metadata=[]]], fields=[b])
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26982) strike a balance between reuse the same RelNode and project/filter/limit/partition push down

2022-04-01 Thread zoucao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516221#comment-17516221
 ] 

zoucao commented on FLINK-26982:


I think we can optimize it in 2 ways:
1). add a new method in Supports project/filter/limit/partition PushDown 
interface, which is called `boolean enablePushDown`, then add a new config 
`source.enforce-reuse` whose default value is false, if users want to reuse the 
source relNode, they can invoke the source table by add hints /*+ 
options('source.enforce-reuse'='true') */. In this situation, the method 
enablePushDown will return false to ensure the source table can be reused.

2). scan all the source relNode and calculate the number of the same source 
relNode before doing optimization, if the number is more than the threshold 
specified by users, then we do not push project/filter/limit/partition down to 
ensure reuse.

cc [~jark], [~godfreyhe], What do you think about this?

>  strike a balance between reuse the same RelNode and 
> project/filter/limit/partition push down
> -
>
> Key: FLINK-26982
> URL: https://issues.apache.org/jira/browse/FLINK-26982
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Reporter: zoucao
>Priority: Major
>
> Now, Flink has effective reuse logic to reuse the same RelNode and subplan, 
> but it will lose efficacy in some situations, like 
> project/filter/limit/partition push down, if one of them is enabled, the new 
> source is not the same as old one, such that the new source can not be reused 
> anymore. 
> For some complicated SQL, many views will be created from the same source 
> table, and the scan RelNode can not be reused, such that many of the same 
> threads about reading source data will be created in one task, which will 
> cause the memory problem and sometimes will cause reading amplification.
> Should we do something to enforce reusing some specific relNode decided by 
> users themselves?
> The following SQL shows the situation proposed above.
> {code:java}
> create table fs(
> a int,
> b string,
> c  bigint
> ) PARTITIONED by ( c )with (
> 'connector' = 'filesystem',
> 'format' = 'csv',
> 'path' = 'file:///tmp/test'
> );
> select * from
>(select * from fs limit 1)
> union all
>(select * from fs where a = 2)
> union all
>(select 1, b, c from fs)
> union all
>(select 1, b, c from fs where c = 1)
> {code}
> == Optimized Execution Plan ==
> {code:java}
> Union(all=[true], union=[a, b, c])
> :- Union(all=[true], union=[a, b, c])
> :  :- Union(all=[true], union=[a, b, c])
> :  :  :- Limit(offset=[0], fetch=[1])
> :  :  :  +- Exchange(distribution=[single])
> :  :  : +- TableSourceScan(table=[[default_catalog, default_database, fs, 
> limit=[1]]], fields=[a, b, c])
> :  :  +- Calc(select=[CAST(2 AS INTEGER) AS a, b, c], where=[(a = 2)])
> :  : +- TableSourceScan(table=[[default_catalog, default_database, fs, 
> filter=[=(a, 2)]]], fields=[a, b, c])
> :  +- Calc(select=[CAST(1 AS INTEGER) AS EXPR$0, b, c])
> : +- TableSourceScan(table=[[default_catalog, default_database, fs, 
> project=[b, c], metadata=[]]], fields=[b, c])
> +- Calc(select=[CAST(1 AS INTEGER) AS EXPR$0, b, CAST(1 AS BIGINT) AS c])
>+- TableSourceScan(table=[[default_catalog, default_database, fs, 
> partitions=[{c=1}], project=[b], metadata=[]]], fields=[b])
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-20808) Remove redundant checkstyle rules

2022-04-01 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-20808:
---
Attachment: (was: image-2022-04-02-12-46-11-005.png)

> Remove redundant checkstyle rules
> -
>
> Key: FLINK-20808
> URL: https://issues.apache.org/jira/browse/FLINK-20808
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System
>Reporter: Chesnay Schepler
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
> Attachments: image-2022-04-02-12-46-28-065.png
>
>
> There are probably a few checkstyle rules that are now enforced by spotless, 
> and we could remove these to clarify the responsibilities of each tool.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26999) Introduce ClickHouse Connector

2022-04-01 Thread RocMarshal (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516218#comment-17516218
 ] 

RocMarshal commented on FLINK-26999:


[~monster#12]  Thanks for the operation.


Recently, the community  try to migrate some related connector to external repo.

Before the ES repo migration, we would consider not opening this ticket for the 
time being, because it needs to be discussed. You can discuss it on the flip 
page. 

 I will open this flip discussion and copy it to you later.

Any idea is appreciated.

cc [~MartijnVisser] 

> Introduce ClickHouse Connector
> --
>
> Key: FLINK-26999
> URL: https://issues.apache.org/jira/browse/FLINK-26999
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Common
>Affects Versions: 1.15.0
>Reporter: ZhuoYu Chen
>Priority: Major
>
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-202%3A+Introduce+ClickHouse+Connector]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] flinkbot edited a comment on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19335:
URL: https://github.com/apache/flink/pull/19335#issuecomment-1086537086


   
   ## CI report:
   
   * d42501cf46bfae3cd30b2927aa4ce23fffd0d81f Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34147)
 
   * f5f269e5c65f55d2d132677f3ce23c148c00bcf4 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34149)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19335:
URL: https://github.com/apache/flink/pull/19335#issuecomment-1086537086


   
   ## CI report:
   
   * d42501cf46bfae3cd30b2927aa4ce23fffd0d81f Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34147)
 
   * f5f269e5c65f55d2d132677f3ce23c148c00bcf4 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20808) Remove redundant checkstyle rules

2022-04-01 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516216#comment-17516216
 ] 

Aitozi commented on FLINK-20808:


No need to answer, after some search, I found it can be solved by set the IDE 
line separator to {{LF :)}}

!image-2022-04-02-12-46-28-065.png|width=354,height=85!

> Remove redundant checkstyle rules
> -
>
> Key: FLINK-20808
> URL: https://issues.apache.org/jira/browse/FLINK-20808
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System
>Reporter: Chesnay Schepler
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
> Attachments: image-2022-04-02-12-46-11-005.png, 
> image-2022-04-02-12-46-28-065.png
>
>
> There are probably a few checkstyle rules that are now enforced by spotless, 
> and we could remove these to clarify the responsibilities of each tool.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] flinkbot edited a comment on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19335:
URL: https://github.com/apache/flink/pull/19335#issuecomment-1086537086


   
   ## CI report:
   
   * d42501cf46bfae3cd30b2927aa4ce23fffd0d81f Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34147)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-20808) Remove redundant checkstyle rules

2022-04-01 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-20808:
---
Attachment: image-2022-04-02-12-46-28-065.png

> Remove redundant checkstyle rules
> -
>
> Key: FLINK-20808
> URL: https://issues.apache.org/jira/browse/FLINK-20808
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System
>Reporter: Chesnay Schepler
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
> Attachments: image-2022-04-02-12-46-11-005.png, 
> image-2022-04-02-12-46-28-065.png
>
>
> There are probably a few checkstyle rules that are now enforced by spotless, 
> and we could remove these to clarify the responsibilities of each tool.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-20808) Remove redundant checkstyle rules

2022-04-01 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-20808:
---
Attachment: image-2022-04-02-12-46-11-005.png

> Remove redundant checkstyle rules
> -
>
> Key: FLINK-20808
> URL: https://issues.apache.org/jira/browse/FLINK-20808
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System
>Reporter: Chesnay Schepler
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
> Attachments: image-2022-04-02-12-46-11-005.png, 
> image-2022-04-02-12-46-28-065.png
>
>
> There are probably a few checkstyle rules that are now enforced by spotless, 
> and we could remove these to clarify the responsibilities of each tool.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] flinkbot edited a comment on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19335:
URL: https://github.com/apache/flink/pull/19335#issuecomment-1086537086


   
   ## CI report:
   
   * d42501cf46bfae3cd30b2927aa4ce23fffd0d81f Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34147)
 
   * f5f269e5c65f55d2d132677f3ce23c148c00bcf4 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20808) Remove redundant checkstyle rules

2022-04-01 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516211#comment-17516211
 ] 

Aitozi commented on FLINK-20808:


Hi [~chesnay]  sorry to bother you here, I run into a case that: I follow the 
development doc to use the save action and google-java-format to automatically 
format the code. But the formatted code can not pass the checksytyle rule 
[NewlineAtEndOfFile] But I check the unpassed code,It has the end line 
actually. Is there some bug for checkstyle or I miss some configuration for 
development?

> Remove redundant checkstyle rules
> -
>
> Key: FLINK-20808
> URL: https://issues.apache.org/jira/browse/FLINK-20808
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System
>Reporter: Chesnay Schepler
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
>
> There are probably a few checkstyle rules that are now enforced by spotless, 
> and we could remove these to clarify the responsibilities of each tool.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink-table-store] LadyForest commented on a change in pull request #67: [FLINK-26911] Introduce parallelism setter for table store

2022-04-01 Thread GitBox


LadyForest commented on a change in pull request #67:
URL: https://github.com/apache/flink-table-store/pull/67#discussion_r841007101



##
File path: 
flink-table-store-connector/src/test/java/org/apache/flink/table/store/connector/ReadWriteTableITCase.java
##
@@ -1713,4 +1748,74 @@ private static StreamExecutionEnvironment 
buildBatchEnv() {
 env.setParallelism(2);
 return env;
 }
+
+@Test
+public void testSourceParallelism() throws Exception {
+String managedTable = createSourceAndManagedTable(false, false, false, 
false, false).f1;
+
+// without hint
+String query = prepareSimpleSelectQuery(managedTable, 
Collections.emptyMap());
+assertThat(sourceParallelism(query)).isEqualTo(env.getParallelism());
+
+// with hint
+query =
+prepareSimpleSelectQuery(
+managedTable, 
Collections.singletonMap(SCAN_PARALLELISM.key(), "66"));
+System.out.println(query);
+assertThat(sourceParallelism(query)).isEqualTo(66);
+}
+
+private int sourceParallelism(String sql) {
+DataStream stream = tEnv.toChangelogStream(tEnv.sqlQuery(sql));
+return stream.getParallelism();
+}
+
+@Test
+public void testSinkParallelism() {
+testSinkParallelism(null, env.getParallelism());
+testSinkParallelism(23, 23);
+}
+

Review comment:
   Nit: group test methods and util methods together?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #19333: [FLINK-26190][python] Remove getTableConfig from ExecNodeConfiguration

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19333:
URL: https://github.com/apache/flink/pull/19333#issuecomment-1086447112


   
   ## CI report:
   
   * 0ae998978c02b08d84eb814e3c9fd95b15883f3b Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34141)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19335:
URL: https://github.com/apache/flink/pull/19335#issuecomment-1086537086


   
   ## CI report:
   
   * d42501cf46bfae3cd30b2927aa4ce23fffd0d81f Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34147)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19335:
URL: https://github.com/apache/flink/pull/19335#issuecomment-1086537086


   
   ## CI report:
   
   * d42501cf46bfae3cd30b2927aa4ce23fffd0d81f Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34147)
 
   * f5f269e5c65f55d2d132677f3ce23c148c00bcf4 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19335:
URL: https://github.com/apache/flink/pull/19335#issuecomment-1086537086


   
   ## CI report:
   
   * d42501cf46bfae3cd30b2927aa4ce23fffd0d81f Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34147)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #19329: [FLINK-22318][table] Support RENAME column name for ALTER TABLE statement

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19329:
URL: https://github.com/apache/flink/pull/19329#issuecomment-1085879092


   
   ## CI report:
   
   * ae35684951a5222b165af15c667c66ac09076958 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34140)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-table-store] LadyForest commented on a change in pull request #67: [FLINK-26911] Introduce parallelism setter for table store

2022-04-01 Thread GitBox


LadyForest commented on a change in pull request #67:
URL: https://github.com/apache/flink-table-store/pull/67#discussion_r841006084



##
File path: 
flink-table-store-connector/src/test/java/org/apache/flink/table/store/connector/ReadWriteTableITCase.java
##
@@ -1713,4 +1748,74 @@ private static StreamExecutionEnvironment 
buildBatchEnv() {
 env.setParallelism(2);
 return env;
 }
+
+@Test
+public void testSourceParallelism() throws Exception {

Review comment:
   `testSetSourceParallelismByHint`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19335:
URL: https://github.com/apache/flink/pull/19335#issuecomment-1086537086


   
   ## CI report:
   
   * d42501cf46bfae3cd30b2927aa4ce23fffd0d81f Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34147)
 
   * f5f269e5c65f55d2d132677f3ce23c148c00bcf4 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-table-store] LadyForest commented on a change in pull request #67: [FLINK-26911] Introduce parallelism setter for table store

2022-04-01 Thread GitBox


LadyForest commented on a change in pull request #67:
URL: https://github.com/apache/flink-table-store/pull/67#discussion_r841005753



##
File path: 
flink-table-store-connector/src/test/java/org/apache/flink/table/store/connector/ReadWriteTableITCase.java
##
@@ -1713,4 +1748,74 @@ private static StreamExecutionEnvironment 
buildBatchEnv() {
 env.setParallelism(2);
 return env;
 }
+
+@Test
+public void testSourceParallelism() throws Exception {
+String managedTable = createSourceAndManagedTable(false, false, false, 
false, false).f1;
+
+// without hint
+String query = prepareSimpleSelectQuery(managedTable, 
Collections.emptyMap());
+assertThat(sourceParallelism(query)).isEqualTo(env.getParallelism());
+
+// with hint
+query =
+prepareSimpleSelectQuery(
+managedTable, 
Collections.singletonMap(SCAN_PARALLELISM.key(), "66"));
+System.out.println(query);

Review comment:
   remove `System.out.println(query);` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] syhily commented on pull request #19285: [FLINK-26931][Connector/pulsar] Make the producer name and consumer name unique in Pulsar

2022-04-01 Thread GitBox


syhily commented on pull request #19285:
URL: https://github.com/apache/flink/pull/19285#issuecomment-1086542049


   > LGTM. Do you want to backport to 1.15?
   
   Yep, backport to 1.15 is required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-table-store] LadyForest commented on a change in pull request #67: [FLINK-26911] Introduce parallelism setter for table store

2022-04-01 Thread GitBox


LadyForest commented on a change in pull request #67:
URL: https://github.com/apache/flink-table-store/pull/67#discussion_r841005052



##
File path: 
flink-table-store-connector/src/main/java/org/apache/flink/table/store/connector/source/TableStoreSource.java
##
@@ -130,15 +135,29 @@ public DataStructureConverter 
createDataStructureConverter(
 },
 projectFields);
 }
-TableStore.SourceBuilder builder =
+
+TableStore.SourceBuilder sourceBuilder =
 tableStore
 .sourceBuilder()
 .withContinuousMode(streaming)
 .withLogSourceProvider(logSourceProvider)
 .withProjection(projectFields)
 
.withPartitionPredicate(PredicateConverter.convert(partitionFilters))
-
.withFieldPredicate(PredicateConverter.convert(fieldFilters));
-return SourceProvider.of(builder.build());
+
.withFieldPredicate(PredicateConverter.convert(fieldFilters))
+
.withParallelism(tableStore.options().get(SCAN_PARALLELISM));
+
+return new DataStreamScanProvider() {
+@Override
+public DataStream produceDataStream(
+ProviderContext providerContext, 
StreamExecutionEnvironment env) {
+return sourceBuilder.withEnv(env).build();
+}
+
+@Override
+public boolean isBounded() {
+return !streaming;

Review comment:
   I'm not very sure about this. Does the file store still serve as a 
bounded scan if it is a hybrid reading with a full scan?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-26999) Introduce ClickHouse Connector

2022-04-01 Thread ZhuoYu Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhuoYu Chen updated FLINK-26999:

Description: 
[https://cwiki.apache.org/confluence/display/FLINK/FLIP-202%3A+Introduce+ClickHouse+Connector]

 

> Introduce ClickHouse Connector
> --
>
> Key: FLINK-26999
> URL: https://issues.apache.org/jira/browse/FLINK-26999
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Common
>Affects Versions: 1.15.0
>Reporter: ZhuoYu Chen
>Priority: Major
>
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-202%3A+Introduce+ClickHouse+Connector]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink-ml] lindong28 commented on a change in pull request #71: [FLINK-26443] Add benchmark framework

2022-04-01 Thread GitBox


lindong28 commented on a change in pull request #71:
URL: https://github.com/apache/flink-ml/pull/71#discussion_r841004650



##
File path: 
flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/BenchmarkUtils.java
##
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.benchmark;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.ml.api.AlgoOperator;
+import org.apache.flink.ml.api.Estimator;
+import org.apache.flink.ml.api.Model;
+import org.apache.flink.ml.api.Stage;
+import org.apache.flink.ml.benchmark.generator.DataGenerator;
+import org.apache.flink.ml.common.datastream.TableUtils;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.util.Preconditions;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+/** Utility methods for benchmarks. */
+public class BenchmarkUtils {
+/**
+ * Instantiates a benchmark from its parameter map and executes the 
benchmark in the provided
+ * environment.
+ *
+ * @return Results of the executed benchmark.
+ */
+@SuppressWarnings({"unchecked", "rawtypes"})
+public static BenchmarkResult runBenchmark(
+StreamTableEnvironment tEnv, String name, Map params) 
throws Exception {
+Stage stage = ReadWriteUtils.instantiateWithParams((Map) 
params.get("stage"));

Review comment:
   Sounds good.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-26999) Introduce ClickHouse Connector

2022-04-01 Thread ZhuoYu Chen (Jira)
ZhuoYu Chen created FLINK-26999:
---

 Summary: Introduce ClickHouse Connector
 Key: FLINK-26999
 URL: https://issues.apache.org/jira/browse/FLINK-26999
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Common
Affects Versions: 1.15.0
Reporter: ZhuoYu Chen






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26999) Introduce ClickHouse Connector

2022-04-01 Thread ZhuoYu Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhuoYu Chen updated FLINK-26999:

Issue Type: New Feature  (was: Improvement)

> Introduce ClickHouse Connector
> --
>
> Key: FLINK-26999
> URL: https://issues.apache.org/jira/browse/FLINK-26999
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Common
>Affects Versions: 1.15.0
>Reporter: ZhuoYu Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink-ml] lindong28 commented on a change in pull request #71: [FLINK-26443] Add benchmark framework

2022-04-01 Thread GitBox


lindong28 commented on a change in pull request #71:
URL: https://github.com/apache/flink-ml/pull/71#discussion_r841001187



##
File path: 
flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/data/DataGenerator.java
##
@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.benchmark.data;
+
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+
+/** Interface for generating data as table arrays. */
+public interface DataGenerator> extends 
CommonDataGeneratorParams {
+/**
+ * Gets an array of Tables containing the data to be generated in the 
provided stream table

Review comment:
   nits: it does not sound right to say `containing the data to be 
generated ...`.
   
   How about changing it to `containing the data generated ...`?

##
File path: 
flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/clustering/kmeans/KMeansModelDataGenerator.java
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.benchmark.clustering.kmeans;
+
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.ml.benchmark.data.DataGenerator;
+import org.apache.flink.ml.benchmark.data.DenseVectorArrayGenerator;
+import org.apache.flink.ml.benchmark.data.DenseVectorArrayGeneratorParams;
+import org.apache.flink.ml.clustering.kmeans.KMeansModelData;
+import org.apache.flink.ml.linalg.DenseVector;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.util.ParamUtils;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Class that generates table arrays containing model data for {@link
+ * org.apache.flink.ml.clustering.kmeans.KMeansModel}.
+ */
+public class KMeansModelDataGenerator
+implements DataGenerator,
+DenseVectorArrayGeneratorParams {
+private final Map, Object> paramMap = new HashMap<>();
+
+public KMeansModelDataGenerator() {
+ParamUtils.initializeMapWithDefaultValues(paramMap, this);
+}
+
+@Override
+public Table[] getData(StreamTableEnvironment tEnv) {
+DataGenerator vectorArrayGenerator = new 
DenseVectorArrayGenerator();

Review comment:
   nits: The code looks busy. Could we add line break in the function, e.g.:
   
   ```
   DataGenerator vectorArrayGenerator = new DenseVectorArrayGenerator();
   ReadWriteUtils.updateExistingParams(vectorArrayGenerator, paramMap);
   
   Table vectorArrayTable = vectorArrayGenerator.getData(tEnv)[0];
   DataStream modelDataStream = tEnv
   .toDataStream(vectorArrayTable, DenseVector[].class)
   .map(new GenerateKMeansModelDataFunction());
   
   return new Table[] {tEnv.fromDataStream(modelDataStream)};
   ```

##
File path: 
flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/data/DenseVectorArrayGeneratorParams.java
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright 

[GitHub] [flink] flinkbot edited a comment on pull request #19336: [hive] support select null literal

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19336:
URL: https://github.com/apache/flink/pull/19336#issuecomment-1086539420


   
   ## CI report:
   
   * 61021e38b76698a321b365c648060b78465a3754 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34148)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-table-store] LadyForest commented on a change in pull request #67: [FLINK-26911] Introduce parallelism setter for table store

2022-04-01 Thread GitBox


LadyForest commented on a change in pull request #67:
URL: https://github.com/apache/flink-table-store/pull/67#discussion_r841004556



##
File path: 
flink-table-store-connector/src/main/java/org/apache/flink/table/store/connector/TableStore.java
##
@@ -298,17 +316,26 @@ private FileStoreSource buildFileSource(
 }
 }
 
-public DataStreamSource build(StreamExecutionEnvironment env) 
{
+public DataStreamSource build() {
+if (env == null) {
+throw new IllegalArgumentException("Env should not be null.");

Review comment:
   I think we'd better clarify the `Env` as`StreamExecutionEnvironment`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] Myasuka commented on a change in pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend

2022-04-01 Thread GitBox


Myasuka commented on a change in pull request #19335:
URL: https://github.com/apache/flink/pull/19335#discussion_r841004243



##
File path: 
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/PredefinedOptions.java
##
@@ -60,9 +59,7 @@
  *   setInfoLogLevel(InfoLogLevel.HEADER_LEVEL)

Review comment:
   We should also remove such comments.
   The same for all other options defined in this file.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #19336: [hive] support select null literal

2022-04-01 Thread GitBox


flinkbot commented on pull request #19336:
URL: https://github.com/apache/flink/pull/19336#issuecomment-1086539420


   
   ## CI report:
   
   * 61021e38b76698a321b365c648060b78465a3754 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (FLINK-26998) Remove log-related configuration from predefined options in RocksDB state backend

2022-04-01 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang reassigned FLINK-26998:


Assignee: Zakelly Lan

> Remove log-related configuration from predefined options in RocksDB state 
> backend 
> --
>
> Key: FLINK-26998
> URL: https://issues.apache.org/jira/browse/FLINK-26998
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.15.0
>Reporter: Zakelly Lan
>Assignee: Zakelly Lan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> After FLINK-23791, the RocksDB log should be enabled by default. However, the 
> log-related configuration remains in _PredefinedOptions_, which will override 
> the default value provided in _RocksDBConfigurableOptions_.
> We could remove the values in _PredefinedOptions_ and let 
> _RocksDBConfigurableOptions_ take control of the default value.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26998) Remove log-related configuration from predefined options in RocksDB state backend

2022-04-01 Thread Yun Tang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Tang updated FLINK-26998:
-
Fix Version/s: 1.16.0

> Remove log-related configuration from predefined options in RocksDB state 
> backend 
> --
>
> Key: FLINK-26998
> URL: https://issues.apache.org/jira/browse/FLINK-26998
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.15.0
>Reporter: Zakelly Lan
>Assignee: Zakelly Lan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0, 1.16.0
>
>
> After FLINK-23791, the RocksDB log should be enabled by default. However, the 
> log-related configuration remains in _PredefinedOptions_, which will override 
> the default value provided in _RocksDBConfigurableOptions_.
> We could remove the values in _PredefinedOptions_ and let 
> _RocksDBConfigurableOptions_ take control of the default value.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] luoyuxia commented on pull request #19012: [FLINK-26540][hive] Support handle join involving complex types in on condition

2022-04-01 Thread GitBox


luoyuxia commented on pull request #19012:
URL: https://github.com/apache/flink/pull/19012#issuecomment-1086538955


   @beyond1920 Hi, could you please help merge?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-table-store] LadyForest commented on a change in pull request #67: [FLINK-26911] Introduce parallelism setter for table store

2022-04-01 Thread GitBox


LadyForest commented on a change in pull request #67:
URL: https://github.com/apache/flink-table-store/pull/67#discussion_r841003912



##
File path: 
flink-table-store-connector/src/main/java/org/apache/flink/table/store/connector/TableStore.java
##
@@ -260,6 +273,11 @@ public SourceBuilder 
withLogSourceProvider(LogSourceProvider logSourceProvider)
 return this;
 }
 
+public SourceBuilder withParallelism(Integer parallelism) {

Review comment:
   Add a `@Nullable` annotation here, since `SCAN_PARALLELISM` has no 
default value.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] luoyuxia opened a new pull request #19336: [hive] support select null literal

2022-04-01 Thread GitBox


luoyuxia opened a new pull request #19336:
URL: https://github.com/apache/flink/pull/19336


   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19335:
URL: https://github.com/apache/flink/pull/19335#issuecomment-1086537086


   
   ## CI report:
   
   * d42501cf46bfae3cd30b2927aa4ce23fffd0d81f Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34147)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend

2022-04-01 Thread GitBox


flinkbot commented on pull request #19335:
URL: https://github.com/apache/flink/pull/19335#issuecomment-1086537086


   
   ## CI report:
   
   * d42501cf46bfae3cd30b2927aa4ce23fffd0d81f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-26998) Remove log-related configuration from predefined options in RocksDB state backend

2022-04-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-26998:
---
Labels: pull-request-available  (was: )

> Remove log-related configuration from predefined options in RocksDB state 
> backend 
> --
>
> Key: FLINK-26998
> URL: https://issues.apache.org/jira/browse/FLINK-26998
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.15.0
>Reporter: Zakelly Lan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> After FLINK-23791, the RocksDB log should be enabled by default. However, the 
> log-related configuration remains in _PredefinedOptions_, which will override 
> the default value provided in _RocksDBConfigurableOptions_.
> We could remove the values in _PredefinedOptions_ and let 
> _RocksDBConfigurableOptions_ take control of the default value.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] Zakelly opened a new pull request #19335: [FLINK-26998] Remove log-related configuration from predefined options in RocksDB state backend

2022-04-01 Thread GitBox


Zakelly opened a new pull request #19335:
URL: https://github.com/apache/flink/pull/19335


   ## What is the purpose of the change
   
   After FLINK-23791, the RocksDB log should be enabled by default. However, 
the log-related configuration remains in ```PredefinedOptions```, which will 
override the default value provided in ```RocksDBConfigurableOptions```.
   We could remove the related values in ```PredefinedOptions``` and let 
```RocksDBConfigurableOptions``` take control of the default value.
   
   ## Brief change log
   
 -  Remove the log related values in 
```org/apache/flink/contrib/streaming/state/PredefinedOptions.java```
   
   ## Verifying this change
   
   This change modified test RocksDBStateBackendConfigTest.testDefaultDbLogDir
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
 - The S3 file system connector: no

   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] LB-Yu commented on pull request #16859: [FLINK-23719] [connectors/hbase] Support switch WAL in Flink SQL DDL options for HBase sink

2022-04-01 Thread GitBox


LB-Yu commented on pull request #16859:
URL: https://github.com/apache/flink/pull/16859#issuecomment-1086533939


   Hi @wuchong, could you review the PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-26998) Remove log-related configuration from predefined options in RocksDB state backend

2022-04-01 Thread Zakelly Lan (Jira)
Zakelly Lan created FLINK-26998:
---

 Summary: Remove log-related configuration from predefined options 
in RocksDB state backend 
 Key: FLINK-26998
 URL: https://issues.apache.org/jira/browse/FLINK-26998
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Affects Versions: 1.15.0
Reporter: Zakelly Lan
 Fix For: 1.15.0


After FLINK-23791, the RocksDB log should be enabled by default. However, the 
log-related configuration remains in _PredefinedOptions_, which will override 
the default value provided in _RocksDBConfigurableOptions_.
We could remove the values in _PredefinedOptions_ and let 
_RocksDBConfigurableOptions_ take control of the default value.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] flinkbot edited a comment on pull request #19334: [FLINK-26858][client]Add error reporting information and correct the …

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19334:
URL: https://github.com/apache/flink/pull/19334#issuecomment-1086523291


   
   ## CI report:
   
   * 5d62f31b0e3c15fff9bd7873a0310ee46bdbcbe9 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34146)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-ml] lindong28 commented on a change in pull request #71: [FLINK-26443] Add benchmark framework

2022-04-01 Thread GitBox


lindong28 commented on a change in pull request #71:
URL: https://github.com/apache/flink-ml/pull/71#discussion_r840999157



##
File path: 
flink-ml-benchmark/src/test/java/org/apache/flink/ml/benchmark/BenchmarkTest.java
##
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.benchmark;
+
+import org.apache.flink.ml.benchmark.clustering.kmeans.KMeansInputsGenerator;
+import org.apache.flink.ml.clustering.kmeans.KMeans;
+import org.apache.flink.ml.clustering.kmeans.KMeansModel;
+import org.apache.flink.ml.param.WithParams;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.test.util.AbstractTestBase;
+
+import org.apache.commons.io.FileUtils;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.ByteArrayOutputStream;
+import java.io.File;
+import java.io.InputStream;
+import java.io.PrintStream;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import static org.apache.flink.ml.util.ReadWriteUtils.OBJECT_MAPPER;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+/** Tests benchmarks. */
+@SuppressWarnings("unchecked")
+public class BenchmarkTest extends AbstractTestBase {
+@Rule public final TemporaryFolder tempFolder = new TemporaryFolder();
+
+private final ByteArrayOutputStream outContent = new 
ByteArrayOutputStream();
+private final PrintStream originalOut = System.out;
+
+@Before
+public void before() {
+System.setOut(new PrintStream(outContent));
+}
+
+@After
+public void after() {
+System.setOut(originalOut);

Review comment:
   Thanks for the explanation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-ml] lindong28 commented on a change in pull request #71: [FLINK-26443] Add benchmark framework

2022-04-01 Thread GitBox


lindong28 commented on a change in pull request #71:
URL: https://github.com/apache/flink-ml/pull/71#discussion_r840999097



##
File path: 
flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/clustering/kmeans/KMeansModelDataGenerator.java
##
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.benchmark.clustering.kmeans;
+
+import org.apache.flink.ml.benchmark.generator.DataGenerator;
+import org.apache.flink.ml.benchmark.generator.GeneratorUtils;
+import org.apache.flink.ml.clustering.kmeans.KMeansModelData;
+import org.apache.flink.ml.clustering.kmeans.KMeansParams;
+import org.apache.flink.ml.common.datastream.TableUtils;
+import org.apache.flink.ml.linalg.DenseVector;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.util.ParamUtils;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Class that generates table arrays containing model data for {@link
+ * org.apache.flink.ml.clustering.kmeans.KMeansModel}.
+ */
+public class KMeansModelDataGenerator

Review comment:
   Sounds good.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-ml] lindong28 commented on a change in pull request #71: [FLINK-26443] Add benchmark framework

2022-04-01 Thread GitBox


lindong28 commented on a change in pull request #71:
URL: https://github.com/apache/flink-ml/pull/71#discussion_r840998966



##
File path: 
flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/clustering/kmeans/KMeansInputsGenerator.java
##
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.benchmark.clustering.kmeans;
+
+import org.apache.flink.ml.benchmark.generator.DataGenerator;
+import org.apache.flink.ml.benchmark.generator.GeneratorUtils;
+import org.apache.flink.ml.clustering.kmeans.KMeansParams;
+import org.apache.flink.ml.common.datastream.TableUtils;
+import org.apache.flink.ml.linalg.DenseVector;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.util.ParamUtils;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.DataTypes;
+import org.apache.flink.table.api.Schema;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Class that generates table arrays containing inputs for {@link
+ * org.apache.flink.ml.clustering.kmeans.KMeans} and {@link
+ * org.apache.flink.ml.clustering.kmeans.KMeansModel}.
+ */
+public class KMeansInputsGenerator

Review comment:
   Sounds good.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #19334: [FLINK-26858][client]Add error reporting information and correct the …

2022-04-01 Thread GitBox


flinkbot commented on pull request #19334:
URL: https://github.com/apache/flink/pull/19334#issuecomment-1086523291


   
   ## CI report:
   
   * 5d62f31b0e3c15fff9bd7873a0310ee46bdbcbe9 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-ml] lindong28 commented on a change in pull request #71: [FLINK-26443] Add benchmark framework

2022-04-01 Thread GitBox


lindong28 commented on a change in pull request #71:
URL: https://github.com/apache/flink-ml/pull/71#discussion_r840998630



##
File path: 
flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/Benchmark.java
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.benchmark;
+
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.util.Preconditions;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.FileInputStream;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+/** Entry class for benchmark execution. */
+public class Benchmark {
+private static final Logger LOG = LoggerFactory.getLogger(Benchmark.class);
+
+@SuppressWarnings("unchecked")
+public static void main(String[] args) throws Exception {

Review comment:
   Thanks for the improvement. The current behavior of the script looks 
good.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-26997) Add table store links to flink web

2022-04-01 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee closed FLINK-26997.

Resolution: Fixed

asf-site:

fcdd0c9fb58629ed7fd66677089cc6942236a7d2

97d9f8b04a306c106ef1cab6404793719029ccf7

> Add table store links to flink web
> --
>
> Key: FLINK-26997
> URL: https://issues.apache.org/jira/browse/FLINK-26997
> Project: Flink
>  Issue Type: Sub-task
>  Components: Project Website, Table Store
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink-ml] lindong28 commented on a change in pull request #71: [FLINK-26443] Add benchmark framework

2022-04-01 Thread GitBox


lindong28 commented on a change in pull request #71:
URL: https://github.com/apache/flink-ml/pull/71#discussion_r840998428



##
File path: 
flink-ml-core/src/main/java/org/apache/flink/ml/util/ReadWriteUtils.java
##
@@ -94,25 +95,29 @@
  */
 public static void saveMetadata(Stage stage, String path, Map extraMetadata)
 throws IOException {
-// Creates parent directories if not already created.
-FileSystem fs = mkdirs(path);
-
 Map metadata = new HashMap<>(extraMetadata);
 metadata.put("className", stage.getClass().getName());
 metadata.put("timestamp", System.currentTimeMillis());
 metadata.put("paramMap", jsonEncode(stage.getParamMap()));
 // TODO: add version in the metadata.
 String metadataStr = OBJECT_MAPPER.writeValueAsString(metadata);
 
-Path metadataPath = new Path(path, "metadata");
-if (fs.exists(metadataPath)) {
-throw new IOException("File " + metadataPath + " already exists.");
+saveToFile(new Path(path, "metadata"), metadataStr);
+}
+
+/** Saves a given string to the specified file. */
+public static void saveToFile(Path path, String content) throws 
IOException {
+// Creates parent directories if not already created.
+FileSystem fs = mkdirs(path.getParent().toString());
+
+if (fs.exists(path)) {
+throw new IOException("File " + path + " already exists.");

Review comment:
   Spark ML don't overwrite model data files by default. I think we can 
follow the same practice for model data files.
   
   Based on my past experience, I believe we typically allow overwriting output 
files specified on the command line tool. But I don't have an example (or 
counter-example) currently.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-26997) Add table store links to flink web

2022-04-01 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-26997:
-
Summary: Add table store links to flink web  (was: Add table store links to 
Community page)

> Add table store links to flink web
> --
>
> Key: FLINK-26997
> URL: https://issues.apache.org/jira/browse/FLINK-26997
> Project: Flink
>  Issue Type: Sub-task
>  Components: Project Website, Table Store
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] dianfu commented on a change in pull request #19328: [FLINK-26969][Examples] Write operation examples of tumbling window, sliding window, session window and count window based on pyfl

2022-04-01 Thread GitBox


dianfu commented on a change in pull request #19328:
URL: https://github.com/apache/flink/pull/19328#discussion_r840997492



##
File path: 
flink-python/pyflink/examples/datastream/windowing/sliding_windowing.py
##
@@ -0,0 +1,70 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+import sys
+
+import argparse
+from typing import Iterable
+
+from pyflink.common import Types, WatermarkStrategy, Time
+from pyflink.common.watermark_strategy import TimestampAssigner
+from pyflink.datastream import StreamExecutionEnvironment, 
ProcessWindowFunction
+from pyflink.datastream.window import CountWindow, SlidingEventTimeWindows
+
+
+class TimestampAssigner(TimestampAssigner):
+def extract_timestamp(self, value, record_timestamp) -> int:
+return int(value[1])
+
+
+class CountWindowProcessFunction(ProcessWindowFunction[tuple, tuple, str, 
CountWindow]):
+def process(self,
+key: str,
+content: ProcessWindowFunction.Context,
+elements: Iterable[tuple]) -> Iterable[tuple]:
+return [(key, len([e for e in elements]))]
+
+def clear(self, context: ProcessWindowFunction.Context) -> None:
+pass
+
+
+if __name__ == '__main__':
+logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
format="%(message)s")
+
+parser = argparse.ArgumentParser()
+parser.add_argument(
+'--output',
+dest='output',
+required=False,
+help='Output file to write results to.')
+
+env = StreamExecutionEnvironment.get_execution_environment()
+env.set_parallelism(1)
+data_stream = env.from_collection([
+('hi', 1), ('hi', 2), ('hi', 3), ('hi', 4), ('hi', 5), ('hi', 8), 
('hi', 9), ('hi', 15)],
+type_info=Types.TUPLE([Types.STRING(), Types.INT()]))  # type: 
DataStream
+watermark_strategy = WatermarkStrategy.for_monotonous_timestamps() \
+.with_timestamp_assigner(TimestampAssigner())
+
+data_stream.assign_timestamps_and_watermarks(watermark_strategy) \
+.key_by(lambda x: x[0], key_type=Types.STRING()) \
+.window(SlidingEventTimeWindows.of(Time.milliseconds(5), 
Time.milliseconds(2), Time.seconds(0))) \
+.process(CountWindowProcessFunction(), Types.TUPLE([Types.STRING(), 
Types.INT()])) \

Review comment:
   What about also outputs the window start and window end to make the 
output more readable?

##
File path: flink-python/pyflink/examples/datastream/windowing/count_windowing.py
##
@@ -0,0 +1,57 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+import sys
+
+import argparse
+from typing import Iterable
+
+from pyflink.common import Types
+from pyflink.datastream import StreamExecutionEnvironment, WindowFunction
+from pyflink.datastream.window import CountWindow
+

Review comment:
   What about rename the file to count_window to keep the naming conversion 
consistent with the Table API examples? 

##
File path: 
flink-python/pyflink/examples/datastream/windowing/sliding_windowing.py
##
@@ -0,0 +1,70 @@

[GitHub] [flink] defineqq opened a new pull request #19334: [FLINK-26858][client]Add error reporting information and correct the …

2022-04-01 Thread GitBox


defineqq opened a new pull request #19334:
URL: https://github.com/apache/flink/pull/19334


   …error reporting guidance when misleading the command line
   
   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-26997) Add table store links to Community page

2022-04-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-26997:
---
Labels: pull-request-available  (was: )

> Add table store links to Community page
> ---
>
> Key: FLINK-26997
> URL: https://issues.apache.org/jira/browse/FLINK-26997
> Project: Flink
>  Issue Type: Sub-task
>  Components: Project Website, Table Store
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink-web] JingsongLi merged pull request #523: [FLINK-26997] Add table store links to Community page

2022-04-01 Thread GitBox


JingsongLi merged pull request #523:
URL: https://github.com/apache/flink-web/pull/523


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-ml] weibozhao commented on a change in pull request #56: [FLINK-25616] Add Transformer for VectorAssembler

2022-04-01 Thread GitBox


weibozhao commented on a change in pull request #56:
URL: https://github.com/apache/flink-ml/pull/56#discussion_r840997787



##
File path: 
flink-ml-lib/src/test/java/org/apache/flink/ml/feature/VectorAssemblerTest.java
##
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.feature;
+
+import org.apache.flink.api.common.restartstrategy.RestartStrategies;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.ml.common.param.HasHandleInvalid;
+import org.apache.flink.ml.feature.vectorassembler.VectorAssembler;
+import org.apache.flink.ml.linalg.DenseVector;
+import org.apache.flink.ml.linalg.SparseVector;
+import org.apache.flink.ml.linalg.Vectors;
+import org.apache.flink.ml.util.StageTestUtils;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import 
org.apache.flink.streaming.api.environment.ExecutionCheckpointingOptions;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.test.util.AbstractTestBase;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.collections.IteratorUtils;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.List;
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNull;
+
+/** Tests VectorAssembler. */
+public class VectorAssemblerTest extends AbstractTestBase {
+
+private StreamTableEnvironment tEnv;
+private Table inputDataTable;
+
+private static final List INPUT_DATA =
+Arrays.asList(
+Row.of(
+0,
+Vectors.dense(2.1, 3.1),
+1.0,
+Vectors.sparse(5, new int[] {3}, new double[] 
{1.0})),
+Row.of(
+1,
+Vectors.dense(2.1, 3.1),
+1.0,
+Vectors.sparse(
+5, new int[] {4, 2, 3, 1}, new double[] 
{4.0, 2.0, 3.0, 1.0})),
+Row.of(2, null, 1.0, null));
+
+private static final SparseVector EXPECTED_OUTPUT_DATA_1 =
+Vectors.sparse(8, new int[] {0, 1, 2, 6}, new double[] {2.1, 3.1, 
1.0, 1.0});
+private static final DenseVector EXPECTED_OUTPUT_DATA_2 =
+Vectors.dense(2.1, 3.1, 1.0, 0.0, 1.0, 2.0, 3.0, 4.0);
+
+@Before
+public void before() {
+Configuration config = new Configuration();
+
config.set(ExecutionCheckpointingOptions.ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH, 
true);
+StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment(config);
+env.setParallelism(4);
+env.enableCheckpointing(100);
+env.setRestartStrategy(RestartStrategies.noRestart());
+tEnv = StreamTableEnvironment.create(env);
+DataStream dataStream = env.fromCollection(INPUT_DATA);
+inputDataTable = tEnv.fromDataStream(dataStream).as("id", "vec", 
"num", "sparseVec");
+}
+
+private void verifyOutputResult(Table output, String outputCol, int 
outputSize)
+throws Exception {
+DataStream dataStream = tEnv.toDataStream(output);
+List results = 
IteratorUtils.toList(dataStream.executeAndCollect());
+assertEquals(outputSize, results.size());
+for (Row result : results) {
+if (result.getField(0) == (Object) 0) {
+assertEquals(EXPECTED_OUTPUT_DATA_1, 
result.getField(outputCol));
+} else if (result.getField(0) == (Object) 1) {
+assertEquals(EXPECTED_OUTPUT_DATA_2, 
result.getField(outputCol));
+} else {
+assertNull(result.getField(outputCol));
+}
+}
+}
+
+@Test
+public void testParam() {
+VectorAssembler vectorAssembler = new VectorAssembler();
+assertEquals(HasHandleInvalid.ERROR_INVALID, 

[GitHub] [flink-ml] lindong28 commented on a change in pull request #71: [FLINK-26443] Add benchmark framework

2022-04-01 Thread GitBox


lindong28 commented on a change in pull request #71:
URL: https://github.com/apache/flink-ml/pull/71#discussion_r840997700



##
File path: flink-ml-benchmark/README.md
##
@@ -0,0 +1,169 @@
+# Flink ML Benchmark Getting Started
+
+This document provides instructions about how to run benchmarks on Flink ML's
+stages in a Linux/MacOS environment.
+
+## Prerequisites
+
+### Installing Flink
+
+Please make sure Flink 1.14 or higher version has been installed in your local
+environment. You can refer to the [local
+installation](https://nightlies.apache.org/flink/flink-docs-master/docs/try-flink/local_installation/)
+instruction on Flink's document website for how to achieve this.
+
+### Setting Up Flink Environment Variables
+
+After having installed Flink, please register `$FLINK_HOME` as an environment
+variable into your local environment, and add `$FLINK_HOME` into your `$PATH`
+variable. This can be completed by running the following commands in the 
Flink's
+folder.
+
+```bash
+export FLINK_HOME=`pwd`

Review comment:
   Thanks. I will comment on the updated PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-ml] lindong28 commented on a change in pull request #71: [FLINK-26443] Add benchmark framework

2022-04-01 Thread GitBox


lindong28 commented on a change in pull request #71:
URL: https://github.com/apache/flink-ml/pull/71#discussion_r840995675



##
File path: flink-ml-benchmark/README.md
##
@@ -0,0 +1,172 @@
+# Flink ML Benchmark Getting Started
+
+This document provides instructions on how to run benchmarks on Flink ML's
+stages in a Linux/MacOS environment.
+
+## Prerequisites
+
+### Install Flink
+
+Please make sure Flink 1.14 or higher version has been installed in your local
+environment. You can refer to the [local
+installation](https://nightlies.apache.org/flink/flink-docs-master/docs/try-flink/local_installation/)
+instruction on Flink's document website for how to achieve this.
+
+### Set Up Flink Environment Variables
+
+After having installed Flink, please register `$FLINK_HOME` as an environment
+variable into your local environment. For example, suppose you have downloaded
+Flink 1.14.0 and placed Flink's binary folder under `/usr/local/`, then you 
need
+to run the following command:
+
+```bash
+export FLINK_HOME=`/usr/local/flink-1.14.0`
+```
+
+Then please run the following command. If this command returns 1.14.0 or a
+higher version, then it means that the required Flink environment has been
+successfully installed and registered in your local environment.
+
+```bash
+$FLINK_HOME/bin/flink --version
+```
+
+### Acquire Flink ML Binary Distribution
+
+In order to use Flink ML's CLI you need to have the latest binary distribution
+of Flink ML. You can acquire the distribution by building Flink ML's source 
code
+locally, which means to execute the following command in Flink ML repository's
+root directory.
+
+```bash
+mvn clean package -DskipTests
+cd ./flink-ml-dist/target/flink-ml-*-bin/flink-ml*/

Review comment:
   How about adding the `path_to_flink_ml` like below:
   
   ```
   cd ${path_to_flink_ml}/flink-ml-dist/target/flink-ml-*-bin/flink-ml*/
   ```

##
File path: flink-ml-benchmark/README.md
##
@@ -0,0 +1,261 @@
+# Flink ML Benchmark Guideline
+
+This document provides instructions about how to run benchmarks on Flink ML's
+stages.
+
+## Write Benchmark Programs
+
+### Add Maven Dependencies
+
+In order to write Flink ML's java benchmark programs, first make sure that the
+following dependencies have been added to your maven project's `pom.xml`.
+
+```xml
+
+  org.apache.flink
+  flink-ml-core_${scala.binary.version}
+  ${flink.ml.version}
+
+
+
+  org.apache.flink
+  flink-ml-iteration_${scala.binary.version}
+  ${flink.ml.version}
+
+
+
+  org.apache.flink
+  flink-ml-lib_${scala.binary.version}
+  ${flink.ml.version}
+
+
+
+  org.apache.flink
+  flink-ml-benchmark_${scala.binary.version}
+  ${flink.ml.version}
+
+
+
+  org.apache.flink
+  statefun-flink-core
+  3.1.0
+  
+
+  org.apache.flink
+  flink-streaming-java_2.12
+
+  
+
+
+
+  org.apache.flink
+  flink-streaming-java_${scala.binary.version}
+  ${flink.version}
+
+
+
+  org.apache.flink
+  flink-table-api-java-bridge_${scala.binary.version}
+  ${flink.version}
+
+
+
+  org.apache.flink
+  flink-table-planner_${scala.binary.version}
+  ${flink.version}
+
+
+
+  org.apache.flink
+  flink-clients_${scala.binary.version}
+  ${flink.version}
+
+```
+
+### Write Java Program
+
+Then you can write a program as follows to run benchmark on Flink ML stages. 
The
+example code below tests the performance of Flink ML's KMeans algorithm, with
+the default configuration parameters used.
+
+```java
+public class Main {
+public static void main(String[] args) throws Exception {
+StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
+StreamTableEnvironment tEnv = StreamTableEnvironment.create(env);
+
+KMeans kMeans = new KMeans();
+KMeansInputsGenerator inputsGenerator = new KMeansInputsGenerator();
+
+BenchmarkResult result =
+BenchmarkUtils.runBenchmark("exampleBenchmark", tEnv, kMeans, 
inputsGenerator);
+
+BenchmarkUtils.printResult(result);
+}
+}
+```
+
+### Execute Benchmark Program
+
+After executing the `main()` method above, you will see benchmark results
+printed out in your terminal. An example of the printed content is as follows.
+
+```
+Benchmark Name: exampleBenchmark
+Total Execution Time(ms): 828.0
+```
+
+### Configure Benchmark Parameters
+
+If you want to run benchmark on customed configuration parameters, you can set
+them with Flink ML's `WithParams` API as follows.
+
+```java
+KMeans kMeans = new KMeans()
+  .setK(5)
+  .setMaxIter(50);
+KMeansInputsGenerator inputsGenerator = new KMeansInputsGenerator()
+  .setDims(3)
+  .setDataSize(1);
+```
+
+## Execute Benchmark through Command-Line Interface (CLI)
+
+You can also configure and execute benchmarks through Command-Line Interface
+(CLI) without writing java programs.
+
+### Prerequisites
+
+Before using Flink ML's CLI, make sure you have installed Flink 1.14 in your
+local environment, and that you have started a 

[GitHub] [flink-ml] weibozhao commented on a change in pull request #56: [FLINK-25616] Add Transformer for VectorAssembler

2022-04-01 Thread GitBox


weibozhao commented on a change in pull request #56:
URL: https://github.com/apache/flink-ml/pull/56#discussion_r840996885



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/feature/vectorassembler/VectorAssembler.java
##
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.feature.vectorassembler;
+
+import org.apache.flink.api.common.functions.FlatMapFunction;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.ml.api.Transformer;
+import org.apache.flink.ml.common.datastream.TableUtils;
+import org.apache.flink.ml.common.param.HasHandleInvalid;
+import org.apache.flink.ml.linalg.DenseVector;
+import org.apache.flink.ml.linalg.SparseVector;
+import org.apache.flink.ml.linalg.Vector;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.util.ParamUtils;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.table.api.internal.TableImpl;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.Collector;
+import org.apache.flink.util.Preconditions;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.Map;
+
+/**
+ * A feature transformer that combines a given list of input columns into a 
vector column. Types of
+ * input columns must be either vector or numerical value.
+ */
+public class VectorAssembler
+implements Transformer, 
VectorAssemblerParams {
+private final Map, Object> paramMap = new HashMap<>();
+private static final double RATIO = 1.5;
+
+public VectorAssembler() {
+ParamUtils.initializeMapWithDefaultValues(paramMap, this);
+}
+
+@Override
+public Table[] transform(Table... inputs) {
+Preconditions.checkArgument(inputs.length == 1);
+
+StreamTableEnvironment tEnv =
+(StreamTableEnvironment) ((TableImpl) 
inputs[0]).getTableEnvironment();
+RowTypeInfo inputTypeInfo = 
TableUtils.getRowTypeInfo(inputs[0].getResolvedSchema());
+RowTypeInfo outputTypeInfo =
+new RowTypeInfo(
+ArrayUtils.addAll(
+inputTypeInfo.getFieldTypes(), 
TypeInformation.of(Vector.class)),
+ArrayUtils.addAll(inputTypeInfo.getFieldNames(), 
getOutputCol()));
+DataStream output =
+tEnv.toDataStream(inputs[0])
+.flatMap(
+new AssemblerFunc(getInputCols(), 
getHandleInvalid()),
+outputTypeInfo);
+Table outputTable = tEnv.fromDataStream(output);
+return new Table[] {outputTable};
+}
+
+private static class AssemblerFunc implements FlatMapFunction {
+private final String[] inputCols;
+private final String handleInvalid;
+
+public AssemblerFunc(String[] inputCols, String handleInvalid) {
+this.inputCols = inputCols;
+this.handleInvalid = handleInvalid;
+}
+
+@Override
+public void flatMap(Row value, Collector out) throws Exception {
+Object[] objects = new Object[inputCols.length];
+for (int i = 0; i < objects.length; ++i) {
+objects[i] = value.getField(inputCols[i]);
+}
+Vector assembledVector = null;
+try {
+assembledVector = assemble(objects);
+} catch (Exception e) {
+switch (handleInvalid) {
+case HasHandleInvalid.ERROR_INVALID:
+throw e;
+case HasHandleInvalid.SKIP_INVALID:
+return;
+case HasHandleInvalid.KEEP_INVALID:

Review comment:
   Here,keep is just keep this input row values,the outputColumn has no 
meaning,so I give a null.
   
   I have refine code for keep 

[GitHub] [flink] SteNicholas commented on pull request #17155: [FLINK-19883][table] Support 'IF EXISTS' in DDL for ALTER TABLE

2022-04-01 Thread GitBox


SteNicholas commented on pull request #17155:
URL: https://github.com/apache/flink/pull/17155#issuecomment-1086509438


   > Thanks for your contribution @SteNicholas, Can you continue to finish this 
work?
   
   @lsyldliu, thanks for the reminder. I will update this pull request this 
week. Sorry for the delay to update.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #19323: [FLINK-26858][client]When the startup mode in the command line submis…

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19323:
URL: https://github.com/apache/flink/pull/19323#issuecomment-1085715458


   
   ## CI report:
   
   * be9d917e79455bf9a80fa43ff9fb978384f264a5 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34099)
 
   * f115508b521cc17953e8d807b0d2e9043272b2ae UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] lsyldliu commented on pull request #17155: [FLINK-19883][table] Support 'IF EXISTS' in DDL for ALTER TABLE

2022-04-01 Thread GitBox


lsyldliu commented on pull request #17155:
URL: https://github.com/apache/flink/pull/17155#issuecomment-1086498370


   Thanks for your contribution @SteNicholas, Can you continue to finish this 
work?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #19333: [FLINK-26190][python] Remove getTableConfig from ExecNodeConfiguration

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19333:
URL: https://github.com/apache/flink/pull/19333#issuecomment-1086447112


   
   ## CI report:
   
   * 84ab13731ec4deaa375549d5d63afb2020ccd476 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34138)
 
   * 0ae998978c02b08d84eb814e3c9fd95b15883f3b Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34141)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] luoyuxia commented on pull request #18958: [FLINK-15854][hive] Use the new type inference for Hive UDTF

2022-04-01 Thread GitBox


luoyuxia commented on pull request #18958:
URL: https://github.com/apache/flink/pull/18958#issuecomment-1086496321


   @twalthr Sorry to bother you, but do you have time to review this pr? Then I 
can continue the new type inference for Hive UDAF.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-26997) Add table store links to Community page

2022-04-01 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-26997:


 Summary: Add table store links to Community page
 Key: FLINK-26997
 URL: https://issues.apache.org/jira/browse/FLINK-26997
 Project: Flink
  Issue Type: Sub-task
  Components: Project Website, Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] flinkbot edited a comment on pull request #19333: [FLINK-26190][python] Remove getTableConfig from ExecNodeConfiguration

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19333:
URL: https://github.com/apache/flink/pull/19333#issuecomment-1086447112


   
   ## CI report:
   
   * 84ab13731ec4deaa375549d5d63afb2020ccd476 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34138)
 
   * 0ae998978c02b08d84eb814e3c9fd95b15883f3b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-22317) Support DROP column/constraint/watermark for ALTER TABLE statement

2022-04-01 Thread dalongliu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516198#comment-17516198
 ] 

dalongliu commented on FLINK-22317:
---

I will take over this issue, cc [~jark] 

> Support DROP column/constraint/watermark for ALTER TABLE statement
> --
>
> Key: FLINK-22317
> URL: https://issues.apache.org/jira/browse/FLINK-22317
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Reporter: Jark Wu
>Assignee: Aiden Gong
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] flinkbot edited a comment on pull request #19329: [FLINK-22318][table] Support RENAME column name for ALTER TABLE statement

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19329:
URL: https://github.com/apache/flink/pull/19329#issuecomment-1085879092


   
   ## CI report:
   
   * fae9d6fee2e0bfe4066b185cabab170b91ed4564 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34116)
 
   * ae35684951a5222b165af15c667c66ac09076958 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34140)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-26945) Add DATE_SUB supported in SQL & Table API

2022-04-01 Thread zoucao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516197#comment-17516197
 ] 

zoucao commented on FLINK-26945:


Looking forward to it, It is very useful.

> Add DATE_SUB supported in SQL & Table API
> -
>
> Key: FLINK-26945
> URL: https://issues.apache.org/jira/browse/FLINK-26945
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Reporter: dalongliu
>Priority: Major
> Fix For: 1.16.0
>
>
> Returns the date {{numDays}} before {{{}startDate{}}}.
> Syntax:
> {code:java}
> date_sub(startDate, numDays) {code}
> Arguments:
>  * {{{}startDate{}}}: A DATE expression.
>  * {{{}numDays{}}}: An INTEGER expression.
> Returns:
> A DATE.
> If {{numDays}} is negative abs(num_days) are added to {{{}startDate{}}}.
> If the result date overflows the date range the function raises an error.
> Examples:
> {code:java}
> > SELECT date_sub('2016-07-30', 1);
>  2016-07-29 {code}
> See more:
>  * 
> [Spark|https://spark.apache.org/docs/latest/sql-ref-functions-builtin.html#date-and-timestamp-functions]
>  * [Hive|https://cwiki.apache.org/confluence/display/hive/languagemanual+udf]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] flinkbot edited a comment on pull request #19329: [FLINK-22318][table] Support RENAME column name for ALTER TABLE statement

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19329:
URL: https://github.com/apache/flink/pull/19329#issuecomment-1085879092


   
   ## CI report:
   
   * fae9d6fee2e0bfe4066b185cabab170b91ed4564 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34116)
 
   * ae35684951a5222b165af15c667c66ac09076958 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-26986) Remove deprecated string expressions in Python Table API

2022-04-01 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-26986:

Affects Version/s: (was: 1.15.0)

> Remove deprecated string expressions in Python Table API
> 
>
> Key: FLINK-26986
> URL: https://issues.apache.org/jira/browse/FLINK-26986
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> In FLINK-26704, it has removed the string expressions in Table API. However, 
> there are still some APIs still using string expressions in Python Table API, 
> however, they should not work any more as the string expressions have already 
> been removed in the Java Table API. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26986) Remove deprecated string expressions in Python Table API

2022-04-01 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516192#comment-17516192
 ] 

Dian Fu commented on FLINK-26986:
-

Just found that FLINK-26704 is resolved in 1.16 and so revert the changes in 
release-1.15 via 1bfbdc9c76de3b3fab4d70058257b7a307269570

> Remove deprecated string expressions in Python Table API
> 
>
> Key: FLINK-26986
> URL: https://issues.apache.org/jira/browse/FLINK-26986
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> In FLINK-26704, it has removed the string expressions in Table API. However, 
> there are still some APIs still using string expressions in Python Table API, 
> however, they should not work any more as the string expressions have already 
> been removed in the Java Table API. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26986) Remove deprecated string expressions in Python Table API

2022-04-01 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-26986:

Fix Version/s: 1.16.0
   (was: 1.15.0)

> Remove deprecated string expressions in Python Table API
> 
>
> Key: FLINK-26986
> URL: https://issues.apache.org/jira/browse/FLINK-26986
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.15.0
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> In FLINK-26704, it has removed the string expressions in Table API. However, 
> there are still some APIs still using string expressions in Python Table API, 
> however, they should not work any more as the string expressions have already 
> been removed in the Java Table API. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink-ml] weibozhao commented on a change in pull request #56: [FLINK-25616] Add Transformer for VectorAssembler

2022-04-01 Thread GitBox


weibozhao commented on a change in pull request #56:
URL: https://github.com/apache/flink-ml/pull/56#discussion_r840992626



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/common/param/HasHandleInvalid.java
##
@@ -37,13 +37,14 @@
 public interface HasHandleInvalid extends WithParams {
 String ERROR_INVALID = "error";
 String SKIP_INVALID = "skip";
+String KEEP_INVALID = "keep";

Review comment:
   I get your idea. I think it's better defining this parameter in the 
VectorAssemblerParams, for keep has no meaning in OnehotEncoder fit and 
transform. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-ml] zhipeng93 commented on a change in pull request #73: [FLINK-26626] Add Transformer and Estimator for StandardScaler

2022-04-01 Thread GitBox


zhipeng93 commented on a change in pull request #73:
URL: https://github.com/apache/flink-ml/pull/73#discussion_r840992081



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/feature/standardscaler/StandardScaler.java
##
@@ -0,0 +1,288 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.feature.standardscaler;
+
+import org.apache.flink.api.common.state.ListState;
+import org.apache.flink.api.common.state.ListStateDescriptor;
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.tuple.Tuple3;
+import org.apache.flink.api.java.typeutils.TupleTypeInfo;
+import org.apache.flink.iteration.operator.OperatorStateUtils;
+import org.apache.flink.ml.api.Estimator;
+import org.apache.flink.ml.linalg.BLAS;
+import org.apache.flink.ml.linalg.DenseVector;
+import org.apache.flink.ml.linalg.Vector;
+import org.apache.flink.ml.linalg.Vectors;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.util.ParamUtils;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.runtime.state.StateInitializationContext;
+import org.apache.flink.runtime.state.StateSnapshotContext;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.streaming.api.operators.AbstractStreamOperator;
+import org.apache.flink.streaming.api.operators.BoundedOneInput;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.table.api.internal.TableImpl;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.Preconditions;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * An Estimator which implements the standard scaling algorithm.
+ *
+ * Standardization is a common requirement for machine learning training 
because they may behave
+ * badly if the individual features of a input do not look like standard 
normally distributed data
+ * (e.g. Gaussian with 0 mean and unit variance).
+ *
+ * This estimator standardizes the input features by removing the mean and 
scaling each dimension
+ * to unit variance.
+ */
+public class StandardScaler
+implements Estimator,
+StandardScalerParams {
+private final Map, Object> paramMap = new HashMap<>();
+
+public StandardScaler() {
+ParamUtils.initializeMapWithDefaultValues(paramMap, this);
+}
+
+@Override
+public StandardScalerModel fit(Table... inputs) {
+Preconditions.checkArgument(inputs.length == 1);
+StreamTableEnvironment tEnv =
+(StreamTableEnvironment) ((TableImpl) 
inputs[0]).getTableEnvironment();
+DataStream> 
sumAndSquaredSumAndWeight =
+tEnv.toDataStream(inputs[0])
+.transform(
+"computeMeta",
+new TupleTypeInfo<>(
+TypeInformation.of(DenseVector.class),
+TypeInformation.of(DenseVector.class),
+BasicTypeInfo.LONG_TYPE_INFO),
+new ComputeMetaOperator(getFeaturesCol()));
+
+DataStream modelData =
+sumAndSquaredSumAndWeight
+.transform(
+"buildModel",
+
TypeInformation.of(StandardScalerModelData.class),
+new BuildModelOperator())
+.setParallelism(1);
+
+StandardScalerModel model =
+new 
StandardScalerModel().setModelData(tEnv.fromDataStream(modelData));
+ReadWriteUtils.updateExistingParams(model, paramMap);
+return model;
+}
+
+/**
+ * Builds the {@link 

[GitHub] [flink-ml] weibozhao commented on a change in pull request #56: [FLINK-25616] Add Transformer for VectorAssembler

2022-04-01 Thread GitBox


weibozhao commented on a change in pull request #56:
URL: https://github.com/apache/flink-ml/pull/56#discussion_r840991670



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/feature/vectorassembler/VectorAssembler.java
##
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.feature.vectorassembler;
+
+import org.apache.flink.api.common.functions.FlatMapFunction;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.ml.api.Transformer;
+import org.apache.flink.ml.common.datastream.TableUtils;
+import org.apache.flink.ml.common.param.HasHandleInvalid;
+import org.apache.flink.ml.linalg.DenseVector;
+import org.apache.flink.ml.linalg.SparseVector;
+import org.apache.flink.ml.linalg.Vector;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.util.ParamUtils;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.table.api.internal.TableImpl;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.Collector;
+import org.apache.flink.util.Preconditions;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.Map;
+
+/**
+ * A feature transformer that combines a given list of input columns into a 
vector column. Types of
+ * input columns must be either vector or numerical value.
+ */
+public class VectorAssembler
+implements Transformer, 
VectorAssemblerParams {
+private final Map, Object> paramMap = new HashMap<>();
+private static final double RATIO = 1.5;
+
+public VectorAssembler() {
+ParamUtils.initializeMapWithDefaultValues(paramMap, this);
+}
+
+@Override
+public Table[] transform(Table... inputs) {
+Preconditions.checkArgument(inputs.length == 1);
+
+StreamTableEnvironment tEnv =
+(StreamTableEnvironment) ((TableImpl) 
inputs[0]).getTableEnvironment();
+RowTypeInfo inputTypeInfo = 
TableUtils.getRowTypeInfo(inputs[0].getResolvedSchema());
+RowTypeInfo outputTypeInfo =
+new RowTypeInfo(
+ArrayUtils.addAll(
+inputTypeInfo.getFieldTypes(), 
TypeInformation.of(Vector.class)),
+ArrayUtils.addAll(inputTypeInfo.getFieldNames(), 
getOutputCol()));
+DataStream output =
+tEnv.toDataStream(inputs[0])
+.flatMap(
+new AssemblerFunc(getInputCols(), 
getHandleInvalid()),
+outputTypeInfo);
+Table outputTable = tEnv.fromDataStream(output);
+return new Table[] {outputTable};
+}
+
+private static class AssemblerFunc implements FlatMapFunction {
+private final String[] inputCols;
+private final String handleInvalid;
+
+public AssemblerFunc(String[] inputCols, String handleInvalid) {
+this.inputCols = inputCols;
+this.handleInvalid = handleInvalid;
+}
+
+@Override
+public void flatMap(Row value, Collector out) throws Exception {
+Object[] objects = new Object[inputCols.length];
+for (int i = 0; i < objects.length; ++i) {
+objects[i] = value.getField(inputCols[i]);
+}
+Vector assembledVector = null;
+try {
+assembledVector = assemble(objects);
+} catch (Exception e) {
+switch (handleInvalid) {
+case HasHandleInvalid.ERROR_INVALID:
+throw e;
+case HasHandleInvalid.SKIP_INVALID:
+return;
+case HasHandleInvalid.KEEP_INVALID:
+out.collect(Row.join(value, Row.of(assembledVector)));
+return;
+default:

Review comment:
 

[jira] [Updated] (FLINK-26996) Break the reconcile after first create session cluster

2022-04-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-26996:
---
Labels: pull-request-available  (was: )

> Break the reconcile after first create session cluster
> --
>
> Key: FLINK-26996
> URL: https://issues.apache.org/jira/browse/FLINK-26996
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Reporter: Aitozi
>Priority: Major
>  Labels: pull-request-available
>
> When I test session cluster, I found that it will always start twice for the 
> session cluster. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink-ml] lindong28 commented on a change in pull request #71: [FLINK-26443] Add benchmark framework

2022-04-01 Thread GitBox


lindong28 commented on a change in pull request #71:
URL: https://github.com/apache/flink-ml/pull/71#discussion_r840990551



##
File path: 
flink-ml-benchmark/src/main/java/org/apache/flink/ml/benchmark/BenchmarkResult.java
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.benchmark;
+
+import java.util.LinkedHashMap;
+import java.util.Map;
+
+/** The result of executing a benchmark. */
+public class BenchmarkResult {
+/** The name of the benchmark. */
+public String name;

Review comment:
   hmm... in the future when we add `latencyP99Ms`, we can just add a new 
constructor and update the existing constructors to provide a default value for 
`latencyP99Ms`. We want need to change all existing usages, right?
   
   Note that if we need to change existing usages, it should be because we want 
to provide a valid non-default value for `latencyP99Ms`. We will also need to 
do these changes if we use `Builder` in this case.
   
   The usages of multiple constructors is a common pattern across Flink to 
handle scenarios like this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-26996) Break the reconcile after first create session cluster

2022-04-01 Thread Aitozi (Jira)
Aitozi created FLINK-26996:
--

 Summary: Break the reconcile after first create session cluster
 Key: FLINK-26996
 URL: https://issues.apache.org/jira/browse/FLINK-26996
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Reporter: Aitozi


When I test session cluster, I found that it will always start twice for the 
session cluster. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink-ml] lindong28 commented on a change in pull request #56: [FLINK-25616] Add Transformer for VectorAssembler

2022-04-01 Thread GitBox


lindong28 commented on a change in pull request #56:
URL: https://github.com/apache/flink-ml/pull/56#discussion_r840982214



##
File path: 
flink-ml-lib/src/test/java/org/apache/flink/ml/feature/VectorAssemblerTest.java
##
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.feature;
+
+import org.apache.flink.api.common.restartstrategy.RestartStrategies;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.ml.common.param.HasHandleInvalid;
+import org.apache.flink.ml.feature.vectorassembler.VectorAssembler;
+import org.apache.flink.ml.linalg.DenseVector;
+import org.apache.flink.ml.linalg.SparseVector;
+import org.apache.flink.ml.linalg.Vectors;
+import org.apache.flink.ml.util.StageTestUtils;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import 
org.apache.flink.streaming.api.environment.ExecutionCheckpointingOptions;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.test.util.AbstractTestBase;
+import org.apache.flink.types.Row;
+
+import org.apache.commons.collections.IteratorUtils;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.Arrays;
+import java.util.List;
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNull;
+
+/** Tests VectorAssembler. */
+public class VectorAssemblerTest extends AbstractTestBase {
+
+private StreamTableEnvironment tEnv;
+private Table inputDataTable;
+
+private static final List INPUT_DATA =
+Arrays.asList(
+Row.of(
+0,
+Vectors.dense(2.1, 3.1),
+1.0,
+Vectors.sparse(5, new int[] {3}, new double[] 
{1.0})),
+Row.of(
+1,
+Vectors.dense(2.1, 3.1),
+1.0,
+Vectors.sparse(
+5, new int[] {4, 2, 3, 1}, new double[] 
{4.0, 2.0, 3.0, 1.0})),
+Row.of(2, null, 1.0, null));
+
+private static final SparseVector EXPECTED_OUTPUT_DATA_1 =
+Vectors.sparse(8, new int[] {0, 1, 2, 6}, new double[] {2.1, 3.1, 
1.0, 1.0});
+private static final DenseVector EXPECTED_OUTPUT_DATA_2 =
+Vectors.dense(2.1, 3.1, 1.0, 0.0, 1.0, 2.0, 3.0, 4.0);
+
+@Before
+public void before() {
+Configuration config = new Configuration();
+
config.set(ExecutionCheckpointingOptions.ENABLE_CHECKPOINTS_AFTER_TASKS_FINISH, 
true);
+StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment(config);
+env.setParallelism(4);
+env.enableCheckpointing(100);
+env.setRestartStrategy(RestartStrategies.noRestart());
+tEnv = StreamTableEnvironment.create(env);
+DataStream dataStream = env.fromCollection(INPUT_DATA);
+inputDataTable = tEnv.fromDataStream(dataStream).as("id", "vec", 
"num", "sparseVec");
+}
+
+private void verifyOutputResult(Table output, String outputCol, int 
outputSize)
+throws Exception {
+DataStream dataStream = tEnv.toDataStream(output);
+List results = 
IteratorUtils.toList(dataStream.executeAndCollect());
+assertEquals(outputSize, results.size());
+for (Row result : results) {
+if (result.getField(0) == (Object) 0) {
+assertEquals(EXPECTED_OUTPUT_DATA_1, 
result.getField(outputCol));
+} else if (result.getField(0) == (Object) 1) {
+assertEquals(EXPECTED_OUTPUT_DATA_2, 
result.getField(outputCol));
+} else {
+assertNull(result.getField(outputCol));
+}
+}
+}
+
+@Test
+public void testParam() {
+VectorAssembler vectorAssembler = new VectorAssembler();
+assertEquals(HasHandleInvalid.ERROR_INVALID, 

[GitHub] [flink] flinkbot edited a comment on pull request #19333: [FLINK-26190][python] Remove getTableConfig from ExecNodeConfiguration

2022-04-01 Thread GitBox


flinkbot edited a comment on pull request #19333:
URL: https://github.com/apache/flink/pull/19333#issuecomment-1086447112


   
   ## CI report:
   
   * 84ab13731ec4deaa375549d5d63afb2020ccd476 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34138)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #19333: [FLINK-26190][python] Remove getTableConfig from ExecNodeConfiguration

2022-04-01 Thread GitBox


flinkbot commented on pull request #19333:
URL: https://github.com/apache/flink/pull/19333#issuecomment-1086447112


   
   ## CI report:
   
   * 84ab13731ec4deaa375549d5d63afb2020ccd476 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-26190) Remove getTableConfig from ExecNodeConfiguration

2022-04-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-26190:
---
Labels: pull-request-available  (was: )

> Remove getTableConfig from ExecNodeConfiguration
> 
>
> Key: FLINK-26190
> URL: https://issues.apache.org/jira/browse/FLINK-26190
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Marios Trivyzas
>Assignee: Dian Fu
>Priority: Major
>  Labels: pull-request-available
>
> Currently, *ExecNodeConfig* holds *TableConfig* instead of *ReadableConfig* 
> for the configuration coming from the planner, because it's used by
> *CommonPythonUtil#getMergedConfig.* This should be fixed, so that 
> *CommonPythonUtil#getMergedConfig* cam use a *ReadableConfig* instead, and 
> then we can pass the *ExecNodeConfig* which holds the complete view of 
> {*}Planner{*}'s *TableConfig* + the {*}ExecNode{*}'s {*}persistedConfig{*}.
>  
> To achieve that the *getMergedConfig* methods of *PythonConfigUtil* must be 
> changed, and also the temp solution in 
> *PythonFunctionFactory#getPythonFunction* must be changed as well:
> {noformat}
> if (config instanceof TableConfig) {
> PythonDependencyUtils.merge(mergedConfig, ((TableConfig) 
> config).getConfiguration());
> } else {
> PythonDependencyUtils.merge(mergedConfig, (Configuration) config);
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] dianfu opened a new pull request #19333: [FLINK-26190][python] Remove getTableConfig from ExecNodeConfiguration

2022-04-01 Thread GitBox


dianfu opened a new pull request #19333:
URL: https://github.com/apache/flink/pull/19333


   
   ## What is the purpose of the change
   
   *This pull request tries to remove getTableConfig from 
ExecNodeConfiguration. It has refactored the PythonConfig and many other places 
as a result.*
   
   ## Verifying this change
   
   This change is a code refactor work and has already covered by existing 
tests.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-26731) Support max_by operation in KeyedStream

2022-04-01 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-26731.
---
Resolution: Duplicate

> Support max_by operation in KeyedStream
> ---
>
> Key: FLINK-26731
> URL: https://issues.apache.org/jira/browse/FLINK-26731
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Reporter: CaoYu
>Assignee: CaoYu
>Priority: Major
>
> Support max_by operation in KeyedStream



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-26730) Support max operation in KeyedStream

2022-04-01 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-26730.
---
Resolution: Duplicate

> Support max operation in KeyedStream
> 
>
> Key: FLINK-26730
> URL: https://issues.apache.org/jira/browse/FLINK-26730
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Reporter: CaoYu
>Assignee: CaoYu
>Priority: Major
>
> Support max operation in KeyedStream.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-26729) Support min_by operation in KeyedStream

2022-04-01 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-26729.
---
Resolution: Duplicate

> Support min_by operation in KeyedStream
> ---
>
> Key: FLINK-26729
> URL: https://issues.apache.org/jira/browse/FLINK-26729
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Affects Versions: 1.14.3
>Reporter: CaoYu
>Assignee: CaoYu
>Priority: Major
>
> Support min_by operation in KeyedStream.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-26728) Support min max min_by max_by operation in KeyedStream

2022-04-01 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-26728.
---
Fix Version/s: 1.16.0
   Resolution: Fixed

Merged to master via 1e1a7417f7814b0abc977fd4d5e5295c27fe3aef

> Support min max min_by max_by operation in KeyedStream
> --
>
> Key: FLINK-26728
> URL: https://issues.apache.org/jira/browse/FLINK-26728
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python
>Reporter: CaoYu
>Assignee: CaoYu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Support min max min_by max_by operation in KeyedStream
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [flink] dianfu closed pull request #19242: [FLINK-26728][python]Support min/max/min_by/max_by operation in KeyedStream

2022-04-01 Thread GitBox


dianfu closed pull request #19242:
URL: https://github.com/apache/flink/pull/19242


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-ml] lindong28 commented on a change in pull request #73: [FLINK-26626] Add Transformer and Estimator for StandardScaler

2022-04-01 Thread GitBox


lindong28 commented on a change in pull request #73:
URL: https://github.com/apache/flink-ml/pull/73#discussion_r840980952



##
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/feature/standardscaler/StandardScaler.java
##
@@ -0,0 +1,288 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.feature.standardscaler;
+
+import org.apache.flink.api.common.state.ListState;
+import org.apache.flink.api.common.state.ListStateDescriptor;
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.tuple.Tuple3;
+import org.apache.flink.api.java.typeutils.TupleTypeInfo;
+import org.apache.flink.iteration.operator.OperatorStateUtils;
+import org.apache.flink.ml.api.Estimator;
+import org.apache.flink.ml.linalg.BLAS;
+import org.apache.flink.ml.linalg.DenseVector;
+import org.apache.flink.ml.linalg.Vector;
+import org.apache.flink.ml.linalg.Vectors;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.util.ParamUtils;
+import org.apache.flink.ml.util.ReadWriteUtils;
+import org.apache.flink.runtime.state.StateInitializationContext;
+import org.apache.flink.runtime.state.StateSnapshotContext;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.streaming.api.operators.AbstractStreamOperator;
+import org.apache.flink.streaming.api.operators.BoundedOneInput;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.table.api.internal.TableImpl;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.Preconditions;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * An Estimator which implements the standard scaling algorithm.
+ *
+ * Standardization is a common requirement for machine learning training 
because they may behave
+ * badly if the individual features of a input do not look like standard 
normally distributed data
+ * (e.g. Gaussian with 0 mean and unit variance).
+ *
+ * This estimator standardizes the input features by removing the mean and 
scaling each dimension
+ * to unit variance.
+ */
+public class StandardScaler
+implements Estimator,
+StandardScalerParams {
+private final Map, Object> paramMap = new HashMap<>();
+
+public StandardScaler() {
+ParamUtils.initializeMapWithDefaultValues(paramMap, this);
+}
+
+@Override
+public StandardScalerModel fit(Table... inputs) {
+Preconditions.checkArgument(inputs.length == 1);
+StreamTableEnvironment tEnv =
+(StreamTableEnvironment) ((TableImpl) 
inputs[0]).getTableEnvironment();
+DataStream> 
sumAndSquaredSumAndWeight =
+tEnv.toDataStream(inputs[0])
+.transform(
+"computeMeta",
+new TupleTypeInfo<>(
+TypeInformation.of(DenseVector.class),
+TypeInformation.of(DenseVector.class),
+BasicTypeInfo.LONG_TYPE_INFO),
+new ComputeMetaOperator(getFeaturesCol()));
+
+DataStream modelData =
+sumAndSquaredSumAndWeight
+.transform(
+"buildModel",
+
TypeInformation.of(StandardScalerModelData.class),
+new BuildModelOperator())
+.setParallelism(1);
+
+StandardScalerModel model =
+new 
StandardScalerModel().setModelData(tEnv.fromDataStream(modelData));
+ReadWriteUtils.updateExistingParams(model, paramMap);
+return model;
+}
+
+/**
+ * Builds the {@link 

  1   2   3   4   5   6   >