[GitHub] [flink-table-store] JingsongLi commented on pull request #569: [FLINK-31269] Split hive connector to each module of each version

2023-03-02 Thread via GitHub


JingsongLi commented on PR #569:
URL: 
https://github.com/apache/flink-table-store/pull/569#issuecomment-1453120013

   I'm mainly worried about whether Hive-catalog can work properly. Have you 
tested all Hive versions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #426: [FLINK-30323] Support table statistics in table store

2023-03-02 Thread via GitHub


JingsongLi commented on code in PR #426:
URL: https://github.com/apache/flink-table-store/pull/426#discussion_r1124125253


##
flink-table-store-flink/flink-table-store-flink-common/src/main/java/org/apache/flink/table/store/connector/source/TableStoreSource.java:
##
@@ -209,4 +233,19 @@ public LookupRuntimeProvider 
getLookupRuntimeProvider(LookupContext context) {
 return TableFunctionProvider.of(
 new FileStoreLookupFunction(table, projection, joinKey, 
predicate));
 }
+
+@Override
+public TableStats reportStatistics() {

Review Comment:
   Can we document this? This may not be a positive path always. Flink SQL 
provides `table.optimizer.source.report-statistics-enabled` to close this.



##
flink-table-store-flink/flink-table-store-flink-1.15/src/main/java/org/apache/flink/table/store/connector/source/TableStoreSource.java:
##
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.store.connector.source;
+
+import org.apache.flink.api.common.eventtime.WatermarkStrategy;
+import org.apache.flink.table.catalog.ObjectIdentifier;
+import org.apache.flink.table.connector.ChangelogMode;
+import org.apache.flink.table.connector.source.DynamicTableSource;
+import org.apache.flink.table.connector.source.LookupTableSource;
+import org.apache.flink.table.connector.source.TableFunctionProvider;
+import 
org.apache.flink.table.connector.source.abilities.SupportsWatermarkPushDown;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.factories.DynamicTableFactory;
+import org.apache.flink.table.store.CoreOptions.ChangelogProducer;
+import org.apache.flink.table.store.CoreOptions.LogChangelogMode;
+import org.apache.flink.table.store.CoreOptions.LogConsistency;
+import org.apache.flink.table.store.annotation.VisibleForTesting;
+import org.apache.flink.table.store.connector.FlinkConnectorOptions;
+import org.apache.flink.table.store.connector.TableStoreDataStreamScanProvider;
+import org.apache.flink.table.store.connector.lookup.FileStoreLookupFunction;
+import org.apache.flink.table.store.file.predicate.Predicate;
+import org.apache.flink.table.store.log.LogSourceProvider;
+import org.apache.flink.table.store.log.LogStoreTableFactory;
+import org.apache.flink.table.store.options.Options;
+import org.apache.flink.table.store.table.AppendOnlyFileStoreTable;
+import org.apache.flink.table.store.table.ChangelogValueCountFileStoreTable;
+import org.apache.flink.table.store.table.ChangelogWithKeyFileStoreTable;
+import org.apache.flink.table.store.table.FileStoreTable;
+import org.apache.flink.table.store.utils.Projection;
+
+import javax.annotation.Nullable;
+
+import java.util.stream.IntStream;
+
+import static org.apache.flink.table.store.CoreOptions.CHANGELOG_PRODUCER;
+import static org.apache.flink.table.store.CoreOptions.LOG_CHANGELOG_MODE;
+import static org.apache.flink.table.store.CoreOptions.LOG_CONSISTENCY;
+import static 
org.apache.flink.table.store.CoreOptions.LOG_SCAN_REMOVE_NORMALIZE;
+
+/**
+ * Table source to create {@link StaticFileStoreSource} or {@link 
ContinuousFileStoreSource} under
+ * batch mode or change-tracking is disabled. For streaming mode with 
change-tracking enabled and
+ * FULL scan mode, it will create a {@code HybridSource} of {@link 
StaticFileStoreSource} and kafka
+ * log source created by {@link LogSourceProvider}.
+ */
+public class TableStoreSource extends FlinkTableSource

Review Comment:
   Can we just introduce a `StatisticTableSource` in flink-common? this can 
avoid copying for large number of codes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] SinBex commented on a diff in pull request #21634: [FLINK-30480][runtime] Add benchmarks for adaptive batch scheduler.

2023-03-02 Thread via GitHub


SinBex commented on code in PR #21634:
URL: https://github.com/apache/flink/pull/21634#discussion_r1124119471


##
flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/benchmark/deploying/DeployingDownstreamTasksInBatchJobBenchmarkTest.java:
##
@@ -31,7 +31,7 @@
 public class DeployingDownstreamTasksInBatchJobBenchmarkTest extends 
TestLogger {

Review Comment:
   Agree with you, it's better to open a new commit to fix this, thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] SinBex commented on a diff in pull request #21634: [FLINK-30480][runtime] Add benchmarks for adaptive batch scheduler.

2023-03-02 Thread via GitHub


SinBex commented on code in PR #21634:
URL: https://github.com/apache/flink/pull/21634#discussion_r1124119471


##
flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/benchmark/deploying/DeployingDownstreamTasksInBatchJobBenchmarkTest.java:
##
@@ -31,7 +31,7 @@
 public class DeployingDownstreamTasksInBatchJobBenchmarkTest extends 
TestLogger {

Review Comment:
   Agree with you, it's better to open a new commit to fix this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-31302) Split spark modules according to version

2023-03-02 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee closed FLINK-31302.

Fix Version/s: table-store-0.4.0
 Assignee: yuzelin
   Resolution: Fixed

master: 77fd8ecfd599c9d6c2b706bee4684558a8ba2f40

> Split spark modules according to version
> 
>
> Key: FLINK-31302
> URL: https://issues.apache.org/jira/browse/FLINK-31302
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Affects Versions: table-store-0.4.0
>Reporter: yuzelin
>Assignee: yuzelin
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31302) Split spark modules according to version

2023-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31302:
---
Labels: pull-request-available  (was: )

> Split spark modules according to version
> 
>
> Key: FLINK-31302
> URL: https://issues.apache.org/jira/browse/FLINK-31302
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Affects Versions: table-store-0.4.0
>Reporter: yuzelin
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-table-store] JingsongLi merged pull request #570: [FLINK-31302] Split spark modules according to version

2023-03-02 Thread via GitHub


JingsongLi merged PR #570:
URL: https://github.com/apache/flink-table-store/pull/570


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] reswqa commented on pull request #22076: [BP-1.16][FLINK-31288][runtime] Disable overdraft buffer for non pipelined result partition.

2023-03-02 Thread via GitHub


reswqa commented on PR #22076:
URL: https://github.com/apache/flink/pull/22076#issuecomment-1453109144

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] SinBex commented on a diff in pull request #21634: [FLINK-30480][runtime] Add benchmarks for adaptive batch scheduler.

2023-03-02 Thread via GitHub


SinBex commented on code in PR #21634:
URL: https://github.com/apache/flink/pull/21634#discussion_r1124116656


##
flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/adaptivebatch/AdaptiveBatchSchedulerTest.java:
##
@@ -73,7 +74,7 @@
 import static org.assertj.core.api.Assertions.assertThat;
 
 /** Test for {@link AdaptiveBatchScheduler}. */
-class AdaptiveBatchSchedulerTest {
+public class AdaptiveBatchSchedulerTest {

Review Comment:
   Fixed this, thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] SinBex commented on a diff in pull request #21634: [FLINK-30480][runtime] Add benchmarks for adaptive batch scheduler.

2023-03-02 Thread via GitHub


SinBex commented on code in PR #21634:
URL: https://github.com/apache/flink/pull/21634#discussion_r1124116265


##
flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/adaptivebatch/AdaptiveBatchSchedulerTest.java:
##
@@ -400,4 +401,10 @@ private SchedulerBase createScheduler(
 .setDefaultMaxParallelism(defaultMaxParallelism)
 .buildAdaptiveBatchJobScheduler();
 }
+
+public static void initializeVerticesIfPossible(DefaultScheduler 
scheduler) {
+if (scheduler instanceof AdaptiveBatchScheduler) {
+((AdaptiveBatchScheduler) 
scheduler).initializeVerticesIfPossible();
+}
+}

Review Comment:
   Yes, it's better. Fixed this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] SinBex commented on a diff in pull request #21634: [FLINK-30480][runtime] Add benchmarks for adaptive batch scheduler.

2023-03-02 Thread via GitHub


SinBex commented on code in PR #21634:
URL: https://github.com/apache/flink/pull/21634#discussion_r1124116265


##
flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/adaptivebatch/AdaptiveBatchSchedulerTest.java:
##
@@ -400,4 +401,10 @@ private SchedulerBase createScheduler(
 .setDefaultMaxParallelism(defaultMaxParallelism)
 .buildAdaptiveBatchJobScheduler();
 }
+
+public static void initializeVerticesIfPossible(DefaultScheduler 
scheduler) {
+if (scheduler instanceof AdaptiveBatchScheduler) {
+((AdaptiveBatchScheduler) 
scheduler).initializeVerticesIfPossible();
+}
+}

Review Comment:
   Yes, it's better. Fix this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] gaoyunhaii commented on pull request #21736: [FLINK-27518][tests] Refactor migration tests to support version update automatically

2023-03-02 Thread via GitHub


gaoyunhaii commented on PR #21736:
URL: https://github.com/apache/flink/pull/21736#issuecomment-1453107730

   > "Especially, the test data generation should be covered in a separate 
commit."
   
   Hi @XComp I have some concern with this point: for the formal process, if we 
want to split the generated files, it will still require a lot of manual 
operations, since there is no explicit mappings from the generated files to the 
migration classes. 
   
   Do we think it is necessary to split the commits? Since now it looks to me 
that they are some kind of auto-managed generated binary files, and the 
developers seems not have requirements to check its history (In fact it will 
only have one log that the files are added). 
   
   If we still think it is necessary to split the commits, I think we might 
need to extend this functionality to support some kind of post-processing 
actions that could execute external scripts to commit files for each migrating 
test classes. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-31092) Hive ITCases fail with OutOfMemoryError

2023-03-02 Thread Shengkai Fang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696063#comment-17696063
 ] 

Shengkai Fang commented on FLINK-31092:
---

Merged into release-1.17: 
594010624f8084efd99d6d405b5310ab24013aeb
86e12eb3fcec54d234154e59f8cb0557fc616494

> Hive ITCases fail with OutOfMemoryError
> ---
>
> Key: FLINK-31092
> URL: https://issues.apache.org/jira/browse/FLINK-31092
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.17.0
>Reporter: Matthias Pohl
>Assignee: Shengkai Fang
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Attachments: 
> -__w-2-s-flink-connectors-flink-connector-hive-target-surefire-reports-2023-02-15T05-01-18_982-jvmRun4.dump,
>  VisualVM-FLINK-31092.png
>
>
> We're experiencing an OutOfMemoryError where the heap space reaches the upper 
> limit:
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46161=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=23142
> {code}
> Feb 15 05:05:14 [INFO] Running 
> org.apache.flink.table.catalog.hive.HiveCatalogITCase
> Feb 15 05:05:17 [INFO] java.lang.OutOfMemoryError: Java heap space
> Feb 15 05:05:17 [INFO] Dumping heap to java_pid9669.hprof ...
> Feb 15 05:05:28 [INFO] Heap dump file created [1957090051 bytes in 11.718 
> secs]
> java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.cancelPingScheduler(ForkedBooter.java:209)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.acknowledgedExit(ForkedBooter.java:419)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:186)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:562)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:548)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] fsk119 closed pull request #22072: [FLINK-31092][sql-gateway][table-common] Fix Hive ITCase fail with OutOfMemoryError

2023-03-02 Thread via GitHub


fsk119 closed pull request #22072: [FLINK-31092][sql-gateway][table-common] Fix 
Hive ITCase fail with OutOfMemoryError
URL: https://github.com/apache/flink/pull/22072


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (FLINK-30238) Unified Sink committer does not clean up state on final savepoint

2023-03-02 Thread Neha Aggarwal (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696060#comment-17696060
 ] 

Neha Aggarwal edited comment on FLINK-30238 at 3/3/23 7:36 AM:
---

This issue is not limited to stop-with-savepoint, It is the same for trigger 
savepoint as well.


was (Author: JIRAUSER299078):
This issue is not limited to stop-with-savepoint, It is same for trigger 
savepoint as well.

> Unified Sink committer does not clean up state on final savepoint
> -
>
> Key: FLINK-30238
> URL: https://issues.apache.org/jira/browse/FLINK-30238
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.17.0, 1.15.3, 1.16.1
>Reporter: Fabian Paul
>Priority: Critical
>
> During stop-with-savepoint the committer only commits the pending 
> committables on notifyCheckpointComplete.
> This has several downsides.
>  * Last committableSummary has checkpoint id LONG.MAX and is never cleared 
> from the state leading to that stop-with-savepoint does not work when the 
> pipeline recovers from a savepoint 
>  * While the committables are committed during stop-with-savepoint they are 
> not forwarded to post-commit topology, potentially losing data and preventing 
> to close open transactions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30238) Unified Sink committer does not clean up state on final savepoint

2023-03-02 Thread Neha Aggarwal (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696060#comment-17696060
 ] 

Neha Aggarwal commented on FLINK-30238:
---

This issue is not limited to stop-with-savepoint, It is same for trigger 
savepoint as well.

> Unified Sink committer does not clean up state on final savepoint
> -
>
> Key: FLINK-30238
> URL: https://issues.apache.org/jira/browse/FLINK-30238
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.17.0, 1.15.3, 1.16.1
>Reporter: Fabian Paul
>Priority: Critical
>
> During stop-with-savepoint the committer only commits the pending 
> committables on notifyCheckpointComplete.
> This has several downsides.
>  * Last committableSummary has checkpoint id LONG.MAX and is never cleared 
> from the state leading to that stop-with-savepoint does not work when the 
> pipeline recovers from a savepoint 
>  * While the committables are committed during stop-with-savepoint they are 
> not forwarded to post-commit topology, potentially losing data and preventing 
> to close open transactions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-table-store] xuzhiwen1255 commented on a diff in pull request #577: [FLINK-31287]Modify the default value of changelog-producer.compactio…

2023-03-02 Thread via GitHub


xuzhiwen1255 commented on code in PR #577:
URL: https://github.com/apache/flink-table-store/pull/577#discussion_r1124097472


##
flink-table-store-flink/flink-table-store-flink-common/src/main/java/org/apache/flink/table/store/connector/FlinkConnectorOptions.java:
##
@@ -102,7 +102,7 @@ public class FlinkConnectorOptions {
 public static final ConfigOption 
CHANGELOG_PRODUCER_FULL_COMPACTION_TRIGGER_INTERVAL =
 key("changelog-producer.compaction-interval")
 .durationType()
-.defaultValue(Duration.ofMinutes(30))
+.defaultValue(Duration.ofSeconds(0))

Review Comment:
   Yes, change to default to a full compression for each ck.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-31310) Force clear directory no matter what situation in HiveCatalog.dropTable

2023-03-02 Thread Nicholas Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696048#comment-17696048
 ] 

Nicholas Jiang commented on FLINK-31310:


[~lzljs3620320], could you assign this to me?

> Force clear directory no matter what situation in HiveCatalog.dropTable
> ---
>
> Key: FLINK-31310
> URL: https://issues.apache.org/jira/browse/FLINK-31310
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Priority: Major
> Fix For: table-store-0.4.0
>
>
> Currently, if no table in hive, will not clear the table.
> We should clear table directory in any situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-table-store] SteNicholas commented on a diff in pull request #577: [FLINK-31287]Modify the default value of changelog-producer.compactio…

2023-03-02 Thread via GitHub


SteNicholas commented on code in PR #577:
URL: https://github.com/apache/flink-table-store/pull/577#discussion_r1124094970


##
flink-table-store-flink/flink-table-store-flink-common/src/main/java/org/apache/flink/table/store/connector/FlinkConnectorOptions.java:
##
@@ -102,7 +102,7 @@ public class FlinkConnectorOptions {
 public static final ConfigOption 
CHANGELOG_PRODUCER_FULL_COMPACTION_TRIGGER_INTERVAL =
 key("changelog-producer.compaction-interval")
 .durationType()
-.defaultValue(Duration.ofMinutes(30))
+.defaultValue(Duration.ofSeconds(0))

Review Comment:
   Could `changelog-producer.compaction-interval` has no default value? If the 
value of `changelog-producer.compaction-interval` is null or 0, the default 
behavior is both that each checkpoint has a full compression and generates a 
change log.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-25217) FLIP-190: Support Version Upgrades for Table API & SQL Programs

2023-03-02 Thread ConradJam (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696047#comment-17696047
 ] 

ConradJam commented on FLINK-25217:
---

I can help with Parts 1 and 2, and I'm slowly learning about the FLIP right 
now. Is there any more information you can provide?

> FLIP-190: Support Version Upgrades for Table API & SQL Programs
> ---
>
> Key: FLINK-25217
> URL: https://issues.apache.org/jira/browse/FLINK-25217
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / API, Table SQL / Planner
>Reporter: Timo Walther
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>
> Nowadays, the Table & SQL API is as important to Flink as the DataStream API. 
> It is one of the main abstractions for expressing pipelines that perform 
> stateful stream processing. Users expect the same backwards compatibility 
> guarantees when upgrading to a newer Flink version as with the DataStream API.
> In particular, this means:
> * once the operator topology is defined, it remains static and does not 
> change between Flink versions, unless resulting in better performance,
> * business logic (defined using expressions and functions in queries) behaves 
> identical as before the version upgrade,
> * the state of a Table & SQL API program can be restored from a savepoint of 
> a previous version,
> * adding or removing stateful operators should be made possible in the 
> DataStream API.
> The same query can remain up and running after upgrades.
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-table-store] liming30 opened a new pull request, #578: [FLINK-31291] Close the Committer when the CommitterOperator closes.

2023-03-02 Thread via GitHub


liming30 opened a new pull request, #578:
URL: https://github.com/apache/flink-table-store/pull/578

   Close the Committer when the CommitterOperator closes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (FLINK-31252) Improve StaticFileStoreSplitEnumerator to assign batch splits

2023-03-02 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee closed FLINK-31252.

Resolution: Fixed

master: 4296d7c1cca7ff8fb5525401b1ef1659aae5879a

> Improve StaticFileStoreSplitEnumerator to assign batch splits
> -
>
> Key: FLINK-31252
> URL: https://issues.apache.org/jira/browse/FLINK-31252
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.4.0
>
>
> The following batch assignment operation is for two things:
> 1. It can be evenly distributed during batch reading to avoid scheduling 
> problems (for example, the current resource can only schedule part of the 
> tasks) that cause some tasks to fail to read data.
> 2. Read with limit, if split is assigned one by one, it may cause the task to 
> repeatedly create SplitFetchers. After the task is created, it is found that 
> it is idle and then closed. Then, new split coming, it will create 
> SplitFetcher and repeatedly read the data of the limit number (the limit 
> status is in the SplitFetcher).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-table-store] JingsongLi merged pull request #563: [FLINK-31252] Improve StaticFileStoreSplitEnumerator to assign batch splits

2023-03-02 Thread via GitHub


JingsongLi merged PR #563:
URL: https://github.com/apache/flink-table-store/pull/563


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-25986) Add FLIP-190 new API methods to python

2023-03-02 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696044#comment-17696044
 ] 

Dian Fu commented on FLINK-25986:
-

Thanks [~qingyue] for taking this issue and also thanks [~nicholasjiang] for 
the confirmation~

> Add FLIP-190 new API methods to python
> --
>
> Key: FLINK-25986
> URL: https://issues.apache.org/jira/browse/FLINK-25986
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python, Table SQL / API
>Reporter: Francesco Guardiani
>Assignee: Jane Chan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-25986) Add FLIP-190 new API methods to python

2023-03-02 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-25986:
---

Assignee: Jane Chan  (was: Nicholas Jiang)

> Add FLIP-190 new API methods to python
> --
>
> Key: FLINK-25986
> URL: https://issues.apache.org/jira/browse/FLINK-25986
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python, Table SQL / API
>Reporter: Francesco Guardiani
>Assignee: Jane Chan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] pgaref commented on a diff in pull request #21923: FLINK-13871: Consolidate volatile status fields in StreamTask

2023-03-02 Thread via GitHub


pgaref commented on code in PR #21923:
URL: https://github.com/apache/flink/pull/21923#discussion_r1124085401


##
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java:
##
@@ -255,23 +258,52 @@
 
 private final StreamTaskAsyncExceptionHandler asyncExceptionHandler;
 
-/**
- * Flag to mark the task "in operation", in which case check needs to be 
initialized to true, so
- * that early cancel() before invoke() behaves correctly.
- */
-private volatile boolean isRunning;
-
-/** Flag to mark the task at restoring duration in {@link #restore()}. */
-private volatile boolean isRestoring;
-
-/** Flag to mark this task as canceled. */
-private volatile boolean canceled;
+/** Possible states of a Task. */
+private static class TaskState {

Review Comment:
   Ack. moved to the bottom of the class



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] fsk119 commented on pull request #22072: [FLINK-31092][sql-gateway][table-common] Fix Hive ITCase fail with OutOfMemoryError

2023-03-02 Thread via GitHub


fsk119 commented on PR #22072:
URL: https://github.com/apache/flink/pull/22072#issuecomment-1453063763

   All tests pass in my pipeline: 
https://dev.azure.com/1059623455/Flink/_build/results?buildId=565=results


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on a diff in pull request #21923: FLINK-13871: Consolidate volatile status fields in StreamTask

2023-03-02 Thread via GitHub


pgaref commented on code in PR #21923:
URL: https://github.com/apache/flink/pull/21923#discussion_r1124078380


##
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java:
##
@@ -733,8 +764,7 @@ void restoreInternal() throws Exception {
 // needed
 channelIOExecutor.shutdown();
 
-isRunning = true;
-isRestoring = false;
+taskState.status = TaskState.Status.RUNNING;

Review Comment:
   Hey Roman, very good point since TaskCanceler is asynchronous in nature.
   However, the only way to invoke [Task 
cancelation](https://github.com/apache/flink/blob/8be94e6663d8ac6e3d74bf4cd5f540cc96c8289e/flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java#L1662)
 is through the [invokable 
method](https://github.com/apache/flink/pull/21923/files#diff-0c5fe245445b932fa83fdaf5c4802dbe671b73e8d76f39058c6eaaaffd9639faL982-L983)
 that simultaneously marks the task as **not** Running and Canceled.  So I 
don't think there is (or should be) currently a way the Task is both cancelled 
and running.
   
   In that sense, its not the same situation as failing and we should be safe 
here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on a diff in pull request #21923: FLINK-13871: Consolidate volatile status fields in StreamTask

2023-03-02 Thread via GitHub


pgaref commented on code in PR #21923:
URL: https://github.com/apache/flink/pull/21923#discussion_r1124078380


##
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java:
##
@@ -733,8 +764,7 @@ void restoreInternal() throws Exception {
 // needed
 channelIOExecutor.shutdown();
 
-isRunning = true;
-isRestoring = false;
+taskState.status = TaskState.Status.RUNNING;

Review Comment:
   Hey Roman, very good point since TaskCanceler is asynchronous in nature.
   However, the only way to invoke [Task 
cancelation](https://github.com/apache/flink/blob/8be94e6663d8ac6e3d74bf4cd5f540cc96c8289e/flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java#L1662)
 is through the [invokable 
method](https://github.com/apache/flink/pull/21923/files#diff-0c5fe245445b932fa83fdaf5c4802dbe671b73e8d76f39058c6eaaaffd9639faL982-L983)
 that simultaneously marks the task as **not** Running and Canceled.  So I 
don't think there is currently a way the Task is both cancelled and running.
   
   In that sense, its not the same situation as failing and we should be safe 
here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on a diff in pull request #21923: FLINK-13871: Consolidate volatile status fields in StreamTask

2023-03-02 Thread via GitHub


pgaref commented on code in PR #21923:
URL: https://github.com/apache/flink/pull/21923#discussion_r1124078380


##
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java:
##
@@ -733,8 +764,7 @@ void restoreInternal() throws Exception {
 // needed
 channelIOExecutor.shutdown();
 
-isRunning = true;
-isRestoring = false;
+taskState.status = TaskState.Status.RUNNING;

Review Comment:
   Hey Roman, very good point since TaskCanceler is asynchronous in nature.
   However, the only way to invoke [Task 
cancelation](https://github.com/apache/flink/blob/8be94e6663d8ac6e3d74bf4cd5f540cc96c8289e/flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java#L1662)
 is through the [invokable 
method](https://github.com/apache/flink/pull/21923/files#diff-0c5fe245445b932fa83fdaf5c4802dbe671b73e8d76f39058c6eaaaffd9639faL982-L983)
 that simultaneously marks the task as **not** Running and Canceled.  So I don 
think there is way currently for the Task to both cancelled and running.
   
   In that sense, its not the same situation as failing and we should be safe 
here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on a diff in pull request #21923: FLINK-13871: Consolidate volatile status fields in StreamTask

2023-03-02 Thread via GitHub


pgaref commented on code in PR #21923:
URL: https://github.com/apache/flink/pull/21923#discussion_r1124081750


##
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java:
##
@@ -1019,16 +1049,24 @@ public CanEmitBatchOfRecordsChecker 
getCanEmitBatchOfRecords() {
 return () -> !this.mailboxProcessor.hasMail() && taskIsAvailable();
 }
 
+public final boolean isInitialized() {

Review Comment:
   Ack, moved to TaskState class



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] pgaref commented on a diff in pull request #21923: FLINK-13871: Consolidate volatile status fields in StreamTask

2023-03-02 Thread via GitHub


pgaref commented on code in PR #21923:
URL: https://github.com/apache/flink/pull/21923#discussion_r1124078380


##
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java:
##
@@ -733,8 +764,7 @@ void restoreInternal() throws Exception {
 // needed
 channelIOExecutor.shutdown();
 
-isRunning = true;
-isRestoring = false;
+taskState.status = TaskState.Status.RUNNING;

Review Comment:
   Hey Roman, very good point since TaskCanceler is asynchronous in nature.
   However, the only way to invoke [Task 
cancelation](https://github.com/apache/flink/blob/8be94e6663d8ac6e3d74bf4cd5f540cc96c8289e/flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java#L1662)
 is through the [invokable 
method](https://github.com/apache/flink/pull/21923/files#diff-0c5fe245445b932fa83fdaf5c4802dbe671b73e8d76f39058c6eaaaffd9639faL982-L983)
 that simultaneously marks the task as **not** Running and Canceled. 
   
   In that sense, its not the same situation as failing, so I believe we are 
safe here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-table-store] SteNicholas commented on a diff in pull request #577: [FLINK-31287]Modify the default value of changelog-producer.compactio…

2023-03-02 Thread via GitHub


SteNicholas commented on code in PR #577:
URL: https://github.com/apache/flink-table-store/pull/577#discussion_r1124073557


##
docs/content/docs/concepts/primary-key-table.md:
##
@@ -150,7 +150,7 @@ If your input can’t produce a complete changelog but you 
still want to get rid
 
 By specifying `'changelog-producer' = 'full-compaction'`, Table Store will 
compare the results between full compactions and produce the differences as 
changelog. The latency of changelog is affected by the frequency of full 
compactions.
 
-By specifying `changelog-producer.compaction-interval` table property (default 
value `30min`), users can define the maximum interval between two full 
compactions to ensure latency. This table property does not affect normal 
compactions and they may still be performed once in a while by writers to 
reduce reader costs.
+By specifying `changelog-producer.compaction-interval` table property (default 
value `0s`), users can define the maximum interval between two full compactions 
to ensure latency. This table property does not affect normal compactions and 
they may still be performed once in a while by writers to reduce reader costs.

Review Comment:
   Could this describe the default behavior for the default value of 
`changelog-producer.compaction-interval`, which default behavior is that each 
checkpoint have a full-compaction and generates a changelog? This could added 
into here or the description of the option 
`changelog-producer.compaction-interval`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-table-store] xuzhiwen1255 commented on a diff in pull request #577: [FLINK-31287]Modify the default value of changelog-producer.compactio…

2023-03-02 Thread via GitHub


xuzhiwen1255 commented on code in PR #577:
URL: https://github.com/apache/flink-table-store/pull/577#discussion_r1124074482


##
docs/content/docs/concepts/primary-key-table.md:
##
@@ -150,7 +150,7 @@ If your input can’t produce a complete changelog but you 
still want to get rid
 
 By specifying `'changelog-producer' = 'full-compaction'`, Table Store will 
compare the results between full compactions and produce the differences as 
changelog. The latency of changelog is affected by the frequency of full 
compactions.
 
-By specifying `changelog-producer.compaction-interval` table property (default 
value `30min`), users can define the maximum interval between two full 
compactions to ensure latency. This table property does not affect normal 
compactions and they may still be performed once in a while by writers to 
reduce reader costs.
+By specifying `changelog-producer.compaction-interval` table property (default 
value `0s`), users can define the maximum interval between two full compactions 
to ensure latency. This table property does not affect normal compactions and 
they may still be performed once in a while by writers to reduce reader costs.

Review Comment:
   @SteNicholas Thanks for the reminder, I was sloppy.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-table-store] SteNicholas commented on a diff in pull request #577: [FLINK-31287]Modify the default value of changelog-producer.compactio…

2023-03-02 Thread via GitHub


SteNicholas commented on code in PR #577:
URL: https://github.com/apache/flink-table-store/pull/577#discussion_r1124073557


##
docs/content/docs/concepts/primary-key-table.md:
##
@@ -150,7 +150,7 @@ If your input can’t produce a complete changelog but you 
still want to get rid
 
 By specifying `'changelog-producer' = 'full-compaction'`, Table Store will 
compare the results between full compactions and produce the differences as 
changelog. The latency of changelog is affected by the frequency of full 
compactions.
 
-By specifying `changelog-producer.compaction-interval` table property (default 
value `30min`), users can define the maximum interval between two full 
compactions to ensure latency. This table property does not affect normal 
compactions and they may still be performed once in a while by writers to 
reduce reader costs.
+By specifying `changelog-producer.compaction-interval` table property (default 
value `0s`), users can define the maximum interval between two full compactions 
to ensure latency. This table property does not affect normal compactions and 
they may still be performed once in a while by writers to reduce reader costs.

Review Comment:
   Could this describe the default behavior for the default value of 
`changelog-producer.compaction-interval`, which default behavior is that each 
checkpoint have a full-compaction and generates a changelog?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #575: [FLINK-31309] Delete DFS schema when create table failed in HiveCatalog

2023-03-02 Thread via GitHub


JingsongLi commented on code in PR #575:
URL: https://github.com/apache/flink-table-store/pull/575#discussion_r1124069564


##
flink-table-store-hive/flink-table-store-hive-catalog/src/main/java/org/apache/flink/table/store/hive/HiveCatalog.java:
##
@@ -241,6 +244,12 @@ public void createTable(Identifier identifier, Schema 
schema, boolean ignoreIfEx
 try {
 client.createTable(table);
 } catch (TException e) {
+Path path = getDataTableLocation(identifier);
+try {
+fileIO.delete(path, true);

Review Comment:
   Maybe introduce a method in `FileIO`, `deleteDirectoryQuietly`?



##
flink-table-store-hive/flink-table-store-hive-catalog/src/test/java/org/apache/flink/table/store/hive/HiveCatalogITCase.java:
##
@@ -527,6 +530,38 @@ public void testQuickPathInShowTables() throws Exception {
 Assert.assertEquals("[]", tables.toString());
 }
 
+@Test
+public void testCreateExistTableInHive() throws Exception {
+tEnv.executeSql(
+String.join(
+"\n",
+"CREATE CATALOG my_hive_custom_client WITH (",
+"  'type' = 'table-store',",
+"  'metastore' = 'hive',",
+"  'uri' = '',",
+"  'warehouse' = '" + path + "',",
+"  'metastore.client.class' = '"
++ TestHiveMetaStoreClient.class.getName()

Review Comment:
   Can you create a new class `FailMetaStoreClient`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-31287) Default value of 'changelog-producer.compaction-interval' can be zero

2023-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31287:
---
Labels: pull-request-available  (was: )

> Default value of 'changelog-producer.compaction-interval' can be zero
> -
>
> Key: FLINK-31287
> URL: https://issues.apache.org/jira/browse/FLINK-31287
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: xzw0223
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.4.0
>
>
> At present, the 30-minute interval is too conservative. We can set it to 0 by 
> default, so that each checkpoint will have a full-compaction and generate a 
> changelog.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-table-store] xuzhiwen1255 opened a new pull request, #577: [FLINK-31287]Modify the default value of changelog-producer.compactio…

2023-03-02 Thread via GitHub


xuzhiwen1255 opened a new pull request, #577:
URL: https://github.com/apache/flink-table-store/pull/577

   …n-interval to 0s


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-31144) Slow scheduling on large-scale batch jobs

2023-03-02 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-31144:
---

Assignee: Junrui Li

> Slow scheduling on large-scale batch jobs 
> --
>
> Key: FLINK-31144
> URL: https://issues.apache.org/jira/browse/FLINK-31144
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Julien Tournay
>Assignee: Junrui Li
>Priority: Major
> Attachments: flink-1.17-snapshot-1676473798013.nps, 
> image-2023-02-21-10-29-49-388.png
>
>
> When executing a complex job graph at high parallelism 
> `DefaultPreferredLocationsRetriever.getPreferredLocationsBasedOnInputs` can 
> get slow and cause long pauses where the JobManager becomes unresponsive and 
> all the taskmanagers just wait. I've attached a VisualVM snapshot to 
> illustrate the problem.[^flink-1.17-snapshot-1676473798013.nps]
> At Spotify we have complex jobs where this issue can cause batch "pause" of 
> 40+ minutes and make the overall execution 30% slower or more.
> More importantly this prevent us from running said jobs on larger cluster as 
> adding resources to the cluster worsen the issue.
> We have successfully tested a modified Flink version where 
> `DefaultPreferredLocationsRetriever.getPreferredLocationsBasedOnInputs` was 
> completely commented and simply returns an empty collection and confirmed it 
> solves the issue.
> In the same spirit as a recent change 
> ([https://github.com/apache/flink/blob/43f419d0eccba86ecc8040fa6f521148f1e358ff/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultPreferredLocationsRetriever.java#L98-L102)]
>  there could be a mechanism in place to detect when Flink run into this 
> specific issue and just skip the call to `getInputLocationFutures`  
> [https://github.com/apache/flink/blob/43f419d0eccba86ecc8040fa6f521148f1e358ff/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultPreferredLocationsRetriever.java#L105-L108.]
> I'm not familiar enough with the internals of Flink to propose a more 
> advanced fix, however it seems like a configurable threshold on the number of 
> consumer vertices above which the preferred location is not computed would 
> do. If this  solution is good enough, I'd be happy to submit a PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31144) Slow scheduling on large-scale batch jobs

2023-03-02 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696033#comment-17696033
 ] 

Zhu Zhu commented on FLINK-31144:
-

Thanks for volunteering! [~JunRuiLi]
I have assigned you the ticket.

> Slow scheduling on large-scale batch jobs 
> --
>
> Key: FLINK-31144
> URL: https://issues.apache.org/jira/browse/FLINK-31144
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Julien Tournay
>Assignee: Junrui Li
>Priority: Major
> Attachments: flink-1.17-snapshot-1676473798013.nps, 
> image-2023-02-21-10-29-49-388.png
>
>
> When executing a complex job graph at high parallelism 
> `DefaultPreferredLocationsRetriever.getPreferredLocationsBasedOnInputs` can 
> get slow and cause long pauses where the JobManager becomes unresponsive and 
> all the taskmanagers just wait. I've attached a VisualVM snapshot to 
> illustrate the problem.[^flink-1.17-snapshot-1676473798013.nps]
> At Spotify we have complex jobs where this issue can cause batch "pause" of 
> 40+ minutes and make the overall execution 30% slower or more.
> More importantly this prevent us from running said jobs on larger cluster as 
> adding resources to the cluster worsen the issue.
> We have successfully tested a modified Flink version where 
> `DefaultPreferredLocationsRetriever.getPreferredLocationsBasedOnInputs` was 
> completely commented and simply returns an empty collection and confirmed it 
> solves the issue.
> In the same spirit as a recent change 
> ([https://github.com/apache/flink/blob/43f419d0eccba86ecc8040fa6f521148f1e358ff/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultPreferredLocationsRetriever.java#L98-L102)]
>  there could be a mechanism in place to detect when Flink run into this 
> specific issue and just skip the call to `getInputLocationFutures`  
> [https://github.com/apache/flink/blob/43f419d0eccba86ecc8040fa6f521148f1e358ff/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultPreferredLocationsRetriever.java#L105-L108.]
> I'm not familiar enough with the internals of Flink to propose a more 
> advanced fix, however it seems like a configurable threshold on the number of 
> consumer vertices above which the preferred location is not computed would 
> do. If this  solution is good enough, I'd be happy to submit a PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31144) Slow scheduling on large-scale batch jobs

2023-03-02 Thread Junrui Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696030#comment-17696030
 ] 

Junrui Li commented on FLINK-31144:
---

Hi, [~zhuzh]. I'd like to fix this issue, could you help to assign this ticket 
to me? 

> Slow scheduling on large-scale batch jobs 
> --
>
> Key: FLINK-31144
> URL: https://issues.apache.org/jira/browse/FLINK-31144
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Julien Tournay
>Priority: Major
> Attachments: flink-1.17-snapshot-1676473798013.nps, 
> image-2023-02-21-10-29-49-388.png
>
>
> When executing a complex job graph at high parallelism 
> `DefaultPreferredLocationsRetriever.getPreferredLocationsBasedOnInputs` can 
> get slow and cause long pauses where the JobManager becomes unresponsive and 
> all the taskmanagers just wait. I've attached a VisualVM snapshot to 
> illustrate the problem.[^flink-1.17-snapshot-1676473798013.nps]
> At Spotify we have complex jobs where this issue can cause batch "pause" of 
> 40+ minutes and make the overall execution 30% slower or more.
> More importantly this prevent us from running said jobs on larger cluster as 
> adding resources to the cluster worsen the issue.
> We have successfully tested a modified Flink version where 
> `DefaultPreferredLocationsRetriever.getPreferredLocationsBasedOnInputs` was 
> completely commented and simply returns an empty collection and confirmed it 
> solves the issue.
> In the same spirit as a recent change 
> ([https://github.com/apache/flink/blob/43f419d0eccba86ecc8040fa6f521148f1e358ff/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultPreferredLocationsRetriever.java#L98-L102)]
>  there could be a mechanism in place to detect when Flink run into this 
> specific issue and just skip the call to `getInputLocationFutures`  
> [https://github.com/apache/flink/blob/43f419d0eccba86ecc8040fa6f521148f1e358ff/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultPreferredLocationsRetriever.java#L105-L108.]
> I'm not familiar enough with the internals of Flink to propose a more 
> advanced fix, however it seems like a configurable threshold on the number of 
> consumer vertices above which the preferred location is not computed would 
> do. If this  solution is good enough, I'd be happy to submit a PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-25986) Add FLIP-190 new API methods to python

2023-03-02 Thread Nicholas Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696029#comment-17696029
 ] 

Nicholas Jiang commented on FLINK-25986:


[~dianfu], help to assign to [~qingyue].

> Add FLIP-190 new API methods to python
> --
>
> Key: FLINK-25986
> URL: https://issues.apache.org/jira/browse/FLINK-25986
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python, Table SQL / API
>Reporter: Francesco Guardiani
>Assignee: Nicholas Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-31087) Introduce MergeIntoAction.

2023-03-02 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee closed FLINK-31087.

Fix Version/s: table-store-0.4.0
 Assignee: yuzelin
   Resolution: Fixed

master: 7080cfae9c9cce9db54b75e607a3d43cefc19ca7

> Introduce MergeIntoAction.
> --
>
> Key: FLINK-31087
> URL: https://issues.apache.org/jira/browse/FLINK-31087
> Project: Flink
>  Issue Type: New Feature
>  Components: Table Store
>Affects Versions: table-store-0.4.0
>Reporter: yuzelin
>Assignee: yuzelin
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.4.0
>
>
> This action simulates the 'MERGE INTO' syntax.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-table-store] JingsongLi merged pull request #540: [FLINK-31087] Introduce merge-into action

2023-03-02 Thread via GitHub


JingsongLi merged PR #540:
URL: https://github.com/apache/flink-table-store/pull/540


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-31312) EnableObjectReuse cause different behaviors

2023-03-02 Thread Jiang Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiang Xin updated FLINK-31312:
--
Description: 
I have the following test code which fails with the exception `Accessing a 
field by name is not supported in position-based field mode`, however, if I 
remove the `enableObjectReuse`, it works.
{code:java}
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);

// The test fails with enableObjectReuse
env.getConfig().enableObjectReuse();

final SourceFunction rowGenerator =
new SourceFunction() {
@Override
public final void run(SourceContext ctx) throws Exception {
Row row = new Row(1);
row.setField(0, "a");
ctx.collect(row);
}

@Override
public void cancel() {}
};

final RowTypeInfo typeInfo =
new RowTypeInfo(new TypeInformation[] {Types.STRING}, new String[] 
{"col1"});

DataStream dataStream = env.addSource(rowGenerator, typeInfo);

DataStream transformedDataStream =
dataStream.map(
(MapFunction) value -> 
Row.of(value.getField("col1")), typeInfo);

transformedDataStream.addSink(new PrintSinkFunction<>());
env.execute("Mini Test");
} {code}
The `SourceFunction` generates rows without field names, but the return type 
info is assigned by `env.addSource(rowGenerator, typeInfo)`.

With object-reuse enabled, rows would be passed to the MapFunction directly, so 
the exception raises. While if the object-reuse is disabled,  rows would be 
reconstructed and given field names when passing to the next operator and the 
test works well.

  was:
I have the following test code which works well, however, if I remove the 
`enableObjectReuse`, the test case would fail with the exception `Accessing a 
field by name is not supported in position-based field mode`.
{code:java}
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);

// The test fails with enableObjectReuse
env.getConfig().enableObjectReuse();

final SourceFunction rowGenerator =
new SourceFunction() {
@Override
public final void run(SourceContext ctx) throws Exception {
Row row = new Row(1);
row.setField(0, "a");
ctx.collect(row);
}

@Override
public void cancel() {}
};

final RowTypeInfo typeInfo =
new RowTypeInfo(new TypeInformation[] {Types.STRING}, new String[] 
{"col1"});

DataStream dataStream = env.addSource(rowGenerator, typeInfo);

DataStream transformedDataStream =
dataStream.map(
(MapFunction) value -> 
Row.of(value.getField("col1")), typeInfo);

transformedDataStream.addSink(new PrintSinkFunction<>());
env.execute("Mini Test");
} {code}
The `SourceFunction` generates rows without field names, but the return type 
info is assigned by `env.addSource(rowGenerator, typeInfo)`.

With object-reuse enabled, rows would be passed to the mapFunction directly, so 
the exception raises. While if the object-reuse is disabled,  rows would be 
reconstructed and given field names when passing to the next operator, so the 
test case works well.


> EnableObjectReuse cause different behaviors
> ---
>
> Key: FLINK-31312
> URL: https://issues.apache.org/jira/browse/FLINK-31312
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Reporter: Jiang Xin
>Priority: Major
>
> I have the following test code which fails with the exception `Accessing a 
> field by name is not supported in position-based field mode`, however, if I 
> remove the `enableObjectReuse`, it works.
> {code:java}
> public static void main(String[] args) throws Exception {
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.setParallelism(1);
> // The test fails with enableObjectReuse
> env.getConfig().enableObjectReuse();
> final SourceFunction rowGenerator =
> new SourceFunction() {
> @Override
> public final void run(SourceContext ctx) throws 
> Exception {
> Row row = new Row(1);
> row.setField(0, "a");
> ctx.collect(row);
> }
> @Override
> public void cancel() {}
> };
> final RowTypeInfo typeInfo =
>

[jira] [Commented] (FLINK-31211) Flink chk files not delete automic after 1.13

2023-03-02 Thread leo.zhi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696017#comment-17696017
 ] 

leo.zhi commented on FLINK-31211:
-

[~martijnvisser]  Hi Martijn, I upgrade to 1.16 now ,and use the claim 
confirguration, but it is not useful..

The checkpoint files are still retained.

I have no idea about it ,can you do me a favor?

Thanks

!image-2023-03-03-13-50-32-795.png!

!image-2023-03-03-13-49-07-262.png!

> Flink chk files not delete automic after 1.13
> -
>
> Key: FLINK-31211
> URL: https://issues.apache.org/jira/browse/FLINK-31211
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.0, 1.15.0, 1.16.0
> Environment: Flink 1.14
>Reporter: leo.zhi
>Priority: Major
> Attachments: 11.png, image-2023-02-28-15-07-38-155.png, 
> image-2023-02-28-15-07-51-406.png, image-2023-03-03-13-48-30-510.png, 
> image-2023-03-03-13-49-07-262.png, image-2023-03-03-13-50-32-795.png
>
>
> Checkpoint chk files can be deleted automic when the version of flink 1.13, 
> but we upgert to 1.14/1.15/1.16 , it failed, every chk file retained.
> chk-1
> chk-2
> chk-3
> ...
> By the way, we use flink on k8s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] fsk119 commented on pull request #22072: [FLINK-31092][sql-gateway][table-common] Fix Hive ITCase fail with OutOfMemoryError

2023-03-02 Thread via GitHub


fsk119 commented on PR #22072:
URL: https://github.com/apache/flink/pull/22072#issuecomment-1453011294

   I have verify the sql-gateway module in the jdk11 environment with command
   
   ```
   mvn clean verify -Pjava11
   ```
   
   and all tests pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-31211) Flink chk files not delete automic after 1.13

2023-03-02 Thread leo.zhi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leo.zhi updated FLINK-31211:

Attachment: image-2023-03-03-13-50-32-795.png

> Flink chk files not delete automic after 1.13
> -
>
> Key: FLINK-31211
> URL: https://issues.apache.org/jira/browse/FLINK-31211
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.0, 1.15.0, 1.16.0
> Environment: Flink 1.14
>Reporter: leo.zhi
>Priority: Major
> Attachments: 11.png, image-2023-02-28-15-07-38-155.png, 
> image-2023-02-28-15-07-51-406.png, image-2023-03-03-13-48-30-510.png, 
> image-2023-03-03-13-49-07-262.png, image-2023-03-03-13-50-32-795.png
>
>
> Checkpoint chk files can be deleted automic when the version of flink 1.13, 
> but we upgert to 1.14/1.15/1.16 , it failed, every chk file retained.
> chk-1
> chk-2
> chk-3
> ...
> By the way, we use flink on k8s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31211) Flink chk files not delete automic after 1.13

2023-03-02 Thread leo.zhi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leo.zhi updated FLINK-31211:

Attachment: image-2023-03-03-13-49-07-262.png

> Flink chk files not delete automic after 1.13
> -
>
> Key: FLINK-31211
> URL: https://issues.apache.org/jira/browse/FLINK-31211
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.0, 1.15.0, 1.16.0
> Environment: Flink 1.14
>Reporter: leo.zhi
>Priority: Major
> Attachments: 11.png, image-2023-02-28-15-07-38-155.png, 
> image-2023-02-28-15-07-51-406.png, image-2023-03-03-13-48-30-510.png, 
> image-2023-03-03-13-49-07-262.png
>
>
> Checkpoint chk files can be deleted automic when the version of flink 1.13, 
> but we upgert to 1.14/1.15/1.16 , it failed, every chk file retained.
> chk-1
> chk-2
> chk-3
> ...
> By the way, we use flink on k8s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31211) Flink chk files not delete automic after 1.13

2023-03-02 Thread leo.zhi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leo.zhi updated FLINK-31211:

Attachment: image-2023-03-03-13-48-30-510.png

> Flink chk files not delete automic after 1.13
> -
>
> Key: FLINK-31211
> URL: https://issues.apache.org/jira/browse/FLINK-31211
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.0, 1.15.0, 1.16.0
> Environment: Flink 1.14
>Reporter: leo.zhi
>Priority: Major
> Attachments: 11.png, image-2023-02-28-15-07-38-155.png, 
> image-2023-02-28-15-07-51-406.png, image-2023-03-03-13-48-30-510.png
>
>
> Checkpoint chk files can be deleted automic when the version of flink 1.13, 
> but we upgert to 1.14/1.15/1.16 , it failed, every chk file retained.
> chk-1
> chk-2
> chk-3
> ...
> By the way, we use flink on k8s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31312) EnableObjectReuse cause different behaviors

2023-03-02 Thread Jiang Xin (Jira)
Jiang Xin created FLINK-31312:
-

 Summary: EnableObjectReuse cause different behaviors
 Key: FLINK-31312
 URL: https://issues.apache.org/jira/browse/FLINK-31312
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Reporter: Jiang Xin


I have the following test code which works well, however, if I remove the 
`enableObjectReuse`, the test case would fail with the exception `Accessing a 
field by name is not supported in position-based field mode`.
{code:java}
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);

// The test fails with enableObjectReuse
env.getConfig().enableObjectReuse();

final SourceFunction rowGenerator =
new SourceFunction() {
@Override
public final void run(SourceContext ctx) throws Exception {
Row row = new Row(1);
row.setField(0, "a");
ctx.collect(row);
}

@Override
public void cancel() {}
};

final RowTypeInfo typeInfo =
new RowTypeInfo(new TypeInformation[] {Types.STRING}, new String[] 
{"col1"});

DataStream dataStream = env.addSource(rowGenerator, typeInfo);

DataStream transformedDataStream =
dataStream.map(
(MapFunction) value -> 
Row.of(value.getField("col1")), typeInfo);

transformedDataStream.addSink(new PrintSinkFunction<>());
env.execute("Mini Test");
} {code}
The `SourceFunction` generates rows without field names, but the return type 
info is assigned by `env.addSource(rowGenerator, typeInfo)`.

With object-reuse enabled, rows would be passed to the mapFunction directly, so 
the exception raises. While if the object-reuse is disabled,  rows would be 
reconstructed and given field names when passing to the next operator, so the 
test case works well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-31291) Document table.exec.sink.upsert-materialize to none

2023-03-02 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee closed FLINK-31291.

Resolution: Fixed

master: 2d1da3d8780b46740a1962e118a2dbd780791480

> Document table.exec.sink.upsert-materialize to none
> ---
>
> Key: FLINK-31291
> URL: https://issues.apache.org/jira/browse/FLINK-31291
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: Guojun Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.4.0
>
>
> The table store has the ability to correct disorder, such as:
> [https://nightlies.apache.org/flink/flink-table-store-docs-master/docs/concepts/primary-key-table/#sequence-field]
> But Flink SQL default sink materialize will result strange behavior, In 
> particular, write to the agg table of the fts.
> We should document this, set table.exec.sink.upsert-materialize to none 
> always, set 'sequence.field' to table in case of disorder.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-table-store] JingsongLi merged pull request #571: [FLINK-31291] Document "table.exec.sink.upsert-materialize" to "none" to avoid strange behavior

2023-03-02 Thread via GitHub


JingsongLi merged PR #571:
URL: https://github.com/apache/flink-table-store/pull/571


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-31311) Supports Bounded Watermark streaming read

2023-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31311:
---
Labels: pull-request-available  (was: )

> Supports Bounded Watermark streaming read
> -
>
> Key: FLINK-31311
> URL: https://issues.apache.org/jira/browse/FLINK-31311
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.4.0
>
>
> There are some bound stream scenarios that require that stream reading can be 
> ended. Generally speaking, the end event time is the better.
> So in this ticket, supports writing the watermark to the snapshot and can 
> specify the ending watermark when reading the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-table-store] JingsongLi opened a new pull request, #576: [FLINK-31311] Supports Bounded Watermark streaming read

2023-03-02 Thread via GitHub


JingsongLi opened a new pull request, #576:
URL: https://github.com/apache/flink-table-store/pull/576

   There are some bound stream scenarios that require that stream reading can 
be ended. Generally speaking, the end event time is the better.
   
   So in this ticket, supports writing the watermark to the snapshot and can 
specify the ending watermark when reading the stream.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-table-store] JingsongLi merged pull request #574: [hotfix] Remove ignoreEmptyCommit in StreamTableCommit

2023-03-02 Thread via GitHub


JingsongLi merged PR #574:
URL: https://github.com/apache/flink-table-store/pull/574


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-31311) Supports Bounded Watermark streaming read

2023-03-02 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-31311:


 Summary: Supports Bounded Watermark streaming read
 Key: FLINK-31311
 URL: https://issues.apache.org/jira/browse/FLINK-31311
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: table-store-0.4.0


There are some bound stream scenarios that require that stream reading can be 
ended. Generally speaking, the end event time is the better.

So in this ticket, supports writing the watermark to the snapshot and can 
specify the ending watermark when reading the stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31309) Rollback DFS schema if hive sync fail in HiveCatalog.createTable

2023-03-02 Thread Shammon (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696007#comment-17696007
 ] 

Shammon commented on FLINK-31309:
-

[~lzljs3620320] Please help to assign to me, thanks

> Rollback DFS schema if hive sync fail in HiveCatalog.createTable
> 
>
> Key: FLINK-31309
> URL: https://issues.apache.org/jira/browse/FLINK-31309
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.4.0
>
>
> Avoid schema residue on DFS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] TanYuxin-tyx commented on a diff in pull request #21937: [FLINK-30983][runtime] Support configured ssl algorithms for external REST SSL

2023-03-02 Thread via GitHub


TanYuxin-tyx commented on code in PR #21937:
URL: https://github.com/apache/flink/pull/21937#discussion_r1124026276


##
flink-runtime/src/test/java/org/apache/flink/runtime/net/SSLUtilsTest.java:
##
@@ -408,20 +433,19 @@ public void testCreateSSLEngineFactory() throws Exception 
{
 final SslHandler sslHandler =
 
serverSSLHandlerFactory.createNettySSLHandler(UnpooledByteBufAllocator.DEFAULT);
 
-assertEquals(expectedSslProtocols.length, 
sslHandler.engine().getEnabledProtocols().length);
-assertThat(
-sslHandler.engine().getEnabledProtocols(),
-arrayContainingInAnyOrder(expectedSslProtocols));
+assertThat(sslHandler.engine().getEnabledProtocols().length)
+.isEqualTo(expectedSslProtocols.length);
+
assertThat(sslHandler.engine().getEnabledProtocols()).contains(expectedSslProtocols);
 
-assertEquals(sslAlgorithms.length, 
sslHandler.engine().getEnabledCipherSuites().length);
-assertThat(
-sslHandler.engine().getEnabledCipherSuites(),
-arrayContainingInAnyOrder(sslAlgorithms));
+assertThat(sslHandler.engine().getEnabledCipherSuites().length)
+.isEqualTo(sslAlgorithms.length);

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] TanYuxin-tyx commented on a diff in pull request #21937: [FLINK-30983][runtime] Support configured ssl algorithms for external REST SSL

2023-03-02 Thread via GitHub


TanYuxin-tyx commented on code in PR #21937:
URL: https://github.com/apache/flink/pull/21937#discussion_r1124026035


##
flink-runtime/src/test/java/org/apache/flink/runtime/net/SSLUtilsTest.java:
##
@@ -432,21 +456,21 @@ public void testInvalidFingerprintParsing() throws 
Exception {
 SSLUtils.createInternalServerSSLEngineFactory(config);
 fail("expected exception");

Review Comment:
   Fixed, removed the fail method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] TanYuxin-tyx commented on pull request #21937: [FLINK-30983][runtime] Support configured ssl algorithms for external REST SSL

2023-03-02 Thread via GitHub


TanYuxin-tyx commented on PR #21937:
URL: https://github.com/apache/flink/pull/21937#issuecomment-1452972780

   Thanks for reviewing the change @reswqa . The IT case 
RestServerEndpointITCase#setup has covered the modified method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] TanYuxin-tyx commented on pull request #21937: [FLINK-30983][runtime] Support configured ssl algorithms for external REST SSL

2023-03-02 Thread via GitHub


TanYuxin-tyx commented on PR #21937:
URL: https://github.com/apache/flink/pull/21937#issuecomment-1452972416

   > Thanks @TanYuxin-tyx, this changes overall look good to me. I only left 
some comments about tests, PTAL~ BTW, I think we also need to introduce an IT 
case for this, maybe `RestServerSSLAuthITCase` is a good place.
   Thanks for reviewing the change. @reswqa . The IT case 
RestServerEndpointITCase#setup has covered the modified method.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-31309) Rollback DFS schema if hive sync fail in HiveCatalog.createTable

2023-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31309:
---
Labels: pull-request-available  (was: )

> Rollback DFS schema if hive sync fail in HiveCatalog.createTable
> 
>
> Key: FLINK-31309
> URL: https://issues.apache.org/jira/browse/FLINK-31309
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.4.0
>
>
> Avoid schema residue on DFS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-table-store] FangYongs opened a new pull request, #575: [FLINK-31309] Delete DFS schema when create table failed in HiveCatalog

2023-03-02 Thread via GitHub


FangYongs opened a new pull request, #575:
URL: https://github.com/apache/flink-table-store/pull/575

   Delete DFS schema when create table failed in HiveCatalog


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] Vancior commented on a diff in pull request #22003: [FLINK-31185][Python] Support side-output in broadcast processing

2023-03-02 Thread via GitHub


Vancior commented on code in PR #22003:
URL: https://github.com/apache/flink/pull/22003#discussion_r1124011456


##
flink-python/pyflink/datastream/tests/test_data_stream.py:
##
@@ -589,6 +589,40 @@ def process_element2(self, value, ctx: 
'CoProcessFunction.Context'):
 side_expected = ['0', '0', '1', '1', '2', '3']
 self.assert_equals_sorted(side_expected, side_sink.get_results())
 
+def test_co_broadcast_side_output(self):
+tag = OutputTag("side", Types.INT())
+
+class MyBroadcastProcessFunction(BroadcastProcessFunction):
+
+def process_element(self, value, ctx):
+yield value[0]
+yield tag, value[1]
+
+def process_broadcast_element(self, value, ctx):
+yield value[1]
+yield tag, value[0]
+
+self.env.set_parallelism(2)

Review Comment:
   This is kind of an "explict reminder" that tells we expect the output result 
should match parallelism=2 with some elements duplicated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] Vancior commented on a diff in pull request #22003: [FLINK-31185][Python] Support side-output in broadcast processing

2023-03-02 Thread via GitHub


Vancior commented on code in PR #22003:
URL: https://github.com/apache/flink/pull/22003#discussion_r1124010607


##
flink-python/src/main/java/org/apache/flink/streaming/api/transformations/python/DelegateOperatorTransformation.java:
##
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.api.transformations.python;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.python.env.PythonEnvironmentManager;
+import 
org.apache.flink.streaming.api.functions.python.DataStreamPythonFunctionInfo;
+import org.apache.flink.streaming.api.operators.SimpleOperatorFactory;
+import 
org.apache.flink.streaming.api.operators.python.AbstractPythonFunctionOperator;
+import 
org.apache.flink.streaming.api.operators.python.DataStreamPythonFunctionOperator;
+import org.apache.flink.util.OutputTag;
+
+import javax.annotation.Nullable;
+
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * For those {@link org.apache.flink.api.dag.Transformation} that don't have 
an operator entity,
+ * {@link DelegateOperatorTransformation} provides a {@link 
SimpleOperatorFactory} containing a
+ * {@link DelegateOperator} , which can hold special configurations during 
transformation
+ * preprocessing for Python jobs, and later be queried at translation stage. 
Currently, those
+ * configurations include {@link OutputTag}s, {@code numPartitions} and 
general {@link
+ * Configuration}.
+ */
+public interface DelegateOperatorTransformation {
+
+SimpleOperatorFactory getOperatorFactory();
+
+static void configureDelegatedOperator(
+DelegateOperatorTransformation transformation,
+AbstractPythonFunctionOperator operator) {
+DelegateOperator delegateOperator =
+(DelegateOperator) 
transformation.getOperatorFactory().getOperator();
+
+
operator.getConfiguration().addAll(delegateOperator.getConfiguration());
+
+if (operator instanceof DataStreamPythonFunctionOperator) {
+DataStreamPythonFunctionOperator dataStreamOperator =
+(DataStreamPythonFunctionOperator) operator;
+
dataStreamOperator.addSideOutputTags(delegateOperator.getSideOutputTags());
+if (delegateOperator.getNumPartitions() != null) {
+
dataStreamOperator.setNumPartitions(delegateOperator.getNumPartitions());
+}
+}
+}
+
+/**
+ * {@link DelegateOperator} holds configurations, e.g. {@link OutputTag}s, 
which will be applied
+ * to the actual python operator at translation stage.
+ */
+class DelegateOperator extends AbstractPythonFunctionOperator
+implements DataStreamPythonFunctionOperator {
+
+private final Map> sideOutputTags = new 
HashMap<>();
+private @Nullable Integer numPartitions = null;
+
+public DelegateOperator() {
+super(new Configuration());
+}
+
+@Override
+public void addSideOutputTags(Collection> outputTags) {
+for (OutputTag outputTag : outputTags) {
+sideOutputTags.put(outputTag.getId(), outputTag);
+}
+}
+
+@Override
+public Collection> getSideOutputTags() {
+return sideOutputTags.values();
+}
+
+@Override
+public void setNumPartitions(int numPartitions) {
+this.numPartitions = numPartitions;
+}
+
+@Nullable
+public Integer getNumPartitions() {
+return numPartitions;
+}
+
+@Override
+public TypeInformation getProducedType() {
+throw new RuntimeException("This should not be invoked on a 
DelegateOperator!");
+}
+
+@Override
+public DataStreamPythonFunctionInfo getPythonFunctionInfo() {
+throw new RuntimeException("This should not be invoked on a 
DelegateOperator!");
+}
+
+@Override
+public  DataStreamPythonFunctionOperator copy(
+DataStreamPythonFunctionInfo pythonFunctionInfo,
+TypeInformation 

[GitHub] [flink] Aitozi commented on pull request #22069: [FLINK-31296][table-planner] Add JoinConditionEqualityTransferRule to…

2023-03-02 Thread via GitHub


Aitozi commented on PR #22069:
URL: https://github.com/apache/flink/pull/22069#issuecomment-1452944267

   @godfreyhe can you help review this pr ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-table-store] JingsongLi merged pull request #573: [hotfix] Fix ChangelogWithKeyFileStoreTableTest#testAggMergeFunc

2023-03-02 Thread via GitHub


JingsongLi merged PR #573:
URL: https://github.com/apache/flink-table-store/pull/573


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-table-store] JingsongLi commented on pull request #573: [hotfix] Fix ChangelogWithKeyFileStoreTableTest#testAggMergeFunc

2023-03-02 Thread via GitHub


JingsongLi commented on PR #573:
URL: 
https://github.com/apache/flink-table-store/pull/573#issuecomment-1452939890

   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-31249) Checkpoint timeout mechanism fails when finalizeCheckpoint is stuck

2023-03-02 Thread renxiang zhou (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695993#comment-17695993
 ] 

renxiang zhou commented on FLINK-31249:
---

[~roman] 

When it takes too long to finalize the last checkpoint, should we cancel the 
last checkpoint by checkpoint timeout function?

Currently I haven't observed this issue in non-mocked setup, but I think it 
could happen when finalizing checkpoint gets stuck in writing metadata to DFS 
due to a DFS failure, like namenode failure of HDFS.

> Checkpoint timeout mechanism fails when finalizeCheckpoint is stuck
> ---
>
> Key: FLINK-31249
> URL: https://issues.apache.org/jira/browse/FLINK-31249
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.6, 1.16.0
>Reporter: renxiang zhou
>Priority: Major
> Fix For: 1.18.0
>
> Attachments: image-2023-02-28-11-25-03-637.png, 
> image-2023-02-28-12-04-35-178.png, image-2023-02-28-12-17-19-607.png
>
>
> When jobmanager receives all ACKs of tasks, it will finalize the pending 
> checkpoint to a completed checkpoint. Currently JM finalizes the pending 
> checkpoint with holding the checkpoint coordinator lock.
> When a DFS failure occurs, the {{jobmanager-future}} thread may be blocked at 
> finalizing the pending checkpoint.
> !image-2023-02-28-12-17-19-607.png|width=1010,height=244!
> And then the next checkpoint is triggered, the {{Checkpoint Timer}} thread 
> waits for the lock to be released. 
> !image-2023-02-28-11-25-03-637.png|width=1144,height=248!
> If the previous checkpoint times out, the {{Checkpoint Timer}} will not 
> execute the timeout event since it is blocked at waiting for the lock. As a 
> result, the previous checkpoint cannot be cancelled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31307) RocksDB:java.lang.UnsatisfiedLinkError

2023-03-02 Thread Wujunzhe (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wujunzhe updated FLINK-31307:
-
Attachment: image-2023-03-03-11-28-54-674.png

> RocksDB:java.lang.UnsatisfiedLinkError
> --
>
> Key: FLINK-31307
> URL: https://issues.apache.org/jira/browse/FLINK-31307
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.5
>Reporter: Wujunzhe
>Priority: Major
> Attachments: image-2023-03-03-10-27-04-810.png, 
> image-2023-03-03-10-29-27-477.png, image-2023-03-03-11-28-54-674.png
>
>
> when i use rocksdb like 
> !image-2023-03-03-10-27-04-810.png!
>  
>  I got an unsolvable exception. 
> !image-2023-03-03-10-29-27-477.png!
>  What can I do to troubleshoot or solve this problem? 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31307) RocksDB:java.lang.UnsatisfiedLinkError

2023-03-02 Thread Wujunzhe (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695990#comment-17695990
 ] 

Wujunzhe commented on FLINK-31307:
--

How to check my jvm process memory left, what I can be sure is that there is 
enough memory on the virtual machine, and there is also enough memory for jm 
and tm. The same configuration parameters can run normally on our test server, 
but not on the development server. The only difference is that one is a 
physical machine and the other is a virtual machine. [~yunta] 

!image-2023-03-03-11-28-54-674.png!

> RocksDB:java.lang.UnsatisfiedLinkError
> --
>
> Key: FLINK-31307
> URL: https://issues.apache.org/jira/browse/FLINK-31307
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.5
>Reporter: Wujunzhe
>Priority: Major
> Attachments: image-2023-03-03-10-27-04-810.png, 
> image-2023-03-03-10-29-27-477.png, image-2023-03-03-11-28-54-674.png
>
>
> when i use rocksdb like 
> !image-2023-03-03-10-27-04-810.png!
>  
>  I got an unsolvable exception. 
> !image-2023-03-03-10-29-27-477.png!
>  What can I do to troubleshoot or solve this problem? 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-table-store] tsreaper opened a new pull request, #573: [hotfix] Fix ChangelogWithKeyFileStoreTableTest#testAggMergeFunc

2023-03-02 Thread via GitHub


tsreaper opened a new pull request, #573:
URL: https://github.com/apache/flink-table-store/pull/573

   Currently the result of 
`ChangelogWithKeyFileStoreTableTest#testAggMergeFunc` in branch release-0.3 is 
not correct. This PR fixes the testing method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-31310) Force clear directory no matter what situation in HiveCatalog.dropTable

2023-03-02 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-31310:


 Summary: Force clear directory no matter what situation in 
HiveCatalog.dropTable
 Key: FLINK-31310
 URL: https://issues.apache.org/jira/browse/FLINK-31310
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
 Fix For: table-store-0.4.0


Currently, if no table in hive, will not clear the table.

We should clear table directory in any situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] lsyldliu commented on pull request #22031: [FLINK-31239][hive] Fix native sum function can't get the corrected value when the argument type is string

2023-03-02 Thread via GitHub


lsyldliu commented on PR #22031:
URL: https://github.com/apache/flink/pull/22031#issuecomment-1452896206

   @godfreyhe Thanks for your reviewing, I've addressed your comments. Could 
you help retain two commits when you merge?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-31309) Rollback DFS schema if hive sync fail in HiveCatalog.createTable

2023-03-02 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-31309:


 Summary: Rollback DFS schema if hive sync fail in 
HiveCatalog.createTable
 Key: FLINK-31309
 URL: https://issues.apache.org/jira/browse/FLINK-31309
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
 Fix For: table-store-0.4.0


Avoid schema residue on DFS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-31294) CommitterOperator forgot to close Committer when closing.

2023-03-02 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee reassigned FLINK-31294:


Assignee: Ming Li

> CommitterOperator forgot to close Committer when closing.
> -
>
> Key: FLINK-31294
> URL: https://issues.apache.org/jira/browse/FLINK-31294
> Project: Flink
>  Issue Type: Bug
>  Components: Table Store
>Reporter: Ming Li
>Assignee: Ming Li
>Priority: Major
>
> {{CommitterOperator}} does not close the {{Committer}} when it closes, which 
> may lead to resource leaks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-31287) Default value of 'changelog-producer.compaction-interval' can be zero

2023-03-02 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee reassigned FLINK-31287:


Assignee: xzw0223

> Default value of 'changelog-producer.compaction-interval' can be zero
> -
>
> Key: FLINK-31287
> URL: https://issues.apache.org/jira/browse/FLINK-31287
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: xzw0223
>Priority: Major
> Fix For: table-store-0.4.0
>
>
> At present, the 30-minute interval is too conservative. We can set it to 0 by 
> default, so that each checkpoint will have a full-compaction and generate a 
> changelog.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31287) Default value of 'changelog-producer.compaction-interval' can be zero

2023-03-02 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695980#comment-17695980
 ] 

Jingsong Lee commented on FLINK-31287:
--

[~xzw0223] Thanks!

> Default value of 'changelog-producer.compaction-interval' can be zero
> -
>
> Key: FLINK-31287
> URL: https://issues.apache.org/jira/browse/FLINK-31287
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: xzw0223
>Priority: Major
> Fix For: table-store-0.4.0
>
>
> At present, the 30-minute interval is too conservative. We can set it to 0 by 
> default, so that each checkpoint will have a full-compaction and generate a 
> changelog.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31292) User HadoopUtils to get Configuration in CatalogContext

2023-03-02 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695979#comment-17695979
 ] 

Jingsong Lee commented on FLINK-31292:
--

[~liyubin117]  Thanks!

> User HadoopUtils to get Configuration in CatalogContext
> ---
>
> Key: FLINK-31292
> URL: https://issues.apache.org/jira/browse/FLINK-31292
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: Yubin Li
>Priority: Major
> Fix For: table-store-0.4.0
>
>
> At present, if HadoopConf is not passed in the CatalogContext, a new 
> HadoopConf will be directly generated, which may not have the required 
> parameters.
> We can refer to HadoopUtils to obtain hadoopConf from the configuration and 
> environment variables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-31292) User HadoopUtils to get Configuration in CatalogContext

2023-03-02 Thread Jingsong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee reassigned FLINK-31292:


Assignee: Yubin Li

> User HadoopUtils to get Configuration in CatalogContext
> ---
>
> Key: FLINK-31292
> URL: https://issues.apache.org/jira/browse/FLINK-31292
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: Yubin Li
>Priority: Major
> Fix For: table-store-0.4.0
>
>
> At present, if HadoopConf is not passed in the CatalogContext, a new 
> HadoopConf will be directly generated, which may not have the required 
> parameters.
> We can refer to HadoopUtils to obtain hadoopConf from the configuration and 
> environment variables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31294) CommitterOperator forgot to close Committer when closing.

2023-03-02 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695978#comment-17695978
 ] 

Jingsong Lee commented on FLINK-31294:
--

[~Ming Li] Thanks!

> CommitterOperator forgot to close Committer when closing.
> -
>
> Key: FLINK-31294
> URL: https://issues.apache.org/jira/browse/FLINK-31294
> Project: Flink
>  Issue Type: Bug
>  Components: Table Store
>Reporter: Ming Li
>Assignee: Ming Li
>Priority: Major
>
> {{CommitterOperator}} does not close the {{Committer}} when it closes, which 
> may lead to resource leaks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31307) RocksDB:java.lang.UnsatisfiedLinkError

2023-03-02 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695976#comment-17695976
 ] 

Yun Tang commented on FLINK-31307:
--

This is the basic usage of RocksDB TTL and the method existed in 
[FlinkCompactionFilter|https://github.com/ververica/frocksdb/blob/d6f50f33064f1d24480dfb3c586a7bd7a7dbac01/java/src/main/java/org/rocksdb/FlinkCompactionFilter.java#L34
 ].

>From my understanding, this shall not happen, did your JVM process kick some 
>classes to disk due to very poor memory left?

> RocksDB:java.lang.UnsatisfiedLinkError
> --
>
> Key: FLINK-31307
> URL: https://issues.apache.org/jira/browse/FLINK-31307
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.14.5
>Reporter: Wujunzhe
>Priority: Major
> Attachments: image-2023-03-03-10-27-04-810.png, 
> image-2023-03-03-10-29-27-477.png
>
>
> when i use rocksdb like 
> !image-2023-03-03-10-27-04-810.png!
>  
>  I got an unsolvable exception. 
> !image-2023-03-03-10-29-27-477.png!
>  What can I do to troubleshoot or solve this problem? 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] slfan1989 commented on a diff in pull request #22028: [FLINK-31230] Improve YarnClusterDescriptor memory unit display.

2023-03-02 Thread via GitHub


slfan1989 commented on code in PR #22028:
URL: https://github.com/apache/flink/pull/22028#discussion_r1123982178


##
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java:
##
@@ -1441,13 +1444,17 @@ public String getClusterDescription() {
 totalMemory += res.getMemory();
 totalCores += res.getVirtualCores();
 ps.format(format, "NodeID", rep.getNodeId());
-ps.format(format, "Memory", res.getMemory() + " MB");
+ps.format(format, "Memory", getDisplayMemory(res.getMemory()));

Review Comment:
   I will modify the code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] slfan1989 commented on a diff in pull request #22028: [FLINK-31230] Improve YarnClusterDescriptor memory unit display.

2023-03-02 Thread via GitHub


slfan1989 commented on code in PR #22028:
URL: https://github.com/apache/flink/pull/22028#discussion_r1123981772


##
flink-yarn/src/test/java/org/apache/flink/yarn/YarnClusterDescriptorTest.java:
##
@@ -912,4 +913,17 @@ private Map getTestMasterEnv(
 appId.toString());
 }
 }
+
+@Test
+public void testByteDesc() {
+long bytesInMB = 1024 * 1024;
+// 128 MB
+assertThat(StringUtils.byteDesc(bytesInMB * 128)).isEqualTo("128 MB");
+// 512 MB
+assertThat(StringUtils.byteDesc(bytesInMB * 512)).isEqualTo("512 MB");
+// 1 GB
+assertThat(StringUtils.byteDesc(bytesInMB * 1024)).isEqualTo("1 GB");
+// 128 GB
+assertThat(StringUtils.byteDesc(bytesInMB * 131072)).isEqualTo("128 
GB");
+}

Review Comment:
   Thanks for your suggestion! I will modify the code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-31298) ConnectionUtilsTest.testFindConnectingAddressWhenGetLocalHostThrows swallows IllegalArgumentException

2023-03-02 Thread Wencong Liu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695974#comment-17695974
 ] 

Wencong Liu commented on FLINK-31298:
-

Hello [~mapohl] , I'd like to take this ticket. SocketOptions.SO_TIMEOUT should 
be set to 0.

> ConnectionUtilsTest.testFindConnectingAddressWhenGetLocalHostThrows swallows 
> IllegalArgumentException
> -
>
> Key: FLINK-31298
> URL: https://issues.apache.org/jira/browse/FLINK-31298
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.17.0, 1.15.3, 1.16.1
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: starter, test-stability
>
> FLINK-24156 introduced {{NetUtils.acceptWithoutTimeout}} which caused the 
> test to print a the stacktrace of an {{IllegalArgumentException}}:
> {code}
> Exception in thread "Thread-0" java.lang.IllegalArgumentException: 
> serverSocket SO_TIMEOUT option must be 0
>   at 
> org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:138)
>   at 
> org.apache.flink.util.NetUtils.acceptWithoutTimeout(NetUtils.java:139)
>   at 
> org.apache.flink.runtime.net.ConnectionUtilsTest$1.run(ConnectionUtilsTest.java:83)
>   at java.lang.Thread.run(Thread.java:750)
> {code}
> This is also shown in the Maven output of CI runs and might cause confusion. 
> The test should be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] slfan1989 commented on a diff in pull request #22028: [FLINK-31230] Improve YarnClusterDescriptor memory unit display.

2023-03-02 Thread via GitHub


slfan1989 commented on code in PR #22028:
URL: https://github.com/apache/flink/pull/22028#discussion_r1123981772


##
flink-yarn/src/test/java/org/apache/flink/yarn/YarnClusterDescriptorTest.java:
##
@@ -912,4 +913,17 @@ private Map getTestMasterEnv(
 appId.toString());
 }
 }
+
+@Test
+public void testByteDesc() {
+long bytesInMB = 1024 * 1024;
+// 128 MB
+assertThat(StringUtils.byteDesc(bytesInMB * 128)).isEqualTo("128 MB");
+// 512 MB
+assertThat(StringUtils.byteDesc(bytesInMB * 512)).isEqualTo("512 MB");
+// 1 GB
+assertThat(StringUtils.byteDesc(bytesInMB * 1024)).isEqualTo("1 GB");
+// 128 GB
+assertThat(StringUtils.byteDesc(bytesInMB * 131072)).isEqualTo("128 
GB");
+}

Review Comment:
   Thank you for your suggestion!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-28691) Improve cache hit rate of generated class

2023-03-02 Thread Shammon (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695973#comment-17695973
 ] 

Shammon commented on FLINK-28691:
-

Hi [~jark] We found the metaspace fullgc problem in codegen and [~FrankZou] 
fixed it our internal branch. This may involve codegen and planner, what do you 
think of it? I found that other users also reported the same problem such as 
https://issues.apache.org/jira/browse/FLINK-31308

> Improve cache hit rate of generated class
> -
>
> Key: FLINK-28691
> URL: https://issues.apache.org/jira/browse/FLINK-28691
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Runtime
>Reporter: Zou
>Priority: Major
>
> In OLAP scenarios, compiling generated classes is very frequent, it will 
> consume a lot of CPU and large amount of generated classes will also takes up 
> a lot of space in metaspace, which will lead to frequent Full GC.
> As we use a self-incrementing counter in CodeGenUtils#newName, it means we 
> could not get the same generated class between two queries even when they are 
> exactly the same. Maybe we could share the same generated class between 
> different queries if they has the same logic, it will be good for job latency 
> and resource consumption. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31308) JobManager's metaspace out-of-memory when submit a flinksessionjobs

2023-03-02 Thread Shammon (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695968#comment-17695968
 ] 

Shammon commented on FLINK-31308:
-

I think it's a same kind of issue in 
https://issues.apache.org/jira/browse/FLINK-28691 cc [~FrankZou]

> JobManager's metaspace out-of-memory when submit a flinksessionjobs
> ---
>
> Key: FLINK-31308
> URL: https://issues.apache.org/jira/browse/FLINK-31308
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator, Table SQL / API
>Affects Versions: 1.16.1, kubernetes-operator-1.4.0
>Reporter: tanjialiang
>Priority: Major
> Attachments: image-2023-03-03-10-34-46-681.png
>
>
> Hello teams, when i try to recurring submit a flinksessionjobs by flink 
> operator, it will be make JobManager's metaspace OOM. My Job having some 
> flink-sql logic, it is the userclassloader didn't closed? Or may be beacuase 
> of flink-sql's codegen? By the way, it not appear when i using 
> flink-sql-gateway to submit.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-table-store] tsreaper opened a new pull request, #572: [FLINK-30608] support rename table

2023-03-02 Thread via GitHub


tsreaper opened a new pull request, #572:
URL: https://github.com/apache/flink-table-store/pull/572

   This PR is cherry-picked from #471.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-31308) JobManager's metaspace out-of-memory when submit a flinksessionjobs

2023-03-02 Thread tanjialiang (Jira)
tanjialiang created FLINK-31308:
---

 Summary: JobManager's metaspace out-of-memory when submit a 
flinksessionjobs
 Key: FLINK-31308
 URL: https://issues.apache.org/jira/browse/FLINK-31308
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator, Table SQL / API
Affects Versions: kubernetes-operator-1.4.0, 1.16.1
Reporter: tanjialiang
 Attachments: image-2023-03-03-10-34-46-681.png

Hello teams, when i try to recurring submit a flinksessionjobs by flink 
operator, it will be make JobManager's metaspace OOM. My Job having some 
flink-sql logic, it is the userclassloader didn't closed? Or may be beacuase of 
flink-sql's codegen? By the way, it not appear when i using flink-sql-gateway 
to submit.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31307) RocksDB:java.lang.UnsatisfiedLinkError

2023-03-02 Thread Wujunzhe (Jira)
Wujunzhe created FLINK-31307:


 Summary: RocksDB:java.lang.UnsatisfiedLinkError
 Key: FLINK-31307
 URL: https://issues.apache.org/jira/browse/FLINK-31307
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.14.5
Reporter: Wujunzhe
 Attachments: image-2023-03-03-10-27-04-810.png, 
image-2023-03-03-10-29-27-477.png

when i use rocksdb like 

!image-2023-03-03-10-27-04-810.png!

 

 I got an unsolvable exception. 

!image-2023-03-03-10-29-27-477.png!

 What can I do to troubleshoot or solve this problem? 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink] 1996fanrui commented on a diff in pull request #22028: [FLINK-31230] Improve YarnClusterDescriptor memory unit display.

2023-03-02 Thread via GitHub


1996fanrui commented on code in PR #22028:
URL: https://github.com/apache/flink/pull/22028#discussion_r1123963326


##
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java:
##
@@ -1441,13 +1444,17 @@ public String getClusterDescription() {
 totalMemory += res.getMemory();
 totalCores += res.getVirtualCores();
 ps.format(format, "NodeID", rep.getNodeId());
-ps.format(format, "Memory", res.getMemory() + " MB");
+ps.format(format, "Memory", getDisplayMemory(res.getMemory()));

Review Comment:
   I prefer using the `MemorySize  of flink`  instead of `StringUtils.byteDesc  
of hadoop`, because flink already has similar feature. WDYT?
   
   ```suggestion
   ps.format(format, "Memory", 
MemorySize.ofMebiBytes(128).toHumanReadableString()));
   ```
   
   And I have a demo here.
   
   
![image](https://user-images.githubusercontent.com/38427477/222614772-35e3c26d-3406-4172-ade6-5a62a465c654.png)
   



##
flink-yarn/src/test/java/org/apache/flink/yarn/YarnClusterDescriptorTest.java:
##
@@ -912,4 +913,17 @@ private Map getTestMasterEnv(
 appId.toString());
 }
 }
+
+@Test
+public void testByteDesc() {
+long bytesInMB = 1024 * 1024;
+// 128 MB
+assertThat(StringUtils.byteDesc(bytesInMB * 128)).isEqualTo("128 MB");
+// 512 MB
+assertThat(StringUtils.byteDesc(bytesInMB * 512)).isEqualTo("512 MB");
+// 1 GB
+assertThat(StringUtils.byteDesc(bytesInMB * 1024)).isEqualTo("1 GB");
+// 128 GB
+assertThat(StringUtils.byteDesc(bytesInMB * 131072)).isEqualTo("128 
GB");
+}

Review Comment:
   Flink unit test shouldn't check the feature of others services, such as: 
yarn, hdfs, zookeeper and jdk.
   
   Flink unit test should check the feature in the flink side.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink] lsyldliu commented on a diff in pull request #22031: [FLINK-31239][hive] Fix native sum function can't get the corrected value when the argument type is string

2023-03-02 Thread via GitHub


lsyldliu commented on code in PR #22031:
URL: https://github.com/apache/flink/pull/22031#discussion_r1123961288


##
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/functions/hive/HiveSumAggFunction.java:
##
@@ -88,27 +97,27 @@ public Expression[] retractExpressions() {
 
 @Override
 public Expression[] mergeExpressions() {
+Expression coalesceSum = coalesce(sum, zero);
 return new Expression[] {
 /* sum = */ ifThenElse(
 isNull(mergeOperand(sum)),
-sum,
-ifThenElse(
-isNull(sum),
-mergeOperand(sum),
-adjustedPlus(getResultType(), sum, 
mergeOperand(sum
+coalesceSum,
+adjustedPlus(getResultType(), coalesceSum, 
mergeOperand(sum))),
+and(isEmpty, mergeOperand(isEmpty))
 };
 }
 
 @Override
 public Expression getValueExpression() {
-return sum;
+return ifThenElse(isTrue(isEmpty), nullOf(getResultType()), sum);

Review Comment:
   Hive code as following: 
   ```
   @AggregationType(estimable = true)
   static class SumLongAgg extends SumAgg {
 @Override
 public int estimate() { return JavaDataModel.PRIMITIVES1 + 
JavaDataModel.PRIMITIVES2; }
   }
   
   @Override
   public AggregationBuffer getNewAggregationBuffer() throws HiveException {
 SumLongAgg result = new SumLongAgg();
 reset(result);
 return result;
   }
   
   @Override
   public void reset(AggregationBuffer agg) throws HiveException {
 SumLongAgg myagg = (SumLongAgg) agg;
 myagg.empty = true;
 myagg.sum = 0L;
 myagg.uniqueObjects = new HashSet();
   }
   
   private boolean warned = false;
   
   @Override
   public void iterate(AggregationBuffer agg, Object[] parameters) throws 
HiveException {
 assert (parameters.length == 1);
 try {
   if (isEligibleValue((SumLongAgg) agg, parameters[0])) {
 ((SumLongAgg)agg).empty = false;
 ((SumLongAgg)agg).sum += 
PrimitiveObjectInspectorUtils.getLong(parameters[0], inputOI);
   }
 } catch (NumberFormatException e) {
   if (!warned) {
 warned = true;
 LOG.warn(getClass().getSimpleName() + " "
 + StringUtils.stringifyException(e));
   }
 }
   }
   
   @Override
   public void merge(AggregationBuffer agg, Object partial) throws 
HiveException {
 if (partial != null) {
   SumLongAgg myagg = (SumLongAgg) agg;
   myagg.empty = false;
   if (isWindowingDistinct()) {
 throw new HiveException("Distinct windowing UDAF doesn't support 
merge and terminatePartial");
   } else {
   myagg.sum += PrimitiveObjectInspectorUtils.getLong(partial, 
inputOI);
   }
 }
   }
   
   @Override
   public Object terminate(AggregationBuffer agg) throws HiveException {
 SumLongAgg myagg = (SumLongAgg) agg;
 if (myagg.empty) {
   return null;
 }
 result.set(myagg.sum);
 return result;
   }
   ```
   It returns a null value if all elements are null.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-ml] jiangxin369 opened a new pull request, #219: Add Servable for Logistic Regression

2023-03-02 Thread via GitHub


jiangxin369 opened a new pull request, #219:
URL: https://github.com/apache/flink-ml/pull/219

   
   
   ## What is the purpose of the change
   
   Add Servable for Logistic Regression.
   
   ## Brief change log
   
   *(for example:)*
 - Adds Servable for Logistic Regression
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes)
 - If yes, how is the feature documented? (JavaDocs)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-31306) Add Servable for PipelineModel

2023-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31306:
---
Labels: pull-request-available  (was: )

> Add Servable for PipelineModel
> --
>
> Key: FLINK-31306
> URL: https://issues.apache.org/jira/browse/FLINK-31306
> Project: Flink
>  Issue Type: Improvement
>  Components: Library / Machine Learning
>Reporter: Jiang Xin
>Priority: Major
>  Labels: pull-request-available
> Fix For: ml-2.2.0
>
>
> Add servable for PipelineModel based on flip-289.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [flink-ml] jiangxin369 opened a new pull request, #218: [FLINK-31306] Add Servable for PipelineModel

2023-03-02 Thread via GitHub


jiangxin369 opened a new pull request, #218:
URL: https://github.com/apache/flink-ml/pull/218

   
   
   ## What is the purpose of the change
   
   Add Servable for PipelineModel.
   
   ## Brief change log
   
   *(for example:)*
 - Refactor code structure of utility classes.
 - Add class `Datatypes` to simplify the usage of data types.
 - Add `PipelineModelServable` to support chaining a sequence of Servables.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes)
 - If yes, how is the feature documented? (JavaDocs)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-31306) Add Servable for PipelineModel

2023-03-02 Thread Jiang Xin (Jira)
Jiang Xin created FLINK-31306:
-

 Summary: Add Servable for PipelineModel
 Key: FLINK-31306
 URL: https://issues.apache.org/jira/browse/FLINK-31306
 Project: Flink
  Issue Type: Improvement
  Components: Library / Machine Learning
Reporter: Jiang Xin
 Fix For: ml-2.2.0


Add servable for PipelineModel based on flip-289.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-30501) Update Flink build instruction to deprecate Java 8 instead of requiring Java 11

2023-03-02 Thread Dong Lin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695954#comment-17695954
 ] 

Dong Lin edited comment on FLINK-30501 at 3/3/23 1:28 AM:
--

[~martijnvisser] I agree we can say we recommend Java 11. How about saying 
this: "Flink requires **Java 8 (deprecated) or Java 11 (recommended)** to 
build"?

Here is the reason I am inclined to explicitly specify "Java 8 (deprecated)". 
Today, many users are still using Java 8 and it is reasonable for users to ask 
whether Flink supports Java 8. Instead of requiring users to ask this question 
on mailing list, we probably should provide answer on the Flink website so that 
it is easy for users to find the right answer by themselves.

I understand we want to encourage users to use Java 11. I just think we should 
provide the right information to users and let users make their own choice.


was (Author: lindong):
[~martijnvisser] I agree we can say we recommend Java 11. How about saying 
this: "Flink requires **Java 8 (deprecated) or Java 11** to build"?

Here is the reason I am inclined to explicitly specify "Java 8 (deprecated)". 
Today, many users are still using Java 8 and it is reasonable for users to ask 
whether Flink supports Java 8. Instead of requiring users to ask this question 
on mailing list, we probably should provide answer on the Flink website so that 
it is easy for users to find the right answer by themselves.

I understand we want to encourage users to use Java 11. I just think we should 
provide the right information to users and let users make their own choice.

> Update Flink build instruction to deprecate Java 8 instead of requiring Java 
> 11
> ---
>
> Key: FLINK-30501
> URL: https://issues.apache.org/jira/browse/FLINK-30501
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System / CI
>Reporter: Dong Lin
>Assignee: Dong Lin
>Priority: Major
>  Labels: pull-request-available
>
> Flink 1.15 and later versions require at least Java 11 to build from sources 
> [1], whereas the pom.xml specifies the source/target is 1.8. This 
> inconsistency confuses users.
> As mentioned in the FLINK-25247 title, the goal of that ticket is to "Inform 
> users about deprecation". It will be better to inform users that "Java 8 is 
> deprecated" instead of saying "Fink requires at least Java 11 to build", so 
> that users have the right information to make the right choice for themselves.
> Also note that Flink community is regularly running flink-ml benchmark for 
> both Java 8 and Java 11 [2], which suggests that we are practically ensuring 
> Java 8 is supported.
> [1] 
> [https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/flinkdev/building/]
> [2] 
> [http://codespeed.dak8s.net:8000/timeline/?ben=mapSink.F27_UNBOUNDED=2]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30501) Update Flink build instruction to deprecate Java 8 instead of requiring Java 11

2023-03-02 Thread Dong Lin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695955#comment-17695955
 ] 

Dong Lin commented on FLINK-30501:
--

Note that the latest [Spark version 
3.3.2|https://spark.apache.org/docs/latest/building-spark.html] still supports 
Java 8. And the [latest Kafka version 3.3.x|http://example.com] also supports 
Java 8. Both projects are widely used and they explicitly list the supported 
Java versions on their official doc website.

And it is explicitly mentioned on the Kafka website that "Java 8, Java 11, and 
Java 17 are supported. Note that Java 8 support has been deprecated since 
Apache Kafka 3.0 and will be removed in Apache Kafka 4.0".

Maybe we should follow their approach regarding whether to specify Java 8 
support and how to encourage users to use Java 11.

> Update Flink build instruction to deprecate Java 8 instead of requiring Java 
> 11
> ---
>
> Key: FLINK-30501
> URL: https://issues.apache.org/jira/browse/FLINK-30501
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System / CI
>Reporter: Dong Lin
>Assignee: Dong Lin
>Priority: Major
>  Labels: pull-request-available
>
> Flink 1.15 and later versions require at least Java 11 to build from sources 
> [1], whereas the pom.xml specifies the source/target is 1.8. This 
> inconsistency confuses users.
> As mentioned in the FLINK-25247 title, the goal of that ticket is to "Inform 
> users about deprecation". It will be better to inform users that "Java 8 is 
> deprecated" instead of saying "Fink requires at least Java 11 to build", so 
> that users have the right information to make the right choice for themselves.
> Also note that Flink community is regularly running flink-ml benchmark for 
> both Java 8 and Java 11 [2], which suggests that we are practically ensuring 
> Java 8 is supported.
> [1] 
> [https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/flinkdev/building/]
> [2] 
> [http://codespeed.dak8s.net:8000/timeline/?ben=mapSink.F27_UNBOUNDED=2]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   >