date:20220621

[jira] [Commented] (FLINK-28160) org.apache.flink.api.common.InvalidProgramException: Table program cannot be compiled.

2022-06-21 Thread Timo Walther (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557255#comment-17557255
 ] 

Timo Walther commented on FLINK-28160:
--

[~cjh] please add a reproducible example. Otherwise exceptions of these kind 
are hard to debug without original SQL query/expression?

> org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. 
> ---
>
> Key: FLINK-28160
> URL: https://issues.apache.org/jira/browse/FLINK-28160
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.15.0
> Environment: flink 1.15.0
>Reporter: cjh
>Priority: Major
>
> 2022-06-21 11:29:34,914 WARN  
> org.apache.flink.table.runtime.generated.GeneratedClass      [] - Failed to 
> compile split code, falling back to original code
> org.apache.flink.util.FlinkRuntimeException: 
> org.apache.flink.api.common.InvalidProgramException: {color:#FF}Table 
> program cannot be compiled{color}. This is a bug. Please file an issue.
>     at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:94)
>  ~[flink-table-runtime-1.15.0.jar:1.15.0]
>     at 
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:97)
>  ~[flink-table-runtime-1.15.0.jar:1.15.0]
>     at 
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:68)
>  ~[flink-table-runtime-1.15.0.jar:1.15.0]
>     at 
> org.apache.flink.table.runtime.operators.aggregate.MiniBatchLocalGroupAggFunction.open(MiniBatchLocalGroupAggFunction.java:59)
>  ~[flink-table-runtime-1.15.0.jar:1.15.0]
>     at 
> org.apache.flink.table.runtime.operators.bundle.AbstractMapBundleOperator.open(AbstractMapBundleOperator.java:82)
>  ~[flink-table-runtime-1.15.0.jar:1.15.0]
>     at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107)
>  ~[flink-dist-1.15.0.jar:1.15.0]
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:700)
>  ~[flink-dist-1.15.0.jar:1.15.0]
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>  ~[flink-dist-1.15.0.jar:1.15.0]
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:676)
>  ~[flink-dist-1.15.0.jar:1.15.0]
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:643)
>  ~[flink-dist-1.15.0.jar:1.15.0]
>     at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>  [flink-dist-1.15.0.jar:1.15.0]
>     at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917) 
> [flink-dist-1.15.0.jar:1.15.0]
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) 
> [flink-dist-1.15.0.jar:1.15.0]
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) 
> [flink-dist-1.15.0.jar:1.15.0]
>     at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
> Caused by: 
> org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  org.apache.flink.api.common.InvalidProgramException: Table program cannot be 
> compiled. This is a bug. Please file an issue.
>     at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
>  ~[flink-shaded-guava-30.1.1-jre-15.0.jar:30.1.1-jre-15.0]
>     at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache.get(LocalCache.java:3962)
>  ~[flink-shaded-guava-30.1.1-jre-15.0.jar:30.1.1-jre-15.0]
>     at 
> org.apache.flink.shaded.guava30.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4859)
>  ~[flink-shaded-guava-30.1.1-jre-15.0.jar:30.1.1-jre-15.0]
>     at 
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:92)
>  ~[flink-table-runtime-1.15.0.jar:1.15.0]
>     ... 14 more



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Closed] (FLINK-28148) Unable to load jar connector to a Python Table API app

2022-06-21 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-28148.
---
Fix Version/s: FLINK-28002
   Resolution: Duplicate

[~CrynetLogistics] Have closed this ticket as duplicate as it seems a bug and 
have been fixed in FLINK-28002. Feel free to reopen it if the fix in 
FLINK-28002 doesn't solve your issue.

> Unable to load jar connector to a Python Table API app
> --
>
> Key: FLINK-28148
> URL: https://issues.apache.org/jira/browse/FLINK-28148
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Connectors / Common, Table SQL / API
>Affects Versions: 1.16.0
>Reporter: Zichen Liu
>Priority: Major
>  Labels: connector, jar, python, table-api
> Fix For: FLINK-28002
>
>
> h2. Background
> User currently unable to build & install the latest PyFlink and then load 
> jars. The jar loading mechanism was introduced in FLINK-16943.
> h2. Reproduction steps
>  * Clone the latest Flink from the master branch.
>  * Follow the Flink [recommended 
> steps|https://nightlies.apache.org/flink/flink-docs-master/docs/flinkdev/building/]
>  to build Flink & install PyFlink. Notes: Tutorial recommended Maven 3.2.x, 
> Python 3.6-3.9, reproduced with: Maven 3.2.5, Python 3.7.
>  * Create a new Python Table API app that loads in a jar, similar to:
> {code:java}
> from pyflink.table import TableEnvironment, StreamTableEnvironment, 
> EnvironmentSettings
> env_settings = EnvironmentSettings.in_streaming_mode()
> t_env = StreamTableEnvironment.create(environment_settings=env_settings)
> t_env.get_config().set("pipeline.classpaths", "file:///path/to/your/jar.jar") 
> {code}
>  
>  * The following alternative way of loading jars produce a similar issue:
> {code:java}
> table_env.get_config().get_configuration().set_string("pipeline.jars", 
> "file:///path/to/your/jar.jar") {code}
>  
>  * The jar loaded here can be any jar, and the following message will appear:
> {code:java}
> Traceback (most recent call last):
>   File "pyflink_table_api_firehose.py", line 48, in 
> log_processing()
>   File "pyflink_table_api_firehose.py", line 14, in log_processing
> t_env.get_config().set("pipeline.classpaths", 
> "file:///home/YOUR_USER/pyflink-table-api/flink/flink-connectors/flink-sql-connector-aws-kinesis-firehose/target/flink-sql-connector-aws-kinesis-firehose-1.16-SNAPSHOT.jar")
>   File 
> "/home/YOUR_USER/.local/lib/python3.7/site-packages/pyflink/table/table_config.py",
>  line 109, in set
> add_jars_to_context_class_loader(value.split(";"))
>   File 
> "/home/YOUR_USER/.local/lib/python3.7/site-packages/pyflink/util/java_utils.py",
>  line 169, in add_jars_to_context_class_loader
> addURL.invoke(loader, to_jarray(get_gateway().jvm.Object, [url]))
>   File 
> "/home/YOUR_USER/.local/lib/python3.7/site-packages/py4j/java_gateway.py", 
> line 1322, in __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File 
> "/home/YOUR_USER/.local/lib/python3.7/site-packages/pyflink/util/exceptions.py",
>  line 146, in deco
> return f(*a, **kw)
>   File "/home/YOUR_USER/.local/lib/python3.7/site-packages/py4j/protocol.py", 
> line 328, in get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling o45.invoke.
> : java.lang.IllegalArgumentException: object is not an instance of declaring 
> class
>at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>at 
> org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>at 
> org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
>at 
> org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>at 
> org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
>at 
> org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
>

[jira] [Updated] (FLINK-28148) Unable to load jar connector to a Python Table API app

2022-06-21 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-28148:

Fix Version/s: (was: FLINK-28002)

> Unable to load jar connector to a Python Table API app
> --
>
> Key: FLINK-28148
> URL: https://issues.apache.org/jira/browse/FLINK-28148
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Connectors / Common, Table SQL / API
>Affects Versions: 1.16.0
>Reporter: Zichen Liu
>Priority: Major
>  Labels: connector, jar, python, table-api
>
> h2. Background
> User currently unable to build & install the latest PyFlink and then load 
> jars. The jar loading mechanism was introduced in FLINK-16943.
> h2. Reproduction steps
>  * Clone the latest Flink from the master branch.
>  * Follow the Flink [recommended 
> steps|https://nightlies.apache.org/flink/flink-docs-master/docs/flinkdev/building/]
>  to build Flink & install PyFlink. Notes: Tutorial recommended Maven 3.2.x, 
> Python 3.6-3.9, reproduced with: Maven 3.2.5, Python 3.7.
>  * Create a new Python Table API app that loads in a jar, similar to:
> {code:java}
> from pyflink.table import TableEnvironment, StreamTableEnvironment, 
> EnvironmentSettings
> env_settings = EnvironmentSettings.in_streaming_mode()
> t_env = StreamTableEnvironment.create(environment_settings=env_settings)
> t_env.get_config().set("pipeline.classpaths", "file:///path/to/your/jar.jar") 
> {code}
>  
>  * The following alternative way of loading jars produce a similar issue:
> {code:java}
> table_env.get_config().get_configuration().set_string("pipeline.jars", 
> "file:///path/to/your/jar.jar") {code}
>  
>  * The jar loaded here can be any jar, and the following message will appear:
> {code:java}
> Traceback (most recent call last):
>   File "pyflink_table_api_firehose.py", line 48, in 
> log_processing()
>   File "pyflink_table_api_firehose.py", line 14, in log_processing
> t_env.get_config().set("pipeline.classpaths", 
> "file:///home/YOUR_USER/pyflink-table-api/flink/flink-connectors/flink-sql-connector-aws-kinesis-firehose/target/flink-sql-connector-aws-kinesis-firehose-1.16-SNAPSHOT.jar")
>   File 
> "/home/YOUR_USER/.local/lib/python3.7/site-packages/pyflink/table/table_config.py",
>  line 109, in set
> add_jars_to_context_class_loader(value.split(";"))
>   File 
> "/home/YOUR_USER/.local/lib/python3.7/site-packages/pyflink/util/java_utils.py",
>  line 169, in add_jars_to_context_class_loader
> addURL.invoke(loader, to_jarray(get_gateway().jvm.Object, [url]))
>   File 
> "/home/YOUR_USER/.local/lib/python3.7/site-packages/py4j/java_gateway.py", 
> line 1322, in __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File 
> "/home/YOUR_USER/.local/lib/python3.7/site-packages/pyflink/util/exceptions.py",
>  line 146, in deco
> return f(*a, **kw)
>   File "/home/YOUR_USER/.local/lib/python3.7/site-packages/py4j/protocol.py", 
> line 328, in get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling o45.invoke.
> : java.lang.IllegalArgumentException: object is not an instance of declaring 
> class
>at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>at 
> org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>at 
> org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
>at 
> org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>at 
> org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
>at 
> org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
>at java.base/java.lang.Thread.run(Thread.java:829) {code}
>  
>  * Next do:
> {code:java}
> pip uninstall apache-flink
> pip install apache-flink{code}
> ...to downgrade it to 1.15 release.
> The loading of the jar should be successful.

[jira] [Assigned] (FLINK-28182) Support Avro generic record decoder in PyFlink

2022-06-21 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-28182:
---

Assignee: Juntao Hu

> Support Avro generic record decoder in PyFlink
> --
>
> Key: FLINK-28182
> URL: https://issues.apache.org/jira/browse/FLINK-28182
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Python
>Affects Versions: 1.15.0
>Reporter: Juntao Hu
>Assignee: Juntao Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Avro generic record decoder is useful for format like parquet-avro, which 
> enables PyFlink users read parquet files into python native objects within a 
> given avro schema.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (FLINK-28182) Support Avro generic record decoder in PyFlink

2022-06-21 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-28182:

Affects Version/s: (was: 1.15.0)

> Support Avro generic record decoder in PyFlink
> --
>
> Key: FLINK-28182
> URL: https://issues.apache.org/jira/browse/FLINK-28182
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Python
>Reporter: Juntao Hu
>Assignee: Juntao Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Avro generic record decoder is useful for format like parquet-avro, which 
> enables PyFlink users read parquet files into python native objects within a 
> given avro schema.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[GitHub] [flink] masteryhx commented on a diff in pull request #19679: [FLINK-23143][state/changelog] Support state migration for ChangelogS…

2022-06-21 Thread GitBox



masteryhx commented on code in PR #19679:
URL: https://github.com/apache/flink/pull/19679#discussion_r903278313


##
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java:
##
@@ -477,6 +478,50 @@ KeyGroupedInternalPriorityQueue create(
 }
 }
 
+@Override
+public  S upgradeKeyedState(
+TypeSerializer namespaceSerializer, StateDescriptor 
stateDescriptor)
+throws Exception {
+StateFactory stateFactory = getStateFactory(stateDescriptor);
+Tuple2> registerResult =
+tryRegisterKvStateInformation(stateDescriptor, 
namespaceSerializer, noTransform());
+
Preconditions.checkState(kvStateInformation.containsKey(stateDescriptor.getName()));
+kvStateInformation.computeIfPresent(
+stateDescriptor.getName(),
+(stateName, kvStateInfo) ->
+new RocksDbKvStateInfo(
+kvStateInfo.columnFamilyHandle,
+new RegisteredKeyValueStateBackendMetaInfo<>(
+kvStateInfo.metaInfo.snapshot(;
+return stateFactory.createState(
+stateDescriptor, registerResult, 
RocksDBKeyedStateBackend.this);

Review Comment:
   Thanks a lot for the suggestion! I also think 2nd option seems better.
   Besides this problem, It could also solve the next problem you mentioned 
below.
   Because what I concerned in the next problem is how to update the serializer 
when one serializer COMPATIBLE_AS_IS with the other.
   I will try to modify it using the 2nd option.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] complone commented on pull request #161: [FLINK-27845] support Column Type Schema Convert

2022-06-21 Thread GitBox



complone commented on PR #161:
URL: 
https://github.com/apache/flink-table-store/pull/161#issuecomment-1162636161

   > 你好@complone 在提交 PR 之前，最好在 JIRA 中讨论清楚需要做什么，大致思路是什么。 
我会关闭这个，如果我们（你和我）取得进一步的进展，我们将继续处理代码。
   
   I will try to sort out some source code analysis articles about 
flink-table-store and discuss it later


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-28179) LeafPredicate accepts multiple literals

2022-06-21 Thread Jingsong Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee closed FLINK-28179.

Resolution: Fixed

master: cceca50c778a0c450f959d279c69f98a5960ad3f

> LeafPredicate accepts multiple literals
> ---
>
> Key: FLINK-28179
> URL: https://issues.apache.org/jira/browse/FLINK-28179
> Project: Flink
>  Issue Type: Improvement
>  Components: Table Store
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: table-store-0.2.0
>
>
> LeafPredicate should accept multiple literals for `IN` and other functions.
> So in this Jira, do some refactor:
>  * Remove Literal class (value is not null). But the literal can be null in 
> `IN`.
>  * Introduce LeafFunction and LeafBinaryFunction and LeafUnaryFunction.
>  * PredicateBuilder should not be a singleton class.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[GitHub] [flink-table-store] JingsongLi merged pull request #168: [FLINK-28179] LeafPredicate accepts multiple literals

2022-06-21 Thread GitBox



JingsongLi merged PR #168:
URL: https://github.com/apache/flink-table-store/pull/168


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-28189) The taskmanager dynamically created by flink native k8s creates pods directly, why not create a deployment?

2022-06-21 Thread Yang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557217#comment-17557217
 ] 

Yang Wang commented on FLINK-28189:
---

BTW, the JIRA ticket is not a good place to ask such questions. You could join 
the Flink slack workspace or post it to the user mail list.

> The taskmanager dynamically created by flink native k8s creates pods 
> directly, why not create a deployment?
> ---
>
> Key: FLINK-28189
> URL: https://issues.apache.org/jira/browse/FLINK-28189
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.14.4
>Reporter: hob007
>Priority: Major
>
> I am using Flink native k8s deployment. The pods is dynamically created. But 
> there's some problem with K8S cluster monitor. I can't monitor taskmanager 
> pods status, because these pods without deployment.
>  
> So my question is:
> The taskmanager dynamically created by flink native k8s creates pods 
> directly, why not create a deployment?
>  
> For example:
> Define a deployment and using replicates to control taskmanager's pods.
>  
> Chinese description reference:
> https://issues.apache.org/jira/browse/FLINK-28167



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-28189) The taskmanager dynamically created by flink native k8s creates pods directly, why not create a deployment?

2022-06-21 Thread Yang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557216#comment-17557216
 ] 

Yang Wang commented on FLINK-28189:
---

First, I believe you could also monitor a pod status while it is created by 
Flink ResourceManager, not a deployment. The monitoring should be irrelevant 
with how the pods are created.

 

Using naked pod has more advantages than k8s deployment for TaskManager.
 * Flink could delete specific naked TaskManager pod easily. And it is more 
difficult to do this in k8s deployment. Imagine that some TaskManager pods in 
session cluster are idle and we want to release them.
 * Flink now could not work normally when we allow the TaskManager pod could be 
restarted. For example, TaskManager with same hostname:port will fail to 
register to the ResourceManager with more than once.
 * For fine-grained resource management, we might have various TaskManager pods 
with different spec(2G, 4G, etc.).

> The taskmanager dynamically created by flink native k8s creates pods 
> directly, why not create a deployment?
> ---
>
> Key: FLINK-28189
> URL: https://issues.apache.org/jira/browse/FLINK-28189
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.14.4
>Reporter: hob007
>Priority: Major
>
> I am using Flink native k8s deployment. The pods is dynamically created. But 
> there's some problem with K8S cluster monitor. I can't monitor taskmanager 
> pods status, because these pods without deployment.
>  
> So my question is:
> The taskmanager dynamically created by flink native k8s creates pods 
> directly, why not create a deployment?
>  
> For example:
> Define a deployment and using replicates to control taskmanager's pods.
>  
> Chinese description reference:
> https://issues.apache.org/jira/browse/FLINK-28167



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-28191) 【flink-runtime】Achieve metrics summary and unified active reporting

2022-06-21 Thread fengjk (Jira)

fengjk created FLINK-28191:
--

 Summary: 【flink-runtime】Achieve metrics summary and unified active 
reporting
 Key: FLINK-28191
 URL: https://issues.apache.org/jira/browse/FLINK-28191
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Metrics
Affects Versions: 1.14.3
 Environment: Flink version : 1.11.1、1.14.3 

Java version : 8
Reporter: fengjk


Currently we use flink-metrics-http plugins to report metrics , have the 
following problems：
 # Different components invoke the reporting interface independently and in 
parallel, frequently invoking the interface.
 # Data can only be aggregated downstream .
 # The number of downstream data rows is large, consuming storage resources

 

Plan：Metrics data for all components are summarized and sent downstream



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-28190) NullPointerException is thrown if the intermediate result of nesting UDFs is used

2022-06-21 Thread Caizhi Weng (Jira)

Caizhi Weng created FLINK-28190:
---

 Summary: NullPointerException is thrown if the intermediate result 
of nesting UDFs is used
 Key: FLINK-28190
 URL: https://issues.apache.org/jira/browse/FLINK-28190
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.14.5, 1.15.0
Reporter: Caizhi Weng


Add the following test case to {{TableEnvironmentITCase}} to reproduce this bug.

{code:scala}
@Test
def myTest(): Unit = {
  tEnv.executeSql("create temporary function myfun1 as 'MyFun1'")
  tEnv.executeSql("create temporary function myfun2 as 'MyFun2'")

  val data: Seq[Row] = Seq(
Row.of("Hi", "Hello")
  )
  tEnv.executeSql(
s"""
  |create table T (
  |  a string,
  |  b string
  |) with (
  |  'connector' = 'values',
  |  'data-id' = '${TestValuesTableFactory.registerData(data)}',
  |  'bounded' = 'true'
  |)
  |""".stripMargin)
  tEnv.executeSql("create temporary view my_view as select myfun1(a, b) as mp 
from T")
  tEnv.executeSql("select myfun2(mp), mp['Hi'] from my_view").print()
}
{code}

UDF classes are
{code:java}
import org.apache.flink.table.functions.ScalarFunction;

import java.util.HashMap;
import java.util.Map;

public class MyFun1 extends ScalarFunction {

public Map eval(String k, String v) {
Map returnMap = new HashMap<>();
returnMap.put(k, v);
return returnMap;
}
}
{code}

{code:java}
import org.apache.flink.table.functions.ScalarFunction;

import java.util.Map;

public class MyFun2 extends ScalarFunction {

public String eval(Map input) {
return String.valueOf(input);
}
}
{code}

The exception stack is
{code}
Caused by: java.lang.NullPointerException
at StreamExecCalc$25.processElement_split1(Unknown Source)
at StreamExecCalc$25.processElement(Unknown Source)
at 
org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:99)
at 
org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:80)
at 
org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39)
at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:418)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:513)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$SwitchingOnClose.collect(StreamSourceContexts.java:103)
at 
org.apache.flink.table.planner.factories.TestValuesRuntimeFunctions$FromElementSourceFunction.run(TestValuesRuntimeFunctions.java:530)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:332)
{code}

The generated code is
{code}
public class ToBinary$0 implements 
org.apache.flink.table.runtime.generated.Projection {

  org.apache.flink.table.data.binary.BinaryRowData out = new 
org.apache.flink.table.data.binary.BinaryRowData(2);
org.apache.flink.table.data.writer.BinaryRowWriter outWriter = new 
org.apache.flink.table.data.writer.BinaryRowWriter(out);

  public ToBinary$0(Object[] references) throws Exception {

  }

  @Override
  public org.apache.flink.table.data.binary.BinaryRowData 
apply(org.apache.flink.table.data.RowData in1) {

if (in1 instanceof org.apache.flink.table.data.binary.BinaryRowData) {
  return ((org.apache.flink.table.data.binary.BinaryRowData) in1);
}

innerApply(in1);
return out;
  }

  /* Fit into JavaCodeSplitter's void function limitation. */
  private void innerApply(org.apache.flink.table.data.RowData in1) {


outWriter.reset();


if (in1.isNullAt(0)) {
  outWriter.setNullAt(0);
} else {
  outWriter.writeString(0, 
((org.apache.flink.table.data.binary.BinaryStringData) in1.getString(0)));
}
 


if (in1.isNullAt(1)) {
  outWriter.setNullAt(1);
} else {
  outWriter.writeString(1, 
((org.apache.flink.table.data.binary.BinaryStringData) in1.getString(1)));
}
 
outWriter.complete();
out.setRowKind(in1.getRowKind());
  }
}


public class ToBinary$1 implements 
org.apache.flink.table.runtime.generated.Projection {

  org.apache.flink.table.data.binary.BinaryRowData out = new 
org.apache.flink.table.data.binary.BinaryRowData(2);
org.apache.flink.table.data.writer.BinaryRowWriter outWriter = new

[jira] [Commented] (FLINK-27832) SplitAggregateITCase tests failed with Could not acquire the minimum required resources

2022-06-21 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-27832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557207#comment-17557207
 ] 

Zhu Zhu commented on FLINK-27832:
-

https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_apis/build/builds/37010/logs/104

> SplitAggregateITCase tests failed with Could not acquire the minimum required 
> resources
> ---
>
> Key: FLINK-27832
> URL: https://issues.apache.org/jira/browse/FLINK-27832
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.16.0
>Reporter: Huang Xingbo
>Assignee: godfrey he
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> 2022-05-27T17:42:59.4814999Z May 27 17:42:59 [ERROR] Tests run: 64, Failures: 
> 23, Errors: 1, Skipped: 0, Time elapsed: 305.5 s <<< FAILURE! - in 
> org.apache.flink.table.planner.runtime.stream.sql.SplitAggregateITCase
> 2022-05-27T17:42:59.4815983Z May 27 17:42:59 [ERROR] 
> SplitAggregateITCase.testAggWithJoin  Time elapsed: 278.742 s  <<< ERROR!
> 2022-05-27T17:42:59.4816608Z May 27 17:42:59 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2022-05-27T17:42:59.4819182Z May 27 17:42:59  at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> 2022-05-27T17:42:59.4820363Z May 27 17:42:59  at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
> 2022-05-27T17:42:59.4821463Z May 27 17:42:59  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> 2022-05-27T17:42:59.4822292Z May 27 17:42:59  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> 2022-05-27T17:42:59.4823317Z May 27 17:42:59  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2022-05-27T17:42:59.4824210Z May 27 17:42:59  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2022-05-27T17:42:59.4825081Z May 27 17:42:59  at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:268)
> 2022-05-27T17:42:59.4825927Z May 27 17:42:59  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
> 2022-05-27T17:42:59.4826748Z May 27 17:42:59  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
> 2022-05-27T17:42:59.4827596Z May 27 17:42:59  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2022-05-27T17:42:59.4828416Z May 27 17:42:59  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2022-05-27T17:42:59.4829284Z May 27 17:42:59  at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1277)
> 2022-05-27T17:42:59.4830111Z May 27 17:42:59  at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
> 2022-05-27T17:42:59.4831015Z May 27 17:42:59  at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
> 2022-05-27T17:42:59.483Z May 27 17:42:59  at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
> 2022-05-27T17:42:59.4833162Z May 27 17:42:59  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
> 2022-05-27T17:42:59.4834250Z May 27 17:42:59  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
> 2022-05-27T17:42:59.4835236Z May 27 17:42:59  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2022-05-27T17:42:59.4836035Z May 27 17:42:59  at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> 2022-05-27T17:42:59.4836872Z May 27 17:42:59  at 
> org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47)
> 2022-05-27T17:42:59.4837630Z May 27 17:42:59  at 
> akka.dispatch.OnComplete.internal(Future.scala:300)
> 2022-05-27T17:42:59.4838394Z May 27 17:42:59  at 
> akka.dispatch.OnComplete.internal(Future.scala:297)
> 2022-05-27T17:42:59.4839044Z May 27 17:42:59  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:224)
> 2022-05-27T17:42:59.4839748Z May 27 17:42:59  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:221)
> 2022-05-27T17:42:59.4840463Z May 27 17:42:59  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
> 2022-05-27T17:42:59.4841313Z May 27 17:42:59  at 
>

[GitHub] [flink-web] PatrickRen merged pull request #551: Add Qingsheng to the committer list

2022-06-21 Thread GitBox



PatrickRen merged PR #551:
URL: https://github.com/apache/flink-web/pull/551


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] cason0126 commented on pull request #19964: [FLINK-27291][docs-zh] Translate the "List of Data Types" section of…

2022-06-21 Thread GitBox



cason0126 commented on PR #19964:
URL: https://github.com/apache/flink/pull/19964#issuecomment-1162569104

   > 
   
translating '（both inclusive）' will catch the Chinese reader's attention 
more than just using '[' and ']' .
   I agree with that .  catche the attention 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-26372) Allow to configure Changelog Storage per program

2022-06-21 Thread Yanfei Lei (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557193#comment-17557193
 ] 

Yanfei Lei commented on FLINK-26372:


Hi [~Feifan Wang], I have an immature suggestion, maybe you can consider 
updating the constant {{ENABLE_CHANGE_LOG_STATE_BACKEND}} by the way? See this 
[discussion|https://github.com/apache/flink/pull/19907#discussion_r902485780] 
for more details. 

> Allow to configure Changelog Storage per program
> 
>
> Key: FLINK-26372
> URL: https://issues.apache.org/jira/browse/FLINK-26372
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration, Runtime / State Backends
>Affects Versions: 1.15.0
>Reporter: Roman Khachatryan
>Assignee: Feifan Wang
>Priority: Major
> Fix For: 1.16.0
>
>
> It's currently possible to override state.backend.changelog.enabled per job, 
> but it's not possible to override Changelog storage settings (i.e. writer 
> settings).
> There should be 1) an API and 2) runtime support for that.
> See this 
> [discussion|https://github.com/apache/flink/pull/16341#discussion_r663749681] 
> and the corresponding 
> [TODO|https://github.com/apache/flink/pull/16341/files#diff-2c21555dcab689ec27c0ab981852a2bfa787695fb2fe04b24c22b89c63d98b73R680].
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[GitHub] [flink-table-store] LadyForest commented on a diff in pull request #168: [FLINK-28179] LeafPredicate accepts multiple literals

2022-06-21 Thread GitBox



LadyForest commented on code in PR #168:
URL: https://github.com/apache/flink-table-store/pull/168#discussion_r903225265


##
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/predicate/LeafPredicate.java:
##
@@ -18,70 +18,107 @@
 
 package org.apache.flink.table.store.file.predicate;
 
+import org.apache.flink.api.common.typeutils.base.ListSerializer;
+import org.apache.flink.api.java.typeutils.runtime.NullableSerializer;
+import org.apache.flink.core.memory.DataInputViewStreamWrapper;
+import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
+import org.apache.flink.table.runtime.typeutils.InternalSerializers;
 import org.apache.flink.table.store.file.stats.FieldStats;
+import org.apache.flink.table.types.logical.LogicalType;
 
-import java.io.Serializable;
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.util.List;
 import java.util.Objects;
 import java.util.Optional;
 
 /** Leaf node of a {@link Predicate} tree. Compares a field in the row with an 
{@link Literal}. */

Review Comment:
   ```suggestion
   /** Leaf node of a {@link Predicate} tree. Compares a field in the row with 
an {@link Literal}. */
   ```
   ```suggestion
   /** Leaf node of a {@link Predicate} tree. Compares a field in the row with 
literals. */
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-19358) when submit job on application mode with HA,the jobid will be 0000000000

2022-06-21 Thread Yangze Guo (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557192#comment-17557192
 ] 

Yangze Guo commented on FLINK-19358:


Hi, there. I'd like to revive this topic for FLIP-241. In the current 
implementation, the constant {{jobId}} will confuse the HistoryServer because 
it assumed the JobID is unique.

After an offline discussion with [~wangyang0918], we propose to generate a 
random JobID at the Flink client when users do not configure one in HA mode. In 
this way:
- we keep the consistency of the JobID across failover.
- do not need to introduce special cases in the recovery progress.
- do not have potential conflict with the existing design for supporting the 
multiple-execute applications.


> when submit job on application mode with HA,the jobid will be 00
> 
>
> Key: FLINK-19358
> URL: https://issues.apache.org/jira/browse/FLINK-19358
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.11.0
>Reporter: Jun Zhang
>Assignee: Yangze Guo
>Priority: Minor
>  Labels: auto-deprioritized-major, usability
>
> when submit a flink job on application mode with HA ,the flink job id will be 
> , when I have many jobs ,they have the same 
> job id , it will be lead to a checkpoint error



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #167: [FLINK-28129] Add documentation for rescale bucket

2022-06-21 Thread GitBox



JingsongLi commented on code in PR #167:
URL: https://github.com/apache/flink-table-store/pull/167#discussion_r903221176


##
docs/content/docs/development/scale-bucket.md:
##
@@ -0,0 +1,116 @@
+---
+title: "Scale Bucket"

Review Comment:
   Why is scale instead of rescale?



##
docs/content/docs/development/scale-bucket.md:
##
@@ -0,0 +1,116 @@
+---
+title: "Scale Bucket"
+weight: 5
+type: docs
+aliases:
+- /development/scale-bucket.html
+---
+
+
+# Scale Bucket
+
+Since the number of total buckets dramatically influences the performance, 
Table Store allows users to 
+tune bucket numbers by `ALTER TABLE` command and reorganize data layout by 
`INSERT OVERWRITE` 
+without recreating the table/partition. When executing overwrite jobs, the 
framework will automatically 
+scan the data with the bucket number recorded in manifest file and hash the 
record according to the current bucket numbers.

Review Comment:
   `manifest file` is an internal concept.
   Here we can simply say: reorganize the data according to the latest bucket 
number.
   (number instead of numbers?)



##
docs/content/docs/development/scale-bucket.md:
##
@@ -0,0 +1,116 @@
+---
+title: "Scale Bucket"
+weight: 5
+type: docs
+aliases:
+- /development/scale-bucket.html
+---
+
+
+# Scale Bucket
+
+Since the number of total buckets dramatically influences the performance, 
Table Store allows users to 
+tune bucket numbers by `ALTER TABLE` command and reorganize data layout by 
`INSERT OVERWRITE` 
+without recreating the table/partition. When executing overwrite jobs, the 
framework will automatically 
+scan the data with the bucket number recorded in manifest file and hash the 
record according to the current bucket numbers.
+
+## Rescale Overwrite
+```sql
+-- scale number of total buckets
+ALTER TABLE table_identifier SET ('bucket' = '...')
+
+-- reorganize data layout of table/partition
+INSERT OVERWRITE table_identifier [PARTITION (part_spec)]
+SELECT ... 
+FROM table_identifier
+[WHERE part_spec]
+``` 
+
+Please note that
+- `ALTER TABLE` only modifies the table's metadata and will **NOT** reorganize 
or reformat existing data. 
+  Reorganize exiting data must be achieved by `INSERT OVERWRITE`.
+- Scale bucket number does not influence the read and running write jobs.
+- Once the bucket number is changed, any new `INSERT INTO` jobs without 
reorganize table/partition 
+  will throw a `TableException` with message like 
+  ```text
+  Try to write table/partition ... with a new bucket num ..., 
+  but the previous bucket num is ... Please switch to batch mode, 
+  and perform INSERT OVERWRITE to rescale current data layout first.
+  ```
+
+
+{{< hint info >}}
+__Note:__ For the table which enables log system(*e.g.* Kafka), please scale 
the topic's partition as well to keep consistency.
+{{< /hint >}}
+
+## Use Case
+
+Suppose there is a daily streaming ETL task to sync transaction data. The 
table's DDL and pipeline
+are listed as follows.
+
+```sql
+-- table DDL
+CREATE TABLE verified_orders (
+trade_order_id BIGINT,
+item_id BIGINT,
+item_price DOUBLE
+dt STRING
+PRIMARY KEY (dt, trade_order_id, item_id) NOT ENFORCED 
+) PARTITIONED BY (dt)
+WITH (
+'bucket' = '16'
+);
+
+-- streaming insert as bucket num = 16
+INSERT INTO verified_orders
+SELECT trade_order_id,
+   item_id,
+   item_price,
+   DATE_FORMAT(gmt_create, '-MM-dd') AS dt
+FROM raw_orders
+WHERE order_status = 'verified'
+```
+The pipeline has been running well for the past four weeks. However, the data 
volume has grown fast recently, 
+and the job's latency keeps increasing. A possible workaround is to create a 
new table with a larger bucket number 
+(thus the parallelism can be increased accordingly) and sync data to this new 
table.
+
+However, there is a better solution with four steps.
+
+- First, suspend the streaming job with savepoint.

Review Comment:
   Can we be more detailed? For example, what is the savepoint api for 
TableEnvironment? Is there a link to the corresponding flink documentation.
   Even we developers are having a hard time figuring out how to savepoint and 
the corresponding restore.



##
docs/content/docs/development/scale-bucket.md:
##
@@ -0,0 +1,116 @@
+---
+title: "Scale Bucket"
+weight: 5
+type: docs
+aliases:
+- /development/scale-bucket.html
+---
+
+
+# Scale Bucket
+
+Since the number of total buckets dramatically influences the performance, 
Table Store allows users to 
+tune bucket numbers by `ALTER TABLE` command and reorganize data layout by 
`INSERT OVERWRITE` 
+without recreating the table/partition. When executing overwrite jobs, the 
framework will automatically 
+scan the data with the bucket number recorded in manifest file and hash the 
record according to the current bucket numbers.
+
+## Rescale Overwrite
+```sql
+-- scale number of total buckets
+ALTER TABLE table_identifier SET ('bucket' = '...')
+
+-- reorganize data

[GitHub] [flink-web] HuangXingBo closed pull request #550: Release Flink 1.14.5

2022-06-21 Thread GitBox



HuangXingBo closed pull request #550: Release Flink 1.14.5
URL: https://github.com/apache/flink-web/pull/550


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-28173) Multiple Parquet format tests are failing with NoSuchMethodError

2022-06-21 Thread jia liu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557183#comment-17557183
 ] 

jia liu edited comment on FLINK-28173 at 6/22/22 2:26 AM:
--

[~martijnvisser] Please assign this ticket to me.
I will add `hadoop3-tests` profile to cover default dependencies with guava 
exclusion.


was (Author: sonice_lj):
[~martijnvisser] Please assign this ticket to me.
I will add `hadoop3-tests` profile to cover default dependencies with guava 
exlclusion.

> Multiple Parquet format tests are failing with NoSuchMethodError
> 
>
> Key: FLINK-28173
> URL: https://issues.apache.org/jira/browse/FLINK-28173
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.16.0
>Reporter: Martijn Visser
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> Jun 21 02:44:38 java.lang.NoSuchMethodError: 
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.readFields(Configuration.java:3798)
> Jun 21 02:44:38   at 
> org.apache.flink.formats.parquet.utils.SerializableConfiguration.readObject(SerializableConfiguration.java:50)
> Jun 21 02:44:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jun 21 02:44:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> {code}
> {code:java}
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testProject
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testReachEnd
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateGenericReader:161->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateReflectReader:133->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateSpecificReader:118->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReadWithRestoreGenericReader:203->restoreReader:293
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReflectReadFromGenericRecords:147->createReader:269
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   ParquetRowDataWriterTest.testCompression:126 » 
> NoSuchMethod com.google.common
> Jun 21 02:44:42 [ERROR]   
> ParquetRowDataWriterTest.testTypes:117->innerTest:168 » NoSuchMethod 
> com.googl...
> Jun 21 02:44:42 [ERROR]   SerializableConfigurationTest.testResource:45 » 
> NoSuchMethod com.google.common...
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] Tests run: 31, Failures: 0, Errors: 24, Skipped: 0
> Jun 21 02:44:42 [INFO] 
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=36979=logs=7e3d33c3-a462-5ea8-98b8-27e1aafe4ceb=ef77f8d1-44c8-5ee2-f175-1c88f61de8c0=16375



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Assigned] (FLINK-19358) when submit job on application mode with HA,the jobid will be 0000000000

2022-06-21 Thread Yangze Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yangze Guo reassigned FLINK-19358:
--

Assignee: Yangze Guo

> when submit job on application mode with HA,the jobid will be 00
> 
>
> Key: FLINK-19358
> URL: https://issues.apache.org/jira/browse/FLINK-19358
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.11.0
>Reporter: Jun Zhang
>Assignee: Yangze Guo
>Priority: Minor
>  Labels: auto-deprioritized-major, usability
>
> when submit a flink job on application mode with HA ,the flink job id will be 
> , when I have many jobs ,they have the same 
> job id , it will be lead to a checkpoint error



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[GitHub] [flink-table-store] JingsongLi merged pull request #169: [hotfix] Fix SqlParserEOFException in AlterTableCompactITCase

2022-06-21 Thread GitBox



JingsongLi merged PR #169:
URL: https://github.com/apache/flink-table-store/pull/169


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-28173) Multiple Parquet format tests are failing with NoSuchMethodError

2022-06-21 Thread jia liu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557183#comment-17557183
 ] 

jia liu edited comment on FLINK-28173 at 6/22/22 2:08 AM:
--

[~martijnvisser] Please assign this ticket to me.
I will add `hadoop3-tests` profile to cover default dependencies with guava 
exlclusion.


was (Author: sonice_lj):
[~martijnvisser] Please assign this ticket to me.
I will add `hadoop3-tests` profile to cover default dependencies with latest 
guava version.

> Multiple Parquet format tests are failing with NoSuchMethodError
> 
>
> Key: FLINK-28173
> URL: https://issues.apache.org/jira/browse/FLINK-28173
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.16.0
>Reporter: Martijn Visser
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> Jun 21 02:44:38 java.lang.NoSuchMethodError: 
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.readFields(Configuration.java:3798)
> Jun 21 02:44:38   at 
> org.apache.flink.formats.parquet.utils.SerializableConfiguration.readObject(SerializableConfiguration.java:50)
> Jun 21 02:44:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jun 21 02:44:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> {code}
> {code:java}
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testProject
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testReachEnd
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateGenericReader:161->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateReflectReader:133->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateSpecificReader:118->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReadWithRestoreGenericReader:203->restoreReader:293
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReflectReadFromGenericRecords:147->createReader:269
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   ParquetRowDataWriterTest.testCompression:126 » 
> NoSuchMethod com.google.common
> Jun 21 02:44:42 [ERROR]   
> ParquetRowDataWriterTest.testTypes:117->innerTest:168 » NoSuchMethod 
> com.googl...
> Jun 21 02:44:42 [ERROR]   SerializableConfigurationTest.testResource:45 » 
> NoSuchMethod com.google.common...
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] Tests run: 31, Failures: 0, Errors: 24, Skipped: 0
> Jun 21 02:44:42 [INFO] 
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=36979=logs=7e3d33c3-a462-5ea8-98b8-27e1aafe4ceb=ef77f8d1-44c8-5ee2-f175-1c88f61de8c0=16375



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[GitHub] [flink-table-store] LadyForest closed pull request #167: [FLINK-28129] Add documentation for rescale bucket

2022-06-21 Thread GitBox



LadyForest closed pull request #167: [FLINK-28129] Add documentation for 
rescale bucket
URL: https://github.com/apache/flink-table-store/pull/167


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-28173) Multiple Parquet format tests are failing with NoSuchMethodError

2022-06-21 Thread jia liu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557183#comment-17557183
 ] 

jia liu commented on FLINK-28173:
-

[~martijnvisser] Please assign this ticket to me.
I will add `hadoop3-tests` profile to cover default dependencies with latest 
guava version.

> Multiple Parquet format tests are failing with NoSuchMethodError
> 
>
> Key: FLINK-28173
> URL: https://issues.apache.org/jira/browse/FLINK-28173
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.16.0
>Reporter: Martijn Visser
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> Jun 21 02:44:38 java.lang.NoSuchMethodError: 
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.readFields(Configuration.java:3798)
> Jun 21 02:44:38   at 
> org.apache.flink.formats.parquet.utils.SerializableConfiguration.readObject(SerializableConfiguration.java:50)
> Jun 21 02:44:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jun 21 02:44:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> {code}
> {code:java}
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testProject
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testReachEnd
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateGenericReader:161->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateReflectReader:133->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateSpecificReader:118->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReadWithRestoreGenericReader:203->restoreReader:293
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReflectReadFromGenericRecords:147->createReader:269
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   ParquetRowDataWriterTest.testCompression:126 » 
> NoSuchMethod com.google.common
> Jun 21 02:44:42 [ERROR]   
> ParquetRowDataWriterTest.testTypes:117->innerTest:168 » NoSuchMethod 
> com.googl...
> Jun 21 02:44:42 [ERROR]   SerializableConfigurationTest.testResource:45 » 
> NoSuchMethod com.google.common...
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] Tests run: 31, Failures: 0, Errors: 24, Skipped: 0
> Jun 21 02:44:42 [INFO] 
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=36979=logs=7e3d33c3-a462-5ea8-98b8-27e1aafe4ceb=ef77f8d1-44c8-5ee2-f175-1c88f61de8c0=16375



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (FLINK-28053) Introduce queue to execute request in sequence

2022-06-21 Thread Shengkai Fang (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shengkai Fang updated FLINK-28053:
--
Description: 
There are two kinds of the parallel we need to consider:

1. The parallel among all the operations for the same session. For example, the 
user may submit a SQL to create a table and modify another table's schema in 
parallel.
2. The parallel is also mainly about the Operation itself. It is possible that 
one thread is reading the data from the Operation and another one closes the 
Operation in parallel. 

We may introduce the queue to make these requests execute in sequence. It 
brings the benefit that simplifying the logic in the Operation and 
OperationManager and moving all locks to the handover. But it may cause a 
performance regression. Therefore, it's better if we can start this issue until 
all components finish.





> Introduce queue to execute request in sequence
> --
>
> Key: FLINK-28053
> URL: https://issues.apache.org/jira/browse/FLINK-28053
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Gateway
>Affects Versions: 1.16.0
>Reporter: Shengkai Fang
>Assignee: Shengkai Fang
>Priority: Major
>
> There are two kinds of the parallel we need to consider:
> 1. The parallel among all the operations for the same session. For example, 
> the user may submit a SQL to create a table and modify another table's schema 
> in parallel.
> 2. The parallel is also mainly about the Operation itself. It is possible 
> that one thread is reading the data from the Operation and another one closes 
> the Operation in parallel. 
> We may introduce the queue to make these requests execute in sequence. It 
> brings the benefit that simplifying the logic in the Operation and 
> OperationManager and moving all locks to the handover. But it may cause a 
> performance regression. Therefore, it's better if we can start this issue 
> until all components finish.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (FLINK-28189) The taskmanager dynamically created by flink native k8s creates pods directly, why not create a deployment?

2022-06-21 Thread hob007 (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hob007 updated FLINK-28189:
---
Description: 
I am using Flink native k8s deployment. The pods is dynamically created. But 
there's some problem with K8S cluster monitor. I can't monitor taskmanager pods 
status, because these pods without deployment.

 

So my question is:

The taskmanager dynamically created by flink native k8s creates pods directly, 
why not create a deployment?

 

For example:

Define a deployment and using replicates to control taskmanager's pods.

 

Chinese description reference:

https://issues.apache.org/jira/browse/FLINK-28167

  was:
I am using Flink native k8s deployment. The pods is dynamically created. But 
there's some problem with K8S cluster monitor. I can't monitor taskmanager pods 
status, because these pod without deployment.

 

So my question is:

The taskmanager dynamically created by flink native k8s creates pods directly, 
why not create a deployment?

 

For example:

Define a deployment and using replicates to control taskmanager's pods.

 

Chinese description reference:

https://issues.apache.org/jira/browse/FLINK-28167


> The taskmanager dynamically created by flink native k8s creates pods 
> directly, why not create a deployment?
> ---
>
> Key: FLINK-28189
> URL: https://issues.apache.org/jira/browse/FLINK-28189
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.14.4
>Reporter: hob007
>Priority: Major
>
> I am using Flink native k8s deployment. The pods is dynamically created. But 
> there's some problem with K8S cluster monitor. I can't monitor taskmanager 
> pods status, because these pods without deployment.
>  
> So my question is:
> The taskmanager dynamically created by flink native k8s creates pods 
> directly, why not create a deployment?
>  
> For example:
> Define a deployment and using replicates to control taskmanager's pods.
>  
> Chinese description reference:
> https://issues.apache.org/jira/browse/FLINK-28167



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (FLINK-28189) The taskmanager dynamically created by flink native k8s creates pods directly, why not create a deployment?

2022-06-21 Thread hob007 (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hob007 updated FLINK-28189:
---
Description: 
I am using Flink native k8s deployment. The pods is dynamically created. But 
there's some problem with K8S cluster monitor. I can't monitor taskmanager pods 
status, because these pod without deployment.

 

So my question is:

The taskmanager dynamically created by flink native k8s creates pods directly, 
why not create a deployment?

 

For example:

Define a deployment and using replicates to control taskmanager's pods.

 

Chinese description reference:

https://issues.apache.org/jira/browse/FLINK-28167

  was:
I am using Flink native k8s deployment. The pods is dynamically created. But 
there's some problem with K8S cluster monitor. I can't monitor taskmanager pods 
status, because these pod without deployment.

 

So my question is:

The taskmanager dynamically created by flink native k8s creates pods directly, 
why not create a deployment?

 

For example:

Define a deployment and using replicates to control taskmanager's pods.


> The taskmanager dynamically created by flink native k8s creates pods 
> directly, why not create a deployment?
> ---
>
> Key: FLINK-28189
> URL: https://issues.apache.org/jira/browse/FLINK-28189
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.14.4
>Reporter: hob007
>Priority: Major
>
> I am using Flink native k8s deployment. The pods is dynamically created. But 
> there's some problem with K8S cluster monitor. I can't monitor taskmanager 
> pods status, because these pod without deployment.
>  
> So my question is:
> The taskmanager dynamically created by flink native k8s creates pods 
> directly, why not create a deployment?
>  
> For example:
> Define a deployment and using replicates to control taskmanager's pods.
>  
> Chinese description reference:
> https://issues.apache.org/jira/browse/FLINK-28167



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-28189) The taskmanager dynamically created by flink native k8s creates pods directly, why not create a deployment?

2022-06-21 Thread hob007 (Jira)

hob007 created FLINK-28189:
--

 Summary: The taskmanager dynamically created by flink native k8s 
creates pods directly, why not create a deployment?
 Key: FLINK-28189
 URL: https://issues.apache.org/jira/browse/FLINK-28189
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes
Affects Versions: 1.14.4
Reporter: hob007


I am using Flink native k8s deployment. The pods is dynamically created. But 
there's some problem with K8S cluster monitor. I can't monitor taskmanager pods 
status, because these pod without deployment.

 

So my question is:

The taskmanager dynamically created by flink native k8s creates pods directly, 
why not create a deployment?

 

For example:

Define a deployment and using replicates to control taskmanager's pods.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-28122) Translate "Overview " and "Project Configuration" in "User-defined Sources & Sinks" page

2022-06-21 Thread chenzihao (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557180#comment-17557180
 ] 

chenzihao commented on FLINK-28122:
---

[~martijnvisser] hi, Martijn. Can you help to confirm this ticket? I can do 
this work if needed.

> Translate "Overview " and "Project Configuration" in "User-defined Sources & 
> Sinks" page 
> -
>
> Key: FLINK-28122
> URL: https://issues.apache.org/jira/browse/FLINK-28122
> Project: Flink
>  Issue Type: Improvement
>  Components: chinese-translation, Documentation
>Reporter: Chengkai Yang
>Priority: Minor
>
> The links are
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sourcessinks/#overview
> and 
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sourcessinks/#project-configuration



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-28171) Adjust Job and Task manager port definitions to work with Istio+mTLS

2022-06-21 Thread Yang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557178#comment-17557178
 ] 

Yang Wang commented on FLINK-28171:
---

I second with [~martijnvisser]'s suggestion that we should consider 
{{appProtocal}} first and make sure it does not break for old K8s versions.

> Adjust Job and Task manager port definitions to work with Istio+mTLS
> 
>
> Key: FLINK-28171
> URL: https://issues.apache.org/jira/browse/FLINK-28171
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.14.4
> Environment: flink-kubernetes-operator 1.0.0
> Flink 1.14-java11
> Kubernetes v1.19.5
> Istio 1.7.6
>Reporter: Moshe Elisha
>Priority: Major
>
> Hello,
>  
> We are launching Flink deployments using the [Flink Kubernetes 
> Operator|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/]
>  on a Kubernetes cluster with Istio and mTLS enabled.
>  
> We found that the TaskManager is unable to communicate with the JobManager on 
> the jobmanager-rpc port:
>  
> {{2022-06-15 15:25:40,508 WARN  akka.remote.ReliableDeliverySupervisor        
>                [] - Association with remote system 
> [akka.tcp://[flink@amf-events-to-inference-and-central.nwdaf-edge|mailto:flink@amf-events-to-inference-and-central.nwdaf-edge]:6123]
>  has failed, address is now gated for [50] ms. Reason: [Association failed 
> with 
> [akka.tcp://[flink@amf-events-to-inference-and-central.nwdaf-edge|mailto:flink@amf-events-to-inference-and-central.nwdaf-edge]:6123]]
>  Caused by: [The remote system explicitly disassociated (reason unknown).]}}
>  
> The reason for the issue is that the JobManager service port definitions are 
> not following the Istio guidelines 
> [https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/]
>  (see example below).
>  
> There was also an email discussion around this topic in the users mailing 
> group under the subject "Flink Kubernetes Operator with K8S + Istio + mTLS - 
> port definitions".
> With the help of the community, we were able to work around the issue but it 
> was very hard and forced us to skip Istio proxy which is not ideal.
>  
> We would like you to consider changing the default port definitions, either
>  # Rename the ports – I understand it is Istio specific guideline but maybe 
> it is better to at least be aligned with one (popular) vendor guideline 
> instead of none at all.
>  # Add the “appProtocol” property[1] that is not specific to any vendor but 
> requires Kubernetes >= 1.19 where it was introduced as beta and moved to 
> stable in >= 1.20. The option to add appProtocol property was added only in 
> [https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0] with 
> [#3570|https://github.com/fabric8io/kubernetes-client/issues/3570].
>  # Or allow a way to override the defaults.
>  
> [https://kubernetes.io/docs/concepts/services-networking/_print/#application-protocol]
>  
>  
> {{# k get service inference-results-to-analytics-engine -o yaml}}
> {{apiVersion: v1}}
> {{kind: Service}}
> {{...}}
> {{spec:}}
> {{  clusterIP: None}}
> {{  ports:}}
> {{  - name: jobmanager-rpc *# should start with “tcp-“ or add "appProtocol" 
> property*}}
> {{    port: 6123}}
> {{    protocol: TCP}}
> {{    targetPort: 6123}}
> {{  - name: blobserver *# should start with "tcp-" or add "appProtocol" 
> property*}}
> {{    port: 6124}}
> {{    protocol: TCP}}
> {{    targetPort: 6124}}
> {{...}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[GitHub] [flink] zhougit86 commented on pull request #20033: [FLINK-27944][runtime] Move input metrics out of the inputGate loop, …

2022-06-21 Thread GitBox



zhougit86 commented on PR #20033:
URL: https://github.com/apache/flink/pull/20033#issuecomment-1162532156

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-28167) flink native k8s动态创建的taskmanager 直接创建pod，为啥不创建deployment？

2022-06-21 Thread hob007 (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557177#comment-17557177
 ] 

hob007 commented on FLINK-28167:


OK, I will create a new tickets.

> flink native k8s动态创建的taskmanager 直接创建pod，为啥不创建deployment？
> -
>
> Key: FLINK-28167
> URL: https://issues.apache.org/jira/browse/FLINK-28167
> Project: Flink
>  Issue Type: Bug
> Environment: Flink 14版本
>Reporter: hob007
>Priority: Major
>
> flink native k8s动态创建的taskmanager 直接创建pod，为啥不创建deployment？
>  
> --原因：没有deployment，导致K8s默认监控的deployment看不到，taskmanager的状态。
> 只能看到一个jobmanager的deployment。
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[GitHub] [flink] 1996fanrui commented on pull request #20038: [FLINK-26762][docs] Document overdraft buffers

2022-06-21 Thread GitBox



1996fanrui commented on PR #20038:
URL: https://github.com/apache/flink/pull/20038#issuecomment-1162517634

   > Ehhh... I've forgotten to squash the commits before merging :/
   
   Could you revert and remerge it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-28188) Documentation: Wrong default value for classloader.parent-first-patterns.default

2022-06-21 Thread Arseniy Tashoyan (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arseniy Tashoyan closed FLINK-28188.

Release Note: 
This is applicable for plugin.classloader.parent-first-patterns.default, not 
for classloader.parent-first-patterns.default.
But plugin.classloader.parent-first-patterns.default is not documented at all.
  Resolution: Invalid

> Documentation: Wrong default value for 
> classloader.parent-first-patterns.default
> 
>
> Key: FLINK-28188
> URL: https://issues.apache.org/jira/browse/FLINK-28188
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.15.0
>Reporter: Arseniy Tashoyan
>Priority: Major
>
> The documentation provides a wrong value:
> [https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/config/#classloader-parent-first-patterns-default]
> "java.;scala.;org.apache.flink.;com.esotericsoftware.kryo;org.apache.hadoop.;javax.annotation.;org.slf4j;org.apache.log4j;org.apache.logging;org.apache.commons.logging;ch.qos.logback;org.xml;javax.xml;org.apache.xerces;org.w3c"
> The actual value is:
> "java.", "org.apache.flink.", "javax.annotation.",
> "org.slf4j",
> "org.apache.log4j",
>  "org.apache.logging",
>  "org.apache.commons.logging",
>  "ch.qos.logback"
> (from CoreOptions.java:187)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (FLINK-28188) Documentation: Wrong default value for classloader.parent-first-patterns.default

2022-06-21 Thread Arseniy Tashoyan (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arseniy Tashoyan updated FLINK-28188:
-
Description: 
The documentation provides a wrong value:
[https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/config/#classloader-parent-first-patterns-default]

"java.;scala.;org.apache.flink.;com.esotericsoftware.kryo;org.apache.hadoop.;javax.annotation.;org.slf4j;org.apache.log4j;org.apache.logging;org.apache.commons.logging;ch.qos.logback;org.xml;javax.xml;org.apache.xerces;org.w3c"

The actual value is:

"java.", "org.apache.flink.", "javax.annotation.",
"org.slf4j",
"org.apache.log4j",
 "org.apache.logging",
 "org.apache.commons.logging",
 "ch.qos.logback"
(from CoreOptions.java:187)

  was:
The documentation provides a wrong value:
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/config/#classloader-parent-first-patterns-default

"java.;scala.;org.apache.flink.;com.esotericsoftware.kryo;org.apache.hadoop.;javax.annotation.;org.slf4j;org.apache.log4j;org.apache.logging;org.apache.commons.logging;ch.qos.logback;org.xml;javax.xml;org.apache.xerces;org.w3c"

The actual value is: "java.", "org.apache.flink.", "javax.annotation."
(from CoreOptions.java:187)


> Documentation: Wrong default value for 
> classloader.parent-first-patterns.default
> 
>
> Key: FLINK-28188
> URL: https://issues.apache.org/jira/browse/FLINK-28188
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.15.0
>Reporter: Arseniy Tashoyan
>Priority: Major
>
> The documentation provides a wrong value:
> [https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/config/#classloader-parent-first-patterns-default]
> "java.;scala.;org.apache.flink.;com.esotericsoftware.kryo;org.apache.hadoop.;javax.annotation.;org.slf4j;org.apache.log4j;org.apache.logging;org.apache.commons.logging;ch.qos.logback;org.xml;javax.xml;org.apache.xerces;org.w3c"
> The actual value is:
> "java.", "org.apache.flink.", "javax.annotation.",
> "org.slf4j",
> "org.apache.log4j",
>  "org.apache.logging",
>  "org.apache.commons.logging",
>  "ch.qos.logback"
> (from CoreOptions.java:187)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-28188) Documentation: Wrong default value for classloader.parent-first-patterns.default

2022-06-21 Thread Arseniy Tashoyan (Jira)

Arseniy Tashoyan created FLINK-28188:


 Summary: Documentation: Wrong default value for 
classloader.parent-first-patterns.default
 Key: FLINK-28188
 URL: https://issues.apache.org/jira/browse/FLINK-28188
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.15.0
Reporter: Arseniy Tashoyan


The documentation provides a wrong value:
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/config/#classloader-parent-first-patterns-default

"java.;scala.;org.apache.flink.;com.esotericsoftware.kryo;org.apache.hadoop.;javax.annotation.;org.slf4j;org.apache.log4j;org.apache.logging;org.apache.commons.logging;ch.qos.logback;org.xml;javax.xml;org.apache.xerces;org.w3c"

The actual value is: "java.", "org.apache.flink.", "javax.annotation."
(from CoreOptions.java:187)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[GitHub] [flink] reswqa commented on a diff in pull request #20039: [FLINK-28133][runtime] Rework DefaultScheduler to directly deploy executions

2022-06-21 Thread GitBox



reswqa commented on code in PR #20039:
URL: https://github.com/apache/flink/pull/20039#discussion_r902880992


##
flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/DefaultExecutionDeployerTest.java:
##
@@ -0,0 +1,388 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.runtime.scheduler;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import 
org.apache.flink.runtime.concurrent.ComponentMainThreadExecutorServiceAdapter;
+import org.apache.flink.runtime.executiongraph.Execution;
+import org.apache.flink.runtime.executiongraph.ExecutionAttemptID;
+import org.apache.flink.runtime.executiongraph.ExecutionGraph;
+import org.apache.flink.runtime.executiongraph.ExecutionVertex;
+import 
org.apache.flink.runtime.executiongraph.TestingDefaultExecutionGraphBuilder;
+import org.apache.flink.runtime.io.network.partition.ResultPartitionID;
+import org.apache.flink.runtime.io.network.partition.ResultPartitionType;
+import 
org.apache.flink.runtime.io.network.partition.TestingJobMasterPartitionTracker;
+import org.apache.flink.runtime.jobgraph.DistributionPattern;
+import org.apache.flink.runtime.jobgraph.JobGraph;
+import org.apache.flink.runtime.jobgraph.JobGraphTestUtils;
+import org.apache.flink.runtime.jobgraph.JobVertex;
+import org.apache.flink.runtime.scheduler.strategy.ExecutionVertexID;
+import org.apache.flink.runtime.shuffle.TestingShuffleMaster;
+import org.apache.flink.runtime.testtasks.NoOpInvokable;
+import org.apache.flink.util.ExecutorUtils;
+import org.apache.flink.util.IterableUtils;
+import org.apache.flink.util.TestLogger;
+
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.assertj.core.api.Assertions.assertThat;
+import static org.assertj.core.api.Fail.fail;
+
+/** Tests for {@link DefaultExecutionDeployer}. */
+public class DefaultExecutionDeployerTest extends TestLogger {
+
+private ScheduledExecutorService executor;
+private ComponentMainThreadExecutor mainThreadExecutor;
+private TestExecutionOperationsDecorator testExecutionOperations;
+private ExecutionVertexVersioner executionVertexVersioner;
+private TestExecutionSlotAllocator testExecutionSlotAllocator;
+private TestingShuffleMaster shuffleMaster;
+private TestingJobMasterPartitionTracker partitionTracker;
+private Time partitionRegistrationTimeout;
+
+@BeforeEach
+public void setUp() throws Exception {

Review Comment:
   ```suggestion
   void setUp() throws Exception {
   ```



##
flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/DefaultExecutionDeployerTest.java:
##
@@ -0,0 +1,388 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.runtime.scheduler;
+
+import org.apache.flink.api.common.time.Time;
+import org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor;
+import 
org.apache.flink.runtime.concurrent.ComponentMainThreadExecutorServiceAdapter;
+import

[jira] [Comment Edited] (FLINK-28060) Kafka Commit on checkpointing fails repeatedly after a broker restart

2022-06-21 Thread Mason Chen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557005#comment-17557005
 ] 

Mason Chen edited comment on FLINK-28060 at 6/21/22 4:38 PM:
-

[~renqs] [~martijnvisser] were we able to reproduce this issue in Flink CI/unit 
test?

+1, I think we need to closely monitor KAFKA-13840, not sure if bumping to 
3.1.1 really fixed the issue. A user states that they still see the issue in 
the 3.1.1 upgrade.


was (Author: mason6345):
[~renqs] [~martijnvisser] were we able to reproduce this issue in Flink CI/unit 
test? 

+1, I think we need to closely monitor KAFKA-13840, not sure if bumping to 
3.1.1 really fixed the issue.

> Kafka Commit on checkpointing fails repeatedly after a broker restart
> -
>
> Key: FLINK-28060
> URL: https://issues.apache.org/jira/browse/FLINK-28060
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Connectors / Kafka
>Affects Versions: 1.15.0
> Environment: Reproduced on MacOS and Linux.
> Using java 8, Flink 1.15.0, Kafka 2.8.1.
>Reporter: Christian Lorenz
>Priority: Major
>  Labels: pull-request-available
> Attachments: flink-kafka-testjob.zip
>
>
> When Kafka Offset committing is enabled and done on Flinks checkpointing, an 
> error might occur if one Kafka broker is shutdown which might be the leader 
> of that partition in Kafkas internal __consumer_offsets topic.
> This is an expected behaviour. But once the broker is started up again, the 
> next checkpoint issued by flink should commit the meanwhile processed offsets 
> back to kafka. Somehow this does not seem to happen always in Flink 1.15.0 
> anymore and the offset committing is broken. An warning like the following 
> will be logged on each checkpoint:
> {code}
> [info] 14:33:13.684 WARN  [Source Data Fetcher for Source: input-kafka-source 
> -> Sink: output-stdout-sink (1/1)#1] o.a.f.c.k.s.reader.KafkaSourceReader - 
> Failed to commit consumer offsets for checkpoint 35
> [info] org.apache.kafka.clients.consumer.RetriableCommitFailedException: 
> Offset commit failed with a retriable exception. You should retry committing 
> the latest consumed offsets.
> [info] Caused by: 
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
> coordinator is not available.
> {code}
> To reproduce this I've attached a small flink job program.  To execute this 
> java8, scala sbt and docker / docker-compose is required.  Also see readme.md 
> for more details.
> The job can be run with `sbt run`, kafka cluster is started by 
> `docker-compose up`. If then the kafka brokers are restarted gracefully by 
> e.g. `docker-compose stop kafka1` and `docker-compose start kafka1` with 
> kafka2 and kafka3 afterwards, this warning will occur and no offsets will be 
> committed into kafka.
> This is not reproducible in flink 1.14.4.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-27788) Adding annotation to k8 operator Pod

2022-06-21 Thread Jaganathan Asokan (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557007#comment-17557007
 ] 

Jaganathan Asokan commented on FLINK-27788:
---

[~mbalassi] do we see any issue in merging this in to 1.0 release ? We 
internally using the stable 1.0 release, and it will help us to have this 
change in 1.0 branch as well.

> Adding annotation to k8 operator Pod
> 
>
> Key: FLINK-27788
> URL: https://issues.apache.org/jira/browse/FLINK-27788
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-0.1.0, kubernetes-operator-1.1.0
>Reporter: Jaganathan Asokan
>Assignee: Jaganathan Asokan
>Priority: Minor
>  Labels: pull-request-available
>
> Currently we lack the option to natively add annotations on flink operator 
> pods. Providing this feature directly on our existing helm chart, could be 
> useful. One potential use-case for allowing annotations on Pod is to enable 
> scrapping of opertor metrics by monitoring Infrastructure like Prometheus , 
> Datadog etc.  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Comment Edited] (FLINK-28060) Kafka Commit on checkpointing fails repeatedly after a broker restart

2022-06-21 Thread Mason Chen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557005#comment-17557005
 ] 

Mason Chen edited comment on FLINK-28060 at 6/21/22 4:34 PM:
-

[~renqs] [~martijnvisser] were we able to reproduce this issue in Flink CI/unit 
test? 

+1, I think we need to closely monitor KAFKA-13840, not sure if bumping to 
3.1.1 really fixed the issue.


was (Author: mason6345):
[~renqs] [~martijnvisser] were we able to reproduce this issue in Flink CI/unit 
test? +1, I think we need to closely monitor KAFKA-13840, not sure if bumping 
to 3.1.1 really fixed the issue.

> Kafka Commit on checkpointing fails repeatedly after a broker restart
> -
>
> Key: FLINK-28060
> URL: https://issues.apache.org/jira/browse/FLINK-28060
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Connectors / Kafka
>Affects Versions: 1.15.0
> Environment: Reproduced on MacOS and Linux.
> Using java 8, Flink 1.15.0, Kafka 2.8.1.
>Reporter: Christian Lorenz
>Priority: Major
>  Labels: pull-request-available
> Attachments: flink-kafka-testjob.zip
>
>
> When Kafka Offset committing is enabled and done on Flinks checkpointing, an 
> error might occur if one Kafka broker is shutdown which might be the leader 
> of that partition in Kafkas internal __consumer_offsets topic.
> This is an expected behaviour. But once the broker is started up again, the 
> next checkpoint issued by flink should commit the meanwhile processed offsets 
> back to kafka. Somehow this does not seem to happen always in Flink 1.15.0 
> anymore and the offset committing is broken. An warning like the following 
> will be logged on each checkpoint:
> {code}
> [info] 14:33:13.684 WARN  [Source Data Fetcher for Source: input-kafka-source 
> -> Sink: output-stdout-sink (1/1)#1] o.a.f.c.k.s.reader.KafkaSourceReader - 
> Failed to commit consumer offsets for checkpoint 35
> [info] org.apache.kafka.clients.consumer.RetriableCommitFailedException: 
> Offset commit failed with a retriable exception. You should retry committing 
> the latest consumed offsets.
> [info] Caused by: 
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
> coordinator is not available.
> {code}
> To reproduce this I've attached a small flink job program.  To execute this 
> java8, scala sbt and docker / docker-compose is required.  Also see readme.md 
> for more details.
> The job can be run with `sbt run`, kafka cluster is started by 
> `docker-compose up`. If then the kafka brokers are restarted gracefully by 
> e.g. `docker-compose stop kafka1` and `docker-compose start kafka1` with 
> kafka2 and kafka3 afterwards, this warning will occur and no offsets will be 
> committed into kafka.
> This is not reproducible in flink 1.14.4.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Comment Edited] (FLINK-28060) Kafka Commit on checkpointing fails repeatedly after a broker restart

2022-06-21 Thread Mason Chen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557005#comment-17557005
 ] 

Mason Chen edited comment on FLINK-28060 at 6/21/22 4:34 PM:
-

[~renqs] [~martijnvisser] were we able to reproduce this issue in Flink CI/unit 
test? +1, I think we need to closely monitor KAFKA-13840, not sure if bumping 
to 3.1.1 really fixed the issue.


was (Author: mason6345):
[~renqs] [~martijnvisser] were we able to reproduce this issue in Flink CI/unit 
test? I think we need to closely monitor 
[KAFKA-13840](https://issues.apache.org/jira/browse/KAFKA-13840), not sure if 
bumping to 3.1.1 really fixed the issue.

> Kafka Commit on checkpointing fails repeatedly after a broker restart
> -
>
> Key: FLINK-28060
> URL: https://issues.apache.org/jira/browse/FLINK-28060
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Connectors / Kafka
>Affects Versions: 1.15.0
> Environment: Reproduced on MacOS and Linux.
> Using java 8, Flink 1.15.0, Kafka 2.8.1.
>Reporter: Christian Lorenz
>Priority: Major
>  Labels: pull-request-available
> Attachments: flink-kafka-testjob.zip
>
>
> When Kafka Offset committing is enabled and done on Flinks checkpointing, an 
> error might occur if one Kafka broker is shutdown which might be the leader 
> of that partition in Kafkas internal __consumer_offsets topic.
> This is an expected behaviour. But once the broker is started up again, the 
> next checkpoint issued by flink should commit the meanwhile processed offsets 
> back to kafka. Somehow this does not seem to happen always in Flink 1.15.0 
> anymore and the offset committing is broken. An warning like the following 
> will be logged on each checkpoint:
> {code}
> [info] 14:33:13.684 WARN  [Source Data Fetcher for Source: input-kafka-source 
> -> Sink: output-stdout-sink (1/1)#1] o.a.f.c.k.s.reader.KafkaSourceReader - 
> Failed to commit consumer offsets for checkpoint 35
> [info] org.apache.kafka.clients.consumer.RetriableCommitFailedException: 
> Offset commit failed with a retriable exception. You should retry committing 
> the latest consumed offsets.
> [info] Caused by: 
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
> coordinator is not available.
> {code}
> To reproduce this I've attached a small flink job program.  To execute this 
> java8, scala sbt and docker / docker-compose is required.  Also see readme.md 
> for more details.
> The job can be run with `sbt run`, kafka cluster is started by 
> `docker-compose up`. If then the kafka brokers are restarted gracefully by 
> e.g. `docker-compose stop kafka1` and `docker-compose start kafka1` with 
> kafka2 and kafka3 afterwards, this warning will occur and no offsets will be 
> committed into kafka.
> This is not reproducible in flink 1.14.4.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-28060) Kafka Commit on checkpointing fails repeatedly after a broker restart

2022-06-21 Thread Mason Chen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557005#comment-17557005
 ] 

Mason Chen commented on FLINK-28060:


[~renqs] [~martijnvisser] were we able to reproduce this issue in Flink CI/unit 
test? I think we need to closely monitor 
[KAFKA-13840](https://issues.apache.org/jira/browse/KAFKA-13840), not sure if 
bumping to 3.1.1 really fixed the issue.

> Kafka Commit on checkpointing fails repeatedly after a broker restart
> -
>
> Key: FLINK-28060
> URL: https://issues.apache.org/jira/browse/FLINK-28060
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Connectors / Kafka
>Affects Versions: 1.15.0
> Environment: Reproduced on MacOS and Linux.
> Using java 8, Flink 1.15.0, Kafka 2.8.1.
>Reporter: Christian Lorenz
>Priority: Major
>  Labels: pull-request-available
> Attachments: flink-kafka-testjob.zip
>
>
> When Kafka Offset committing is enabled and done on Flinks checkpointing, an 
> error might occur if one Kafka broker is shutdown which might be the leader 
> of that partition in Kafkas internal __consumer_offsets topic.
> This is an expected behaviour. But once the broker is started up again, the 
> next checkpoint issued by flink should commit the meanwhile processed offsets 
> back to kafka. Somehow this does not seem to happen always in Flink 1.15.0 
> anymore and the offset committing is broken. An warning like the following 
> will be logged on each checkpoint:
> {code}
> [info] 14:33:13.684 WARN  [Source Data Fetcher for Source: input-kafka-source 
> -> Sink: output-stdout-sink (1/1)#1] o.a.f.c.k.s.reader.KafkaSourceReader - 
> Failed to commit consumer offsets for checkpoint 35
> [info] org.apache.kafka.clients.consumer.RetriableCommitFailedException: 
> Offset commit failed with a retriable exception. You should retry committing 
> the latest consumed offsets.
> [info] Caused by: 
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
> coordinator is not available.
> {code}
> To reproduce this I've attached a small flink job program.  To execute this 
> java8, scala sbt and docker / docker-compose is required.  Also see readme.md 
> for more details.
> The job can be run with `sbt run`, kafka cluster is started by 
> `docker-compose up`. If then the kafka brokers are restarted gracefully by 
> e.g. `docker-compose stop kafka1` and `docker-compose start kafka1` with 
> kafka2 and kafka3 afterwards, this warning will occur and no offsets will be 
> committed into kafka.
> This is not reproducible in flink 1.14.4.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[GitHub] [flink] rkhachatryan commented on a diff in pull request #19679: [FLINK-23143][state/changelog] Support state migration for ChangelogS…

2022-06-21 Thread GitBox



rkhachatryan commented on code in PR #19679:
URL: https://github.com/apache/flink/pull/19679#discussion_r902812486


##
flink-state-backends/flink-statebackend-changelog/src/main/java/org/apache/flink/state/changelog/restore/ChangelogRestoreTarget.java:
##
@@ -21,52 +21,143 @@
 import org.apache.flink.api.common.state.State;
 import org.apache.flink.api.common.state.StateDescriptor;
 import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.runtime.state.AbstractKeyedStateBackend;
 import org.apache.flink.runtime.state.CheckpointableKeyedStateBackend;
 import org.apache.flink.runtime.state.KeyGroupRange;
 import org.apache.flink.runtime.state.KeyGroupedInternalPriorityQueue;
 import org.apache.flink.runtime.state.Keyed;
-import org.apache.flink.runtime.state.KeyedStateBackend;
 import org.apache.flink.runtime.state.PriorityComparable;
-import org.apache.flink.runtime.state.PriorityQueueSetFactory;
 import org.apache.flink.runtime.state.heap.HeapPriorityQueueElement;
+import org.apache.flink.runtime.state.internal.InternalKvState;
+import org.apache.flink.runtime.state.metainfo.StateMetaInfoSnapshot;
 import 
org.apache.flink.runtime.state.metainfo.StateMetaInfoSnapshot.BackendStateType;
+import org.apache.flink.state.changelog.ChangelogKeyGroupedPriorityQueue;
 import org.apache.flink.state.changelog.ChangelogState;
+import org.apache.flink.state.changelog.ChangelogStateFactory;
+import org.apache.flink.state.changelog.KvStateChangeLogger;
+import org.apache.flink.state.changelog.StateChangeLogger;
 
 import javax.annotation.Nonnull;
 
 /** Maintains metadata operation related to Changelog recovery. */
 @Internal
-public interface ChangelogRestoreTarget {
+public abstract class ChangelogRestoreTarget {
+
+protected final AbstractKeyedStateBackend keyedStateBackend;
+
+protected final ChangelogStateFactory changelogStateFactory;
+
+protected final FunctionDelegationHelper functionDelegationHelper;
+
+public ChangelogRestoreTarget(
+AbstractKeyedStateBackend keyedStateBackend,
+ChangelogStateFactory changelogStateFactory) {
+this.keyedStateBackend = keyedStateBackend;
+this.changelogStateFactory = changelogStateFactory;
+this.functionDelegationHelper = new FunctionDelegationHelper();
+}
 
 /** Returns the key groups which this restore procedure covers. */
-KeyGroupRange getKeyGroupRange();
+public KeyGroupRange getKeyGroupRange() {
+return keyedStateBackend.getKeyGroupRange();
+}
+
+/**
+ * Returns the existing state created by {@link 
#createKeyedState(TypeSerializer,
+ * StateDescriptor)} or {@link #createPqState(String, TypeSerializer)} in 
the restore procedure.
+ */
+public ChangelogState getExistingState(
+String name, StateMetaInfoSnapshot.BackendStateType type) {
+return changelogStateFactory.getExistingState(name, type);
+}
 
 /**
  * Creates a keyed state which could be retrieved by {@link 
#getExistingState(String,
- * BackendStateType)} in the restore procedure. The interface comes from 
{@link
- * KeyedStateBackend#getOrCreateKeyedState(TypeSerializer, 
StateDescriptor)}.
+ * BackendStateType)} in the restore procedure.
  */
- S createKeyedState(
+@SuppressWarnings("unchecked")
+public  S createKeyedState(
 TypeSerializer namespaceSerializer, StateDescriptor 
stateDescriptor)
-throws Exception;
+throws Exception {
+ChangelogState existingState =
+changelogStateFactory.getExistingState(
+stateDescriptor.getName(),
+StateMetaInfoSnapshot.BackendStateType.KEY_VALUE);
+if (existingState == null
+|| !isCompleteCompatible(
+(InternalKvState) existingState,
+namespaceSerializer,
+stateDescriptor)) {
+S keyedState =
+keyedStateBackend.upgradeKeyedState(namespaceSerializer, 
stateDescriptor);
+functionDelegationHelper.addOrUpdate(stateDescriptor);
+final InternalKvState kvState = (InternalKvState) keyedState;
+return (S)
+changelogStateFactory.create(
+stateDescriptor,
+kvState,
+getKvStateChangeLogger(kvState, stateDescriptor),
+keyedStateBackend);
+}
+return (S) existingState;
+}
+
+private  boolean isCompleteCompatible(
+InternalKvState existingState,
+TypeSerializer namespaceSerializer,
+StateDescriptor stateDescriptor) {
+return isCompleteCompatible(existingState.getNamespaceSerializer(), 
namespaceSerializer)
+&& isCompleteCompatible(
+existingState.getValueSerializer(),

[GitHub] [flink] rkhachatryan commented on a diff in pull request #19679: [FLINK-23143][state/changelog] Support state migration for ChangelogS…

2022-06-21 Thread GitBox



rkhachatryan commented on code in PR #19679:
URL: https://github.com/apache/flink/pull/19679#discussion_r902812486


##
flink-state-backends/flink-statebackend-changelog/src/main/java/org/apache/flink/state/changelog/restore/ChangelogRestoreTarget.java:
##
@@ -21,52 +21,143 @@
 import org.apache.flink.api.common.state.State;
 import org.apache.flink.api.common.state.StateDescriptor;
 import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.runtime.state.AbstractKeyedStateBackend;
 import org.apache.flink.runtime.state.CheckpointableKeyedStateBackend;
 import org.apache.flink.runtime.state.KeyGroupRange;
 import org.apache.flink.runtime.state.KeyGroupedInternalPriorityQueue;
 import org.apache.flink.runtime.state.Keyed;
-import org.apache.flink.runtime.state.KeyedStateBackend;
 import org.apache.flink.runtime.state.PriorityComparable;
-import org.apache.flink.runtime.state.PriorityQueueSetFactory;
 import org.apache.flink.runtime.state.heap.HeapPriorityQueueElement;
+import org.apache.flink.runtime.state.internal.InternalKvState;
+import org.apache.flink.runtime.state.metainfo.StateMetaInfoSnapshot;
 import 
org.apache.flink.runtime.state.metainfo.StateMetaInfoSnapshot.BackendStateType;
+import org.apache.flink.state.changelog.ChangelogKeyGroupedPriorityQueue;
 import org.apache.flink.state.changelog.ChangelogState;
+import org.apache.flink.state.changelog.ChangelogStateFactory;
+import org.apache.flink.state.changelog.KvStateChangeLogger;
+import org.apache.flink.state.changelog.StateChangeLogger;
 
 import javax.annotation.Nonnull;
 
 /** Maintains metadata operation related to Changelog recovery. */
 @Internal
-public interface ChangelogRestoreTarget {
+public abstract class ChangelogRestoreTarget {
+
+protected final AbstractKeyedStateBackend keyedStateBackend;
+
+protected final ChangelogStateFactory changelogStateFactory;
+
+protected final FunctionDelegationHelper functionDelegationHelper;
+
+public ChangelogRestoreTarget(
+AbstractKeyedStateBackend keyedStateBackend,
+ChangelogStateFactory changelogStateFactory) {
+this.keyedStateBackend = keyedStateBackend;
+this.changelogStateFactory = changelogStateFactory;
+this.functionDelegationHelper = new FunctionDelegationHelper();
+}
 
 /** Returns the key groups which this restore procedure covers. */
-KeyGroupRange getKeyGroupRange();
+public KeyGroupRange getKeyGroupRange() {
+return keyedStateBackend.getKeyGroupRange();
+}
+
+/**
+ * Returns the existing state created by {@link 
#createKeyedState(TypeSerializer,
+ * StateDescriptor)} or {@link #createPqState(String, TypeSerializer)} in 
the restore procedure.
+ */
+public ChangelogState getExistingState(
+String name, StateMetaInfoSnapshot.BackendStateType type) {
+return changelogStateFactory.getExistingState(name, type);
+}
 
 /**
  * Creates a keyed state which could be retrieved by {@link 
#getExistingState(String,
- * BackendStateType)} in the restore procedure. The interface comes from 
{@link
- * KeyedStateBackend#getOrCreateKeyedState(TypeSerializer, 
StateDescriptor)}.
+ * BackendStateType)} in the restore procedure.
  */
- S createKeyedState(
+@SuppressWarnings("unchecked")
+public  S createKeyedState(
 TypeSerializer namespaceSerializer, StateDescriptor 
stateDescriptor)
-throws Exception;
+throws Exception {
+ChangelogState existingState =
+changelogStateFactory.getExistingState(
+stateDescriptor.getName(),
+StateMetaInfoSnapshot.BackendStateType.KEY_VALUE);
+if (existingState == null
+|| !isCompleteCompatible(
+(InternalKvState) existingState,
+namespaceSerializer,
+stateDescriptor)) {
+S keyedState =
+keyedStateBackend.upgradeKeyedState(namespaceSerializer, 
stateDescriptor);
+functionDelegationHelper.addOrUpdate(stateDescriptor);
+final InternalKvState kvState = (InternalKvState) keyedState;
+return (S)
+changelogStateFactory.create(
+stateDescriptor,
+kvState,
+getKvStateChangeLogger(kvState, stateDescriptor),
+keyedStateBackend);
+}
+return (S) existingState;
+}
+
+private  boolean isCompleteCompatible(
+InternalKvState existingState,
+TypeSerializer namespaceSerializer,
+StateDescriptor stateDescriptor) {
+return isCompleteCompatible(existingState.getNamespaceSerializer(), 
namespaceSerializer)
+&& isCompleteCompatible(
+existingState.getValueSerializer(),

[GitHub] [flink] rkhachatryan commented on a diff in pull request #19679: [FLINK-23143][state/changelog] Support state migration for ChangelogS…

2022-06-21 Thread GitBox



rkhachatryan commented on code in PR #19679:
URL: https://github.com/apache/flink/pull/19679#discussion_r902807955


##
flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java:
##
@@ -477,6 +478,50 @@ KeyGroupedInternalPriorityQueue create(
 }
 }
 
+@Override
+public  S upgradeKeyedState(
+TypeSerializer namespaceSerializer, StateDescriptor 
stateDescriptor)
+throws Exception {
+StateFactory stateFactory = getStateFactory(stateDescriptor);
+Tuple2> registerResult =
+tryRegisterKvStateInformation(stateDescriptor, 
namespaceSerializer, noTransform());
+
Preconditions.checkState(kvStateInformation.containsKey(stateDescriptor.getName()));
+kvStateInformation.computeIfPresent(
+stateDescriptor.getName(),
+(stateName, kvStateInfo) ->
+new RocksDbKvStateInfo(
+kvStateInfo.columnFamilyHandle,
+new RegisteredKeyValueStateBackendMetaInfo<>(
+kvStateInfo.metaInfo.snapshot(;
+return stateFactory.createState(
+stateDescriptor, registerResult, 
RocksDBKeyedStateBackend.this);

Review Comment:
   I see the following options:
   1. Create new objects (as it is done currently in the PR)
   2. Facilitate upgrade by state objects (e.g. `RocksDBValueState`) - i.e. add 
methods to update the serializer (and make the field mutable)
   3. Use getter (`RegisteredKeyValueStateBackendMetaInfo.getStateSerializer`) 
in state objects (e.g. ` RocksDBValueState`) instead of injecting it
   4. Wrap serializer to delegate all accesses to it and allow updating the 
actual serializer "externally" (similar to what was initially described in the 
ticket)
   
   2nd option looks better to me because it is explicit, more simple than 
others and doesn't incur any overhead.
   WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-28187) Duplicate job submission for FlinkSessionJob

2022-06-21 Thread Gyula Fora (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora updated FLINK-28187:
---
Priority: Critical  (was: Major)

> Duplicate job submission for FlinkSessionJob
> 
>
> Key: FLINK-28187
> URL: https://issues.apache.org/jira/browse/FLINK-28187
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.0.0
>Reporter: Jeesmon Jacob
>Priority: Critical
> Attachments: flink-operator-log.txt
>
>
> During a session job submission if a deployment error (ex: 
> concurrent.TimeoutException) is hit, operator will submit the job again. But 
> first submission could have succeeded in jobManager side and second 
> submission could result in duplicate job. Operator log attached.
> Per [~gyfora]:
> The problem is that in case a deployment error was hit, the 
> SessionJobObserver will not be able to tell whether it has submitted the job 
> or not. So it will simply try to submit it again. We have to find a mechanism 
> to correlate Jobs on the cluster with the SessionJob CR itself. Maybe we 
> could override the job name itself for this purpose or something like that.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Comment Edited] (FLINK-26762) Add the overdraft buffer in BufferPool to reduce unaligned checkpoint being blocked

2022-06-21 Thread Piotr Nowojski (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556873#comment-17556873
 ] 

Piotr Nowojski edited comment on FLINK-26762 at 6/21/22 3:48 PM:
-

Feature merged commit 0b60ee8 into apache:master
documentations merged as a17d824e179 into apache:master


was (Author: pnowojski):
Feature merged commit 0b60ee8 into apache:master

> Add the overdraft buffer in BufferPool to reduce unaligned checkpoint being 
> blocked
> ---
>
> Key: FLINK-26762
> URL: https://issues.apache.org/jira/browse/FLINK-26762
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Checkpointing, Runtime / Network
>Affects Versions: 1.13.0, 1.14.0, 1.15.0
>Reporter: fanrui
>Assignee: fanrui
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
> Attachments: image-2022-04-18-11-45-14-700.png, 
> image-2022-04-18-11-46-03-895.png
>
>
> In some past JIRAs of Unaligned Checkpoint, the community has added the  
> recordWriter.isAvaliable() to reduce block for single record write. But for 
> large record, flatmap or broadcast watermark, they may need more buffer.
> Can we add the overdraft buffer in BufferPool to reduce unaligned checkpoint 
> being blocked? 
> h2. Overdraft Buffer mechanism
> Add the configuration of 
> 'taskmanager.network.memory.overdraft-buffers-per-gate=5'. 
> When requestMemory is called and the bufferPool is insufficient, the 
> bufferPool will allow the Task to overdraw up to 5 MemorySegments. And 
> bufferPool will be unavailable until all overdrawn buffers are consumed by 
> downstream tasks. Then the task will wait for bufferPool being available.
> From the above, we have the following benefits:
>  * For scenarios that require multiple buffers, the Task releases the 
> Checkpoint lock, so the Unaligned Checkpoint can be completed quickly.
>  * We can control the memory usage to prevent memory leak.
>  * It just needs a litter memory, and can improve the stability of the Task 
> under back pressure.
>  * Users can increase the overdraft-buffers to adapt the scenarios that 
> require more buffers.
>  
> Masters, please correct me if I'm wrong, thanks a lot.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[GitHub] [flink] pnowojski commented on pull request #20038: [FLINK-26762][docs] Document overdraft buffers

2022-06-21 Thread GitBox



pnowojski commented on PR #20038:
URL: https://github.com/apache/flink/pull/20038#issuecomment-1161940289

   Ehhh... I've forgotten to squash the commits before merging :/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] pnowojski merged pull request #20038: [FLINK-26762][docs] Document overdraft buffers

2022-06-21 Thread GitBox



pnowojski merged PR #20038:
URL: https://github.com/apache/flink/pull/20038


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] pnowojski commented on pull request #20038: [FLINK-26762][docs] Document overdraft buffers

2022-06-21 Thread GitBox



pnowojski commented on PR #20038:
URL: https://github.com/apache/flink/pull/20038#issuecomment-1161936600

   Thanks for the review and suggestions :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] ChengkaiYang2022 commented on pull request #20017: [FLINK-27290][docs-zh]Translate the "Data Type" section of "Data Types" in to Chinese.

2022-06-21 Thread GitBox



ChengkaiYang2022 commented on PR #20017:
URL: https://github.com/apache/flink/pull/20017#issuecomment-1161935774

   Anyway, I will correct the git message by adding [docs-zh] when I rebase.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-28122) Translate "Overview " and "Project Configuration" in "User-defined Sources & Sinks" page

2022-06-21 Thread Chengkai Yang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556978#comment-17556978
 ] 

Chengkai Yang edited comment on FLINK-28122 at 6/21/22 3:38 PM:


Sorry, [~chenzihao] I don't have the authority to assign it to you. Maybe you 
can ask someone who has the authority to that.

But I'm willing to help review it after you pull the request on Github.


was (Author: JIRAUSER282569):
Sorry, [~chenzihao] I don't have the authority to assign you. Maybe you can ask 
someone who has the authority to assign this to you.

But I'm willing to help review it after you pull the request on Github.

> Translate "Overview " and "Project Configuration" in "User-defined Sources & 
> Sinks" page 
> -
>
> Key: FLINK-28122
> URL: https://issues.apache.org/jira/browse/FLINK-28122
> Project: Flink
>  Issue Type: Improvement
>  Components: chinese-translation, Documentation
>Reporter: Chengkai Yang
>Priority: Minor
>
> The links are
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sourcessinks/#overview
> and 
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sourcessinks/#project-configuration



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Comment Edited] (FLINK-28122) Translate "Overview " and "Project Configuration" in "User-defined Sources & Sinks" page

2022-06-21 Thread Chengkai Yang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556978#comment-17556978
 ] 

Chengkai Yang edited comment on FLINK-28122 at 6/21/22 3:38 PM:


Sorry, [~chenzihao] I don't have the authority to assign it to you. Maybe you 
can ask someone who has the authority to do that.

But I'm willing to help review it after you pull the request on Github.


was (Author: JIRAUSER282569):
Sorry, [~chenzihao] I don't have the authority to assign it to you. Maybe you 
can ask someone who has the authority to that.

But I'm willing to help review it after you pull the request on Github.

> Translate "Overview " and "Project Configuration" in "User-defined Sources & 
> Sinks" page 
> -
>
> Key: FLINK-28122
> URL: https://issues.apache.org/jira/browse/FLINK-28122
> Project: Flink
>  Issue Type: Improvement
>  Components: chinese-translation, Documentation
>Reporter: Chengkai Yang
>Priority: Minor
>
> The links are
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sourcessinks/#overview
> and 
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sourcessinks/#project-configuration



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-28122) Translate "Overview " and "Project Configuration" in "User-defined Sources & Sinks" page

2022-06-21 Thread Chengkai Yang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556978#comment-17556978
 ] 

Chengkai Yang commented on FLINK-28122:
---

Sorry, [~chenzihao] I don't have the authority to assign you. Maybe you can ask 
someone who has the authority to assign this to you.

But I'm willing to help review it after you pull the request on Github.

> Translate "Overview " and "Project Configuration" in "User-defined Sources & 
> Sinks" page 
> -
>
> Key: FLINK-28122
> URL: https://issues.apache.org/jira/browse/FLINK-28122
> Project: Flink
>  Issue Type: Improvement
>  Components: chinese-translation, Documentation
>Reporter: Chengkai Yang
>Priority: Minor
>
> The links are
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sourcessinks/#overview
> and 
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sourcessinks/#project-configuration



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-28187) Duplicate job submission for FlinkSessionJob

2022-06-21 Thread Gyula Fora (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556975#comment-17556975
 ] 

Gyula Fora commented on FLINK-28187:


The proper mechanism for this seems to be implemented already in 
https://issues.apache.org/jira/browse/FLINK-11544

We can set a custom fixed jobId for the deployment itself. We should use a 
combination of resource name, and generation similar to 
[https://github.com/apache/flink-kubernetes-operator/commit/ab59d6eb980512775590d0d01e697fe0c28d1b3b]

This way the observer can robustly detect already submitted jobs.

> Duplicate job submission for FlinkSessionJob
> 
>
> Key: FLINK-28187
> URL: https://issues.apache.org/jira/browse/FLINK-28187
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.0.0
>Reporter: Jeesmon Jacob
>Priority: Major
> Attachments: flink-operator-log.txt
>
>
> During a session job submission if a deployment error (ex: 
> concurrent.TimeoutException) is hit, operator will submit the job again. But 
> first submission could have succeeded in jobManager side and second 
> submission could result in duplicate job. Operator log attached.
> Per [~gyfora]:
> The problem is that in case a deployment error was hit, the 
> SessionJobObserver will not be able to tell whether it has submitted the job 
> or not. So it will simply try to submit it again. We have to find a mechanism 
> to correlate Jobs on the cluster with the SessionJob CR itself. Maybe we 
> could override the job name itself for this purpose or something like that.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Closed] (FLINK-27788) Adding annotation to k8 operator Pod

2022-06-21 Thread Jira



 [ 
https://issues.apache.org/jira/browse/FLINK-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Márton Balassi closed FLINK-27788.
--
Resolution: Fixed

91753ec in main

> Adding annotation to k8 operator Pod
> 
>
> Key: FLINK-27788
> URL: https://issues.apache.org/jira/browse/FLINK-27788
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-0.1.0, kubernetes-operator-1.1.0
>Reporter: Jaganathan Asokan
>Assignee: Jaganathan Asokan
>Priority: Minor
>  Labels: pull-request-available
>
> Currently we lack the option to natively add annotations on flink operator 
> pods. Providing this feature directly on our existing helm chart, could be 
> useful. One potential use-case for allowing annotations on Pod is to enable 
> scrapping of opertor metrics by monitoring Infrastructure like Prometheus , 
> Datadog etc.  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[GitHub] [flink-kubernetes-operator] mbalassi merged pull request #246: [FLINK-27788] Adding annotation to k8 operator Pod

2022-06-21 Thread GitBox



mbalassi merged PR #246:
URL: https://github.com/apache/flink-kubernetes-operator/pull/246


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] 1996fanrui commented on a diff in pull request #20038: [FLINK-26762][docs] Document overdraft buffers

2022-06-21 Thread GitBox



1996fanrui commented on code in PR #20038:
URL: https://github.com/apache/flink/pull/20038#discussion_r902756197


##
docs/content/docs/deployment/memory/network_mem_tuning.md:
##
@@ -120,6 +120,19 @@ In order to avoid excessive data skew, the number of 
buffers for each subpartiti
 
 Unlike the input buffer pool, the configured amount of exclusive buffers and 
floating buffers is only treated as recommended values. If there are not enough 
buffers available, Flink can make progress with only a single exclusive buffer 
per output subpartition and zero floating buffers.
 
+ Overdraft buffers
+
+For each output subtask can also request up to 
`taskmanager.network.memory.max-overdraft-buffers-per-gate` (by default 5) 
extra overdraft buffers.
+Those buffers are only used, if despite presence of a backpressure, Flink can 
not stop producing more records to the output.

Review Comment:
   LGTM



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-28173) Multiple Parquet format tests are failing with NoSuchMethodError

2022-06-21 Thread jia liu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556965#comment-17556965
 ] 

jia liu edited comment on FLINK-28173 at 6/21/22 3:18 PM:
--

[~chesnay] Thanks for your information. 


was (Author: sonice_lj):
[~chesnay] Thanks for you information. 

> Multiple Parquet format tests are failing with NoSuchMethodError
> 
>
> Key: FLINK-28173
> URL: https://issues.apache.org/jira/browse/FLINK-28173
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.16.0
>Reporter: Martijn Visser
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> Jun 21 02:44:38 java.lang.NoSuchMethodError: 
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.readFields(Configuration.java:3798)
> Jun 21 02:44:38   at 
> org.apache.flink.formats.parquet.utils.SerializableConfiguration.readObject(SerializableConfiguration.java:50)
> Jun 21 02:44:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jun 21 02:44:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> {code}
> {code:java}
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testProject
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testReachEnd
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateGenericReader:161->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateReflectReader:133->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateSpecificReader:118->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReadWithRestoreGenericReader:203->restoreReader:293
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReflectReadFromGenericRecords:147->createReader:269
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   ParquetRowDataWriterTest.testCompression:126 » 
> NoSuchMethod com.google.common
> Jun 21 02:44:42 [ERROR]   
> ParquetRowDataWriterTest.testTypes:117->innerTest:168 » NoSuchMethod 
> com.googl...
> Jun 21 02:44:42 [ERROR]   SerializableConfigurationTest.testResource:45 » 
> NoSuchMethod com.google.common...
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] Tests run: 31, Failures: 0, Errors: 24, Skipped: 0
> Jun 21 02:44:42 [INFO] 
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=36979=logs=7e3d33c3-a462-5ea8-98b8-27e1aafe4ceb=ef77f8d1-44c8-5ee2-f175-1c88f61de8c0=16375



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-28187) Duplicate job submission for FlinkSessionJob

2022-06-21 Thread Gyula Fora (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556970#comment-17556970
 ] 

Gyula Fora commented on FLINK-28187:


cc [~aitozi] 

> Duplicate job submission for FlinkSessionJob
> 
>
> Key: FLINK-28187
> URL: https://issues.apache.org/jira/browse/FLINK-28187
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.0.0
>Reporter: Jeesmon Jacob
>Priority: Major
> Attachments: flink-operator-log.txt
>
>
> During a session job submission if a deployment error (ex: 
> concurrent.TimeoutException) is hit, operator will submit the job again. But 
> first submission could have succeeded in jobManager side and second 
> submission could result in duplicate job. Operator log attached.
> Per [~gyfora]:
> The problem is that in case a deployment error was hit, the 
> SessionJobObserver will not be able to tell whether it has submitted the job 
> or not. So it will simply try to submit it again. We have to find a mechanism 
> to correlate Jobs on the cluster with the SessionJob CR itself. Maybe we 
> could override the job name itself for this purpose or something like that.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[GitHub] [flink] wuchong commented on pull request #20003: [FLINK-28080][runtime] Introduce MutableURLClassLoader as parent class of FlinkUserClassLoader and SafetyNetWrapperClassLoader

2022-06-21 Thread GitBox



wuchong commented on PR #20003:
URL: https://github.com/apache/flink/pull/20003#issuecomment-1161906232

   > That depends on how the class-loading order is set up and how you actually 
use it.
   If you load everything parent-first within the added sub-tree this problem 
will not occur.
   
   If we force to use parent-first mode, then the classloader behavior is 
inconsistent between local job compiling and distributed. Say the user wants to 
use `child-first` for distributed execution to resolve class conflict between 
user jar and flink core jar. However, the job can't be compiled because the 
client forces to use parent-first and ignores users' 
`classloader.resolve-order` configuration and causes NoSuchMethod exceptions.
   
   > If we start removing URLs however this very much changes.
   Yes. But we don't and won't support removing URLs/JARs. 
   
   > Can you clarify on whether the jars are accessed in between addUrl calls?
   Would it be technically feasible to first determine all the required jars 
before creating the first user CL?
   
   Ah, let me clarify the background of this pull request. The motivation is we 
would like to support `ADD JAR` and `CREATE FUNCTION ... USING JAR` 
([FLIP-214](https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL))
 statements in the table ecosystem, especially in SQL CLI and SQL Gateway. I 
will explain a use case of SQL Gateway + ADD JAR. A SQL Gateway (FLIP-91) is a 
long-running service that many users can connect to it via REST/JDBC/Beeline... 
Each user has a separate environment (e.g. classloader) and can submit SQL 
statements interactively. For example: 
   
   ```sql
   -- start a SQL CLI and connects to the SQL Gateway which is serving at 
10.10.11.2:8083 
   bin/sql-client.sh --endpoint 10.10.11.2:8083 
   
   -- A new session is opened for the current user, and a clean classloader is 
prepared for the user
   
   -- query is executed without additional user jars
   Flink SQL> SELECT * FROM T;
   
   -- the user jar is added to the user classloader of the current session 
   Flink SQL> ADD JAR '/path/to/aaa.jar';
   
   -- register a user-defined function (UDF) that is loaded from the previously 
added jaar
   Flink SQL> CREATE TEMPORARY FUNCTION lower AS 'org.apache.flink.udf.Lower';
   
   -- query is executed with the added jar and the UDF in the jar.  
   Flink SQL> SELECT id, lower(name) FROM T;
   
   -- SQL Gateway downloads the jar from HDFS to local disk, and add the jar to 
the user classloader of the current session.
   -- So the user classloader should contain both aaa.jar and bbb.jar
   -- And register the UDF that is loaded from bbb.jar
   Flink SQL> CREATE TEMPORARY FUNCTION upper AS 'me.wuchong.Upper' USING JAR 
'hdfs:///path/to/bbb.jar';
   
   -- query is executed with the added jars and the UDFs in the jar.  
   Flink SQL> SELECT id, lower(name), upper(name) FROM T;
   
   -- session is closed, and the user classloader for this session is released 
in SQL Gateway 
   Flink SQL>exist;
   ```
   
   So, yes, the jars are accessed in between addUrl calls and we can't 
determine all the required jars before creating the first user CL. Because this 
is an interactive process, the jars are added dynamically, and we don't know 
what jars will be added at what time point. 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] 1996fanrui commented on pull request #20038: [FLINK-26762][docs] Document overdraft buffers

2022-06-21 Thread GitBox



1996fanrui commented on PR #20038:
URL: https://github.com/apache/flink/pull/20038#issuecomment-1161905389

   LGTM, thanks for you contribution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-28187) Duplicate job submission for FlinkSessionJob

2022-06-21 Thread Jeesmon Jacob (Jira)

Jeesmon Jacob created FLINK-28187:
-

 Summary: Duplicate job submission for FlinkSessionJob
 Key: FLINK-28187
 URL: https://issues.apache.org/jira/browse/FLINK-28187
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.0.0
Reporter: Jeesmon Jacob
 Attachments: flink-operator-log.txt

During a session job submission if a deployment error (ex: 
concurrent.TimeoutException) is hit, operator will submit the job again. But 
first submission could have succeeded in jobManager side and second submission 
could result in duplicate job. Operator log attached.

Per [~gyfora]:

The problem is that in case a deployment error was hit, the SessionJobObserver 
will not be able to tell whether it has submitted the job or not. So it will 
simply try to submit it again. We have to find a mechanism to correlate Jobs on 
the cluster with the SessionJob CR itself. Maybe we could override the job name 
itself for this purpose or something like that.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Comment Edited] (FLINK-28173) Multiple Parquet format tests are failing with NoSuchMethodError

2022-06-21 Thread jia liu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556965#comment-17556965
 ] 

jia liu edited comment on FLINK-28173 at 6/21/22 3:13 PM:
--

[~chesnay] Thanks for you information. 


was (Author: sonice_lj):
[~chesnay] Thanks for you information. 
{code:java}
-Dinclude_hadoop_aws -Dhadoop.version=3.1.3 -Phadoop3-tests,hive3{code}
Are these maven flags enghou for my local testing?

> Multiple Parquet format tests are failing with NoSuchMethodError
> 
>
> Key: FLINK-28173
> URL: https://issues.apache.org/jira/browse/FLINK-28173
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.16.0
>Reporter: Martijn Visser
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> Jun 21 02:44:38 java.lang.NoSuchMethodError: 
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.readFields(Configuration.java:3798)
> Jun 21 02:44:38   at 
> org.apache.flink.formats.parquet.utils.SerializableConfiguration.readObject(SerializableConfiguration.java:50)
> Jun 21 02:44:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jun 21 02:44:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> {code}
> {code:java}
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testProject
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testReachEnd
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateGenericReader:161->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateReflectReader:133->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateSpecificReader:118->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReadWithRestoreGenericReader:203->restoreReader:293
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReflectReadFromGenericRecords:147->createReader:269
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   ParquetRowDataWriterTest.testCompression:126 » 
> NoSuchMethod com.google.common
> Jun 21 02:44:42 [ERROR]   
> ParquetRowDataWriterTest.testTypes:117->innerTest:168 » NoSuchMethod 
> com.googl...
> Jun 21 02:44:42 [ERROR]   SerializableConfigurationTest.testResource:45 » 
> NoSuchMethod com.google.common...
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] Tests run: 31, Failures: 0, Errors: 24, Skipped: 0
> Jun 21 02:44:42 [INFO] 
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=36979=logs=7e3d33c3-a462-5ea8-98b8-27e1aafe4ceb=ef77f8d1-44c8-5ee2-f175-1c88f61de8c0=16375



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[GitHub] [flink] pnowojski commented on a diff in pull request #20038: [FLINK-26762][docs] Document overdraft buffers

2022-06-21 Thread GitBox



pnowojski commented on code in PR #20038:
URL: https://github.com/apache/flink/pull/20038#discussion_r902748533


##
docs/content/docs/deployment/memory/network_mem_tuning.md:
##
@@ -120,6 +120,19 @@ In order to avoid excessive data skew, the number of 
buffers for each subpartiti
 
 Unlike the input buffer pool, the configured amount of exclusive buffers and 
floating buffers is only treated as recommended values. If there are not enough 
buffers available, Flink can make progress with only a single exclusive buffer 
per output subpartition and zero floating buffers.
 
+ Overdraft buffers
+
+For each output subtask can also request up to 
`taskmanager.network.memory.max-overdraft-buffers-per-gate` (by default 5) 
extra overdraft buffers.
+Those buffers are only used, if despite presence of a backpressure, Flink can 
not stop producing more records to the output.

Review Comment:
   What about:
   > Those buffers are only used, if the subtask is backpressured
   > by downstream subtasks and the subtask requires more than a single network 
buffer to finish what its
   > currently doing. This can happen in situations like:
   
   ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] MartijnVisser commented on pull request #20012: [FLINK-27667][yarn-tests] Stabilize flaky YARN integration tests

2022-06-21 Thread GitBox



MartijnVisser commented on PR #20012:
URL: https://github.com/apache/flink/pull/20012#issuecomment-1161884244

   If they look good we should merge it imho. Getting stable tests makes it 
worthwhile 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-28173) Multiple Parquet format tests are failing with NoSuchMethodError

2022-06-21 Thread jia liu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556965#comment-17556965
 ] 

jia liu commented on FLINK-28173:
-

[~chesnay] Thanks for you information. 
{code:java}
-Dinclude_hadoop_aws -Dhadoop.version=3.1.3 -Phadoop3-tests,hive3{code}
Are these maven flags enghou for my local testing?

> Multiple Parquet format tests are failing with NoSuchMethodError
> 
>
> Key: FLINK-28173
> URL: https://issues.apache.org/jira/browse/FLINK-28173
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.16.0
>Reporter: Martijn Visser
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> Jun 21 02:44:38 java.lang.NoSuchMethodError: 
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.readFields(Configuration.java:3798)
> Jun 21 02:44:38   at 
> org.apache.flink.formats.parquet.utils.SerializableConfiguration.readObject(SerializableConfiguration.java:50)
> Jun 21 02:44:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jun 21 02:44:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> {code}
> {code:java}
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testProject
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testReachEnd
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateGenericReader:161->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateReflectReader:133->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateSpecificReader:118->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReadWithRestoreGenericReader:203->restoreReader:293
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReflectReadFromGenericRecords:147->createReader:269
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   ParquetRowDataWriterTest.testCompression:126 » 
> NoSuchMethod com.google.common
> Jun 21 02:44:42 [ERROR]   
> ParquetRowDataWriterTest.testTypes:117->innerTest:168 » NoSuchMethod 
> com.googl...
> Jun 21 02:44:42 [ERROR]   SerializableConfigurationTest.testResource:45 » 
> NoSuchMethod com.google.common...
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] Tests run: 31, Failures: 0, Errors: 24, Skipped: 0
> Jun 21 02:44:42 [INFO] 
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=36979=logs=7e3d33c3-a462-5ea8-98b8-27e1aafe4ceb=ef77f8d1-44c8-5ee2-f175-1c88f61de8c0=16375



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-28078) ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers runs into timeout

2022-06-21 Thread Matthias Pohl (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556962#comment-17556962
 ] 

Matthias Pohl commented on FLINK-28078:
---

It's not clear to me, yet, why we're not seeing the second ElectionDriver 
taking over the leadership. I'd expect to get through the second iteration as 
well because the leadership should be obtainable for the second ElectionDriver 
as well. 

> ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers
>  runs into timeout
> --
>
> Key: FLINK-28078
> URL: https://issues.apache.org/jira/browse/FLINK-28078
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.16.0
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: test-stability
>
> [Build 
> #36189|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=36189=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=10455]
>  got stuck in 
> {{ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers}}
> {code}
> "ForkJoinPool-45-worker-25" #525 daemon prio=5 os_prio=0 
> tid=0x7fc74d9e3800 nid=0x62c8 waiting on condition [0x7fc6ff2f2000]
> May 30 16:36:10java.lang.Thread.State: WAITING (parking)
> May 30 16:36:10   at sun.misc.Unsafe.park(Native Method)
> May 30 16:36:10   - parking to wait for  <0xc2571b80> (a 
> java.util.concurrent.CompletableFuture$Signaller)
> May 30 16:36:10   at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> May 30 16:36:10   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> May 30 16:36:10   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313)
> May 30 16:36:10   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> May 30 16:36:10   at 
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
> May 30 16:36:10   at 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers(ZooKeeperMultipleComponentLeaderElectionDriverTest.java:256)
> May 30 16:36:10   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> May 30 16:36:10   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> May 30 16:36:10   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 30 16:36:10   at java.lang.reflect.Method.invoke(Method.java:498)
> [...]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-26621) flink-tests failed on azure due to Error occurred in starting fork

2022-06-21 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556960#comment-17556960
 ] 

Chesnay Schepler commented on FLINK-26621:
--

I have a theory that is related to some RocksDB checkpoint ITCase but wasn't 
able to confirm it yet.

> flink-tests failed on azure due to Error occurred in starting fork
> --
>
> Key: FLINK-26621
> URL: https://issues.apache.org/jira/browse/FLINK-26621
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines
>Affects Versions: 1.15.0, 1.14.4, 1.16.0
>Reporter: Yun Gao
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> 2022-03-11T16:20:12.6929558Z Mar 11 16:20:12 [WARNING] The requested profile 
> "skip-webui-build" could not be activated because it does not exist.
> 2022-03-11T16:20:12.6939269Z Mar 11 16:20:12 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test 
> (integration-tests) on project flink-tests: There are test failures.
> 2022-03-11T16:20:12.6940062Z Mar 11 16:20:12 [ERROR] 
> 2022-03-11T16:20:12.6940954Z Mar 11 16:20:12 [ERROR] Please refer to 
> /__w/2/s/flink-tests/target/surefire-reports for the individual test results.
> 2022-03-11T16:20:12.6941875Z Mar 11 16:20:12 [ERROR] Please refer to dump 
> files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
> 2022-03-11T16:20:12.6942966Z Mar 11 16:20:12 [ERROR] ExecutionException Error 
> occurred in starting fork, check output in log
> 2022-03-11T16:20:12.6943919Z Mar 11 16:20:12 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException Error occurred in starting fork, check output in log
> 2022-03-11T16:20:12.6945023Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:532)
> 2022-03-11T16:20:12.6945878Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:479)
> 2022-03-11T16:20:12.6946761Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:322)
> 2022-03-11T16:20:12.6947532Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:266)
> 2022-03-11T16:20:12.6953051Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1314)
> 2022-03-11T16:20:12.6954035Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1159)
> 2022-03-11T16:20:12.6954917Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:932)
> 2022-03-11T16:20:12.6955749Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
> 2022-03-11T16:20:12.6956542Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
> 2022-03-11T16:20:12.6957456Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
> 2022-03-11T16:20:12.6958232Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
> 2022-03-11T16:20:12.6959038Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
> 2022-03-11T16:20:12.6960553Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
> 2022-03-11T16:20:12.6962116Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
> 2022-03-11T16:20:12.6963009Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
> 2022-03-11T16:20:12.6963737Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
> 2022-03-11T16:20:12.6964644Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
> 2022-03-11T16:20:12.6965647Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
> 2022-03-11T16:20:12.6966732Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
> 2022-03-11T16:20:12.6967818Z Mar 11 16:20:12 [ERROR] at 
> org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
> 2022-03-11T16:20:12.6968857Z Mar 11 16:20:12 [ERROR] at 
>

[GitHub] [flink] ChengkaiYang2022 commented on pull request #19964: [FLINK-27291][docs-zh] Translate the "List of Data Types" section of…

2022-06-21 Thread GitBox



ChengkaiYang2022 commented on PR #19964:
URL: https://github.com/apache/flink/pull/19964#issuecomment-1161860720

   > @ChengkaiYang2022 Thanks for correcting. I will fix them ; There is one 
part I am confusing about : using (both inclusive) instead of using '[' and ']' 
。 I think using `[` and`] is the same way as （both inclusive） while this way 
can use fewer characters 。 Is it this way that creates ambiguity ?
   
   Using '[' and ']' is a way that use fewer charaters, while I think translate 
'（both inclusive）' in to Chinese could be more specific since most Chinese 
readers are more familiar with the TEXT '开闭区间’ instead of the CHARATERS '[' or 
']'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-28173) Multiple Parquet format tests are failing with NoSuchMethodError

2022-06-21 Thread Chesnay Schepler (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556956#comment-17556956
 ] 

Chesnay Schepler commented on FLINK-28173:
--

[~sonice_lj] This only fails on the cron builds where we use Hadoop 3.

> Multiple Parquet format tests are failing with NoSuchMethodError
> 
>
> Key: FLINK-28173
> URL: https://issues.apache.org/jira/browse/FLINK-28173
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.16.0
>Reporter: Martijn Visser
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> Jun 21 02:44:38 java.lang.NoSuchMethodError: 
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.readFields(Configuration.java:3798)
> Jun 21 02:44:38   at 
> org.apache.flink.formats.parquet.utils.SerializableConfiguration.readObject(SerializableConfiguration.java:50)
> Jun 21 02:44:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jun 21 02:44:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> {code}
> {code:java}
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testProject
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testReachEnd
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateGenericReader:161->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateReflectReader:133->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateSpecificReader:118->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReadWithRestoreGenericReader:203->restoreReader:293
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReflectReadFromGenericRecords:147->createReader:269
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   ParquetRowDataWriterTest.testCompression:126 » 
> NoSuchMethod com.google.common
> Jun 21 02:44:42 [ERROR]   
> ParquetRowDataWriterTest.testTypes:117->innerTest:168 » NoSuchMethod 
> com.googl...
> Jun 21 02:44:42 [ERROR]   SerializableConfigurationTest.testResource:45 » 
> NoSuchMethod com.google.common...
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] Tests run: 31, Failures: 0, Errors: 24, Skipped: 0
> Jun 21 02:44:42 [INFO] 
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=36979=logs=7e3d33c3-a462-5ea8-98b8-27e1aafe4ceb=ef77f8d1-44c8-5ee2-f175-1c88f61de8c0=16375



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-28173) Multiple Parquet format tests are failing with NoSuchMethodError

2022-06-21 Thread jia liu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556951#comment-17556951
 ] 

jia liu commented on FLINK-28173:
-

[~martijnvisser] If this commit is blocking other PRs, please revert it. I'll 
setup a brand new local environment to test my commit at tomorrow.

> Multiple Parquet format tests are failing with NoSuchMethodError
> 
>
> Key: FLINK-28173
> URL: https://issues.apache.org/jira/browse/FLINK-28173
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.16.0
>Reporter: Martijn Visser
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> Jun 21 02:44:38 java.lang.NoSuchMethodError: 
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.readFields(Configuration.java:3798)
> Jun 21 02:44:38   at 
> org.apache.flink.formats.parquet.utils.SerializableConfiguration.readObject(SerializableConfiguration.java:50)
> Jun 21 02:44:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jun 21 02:44:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> {code}
> {code:java}
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testProject
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testReachEnd
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateGenericReader:161->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateReflectReader:133->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateSpecificReader:118->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReadWithRestoreGenericReader:203->restoreReader:293
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReflectReadFromGenericRecords:147->createReader:269
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   ParquetRowDataWriterTest.testCompression:126 » 
> NoSuchMethod com.google.common
> Jun 21 02:44:42 [ERROR]   
> ParquetRowDataWriterTest.testTypes:117->innerTest:168 » NoSuchMethod 
> com.googl...
> Jun 21 02:44:42 [ERROR]   SerializableConfigurationTest.testResource:45 » 
> NoSuchMethod com.google.common...
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] Tests run: 31, Failures: 0, Errors: 24, Skipped: 0
> Jun 21 02:44:42 [INFO] 
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=36979=logs=7e3d33c3-a462-5ea8-98b8-27e1aafe4ceb=ef77f8d1-44c8-5ee2-f175-1c88f61de8c0=16375



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-27667) YARNHighAvailabilityITCase fails with "Failed to delete temp directory /tmp/junit1681"

2022-06-21 Thread Ferenc Csaky (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-27667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556949#comment-17556949
 ] 

Ferenc Csaky commented on FLINK-27667:
--

[~martijnvisser] [~bgeng777] FYI: I ended up incorporating those changes as 
well in the #20012 PR.

> YARNHighAvailabilityITCase fails with "Failed to delete temp directory 
> /tmp/junit1681"
> --
>
> Key: FLINK-27667
> URL: https://issues.apache.org/jira/browse/FLINK-27667
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.16.0
>Reporter: Martijn Visser
>Assignee: Ferenc Csaky
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=35733=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=29208
>  
> {code:bash}
> May 17 08:36:22 [INFO] Results: 
> May 17 08:36:22 [INFO] 
> May 17 08:36:22 [ERROR] Errors: 
> May 17 08:36:22 [ERROR] YARNHighAvailabilityITCase » IO Failed to delete temp 
> directory /tmp/junit1681... 
> May 17 08:36:22 [INFO] 
> May 17 08:36:22 [ERROR] Tests run: 28, Failures: 0, Errors: 1, Skipped: 0 
> May 17 08:36:22 [INFO] 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[GitHub] [flink] liuzhuang2017 commented on pull request #19966: [hotfix][docs] Fix the Intellij key nouns.

2022-06-21 Thread GitBox



liuzhuang2017 commented on PR #19966:
URL: https://github.com/apache/flink/pull/19966#issuecomment-1161840497

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] mbalassi commented on pull request #20037: [hotfix][docs] Fix the inaccessible documentation link.

2022-06-21 Thread GitBox



mbalassi commented on PR #20037:
URL: https://github.com/apache/flink/pull/20037#issuecomment-1161837644

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-28078) ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers runs into timeout

2022-06-21 Thread Matthias Pohl (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556863#comment-17556863
 ] 

Matthias Pohl edited comment on FLINK-28078 at 6/21/22 2:36 PM:


{code}
16:17:07,802 [ForkJoinPool-45-worker-25] INFO  
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl
 [] - Starting
16:17:07,804 [ForkJoinPool-45-worker-25] INFO  
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl
 [] - Default schema
16:17:07,814 [ForkJoinPool-45-worker-25-EventThread] INFO  
org.apache.flink.shaded.curator5.org.apache.curator.framework.state.ConnectionStateManager
 [] - State change: CONNECTED
16:17:07,817 [ForkJoinPool-45-worker-25-EventThread] INFO  
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker
 [] - New config event received: {}
16:17:07,824 [Curator-ConnectionStateManager-0] DEBUG 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - Connected to ZooKeeper quorum. Leader election can start.
16:17:07,824 [Curator-ConnectionStateManager-0] DEBUG 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - Connected to ZooKeeper quorum. Leader election can start.
16:17:07,826 [ForkJoinPool-45-worker-25-EventThread] INFO  
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker
 [] - New config event received: {}
16:17:07,848 [ForkJoinPool-45-worker-25-EventThread] DEBUG 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - ZooKeeperMultipleComponentLeaderElectionDriver obtained the leadership.
16:17:07,860 [ForkJoinPool-45-worker-25] INFO  
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - Closing ZooKeeperMultipleComponentLeaderElectionDriver.
{code}

The test itself usually creates three {{ElectionDriver}} instances and removes 
them one by one through a for loop. The logs of the failed test reveal that 
only two out of the three have the quorum connection established (i.e. the log 
message {{Connected to ZooKeeper quorum. Leader election can start.}} is 
printed). The first iteration picks the first instance, checks its leadership 
and closes it. 

The {{anyOf}} call in the next iteration should actually still succeed because 
there's one {{ElectionDriver}} that has an established connection. But the 
resulting {{anyOf}} composite future doesn't complete, i.e. non of the left 
Leadership futures completes resulting in the test getting stuck in the 
subsequent {{join}} call.


was (Author: mapohl):
{code}
16:17:07,802 [ForkJoinPool-45-worker-25] INFO  
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl
 [] - Starting
16:17:07,804 [ForkJoinPool-45-worker-25] INFO  
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl
 [] - Default schema
16:17:07,814 [ForkJoinPool-45-worker-25-EventThread] INFO  
org.apache.flink.shaded.curator5.org.apache.curator.framework.state.ConnectionStateManager
 [] - State change: CONNECTED
16:17:07,817 [ForkJoinPool-45-worker-25-EventThread] INFO  
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker
 [] - New config event received: {}
16:17:07,824 [Curator-ConnectionStateManager-0] DEBUG 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - Connected to ZooKeeper quorum. Leader election can start.
16:17:07,824 [Curator-ConnectionStateManager-0] DEBUG 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - Connected to ZooKeeper quorum. Leader election can start.
16:17:07,826 [ForkJoinPool-45-worker-25-EventThread] INFO  
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker
 [] - New config event received: {}
16:17:07,848 [ForkJoinPool-45-worker-25-EventThread] DEBUG 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - ZooKeeperMultipleComponentLeaderElectionDriver obtained the leadership.
16:17:07,860 [ForkJoinPool-45-worker-25] INFO  
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - Closing ZooKeeperMultipleComponentLeaderElectionDriver.
{code}

The test itself usually creates three {{ElectionDriver}} instances and removes 
them one by one through a for loop. The logs of the failed test reveal that 
only two out of the three have the quorum connection established (i.e. the log 
message {{Connected to ZooKeeper quorum. Leader election can start.}} is 
printed). The first iteration picks the first instance, checks its leadership 
and closes it. It looks like the second iteration picks the instance for which 
the quorum connection is still not established. The leadership future could

[jira] [Commented] (FLINK-28173) Multiple Parquet format tests are failing with NoSuchMethodError

2022-06-21 Thread jia liu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556939#comment-17556939
 ] 

jia liu commented on FLINK-28173:
-

[~martijnvisser] I'm wondering why my pr can pass all CI testcases in azure and 
then cause error above.

> Multiple Parquet format tests are failing with NoSuchMethodError
> 
>
> Key: FLINK-28173
> URL: https://issues.apache.org/jira/browse/FLINK-28173
> Project: Flink
>  Issue Type: Bug
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.16.0
>Reporter: Martijn Visser
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> Jun 21 02:44:38 java.lang.NoSuchMethodError: 
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
> Jun 21 02:44:38   at 
> org.apache.hadoop.conf.Configuration.readFields(Configuration.java:3798)
> Jun 21 02:44:38   at 
> org.apache.flink.formats.parquet.utils.SerializableConfiguration.readObject(SerializableConfiguration.java:50)
> Jun 21 02:44:38   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jun 21 02:44:38   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> {code}
> {code:java}
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testProject
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] ParquetColumnarRowSplitReaderTest.testReachEnd
> Jun 21 02:44:42 [ERROR]   Run 1: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [ERROR]   Run 2: 
> com.google.common.base.Preconditions.checkState(ZLjava/lang/String;I)V
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateGenericReader:161->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateReflectReader:133->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testCreateSpecificReader:118->createReader:269 » 
> NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReadWithRestoreGenericReader:203->restoreReader:293
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   
> AvroParquetRecordFormatTest.testReflectReadFromGenericRecords:147->createReader:269
>  » NoSuchMethod
> Jun 21 02:44:42 [ERROR]   ParquetRowDataWriterTest.testCompression:126 » 
> NoSuchMethod com.google.common
> Jun 21 02:44:42 [ERROR]   
> ParquetRowDataWriterTest.testTypes:117->innerTest:168 » NoSuchMethod 
> com.googl...
> Jun 21 02:44:42 [ERROR]   SerializableConfigurationTest.testResource:45 » 
> NoSuchMethod com.google.common...
> Jun 21 02:44:42 [INFO] 
> Jun 21 02:44:42 [ERROR] Tests run: 31, Failures: 0, Errors: 24, Skipped: 0
> Jun 21 02:44:42 [INFO] 
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=36979=logs=7e3d33c3-a462-5ea8-98b8-27e1aafe4ceb=ef77f8d1-44c8-5ee2-f175-1c88f61de8c0=16375



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Comment Edited] (FLINK-28115) Flink 1.15.0 Parallelism Rebalance causes flink job failure

2022-06-21 Thread Huameng Li (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556929#comment-17556929
 ] 

Huameng Li edited comment on FLINK-28115 at 6/21/22 2:28 PM:
-

Hi Jing and Qingsheng, thanks for looking into this parallelism re-balance 
issue issue.

The exception stack trace logging is from the task manager (TM).

This issue exists in only Flink 1.15.0, and occurs right after the job 
submission, and where the # of parallelism of map/process function is different 
from that of sink or source.

The issue is not related to Flink Kafka connector. The TaskManager is still 
alive when the issue occurs.

*Job topology:*

*kafkaSource (parallelism 4) -> map/processer (parallelism 8) -> kafkaSink (4 
parallelism)*

*Our dev flink cluster has 8 hosts,* *each host has 25 task managers alive.* 
*Each TM has 2 task slots*

Interestingly, if a job uses same number of parallelism (No rebalance)  for 
*source* (parallelism {*}6{*}) -> *map* (parallelism {*}6{*}) -> *sink* 
(parallelism {*}6{*}), the job runs fine with no issue. 


was (Author: JIRAUSER291148):
Hi Jing and Qingsheng, thanks for looking into this parallelism re-balance 
issue issue.

The exception stack trace logging is from the task manager (TM).

This issue exists in only Flink 1.15.0, and occurs right after the job 
submission, and where the # of parallelism of map/process function is different 
from that of sink or source.

The issue is not related to Flink Kafka connector. The TaskManager is still 
alive when the issue occurs.

 If we use same number of parallelism (No rebalance)  for *source* (parallelism 
{*}4{*}) -> *map* (parallelism {*}4{*}) -> *sink* (parallelism {*}4{*}), the 
job runs fine without issue. 

> Flink 1.15.0 Parallelism Rebalance causes flink job failure
> ---
>
> Key: FLINK-28115
> URL: https://issues.apache.org/jira/browse/FLINK-28115
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.0
> Environment: Flink 1.15.0 session cluster with 8 hosts.
>Reporter: Huameng Li
>Priority: Major
> Attachments: image-2022-06-17-13-01-08-992.png
>
>
> {color:#de350b}*Issue:*{color}
> *Flink 1.15.0 Parallelism Rebalance causes flink job failure.* Same issue was 
> not in flink 1.14.4.
> {color:#de350b}*Exceptions:*{color}
> *1 of the 8 re-balance parallelism task slots failed due to* 
> *{color:#de350b}org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  finishConnect(..) failed: Connection refused: /127.0.0.1:43354{color}*
> *{color:#de350b}Caused by: java.net.ConnectException: finishConnect(..) 
> failed: Connection refused{color}*
>  
> *Job topology:*
> *kafkaSource (parallelism 4) -> map/processer (parallelism 8) -> kafkaSink (4 
> parallelism)*
> *Our dev flink cluster has 8 hosts,* *each host has 25 task managers alive.* 
> *Each TM has 2 task slots*
> *!image-2022-06-17-13-01-08-992.png!*
>  
>  
> *Error stack trace:*
> 2022-06-17 12:54:38.563 WARN  [Framework] [Map (3/8)#5|#5] 
> org.apache.flink.runtime.taskmanager.Task  - Map (3/8)#5 
> (69a82f741d68fd7161d7b13de48c6c4b) switched from RUNNING to FAILED with 
> failure cause: 
> org.apache.flink.runtime.io.network.partition.consumer.PartitionConnectionException:
>  Connection for partition 
> 2064424258b3b74fdc349607017f1029#1@a94744160d7d5e85101881e2d783dcd2 not 
> reachable.
>     at 
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:190)
>     at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.internalRequestPartitions(SingleInputGate.java:342)
>     at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:312)
>     at 
> org.apache.flink.runtime.taskmanager.InputGateWithMetrics.requestPartitions(InputGateWithMetrics.java:115)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
>     at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
>     at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsNonBlocking(MailboxProcessor.java:353)
>     at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:317)
>     at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753)
>     at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>     at 
>

[jira] [Updated] (FLINK-28115) Flink 1.15.0 Parallelism Rebalance causes flink job failure

2022-06-21 Thread Huameng Li (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huameng Li updated FLINK-28115:
---
Description: 
{color:#de350b}*Issue:*{color}

*Flink 1.15.0 Parallelism Rebalance causes flink job failure.* Same issue was 
not in flink 1.14.4.

{color:#de350b}*Exceptions:*{color}

*1 of the 8 re-balance parallelism task slots failed due to* 
*{color:#de350b}org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 finishConnect(..) failed: Connection refused: /127.0.0.1:43354{color}*
*{color:#de350b}Caused by: java.net.ConnectException: finishConnect(..) failed: 
Connection refused{color}*

 

*Job topology:*

*kafkaSource (parallelism 4) -> map/processer (parallelism 8) -> kafkaSink (4 
parallelism)*

*Our dev flink cluster has 8 hosts,* *each host has 25 task managers alive.* 
*Each TM has 2 task slots*

*!image-2022-06-17-13-01-08-992.png!*

 
 
*Error stack trace:*
2022-06-17 12:54:38.563 WARN  [Framework] [Map (3/8)#5|#5] 
org.apache.flink.runtime.taskmanager.Task  - Map (3/8)#5 
(69a82f741d68fd7161d7b13de48c6c4b) switched from RUNNING to FAILED with failure 
cause: 
org.apache.flink.runtime.io.network.partition.consumer.PartitionConnectionException:
 Connection for partition 
2064424258b3b74fdc349607017f1029#1@a94744160d7d5e85101881e2d783dcd2 not 
reachable.
    at 
org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:190)
    at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.internalRequestPartitions(SingleInputGate.java:342)
    at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:312)
    at 
org.apache.flink.runtime.taskmanager.InputGateWithMetrics.requestPartitions(InputGateWithMetrics.java:115)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
    at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
    at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsNonBlocking(MailboxProcessor.java:353)
    at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:317)
    at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753)
    at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
    at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927)
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
    at java.lang.Thread.run(Thread.java:748)
Caused by: 
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
Connecting to remote task manager '/127.0.0.1:43354' has failed. This might 
indicate that the remote task manager has been lost.
    at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connect(PartitionRequestClientFactory.java:169)
    at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connectWithRetries(PartitionRequestClientFactory.java:135)
    at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:96)
    at 
org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:95)
    at 
org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:186)
    ... 15 more
Caused by: 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 finishConnect(..) failed: Connection refused: /127.0.0.1:43354
Caused by: java.net.ConnectException: finishConnect(..) failed: Connection 
refused
    at 
org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors.newConnectException0(Errors.java:155)
    at 
org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors.handleConnectErrno(Errors.java:128)
    at 
org.apache.flink.shaded.netty4.io.netty.channel.unix.Socket.finishConnect(Socket.java:320)
    at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:710)
    at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:687)
    at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:567)
    at

[jira] [Commented] (FLINK-28115) Flink 1.15.0 Parallelism Rebalance causes flink job failure

2022-06-21 Thread Huameng Li (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556929#comment-17556929
 ] 

Huameng Li commented on FLINK-28115:


Hi Jing and Qingsheng, thanks for looking into this parallelism re-balance 
issue issue.

The exception stack trace logging is from the task manager (TM).

This issue exists in only Flink 1.15.0, and occurs right after the job 
submission, and where the # of parallelism of map/process function is different 
from that of sink or source.

The issue is not related to Flink Kafka connector. The TaskManager is still 
alive when the issue occurs.

 If we use same number of parallelism (No rebalance)  for *source* (parallelism 
{*}4{*}) -> *map* (parallelism {*}4{*}) -> *sink* (parallelism {*}4{*}), the 
job runs fine without issue. 

> Flink 1.15.0 Parallelism Rebalance causes flink job failure
> ---
>
> Key: FLINK-28115
> URL: https://issues.apache.org/jira/browse/FLINK-28115
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.0
> Environment: Flink 1.15.0 session cluster with 8 hosts.
>Reporter: Huameng Li
>Priority: Major
> Attachments: image-2022-06-17-13-01-08-992.png
>
>
> {color:#de350b}*Issue:*{color}
> *Flink 1.15.0 Parallelism Rebalance causes flink job failure.* Same issue was 
> not in flink 1.14.4.
> {color:#de350b}*Exceptions:*{color}
> *1 of the 8 re-balance parallelism task slots failed due to* 
> *{color:#de350b}org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  finishConnect(..) failed: Connection refused: /127.0.0.1:43354{color}*
> *{color:#de350b}Caused by: java.net.ConnectException: finishConnect(..) 
> failed: Connection refused{color}*
>  
> *Job topology:*
> *kafkaSource (parallelism 4) -> map/processer (parallelism 8) -> kafkaSink (4 
> parallelism)*
>  
> *!image-2022-06-17-13-01-08-992.png!*
>  
>  
> *Error stack trace:*
> 2022-06-17 12:54:38.563 WARN  [Framework] [Map (3/8)#5|#5] 
> org.apache.flink.runtime.taskmanager.Task  - Map (3/8)#5 
> (69a82f741d68fd7161d7b13de48c6c4b) switched from RUNNING to FAILED with 
> failure cause: 
> org.apache.flink.runtime.io.network.partition.consumer.PartitionConnectionException:
>  Connection for partition 
> 2064424258b3b74fdc349607017f1029#1@a94744160d7d5e85101881e2d783dcd2 not 
> reachable.
>     at 
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.requestSubpartition(RemoteInputChannel.java:190)
>     at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.internalRequestPartitions(SingleInputGate.java:342)
>     at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.requestPartitions(SingleInputGate.java:312)
>     at 
> org.apache.flink.runtime.taskmanager.InputGateWithMetrics.requestPartitions(InputGateWithMetrics.java:115)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
>     at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
>     at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsNonBlocking(MailboxProcessor.java:353)
>     at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:317)
>     at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804)
>     at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753)
>     at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>     at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927)
>     at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
> Connecting to remote task manager '/127.0.0.1:43354' has failed. This might 
> indicate that the remote task manager has been lost.
>     at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connect(PartitionRequestClientFactory.java:169)
>     at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.connectWithRetries(PartitionRequestClientFactory.java:135)
>     at 
> org.apache.flink.runtime.io.network.netty.PartitionRequestClientFactory.createPartitionRequestClient(PartitionRequestClientFactory.java:96)
>     at 
> org.apache.flink.runtime.io.network.netty.NettyConnectionManager.createPartitionRequestClient(NettyConnectionManager.java:95)
>     at 
>

[GitHub] [flink-table-store] LadyForest opened a new pull request, #169: [hotfix] Fix SqlParserEOFException in AlterTableCompactITCase

2022-06-21 Thread GitBox



LadyForest opened a new pull request, #169:
URL: https://github.com/apache/flink-table-store/pull/169

   It case is unstable, see 
https://github.com/apache/flink-table-store/runs/6985604079?check_suite_focus=true
   
   The exception is caused by the empty list returned by `generateData`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-25710) Multiple Kafka IT cases fail with "ContainerLaunch Container startup failed"

2022-06-21 Thread Roman Khachatryan (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-25710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan closed FLINK-25710.
-
Fix Version/s: (was: 1.16.0)
   Resolution: Cannot Reproduce

> Multiple Kafka IT cases fail with "ContainerLaunch Container startup failed"
> 
>
> Key: FLINK-25710
> URL: https://issues.apache.org/jira/browse/FLINK-25710
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Tests
>Affects Versions: 1.15.0, 1.16.0
>Reporter: Roman Khachatryan
>Priority: Major
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=29731=logs=c5f0071e-1851-543e-9a45-9ac140befc32=15a22db7-8faa-5b34-3920-d33c9f0ca23c=35454
> {code}
> 2022-01-19T18:17:40.3503774Z Jan 19 18:17:40 [INFO] 
> ---
> 2022-01-19T18:17:42.3992027Z Jan 19 18:17:42 [ERROR] Picked up 
> JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError
> 2022-01-19T18:17:42.9262342Z Jan 19 18:17:42 [INFO] Running 
> org.apache.flink.connector.kafka.sink.KafkaTransactionLogITCase
> 2022-01-19T18:18:47.9992530Z Jan 19 18:18:47 [ERROR] Tests run: 1, Failures: 
> 0, Errors: 1, Skipped: 0, Time elapsed: 65.053 s <<< FAILURE! - in or 
> g.apache.flink.connector.kafka.sink.KafkaTransactionLogITCase
> 2022-01-19T18:18:47.9993836Z Jan 19 18:18:47 [ERROR] 
> org.apache.flink.connector.kafka.sink.KafkaTransactionLogITCase  Time 
> elapsed: 65.053 s  <<<  ERROR!
> 2022-01-19T18:18:47.9994507Z Jan 19 18:18:47 
> org.testcontainers.containers.ContainerLaunchException: Container startup 
> failed
> ...
> 2022-01-19T18:18:48.0038449Z Jan 19 18:18:47 Caused by: 
> org.rnorth.ducttape.RetryCountExceededException: Retry limit hit with 
> exception
> 2022-01-19T18:18:48.0039451Z Jan 19 18:18:47at 
> org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:88)
> 2022-01-19T18:18:48.0040449Z Jan 19 18:18:47at 
> org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:329)
> 2022-01-19T18:18:48.0041204Z Jan 19 18:18:47... 27 more
> 2022-01-19T18:18:48.0041993Z Jan 19 18:18:47 Caused by: 
> org.testcontainers.containers.ContainerLaunchException: Could not 
> create/start container
> 2022-01-19T18:18:48.0043007Z Jan 19 18:18:47at 
> org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:525)
> 2022-01-19T18:18:48.0044020Z Jan 19 18:18:47at 
> org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:331)
> 2022-01-19T18:18:48.0045158Z Jan 19 18:18:47at 
> org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81)
> 2022-01-19T18:18:48.0046043Z Jan 19 18:18:47... 28 more
> 2022-01-19T18:18:48.0047026Z Jan 19 18:18:47 Caused by: 
> org.testcontainers.containers.ContainerLaunchException: Timed out waiting for 
> container po*rt to open (172.17.0.1 ports: [56218, 56219] should be listening)
> 2022-01-19T18:18:48.0048320Z Jan 19 18:18:47at 
> org.testcontainers.containers.wait.strategy.HostPortWaitStrategy.waitUntilReady(HostPortWaitStr
>  ategy.java:90)
> 2022-01-19T18:18:48.0049465Z Jan 19 18:18:47at 
> org.testcontainers.containers.wait.strategy.AbstractWaitStrategy.waitUntilReady(AbstractWaitStr
>  ategy.java:51)
> 2022-01-19T18:18:48.0050585Z Jan 19 18:18:47at 
> org.testcontainers.containers.GenericContainer.waitUntilContainerStarted(GenericContainer.java:
>  929)
> 2022-01-19T18:18:48.0051628Z Jan 19 18:18:47at 
> org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:468)
> 2022-01-19T18:18:48.0052380Z Jan 19 18:18:47... 30 more
> ...
> 2022-01-19T18:40:37.7197924Z Jan 19 18:40:37 [INFO] Results:
> 2022-01-19T18:40:37.7198526Z Jan 19 18:40:37 [INFO]
> 2022-01-19T18:40:37.7199093Z Jan 19 18:40:37 [ERROR] Errors:
> 2022-01-19T18:40:37.7200602Z Jan 19 18:40:37 [ERROR]   KafkaSinkITCase » 
> ContainerLaunch Container startup failed
> 2022-01-19T18:40:37.7201683Z Jan 19 18:40:37 [ERROR]   
> KafkaTransactionLogITCase » ContainerLaunch Container startup failed
> 2022-01-19T18:40:37.7204632Z Jan 19 18:40:37 [ERROR]   
> KafkaWriterITCase.beforeAll:99 » ContainerLaunch Container startup failed
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-25710) Multiple Kafka IT cases fail with "ContainerLaunch Container startup failed"

2022-06-21 Thread Roman Khachatryan (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-25710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556923#comment-17556923
 ] 

Roman Khachatryan commented on FLINK-25710:
---

Thanks for looking into it [~shubham.bansal]
I think we can close it for now and re-open if the issue pops-up again in the 
future.

> Multiple Kafka IT cases fail with "ContainerLaunch Container startup failed"
> 
>
> Key: FLINK-25710
> URL: https://issues.apache.org/jira/browse/FLINK-25710
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Tests
>Affects Versions: 1.15.0, 1.16.0
>Reporter: Roman Khachatryan
>Priority: Major
>  Labels: test-stability
> Fix For: 1.16.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=29731=logs=c5f0071e-1851-543e-9a45-9ac140befc32=15a22db7-8faa-5b34-3920-d33c9f0ca23c=35454
> {code}
> 2022-01-19T18:17:40.3503774Z Jan 19 18:17:40 [INFO] 
> ---
> 2022-01-19T18:17:42.3992027Z Jan 19 18:17:42 [ERROR] Picked up 
> JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError
> 2022-01-19T18:17:42.9262342Z Jan 19 18:17:42 [INFO] Running 
> org.apache.flink.connector.kafka.sink.KafkaTransactionLogITCase
> 2022-01-19T18:18:47.9992530Z Jan 19 18:18:47 [ERROR] Tests run: 1, Failures: 
> 0, Errors: 1, Skipped: 0, Time elapsed: 65.053 s <<< FAILURE! - in or 
> g.apache.flink.connector.kafka.sink.KafkaTransactionLogITCase
> 2022-01-19T18:18:47.9993836Z Jan 19 18:18:47 [ERROR] 
> org.apache.flink.connector.kafka.sink.KafkaTransactionLogITCase  Time 
> elapsed: 65.053 s  <<<  ERROR!
> 2022-01-19T18:18:47.9994507Z Jan 19 18:18:47 
> org.testcontainers.containers.ContainerLaunchException: Container startup 
> failed
> ...
> 2022-01-19T18:18:48.0038449Z Jan 19 18:18:47 Caused by: 
> org.rnorth.ducttape.RetryCountExceededException: Retry limit hit with 
> exception
> 2022-01-19T18:18:48.0039451Z Jan 19 18:18:47at 
> org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:88)
> 2022-01-19T18:18:48.0040449Z Jan 19 18:18:47at 
> org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:329)
> 2022-01-19T18:18:48.0041204Z Jan 19 18:18:47... 27 more
> 2022-01-19T18:18:48.0041993Z Jan 19 18:18:47 Caused by: 
> org.testcontainers.containers.ContainerLaunchException: Could not 
> create/start container
> 2022-01-19T18:18:48.0043007Z Jan 19 18:18:47at 
> org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:525)
> 2022-01-19T18:18:48.0044020Z Jan 19 18:18:47at 
> org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:331)
> 2022-01-19T18:18:48.0045158Z Jan 19 18:18:47at 
> org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81)
> 2022-01-19T18:18:48.0046043Z Jan 19 18:18:47... 28 more
> 2022-01-19T18:18:48.0047026Z Jan 19 18:18:47 Caused by: 
> org.testcontainers.containers.ContainerLaunchException: Timed out waiting for 
> container po*rt to open (172.17.0.1 ports: [56218, 56219] should be listening)
> 2022-01-19T18:18:48.0048320Z Jan 19 18:18:47at 
> org.testcontainers.containers.wait.strategy.HostPortWaitStrategy.waitUntilReady(HostPortWaitStr
>  ategy.java:90)
> 2022-01-19T18:18:48.0049465Z Jan 19 18:18:47at 
> org.testcontainers.containers.wait.strategy.AbstractWaitStrategy.waitUntilReady(AbstractWaitStr
>  ategy.java:51)
> 2022-01-19T18:18:48.0050585Z Jan 19 18:18:47at 
> org.testcontainers.containers.GenericContainer.waitUntilContainerStarted(GenericContainer.java:
>  929)
> 2022-01-19T18:18:48.0051628Z Jan 19 18:18:47at 
> org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:468)
> 2022-01-19T18:18:48.0052380Z Jan 19 18:18:47... 30 more
> ...
> 2022-01-19T18:40:37.7197924Z Jan 19 18:40:37 [INFO] Results:
> 2022-01-19T18:40:37.7198526Z Jan 19 18:40:37 [INFO]
> 2022-01-19T18:40:37.7199093Z Jan 19 18:40:37 [ERROR] Errors:
> 2022-01-19T18:40:37.7200602Z Jan 19 18:40:37 [ERROR]   KafkaSinkITCase » 
> ContainerLaunch Container startup failed
> 2022-01-19T18:40:37.7201683Z Jan 19 18:40:37 [ERROR]   
> KafkaTransactionLogITCase » ContainerLaunch Container startup failed
> 2022-01-19T18:40:37.7204632Z Jan 19 18:40:37 [ERROR]   
> KafkaWriterITCase.beforeAll:99 » ContainerLaunch Container startup failed
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[GitHub] [flink-kubernetes-operator] zeus1ammon commented on pull request #246: [FLINK-27788] Adding annotation to k8 operator Pod

2022-06-21 Thread GitBox



zeus1ammon commented on PR #246:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/246#issuecomment-1161799750

   > Looking forward to it @zeus1ammon.
   
   @mbalassi I have addressed the comments., please have a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-28185) "Invalid negative offset" when using OffsetsInitializer.timestamp(.)

2022-06-21 Thread Peter Schrott (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Schrott updated FLINK-28185:
--
Description: 
When using the {{OffsetsInitializer.timestamp(.)}} on a topic with empty 
partitions – little traffice + low retention – an {{IllegalArgumentException: 
Invalid negative offset}} occures. See stracktrace below.

The problem here is, that the admin client returns -1 as timestamps and offset 
for empty partitions in {{{}KafkaAdminClient.listOffsets(.){}}}. [1] Please 
compare the attached screenshot. When creating {{OffsetAndTimestamp}} object 
from the admin client response the exception is thrown.
{code:java}
org.apache.flink.util.FlinkRuntimeException: Failed to initialize partition 
splits due to 
    at 
org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator.handlePartitionSplitChanges(KafkaSourceEnumerator.java:299)
    at 
org.apache.flink.runtime.source.coordinator.ExecutorNotifier.lambda$null$1(ExecutorNotifier.java:83)
    at 
org.apache.flink.util.ThrowableCatchingRunnable.run(ThrowableCatchingRunnable.java:40)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
    at java.util.concurrent.FutureTask.run(FutureTask.java)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalArgumentException: Invalid negative offset
    at 
org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.OffsetAndTimestamp.(OffsetAndTimestamp.java:36)
    at 
org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator$PartitionOffsetsRetrieverImpl.lambda$offsetsForTimes$8(KafkaSourceEnumerator.java:622)
    at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321)
    at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
    at java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1723)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
    at 
org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator$PartitionOffsetsRetrieverImpl.offsetsForTimes(KafkaSourceEnumerator.java:615)
    at 
org.apache.flink.connector.kafka.source.enumerator.initializer.TimestampOffsetsInitializer.getPartitionOffsets(TimestampOffsetsInitializer.java:57)
    at 
org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator.initializePartitionSplits(KafkaSourceEnumerator.java:272)
    at 
org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator.lambda$checkPartitionChanges$0(KafkaSourceEnumerator.java:242)
    at 
org.apache.flink.runtime.source.coordinator.ExecutorNotifier.lambda$notifyReadyAsync$2(ExecutorNotifier.java:80)
    ... 8 common frames omitted
15:25:58.025 INFO  [flink-akka.actor.default-dispatcher-11] 
o.a.f.runtime.jobmaster.JobMaster - Trying to recover from a global failure.
org.apache.flink.util.FlinkException: Global failure triggered by 
OperatorCoordinator for 'Source: XXX -> YYY -> Sink: ZZZ' (operator 
351e440289835f2ff3e6fee31bf6e13c).
    at 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:556)
    at 
org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$QuiesceableContext.failJob(RecreateOnResetOperatorCoordinator.java:231)
    at 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.failJob(SourceCoordinatorContext.java:316)
    at 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.handleUncaughtExceptionFromAsyncCall(SourceCoordinatorContext.java:329)
    at 
org.apache.flink.util.ThrowableCatchingRunnable.run(ThrowableCatchingRunnable.java:42)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
    at java.util.concurrent.FutureTask.run(FutureTask.java)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at

[jira] [Closed] (FLINK-28044) Add hadoop filesystems configuration possibility to all deployment targets

2022-06-21 Thread Jira



 [ 
https://issues.apache.org/jira/browse/FLINK-28044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Márton Balassi closed FLINK-28044.
--
Resolution: Fixed

f8f54bc in master.

> Add hadoop filesystems configuration possibility to all deployment targets
> --
>
> Key: FLINK-28044
> URL: https://issues.apache.org/jira/browse/FLINK-28044
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Common
>Affects Versions: 1.16.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
>  Labels: pull-request-available, security
>
> At the moment only YARN supports delegation tokens for hadoop filesystesm 
> with the following config:
> {code:java}
> yarn.security.kerberos.additionalFileSystems
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[GitHub] [flink] gaborgsomogyi commented on pull request #19953: [FLINK-28044][runtime][security] Make hadoop filesystems configuration available to all deployment targets

2022-06-21 Thread GitBox



gaborgsomogyi commented on PR #19953:
URL: https://github.com/apache/flink/pull/19953#issuecomment-1161791791

   Thank you @mbalassi for taking care!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] asfgit closed pull request #19953: [FLINK-28044][runtime][security] Make hadoop filesystems configuration available to all deployment targets

2022-06-21 Thread GitBox



asfgit closed pull request #19953: [FLINK-28044][runtime][security] Make hadoop 
filesystems configuration available to all deployment targets
URL: https://github.com/apache/flink/pull/19953


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zentol commented on pull request #20003: [FLINK-28080][runtime] Introduce MutableURLClassLoader as parent class of FlinkUserClassLoader and SafetyNetWrapperClassLoader

2022-06-21 Thread GitBox



zentol commented on PR #20003:
URL: https://github.com/apache/flink/pull/20003#issuecomment-1161787438

   > What's more, the same class will be loaded by different classloaders, 
there might be many class cast exceptions (X cannot be cast to X) even if the 
classes are in the same version.
   
   That depends on how the class-loading order is set up and how you actually 
use it.
   If you load everything parent-first within the added sub-tree this problem 
will not occur.
   
   I actually have to correct myself here; the addition of another jar should 
have no impact on the loading of a previously loaded class, because the 
URLClassLoader _should_ (because why not) access the jars in the order they 
were passed. Hence the most recently added jar is checked last. It should 
behave like parent-first within the sub-tree.
   
   If we start removing URLs however this very much changes.
   
   > Introducing a new classloader mechanism (each user jar in a new CL level) 
different from the existing one (all user jars are in the same CL level) will 
lead to inconsistent and unexpected behavior.
   > it is more friendly to fail fast before distributed execution.
   > it's important to keep the same classloader mechanism between local job 
compiling and distributed execution
   
   These are good points, but doesn't that mean that the _full_ user CL should 
be built eagerly in the CLI before any user-code is called? (and forbid us from 
breaking that pattern)
   Because that will be the actual behavior at runtime.
   I.E., we collect all the jars we need, then build the user CL, and only then 
start using the jars.
   
   Can you clarify on whether the jars are accessed in between `addUrl` calls?
   Would it be technically feasible to first determine all the required jars 
before creating the first user CL?
   
   Are there no plans to isolate the usage of a jar to a specific operator?
   
   > Besides, there are no differences in this case when creating a new 
classloader to wrap the existing classloader.
   
   It is indeed technically similar; but I already listed the benefits compared 
to the mutable CL in my previous comment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (FLINK-28082) Support end to end encryption on Pulsar connector.

2022-06-21 Thread Martijn Visser (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser reassigned FLINK-28082:
--

Assignee: Yufan Sheng

> Support end to end encryption on Pulsar connector.
> --
>
> Key: FLINK-28082
> URL: https://issues.apache.org/jira/browse/FLINK-28082
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Pulsar
>Affects Versions: 1.16.0
>Reporter: Yufan Sheng
>Assignee: Yufan Sheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Add this Pulsar encryption support:
> https://pulsar.apache.org/docs/security-encryption/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Assigned] (FLINK-28085) Close all the pending Pulsar transactions when flink shutdown the pipeline.

2022-06-21 Thread Martijn Visser (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser reassigned FLINK-28085:
--

Assignee: Yufan Sheng

> Close all the pending Pulsar transactions when flink shutdown the pipeline.
> ---
>
> Key: FLINK-28085
> URL: https://issues.apache.org/jira/browse/FLINK-28085
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Pulsar
>Affects Versions: 1.15.0, 1.14.4
>Reporter: Yufan Sheng
>Assignee: Yufan Sheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.16.0
>
>
> Currently transactionId is not persisted. After a job restart we lose handle 
> to the transaction which is still not aborted in Pulsar broker. Pulsar broker 
> will abort these hanging transactions after a timeout but this is not 
> desirable. We need to close all the pending transactionId.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Assigned] (FLINK-28186) Trigger Operator Events on Configuration Changes

2022-06-21 Thread Gyula Fora (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-28186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora reassigned FLINK-28186:
--

Assignee: Matyas Orhidi

> Trigger Operator Events on Configuration Changes
> 
>
> Key: FLINK-28186
> URL: https://issues.apache.org/jira/browse/FLINK-28186
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: kubernetes-operator-1.1.0
>Reporter: Matyas Orhidi
>Assignee: Matyas Orhidi
>Priority: Major
> Fix For: kubernetes-operator-1.1.0
>
>
> The Operator can already emit K8s Events related to CRs it manages, but it 
> needs to emit events on important Operator related changes too, e.g. config 
> updates, dynamic namespace changes, etc.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-28186) Trigger Operator Events on Configuration Changes

2022-06-21 Thread Matyas Orhidi (Jira)

Matyas Orhidi created FLINK-28186:
-

 Summary: Trigger Operator Events on Configuration Changes
 Key: FLINK-28186
 URL: https://issues.apache.org/jira/browse/FLINK-28186
 Project: Flink
  Issue Type: Improvement
Affects Versions: kubernetes-operator-1.1.0
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.1.0


The Operator can already emit K8s Events related to CRs it manages, but it 
needs to emit events on important Operator related changes too, e.g. config 
updates, dynamic namespace changes, etc.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-28185) "Invalid negative offset" when using OffsetsInitializer.timestamp(.)

2022-06-21 Thread Martijn Visser (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-28185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556917#comment-17556917
 ] 

Martijn Visser commented on FLINK-28185:


[~renqs] Any thoughts on this? 

> "Invalid negative offset" when using OffsetsInitializer.timestamp(.)
> 
>
> Key: FLINK-28185
> URL: https://issues.apache.org/jira/browse/FLINK-28185
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.15.0
> Environment: Flink 1.15.0
> Kafka 2.8.1
>Reporter: Peter Schrott
>Priority: Minor
> Attachments: Bildschirmfoto 2022-06-21 um 15.24.58-1.png
>
>
> When using the {{OffsetsInitializer.timestamp(.)}} on a topic with empty 
> partitions – little traffice + low retention – an {{IllegalArgumentException: 
> Invalid negative offset}} occures. See stracktrace below.
> The problem here is, that the admin client returns -1 as timestamps and 
> offset for empty partitions in {{{}KafkaAdminClient.listOffsets(.){}}}. [1] 
> Please compare the attached screenshot. When creating {{OffsetAndTimestamp}} 
> object from the admin client response the exception is thrown.
> {code:java}
> org.apache.flink.util.FlinkRuntimeException: Failed to initialize partition 
> splits due to 
>     at 
> org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator.handlePartitionSplitChanges(KafkaSourceEnumerator.java:299)
>     at 
> org.apache.flink.runtime.source.coordinator.ExecutorNotifier.lambda$null$1(ExecutorNotifier.java:83)
>     at 
> org.apache.flink.util.ThrowableCatchingRunnable.run(ThrowableCatchingRunnable.java:40)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>     at java.util.concurrent.FutureTask.run(FutureTask.java)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.IllegalArgumentException: Invalid negative offset
>     at 
> org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.OffsetAndTimestamp.(OffsetAndTimestamp.java:36)
>     at 
> org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator$PartitionOffsetsRetrieverImpl.lambda$offsetsForTimes$8(KafkaSourceEnumerator.java:622)
>     at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321)
>     at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
>     at java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1723)
>     at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>     at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>     at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>     at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>     at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
>     at 
> org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator$PartitionOffsetsRetrieverImpl.offsetsForTimes(KafkaSourceEnumerator.java:615)
>     at 
> org.apache.flink.connector.kafka.source.enumerator.initializer.TimestampOffsetsInitializer.getPartitionOffsets(TimestampOffsetsInitializer.java:57)
>     at 
> org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator.initializePartitionSplits(KafkaSourceEnumerator.java:272)
>     at 
> org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator.lambda$checkPartitionChanges$0(KafkaSourceEnumerator.java:242)
>     at 
> org.apache.flink.runtime.source.coordinator.ExecutorNotifier.lambda$notifyReadyAsync$2(ExecutorNotifier.java:80)
>     ... 8 common frames omitted
> 15:25:58.025 INFO  [flink-akka.actor.default-dispatcher-11] 
> o.a.f.runtime.jobmaster.JobMaster - Trying to recover from a global failure.
> org.apache.flink.util.FlinkException: Global failure triggered by 
> OperatorCoordinator for 'Source: XXX -> YYY -> Sink: ZZZ' (operator 
> 351e440289835f2ff3e6fee31bf6e13c).
>     at 
> org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:556)
>     at 
> org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$QuiesceableContext.failJob(RecreateOnResetOperatorCoordinator.java:231)
>     at 
>

[jira] [Created] (FLINK-28185) "Invalid negative offset" when using OffsetsInitializer.timestamp(.)

2022-06-21 Thread Peter Schrott (Jira)

Peter Schrott created FLINK-28185:
-

 Summary: "Invalid negative offset" when using 
OffsetsInitializer.timestamp(.)
 Key: FLINK-28185
 URL: https://issues.apache.org/jira/browse/FLINK-28185
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.15.0
 Environment: Flink 1.15.0
Kafka 2.8.1
Reporter: Peter Schrott
 Attachments: Bildschirmfoto 2022-06-21 um 15.24.58-1.png

When using the {{OffsetsInitializer.timestamp(.)}} on a topic with empty 
partitions – little traffice + low retention – an {{IllegalArgumentException: 
Invalid negative offset}} occures. See stracktrace below.

The problem here is, that the admin client returns -1 as timestamps and offset 
for empty partitions in {{{}KafkaAdminClient.listOffsets(.){}}}. [1] Please 
compare the attached screenshot. When creating {{OffsetAndTimestamp}} object 
from the admin client response the exception is thrown.
{code:java}
org.apache.flink.util.FlinkRuntimeException: Failed to initialize partition 
splits due to 
    at 
org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator.handlePartitionSplitChanges(KafkaSourceEnumerator.java:299)
    at 
org.apache.flink.runtime.source.coordinator.ExecutorNotifier.lambda$null$1(ExecutorNotifier.java:83)
    at 
org.apache.flink.util.ThrowableCatchingRunnable.run(ThrowableCatchingRunnable.java:40)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
    at java.util.concurrent.FutureTask.run(FutureTask.java)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalArgumentException: Invalid negative offset
    at 
org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.OffsetAndTimestamp.(OffsetAndTimestamp.java:36)
    at 
org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator$PartitionOffsetsRetrieverImpl.lambda$offsetsForTimes$8(KafkaSourceEnumerator.java:622)
    at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321)
    at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
    at java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1723)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
    at 
org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator$PartitionOffsetsRetrieverImpl.offsetsForTimes(KafkaSourceEnumerator.java:615)
    at 
org.apache.flink.connector.kafka.source.enumerator.initializer.TimestampOffsetsInitializer.getPartitionOffsets(TimestampOffsetsInitializer.java:57)
    at 
org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator.initializePartitionSplits(KafkaSourceEnumerator.java:272)
    at 
org.apache.flink.connector.kafka.source.enumerator.KafkaSourceEnumerator.lambda$checkPartitionChanges$0(KafkaSourceEnumerator.java:242)
    at 
org.apache.flink.runtime.source.coordinator.ExecutorNotifier.lambda$notifyReadyAsync$2(ExecutorNotifier.java:80)
    ... 8 common frames omitted
15:25:58.025 INFO  [flink-akka.actor.default-dispatcher-11] 
o.a.f.runtime.jobmaster.JobMaster - Trying to recover from a global failure.
org.apache.flink.util.FlinkException: Global failure triggered by 
OperatorCoordinator for 'Source: XXX -> YYY -> Sink: ZZZ' (operator 
351e440289835f2ff3e6fee31bf6e13c).
    at 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:556)
    at 
org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$QuiesceableContext.failJob(RecreateOnResetOperatorCoordinator.java:231)
    at 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.failJob(SourceCoordinatorContext.java:316)
    at 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.handleUncaughtExceptionFromAsyncCall(SourceCoordinatorContext.java:329)
    at 
org.apache.flink.util.ThrowableCatchingRunnable.run(ThrowableCatchingRunnable.java:42)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at

[GitHub] [flink] flinkbot commented on pull request #20040: [FLINK-28182][python][format] Support Avro generic record decoder

2022-06-21 Thread GitBox



flinkbot commented on PR #20040:
URL: https://github.com/apache/flink/pull/20040#issuecomment-1161755705

   
   ## CI report:
   
   * 66350366c30a2c20e5d233a4d53720f391935737 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] fredia commented on pull request #19448: [FLINK-25872][state] Support restore from non-changelog checkpoint with changelog enabled in CLAIM mode

2022-06-21 Thread GitBox



fredia commented on PR #19448:
URL: https://github.com/apache/flink/pull/19448#issuecomment-1161751361

   Thanks a lot for your detailed review @rkhachatryan. And I squashed and 
rebased the commits.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 3 4 >

1 - 100 of 312 matches

Mail list logo