[jira] [Assigned] (FLINK-29814) Upgrade Stateful Functions to use Flink 1.15
[ https://issues.apache.org/jira/browse/FLINK-29814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-29814: - Assignee: Galen Warren > Upgrade Stateful Functions to use Flink 1.15 > > > Key: FLINK-29814 > URL: https://issues.apache.org/jira/browse/FLINK-29814 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Fil Karnicki >Assignee: Galen Warren >Priority: Major > > Upgrade Statefun to use the latest Flink 1.15.x > > There already exists a pull request without a jira. Perhaps it can be > reviewed/amended as part of this task? > [https://github.com/apache/flink-statefun/pull/314] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28747) "target_id can not be missing" in HTTP statefun request
[ https://issues.apache.org/jira/browse/FLINK-28747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574752#comment-17574752 ] Till Rohrmann commented on FLINK-28747: --- I think this is Protobuf's behaviour. If Protobuf sees that a field has the default value assigned, then it won't serialize this field. On the receiving end where the message is deserialized, Protobuf will return the default value for a missing field (for a string type it is the empty string). > "target_id can not be missing" in HTTP statefun request > --- > > Key: FLINK-28747 > URL: https://issues.apache.org/jira/browse/FLINK-28747 > Project: Flink > Issue Type: Bug > Components: Stateful Functions >Affects Versions: statefun-3.0.0, statefun-3.2.0, statefun-3.1.1 >Reporter: Stephan Weinwurm >Priority: Major > > Hi all, > We've suddenly started to see the following exception in our HTTP statefun > functions endpoints: > {code}Traceback (most recent call last): > File > "/src/.venv/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", > line 403, in run_asgi > result = await app(self.scope, self.receive, self.send) > File > "/src/.venv/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", > line 78, in __call__ > return await self.app(scope, receive, send) > File "/src/worker/baseplate_asgi/asgi/baseplate_asgi_middleware.py", line > 37, in __call__ > await span_processor.execute() > File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line > 61, in execute > raise e > File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line > 57, in execute > await self.app(self.scope, self.receive, self.send) > File "/src/.venv/lib/python3.9/site-packages/starlette/applications.py", > line 124, in __call__ > await self.middleware_stack(scope, receive, send) > File > "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line > 184, in __call__ > raise exc > File > "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line > 162, in __call__ > await self.app(scope, receive, _send) > File > "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", > line 75, in __call__ > raise exc > File > "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", > line 64, in __call__ > await self.app(scope, receive, sender) > File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line > 680, in __call__ > await route.handle(scope, receive, send) > File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line > 275, in handle > await self.app(scope, receive, send) > File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line > 65, in app > response = await func(request) > File "/src/worker/baseplate_statefun/server/asgi/make_statefun_handler.py", > line 25, in statefun_handler > result = await handler.handle_async(request_body) > File "/src/.venv/lib/python3.9/site-packages/statefun/request_reply_v3.py", > line 262, in handle_async > msg = Message(target_typename=sdk_address.typename, > target_id=sdk_address.id, > File "/src/.venv/lib/python3.9/site-packages/statefun/messages.py", line > 42, in __init__ > raise ValueError("target_id can not be missing"){code} > Interestingly, this has started to happen in three separate Flink deployments > at the very same time. The only thing in common between the three deployments > is that they consume the same Kafka topics. > No deployments have happened when the issue started happening which was on > July 28th 3:05PM. We have since been continuously seeing the error. > We were also able to extract the request that Flink sends to the HTTP > statefun endpoint: > {code}{'invocation': {'target': {'namespace': 'com.x.dummy', 'type': > 'dummy'}, 'invocations': [{'argument': {'typename': > 'type.googleapis.com/v2_event.Event', 'has_value': True, 'value': > '-redicated-'}}]}} > {code} > As you can see, no `id` field is present in the `invocation.target` object or > the `target_id` was an empty string. > > This is our module.yaml from one of the Flink deployments: > > {code} > version: "3.0" > module: > meta: > type: remote > spec: > endpoints: > - endpoint: > meta: > kind: io.statefun.endpoints.v1/http > spec: > functions: com.x.dummy/dummy > urlPathTemplate: [http://x-worker-dummy.x-functions:9090/statefun] > timeouts: > call: 2 min > read: 2 min > write: 2 min > maxNumBatchRequests: 100 > ingresses: > - ingress: > meta: > type: io.statefun.kafka/ingress > id: com.x/ingress > spec: > address: x-kafka-0.x.ue1.x.net:9092 > consumerGroupId: x-worker-dummy > topics: > - topic: v2_post_events > valueType: type.googleapis.
[jira] [Comment Edited] (FLINK-28747) "target_id can not be missing" in HTTP statefun request
[ https://issues.apache.org/jira/browse/FLINK-28747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574267#comment-17574267 ] Till Rohrmann edited comment on FLINK-28747 at 8/2/22 2:57 PM: --- Thanks for reporting this issue [~stepweiwu]. Without knowing the exact details of your StateFun job, I suspect that it could be caused by Kafka messages that only have an empty string as a key specified. This could also explain why you are seeing the problem appear across different services consuming from the same topic. For some context, StateFun uses the key of a Kafka message as the target id for the StatefulFunction invocation. The target id identifies an instance of the StatefulFunction so that the runtime can associate state with it. Empty strings should be valid keys, so I believe that this is a bug in the Python SDK where we check {{if not target_id: raise ValueError}} that is executed if the key is empty (https://github.com/apache/flink-statefun/blob/master/statefun-sdk-python/statefun/messages.py#L41). As a workaround (if you can control it), you could require your Kafka message to require a non-empty key. The proper fix would be to fix the Python SDK. was (Author: till.rohrmann): Thanks for reporting this issue [~stepweiwu]. Without knowing the exact details of your StateFun job, I suspect that it could be caused by Kafka messages that only have an empty string as a key specified. This could also explain why you are seeing the problem appear across different services consuming from the same topic. Empty strings should be valid keys, so I believe that this is a bug in the Python SDK where we check {{if not target_id: raise ValueError}} that is executed if the key is empty (https://github.com/apache/flink-statefun/blob/master/statefun-sdk-python/statefun/messages.py#L41). As a workaround (if you can control it), you could require your Kafka message to require a non-empty key. The proper fix would be to fix the Python SDK. > "target_id can not be missing" in HTTP statefun request > --- > > Key: FLINK-28747 > URL: https://issues.apache.org/jira/browse/FLINK-28747 > Project: Flink > Issue Type: Bug > Components: Stateful Functions >Affects Versions: statefun-3.0.0, statefun-3.2.0, statefun-3.1.1 >Reporter: Stephan Weinwurm >Priority: Major > > Hi all, > We've suddenly started to see the following exception in our HTTP statefun > functions endpoints: > {code}Traceback (most recent call last): > File > "/src/.venv/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", > line 403, in run_asgi > result = await app(self.scope, self.receive, self.send) > File > "/src/.venv/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", > line 78, in __call__ > return await self.app(scope, receive, send) > File "/src/worker/baseplate_asgi/asgi/baseplate_asgi_middleware.py", line > 37, in __call__ > await span_processor.execute() > File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line > 61, in execute > raise e > File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line > 57, in execute > await self.app(self.scope, self.receive, self.send) > File "/src/.venv/lib/python3.9/site-packages/starlette/applications.py", > line 124, in __call__ > await self.middleware_stack(scope, receive, send) > File > "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line > 184, in __call__ > raise exc > File > "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line > 162, in __call__ > await self.app(scope, receive, _send) > File > "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", > line 75, in __call__ > raise exc > File > "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", > line 64, in __call__ > await self.app(scope, receive, sender) > File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line > 680, in __call__ > await route.handle(scope, receive, send) > File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line > 275, in handle > await self.app(scope, receive, send) > File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line > 65, in app > response = await func(request) > File "/src/worker/baseplate_statefun/server/asgi/make_statefun_handler.py", > line 25, in statefun_handler > result = await handler.handle_async(request_body) > File "/src/.venv/lib/python3.9/site-packages/statefun/request_reply_v3.py", > line 262, in handle_async > msg = Message(target_typename=sdk_address.typename, > target_id=sdk_address.id, > File "/src/.venv/lib/python3.9/site-packages/sta
[jira] [Commented] (FLINK-28747) "target_id can not be missing" in HTTP statefun request
[ https://issues.apache.org/jira/browse/FLINK-28747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574267#comment-17574267 ] Till Rohrmann commented on FLINK-28747: --- Thanks for reporting this issue [~stepweiwu]. Without knowing the exact details of your StateFun job, I suspect that it could be caused by Kafka messages that only have an empty string as a key specified. This could also explain why you are seeing the problem appear across different services consuming from the same topic. Empty strings should be valid keys, so I believe that this is a bug in the Python SDK where we check {{if not target_id: raise ValueError}} that is executed if the key is empty (https://github.com/apache/flink-statefun/blob/master/statefun-sdk-python/statefun/messages.py#L41). As a workaround (if you can control it), you could require your Kafka message to require a non-empty key. The proper fix would be to fix the Python SDK. > "target_id can not be missing" in HTTP statefun request > --- > > Key: FLINK-28747 > URL: https://issues.apache.org/jira/browse/FLINK-28747 > Project: Flink > Issue Type: Bug > Components: Stateful Functions >Affects Versions: statefun-3.0.0, statefun-3.2.0, statefun-3.1.1 >Reporter: Stephan Weinwurm >Priority: Major > > Hi all, > We've suddenly started to see the following exception in our HTTP statefun > functions endpoints: > {code}Traceback (most recent call last): > File > "/src/.venv/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", > line 403, in run_asgi > result = await app(self.scope, self.receive, self.send) > File > "/src/.venv/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", > line 78, in __call__ > return await self.app(scope, receive, send) > File "/src/worker/baseplate_asgi/asgi/baseplate_asgi_middleware.py", line > 37, in __call__ > await span_processor.execute() > File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line > 61, in execute > raise e > File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line > 57, in execute > await self.app(self.scope, self.receive, self.send) > File "/src/.venv/lib/python3.9/site-packages/starlette/applications.py", > line 124, in __call__ > await self.middleware_stack(scope, receive, send) > File > "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line > 184, in __call__ > raise exc > File > "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line > 162, in __call__ > await self.app(scope, receive, _send) > File > "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", > line 75, in __call__ > raise exc > File > "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", > line 64, in __call__ > await self.app(scope, receive, sender) > File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line > 680, in __call__ > await route.handle(scope, receive, send) > File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line > 275, in handle > await self.app(scope, receive, send) > File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line > 65, in app > response = await func(request) > File "/src/worker/baseplate_statefun/server/asgi/make_statefun_handler.py", > line 25, in statefun_handler > result = await handler.handle_async(request_body) > File "/src/.venv/lib/python3.9/site-packages/statefun/request_reply_v3.py", > line 262, in handle_async > msg = Message(target_typename=sdk_address.typename, > target_id=sdk_address.id, > File "/src/.venv/lib/python3.9/site-packages/statefun/messages.py", line > 42, in __init__ > raise ValueError("target_id can not be missing"){code} > Interestingly, this has started to happen in three separate Flink deployments > at the very same time. The only thing in common between the three deployments > is that they consume the same Kafka topics. > No deployments have happened when the issue started happening which was on > July 28th 3:05PM. We have since been continuously seeing the error. > We were also able to extract the request that Flink sends to the HTTP > statefun endpoint: > {code}{'invocation': {'target': {'namespace': 'com.x.dummy', 'type': > 'dummy'}, 'invocations': [{'argument': {'typename': > 'type.googleapis.com/v2_event.Event', 'has_value': True, 'value': > '-redicated-'}}]}} > {code} > As you can see, no `id` field is present in the `invocation.target` object or > the `target_id` was an empty string. > > This is our module.yaml from one of the Flink deployments: > > {code} > version: "3.0" > module: > meta: > type: remote > spec: > endpoints: > - endpoint: > me
[jira] [Commented] (FLINK-26845) Document testing guidelines
[ https://issues.apache.org/jira/browse/FLINK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514598#comment-17514598 ] Till Rohrmann commented on FLINK-26845: --- Hi [~Alcántara], I think a good start would be to document and give examples for what is already there. We could create for each SDK a sub task to do this. Based on this we will pretty quickly realize which SDK is ahead in terms os testability and what needs to be added to the other SDKs. Then, as you've proposed, we should tackle the e2e testing utilities. I would probably make this a separate ticket as well. I think that the e2e testing might be less SDK specific and could be covered in a generic fashion. The main part is probably the deployment of a SF cluster which should overlap a lot with the general deployment documentation of SF. Would you be interested in helping with the first part of documenting what is already there? > Document testing guidelines > --- > > Key: FLINK-26845 > URL: https://issues.apache.org/jira/browse/FLINK-26845 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Salva >Priority: Minor > > As of now, there seems to be not much guidance on how to approach testing. > Although it's true [~sjwiesman] that > {quote}Unlike other flink apis, the sdk doesn’t pull in the runtime so > testing should really look like any other code. There isn’t much statefun > specific. > {quote} > I think that at the very least testing should be mentioned as part of the > docs, even if it is only to stress the above fact. Indeed, as reported by > [~trohrmann], there seems that there are already some test utilities in place > but they are not well-documented, plus some potential ideas on how to improve > on that front > {quote}Testing tools is definitely one area where we want to improve > significantly. Also the documentation for how to do things needs to be > updated. There are some testing utils that you can already use today: > [https://github.com/apache/flink-statefun/blob/master/statefun-sdk-java/src/main/java/org/apache/flink/statefun/sdk/java/testing/TestContext.java…|https://t.co/WGjyMA510b]. > However, it is not well documented. > {quote} > Once the overall guidelines are in place for different testing strategies, > tests could be added to the examples in the > {_}[playground|https://github.com/apache/flink-statefun-playground]{_}. > {_}Note{_}: Issue originally reported in Twitter: > [https://twitter.com/salvalcantara/status/1505834101026267136?s=20&t=Go2IHP6iP4ZmIyVLmIeD3g] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-26617) Pass Kafka headers to remote functions and egresses
[ https://issues.apache.org/jira/browse/FLINK-26617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514596#comment-17514596 ] Till Rohrmann commented on FLINK-26617: --- If this is a requirement, then I guess we need to send the span context to the remote functions. I think this should be doable as this is not a lot of information. At the moment I am not 100% sure what things need to change in order to fully support tracing in SF. To explore this and the requirements, I'd be fine if you explored it using the example of Kafka records with spans attached. Based on that we could think about how to generalize the mechanism so that it also works with other ingresses/egresses. > Pass Kafka headers to remote functions and egresses > --- > > Key: FLINK-26617 > URL: https://issues.apache.org/jira/browse/FLINK-26617 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Fil Karnicki >Priority: Minor > > Typically OpenTelemetry (FLINK-22390) tracing spans get passed in kafka > headers. We could be passing not only the Kafka ConsumerRecord value, but > also the headers to remote functions, if the user configures their kafka > ingress to do so > Similarly, kafka egresses could be configurable so that headers get passed on > via the KafkaProducerRecord proto to kafka -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-26570) Remote module configuration interpolation
[ https://issues.apache.org/jira/browse/FLINK-26570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513271#comment-17513271 ] Till Rohrmann commented on FLINK-26570: --- Thanks for opening this PR [~Fil Karnicki]. I will take a look at it. > Remote module configuration interpolation > - > > Key: FLINK-26570 > URL: https://issues.apache.org/jira/browse/FLINK-26570 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Fil Karnicki >Assignee: Fil Karnicki >Priority: Major > Labels: pull-request-available > > Add the ability for users to provide placeholders in module.yaml, e.g. > {code:java} > kind: com.foo.bar/test > spec: > something: ${REPLACE_ME} > transport: > password: ${REPLACE_ME_WITH_A_SECRET} > array: > - ${REPLACE_ME} > - sthElse {code} > These placeholders would be resolved in > org.apache.flink.statefun.flink.core.jsonmodule.RemoteModule#bindComponent > using > {code:java} > ParameterTool.fromSystemProperties().mergeWith(ParameterTool.fromMap(globalConfiguration) > {code} > by traversing the ComponentJsonObject.specJsonNode() and replacing values > that contain placeholders with values from the combined system+globalConfig > map -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-26628) Use REST program arguments in StatefulFunctionsJob
[ https://issues.apache.org/jira/browse/FLINK-26628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513229#comment-17513229 ] Till Rohrmann commented on FLINK-26628: --- I think [~igal] is right here [~Fil Karnicki]. Flink's web submission has some limitations and this is one of them if I am not mistaken. In general, the recommendation in the Flink community is to not use the web submission but rather to use the cli. Hence, I believe that this issue is a Flink and not a SF issue. > Use REST program arguments in StatefulFunctionsJob > -- > > Key: FLINK-26628 > URL: https://issues.apache.org/jira/browse/FLINK-26628 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Fil Karnicki >Priority: Major > > Currently the Program Arguments passed into the REST api don't get used by > StreamingExecutionEnvironment in StatefulFunctionsJob (in this case the > checkpointing will not be set) > {code:java} > --execution.checkpointing.interval 1000 --state.backend rocksdb > --state.checkpoint-storage filesystem --state.checkpoints.dir file:///tmp/ > --statefun.embedded true {code} > Conversely, Flink CLI params *do* get used by the > StreamingExecutionEnvironment in statefun jobs > {code:java} > flink run -Dexecution.checkpointing.interval=1000 -Dstate.backend=rocksdb > -Dstate.checkpoint-storage=filesystem -Dstate.checkpoints.dir=file:///tmp/ > -Dstatefun.embedded=true myjar.jar{code} > > To reproduce, > # clone and run mvn package on > [https://github.com/FilKarnicki/statefun-flinkjob/tree/argsNotUsedViaRest] > # run docker-compose up in flinkjob/docker-compose > # observe checkpointing happening for this job > # go to [http://localhost:8081/#/submit] and submit > flinkjob-1.0-SNAPSHOT.jar again manually from target with program arguments > {code:java} > --execution.checkpointing.interval 1000 --state.backend rocksdb > --state.checkpoint-storage filesystem --state.checkpoints.dir file:///tmp/ > --statefun.embedded true {code} > 5. observe no checkpointing happening for the second job -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-26617) Pass Kafka headers to remote functions and egresses
[ https://issues.apache.org/jira/browse/FLINK-26617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513227#comment-17513227 ] Till Rohrmann commented on FLINK-26617: --- Thanks for creating this ticket [~Fil Karnicki]. I think the topic of tracing is very interesting as it opens SF up for better understanding what is going on. Ideally, we can solve this problem generically as tracing is also interesting for other ingresses and SF calls in general. I think what we would need to add to the runtime is support for creating/deriving spans and propagating their contexts. I am not sure whether this information really needs to be forwarded to a remote function unless the remote function can spawn external calls that need the context as well. That way we don't need SDK specific handling logic. However, we would need to adjust the ingresses and egresses to understand the span/context field of a message. > Pass Kafka headers to remote functions and egresses > --- > > Key: FLINK-26617 > URL: https://issues.apache.org/jira/browse/FLINK-26617 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Fil Karnicki >Priority: Minor > > Typically OpenTelemetry (FLINK-22390) tracing spans get passed in kafka > headers. We could be passing not only the Kafka ConsumerRecord value, but > also the headers to remote functions, if the user configures their kafka > ingress to do so > Similarly, kafka egresses could be configurable so that headers get passed on > via the KafkaProducerRecord proto to kafka -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-26424) RequestTimeoutException
[ https://issues.apache.org/jira/browse/FLINK-26424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513212#comment-17513212 ] Till Rohrmann commented on FLINK-26424: --- Yes, such a feature will definitely be an opt-in feature. Thanks for your feedback [~Fil Karnicki]. > RequestTimeoutException > > > Key: FLINK-26424 > URL: https://issues.apache.org/jira/browse/FLINK-26424 > Project: Flink > Issue Type: Bug > Components: Stateful Functions >Affects Versions: statefun-3.2.0 >Reporter: xinchenyuan >Priority: Major > > there is no max retries, all I got is the call timeout > as doc said, [Transport Spec > |https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/docs/modules/http-endpoint/#transport-1]call > will be failed after timeout. > but when expcetion raised, runtime restart, I'm confused why a function > internal error will cause such a big problem, will MAX RETRIES be a > configurable param? > > 2022-02-28 17:58:32 > org.apache.flink.statefun.flink.core.functions.StatefulFunctionInvocationException: > An error occurred when attempting to invoke function FunctionType(tendoc, > AlertNotificationIngressCkafka). > at > org.apache.flink.statefun.flink.core.functions.StatefulFunction.receive(StatefulFunction.java:50) > at > org.apache.flink.statefun.flink.core.functions.ReusableContext.apply(ReusableContext.java:73) > at > org.apache.flink.statefun.flink.core.functions.FunctionActivation.applyNextPendingEnvelope(FunctionActivation.java:50) > at > org.apache.flink.statefun.flink.core.functions.LocalFunctionGroup.processNextEnvelope(LocalFunctionGroup.java:61) > at > org.apache.flink.statefun.flink.core.functions.Reductions.processEnvelopes(Reductions.java:164) > at > org.apache.flink.statefun.flink.core.functions.AsyncSink.drainOnOperatorThread(AsyncSink.java:119) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) > at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxExecutorImpl.yield(MailboxExecutorImpl.java:86) > at > org.apache.flink.statefun.flink.core.functions.FunctionGroupOperator.processElement(FunctionGroupOperator.java:88) > at > org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:108) > at > org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:89) > at > org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39) > at > org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50) > at > org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28) > at > org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.sendDownstream(FeedbackUnionOperator.java:186) > at > org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.processElement(FeedbackUnionOperator.java:86) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:205) > at > org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134) > at > org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) > at java.base/java.lang.Thread.run(Unknown Source) > Caused by: java.lang.IllegalStateException: Failure forwarding a message to a > remote function Address(tendoc, AlertNotificationIngressCkafka, cls-message) > at > org.apache.flink.statefun.flink.core.reqreply.RequestReplyFunction.onAsyncResult(RequestReplyFunction.java:170) > at > org.apache.flink.statefun.flink.core.reqreply.RequestReplyFunction.invoke(RequestReplyFunction.java:1
[jira] [Commented] (FLINK-26570) Remote module configuration interpolation
[ https://issues.apache.org/jira/browse/FLINK-26570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504762#comment-17504762 ] Till Rohrmann commented on FLINK-26570: --- Thanks for creating this ticket [~Fil Karnicki]. I like this idea. Could environment variables be another mechanism to supply placeholder values? > Remote module configuration interpolation > - > > Key: FLINK-26570 > URL: https://issues.apache.org/jira/browse/FLINK-26570 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Fil Karnicki >Priority: Major > > Add the ability for users to provide placeholders in module.yaml, e.g. > {code:java} > kind: com.foo.bar/test > spec: > something: ${REPLACE_ME} > transport: > password: ${REPLACE_ME_WITH_A_SECRET} > array: > - ${REPLACE_ME} > - sthElse {code} > These placeholders would be resolved in > org.apache.flink.statefun.flink.core.jsonmodule.RemoteModule#bindComponent > using > {code:java} > ParameterTool.fromSystemProperties().mergeWith(ParameterTool.fromMap(globalConfiguration) > {code} > by traversing the ComponentJsonObject.specJsonNode() and replacing values > that contain placeholders with values from the combined system+globalConfig > map -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-23190) Make task-slot allocation much more evenly
[ https://issues.apache.org/jira/browse/FLINK-23190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502807#comment-17502807 ] Till Rohrmann commented on FLINK-23190: --- Sorry for not getting back to your PR [~loyi]. Unfortunately, I have left my previous company and am no longer working very actively on the Flink project. The best thing would be to reach out to the Flink community and look for a new shepherd. Sorry for the inconveniences I've caused here. > Make task-slot allocation much more evenly > -- > > Key: FLINK-23190 > URL: https://issues.apache.org/jira/browse/FLINK-23190 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.14.0, 1.12.3, 1.13.1 >Reporter: loyi >Assignee: loyi >Priority: Major > Labels: pull-request-available, stale-assigned > Attachments: image-2021-07-16-10-34-30-700.png > > > FLINK-12122 only guarantees spreading out tasks across the set of TMs which > are registered at the time of scheduling, but our jobs are all runing on > active yarn mode, the job with smaller source parallelism offen cause > load-balance issues. > > For this job: > {code:java} > // -ys 4 means 10 taskmanagers > env.addSource(...).name("A").setParallelism(10). > map(...).name("B").setParallelism(30) > .map(...).name("C").setParallelism(40) > .addSink(...).name("D").setParallelism(20); > {code} > > Flink-1.12.3 task allocation: > ||operator||tm1 ||tm2||tm3||tm4||tm5||5m6||tm7||tm8||tm9||tm10|| > |A| > 1|{color:#de350b}2{color}|{color:#de350b}2{color}|1|1|{color:#de350b}3{color}|{color:#de350b}0{color}|{color:#de350b}0{color}|{color:#de350b}0{color}|{color:#de350b}0{color}| > |B|3|3|3|3|3|3|3|3|{color:#de350b}2{color}|{color:#de350b}4{color}| > |C|4|4|4|4|4|4|4|4|4|4| > |D|2|2|2|2|2|{color:#de350b}1{color}|{color:#de350b}1{color}|2|2|{color:#de350b}4{color}| > > Suggestions: > When TaskManger start register slots to slotManager , current processing > logic will choose the first pendingSlot which meet its resource > requirements. The "random" strategy usually causes uneven task allocation > when source-operator's parallelism is significantly below process-operator's. > A simple feasible idea is {color:#de350b}partition{color} the current > "{color:#de350b}pendingSlots{color}" by their "JobVertexIds" (such as let > AllocationID bring the detail) , then allocate the slots proportionally to > each JobVertexGroup. > > For above case, the 40 pendingSlots could be divided into 4 groups: > [ABCD]: 10 // A、B、C、D reprents {color:#de350b}jobVertexId{color} > [BCD]: 10 > [CD]: 10 > [D]: 10 > > Every taskmanager will provide 4 slots one time, and each group will get 1 > slot according their proportion (1/4), the final allocation result is below: > [ABCD] : deploye on 10 different taskmangers > [BCD]: deploye on 10 different taskmangers > [CD]: deploye on 10 different taskmangers > [D]: deploye on 10 different taskmangers > > I have implement a [concept > code|https://github.com/saddays/flink/commit/dc82e60a7c7599fbcb58c14f8e3445bc8d07ace1] > based on Flink-1.12.3 , the patch version has {color:#de350b}fully > evenly{color} task allocation , and works well on my workload . Are there > other point that have not been considered or does it conflict with future > plans? Sorry for my poor english. > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-26424) RequestTimeoutException
[ https://issues.apache.org/jira/browse/FLINK-26424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502804#comment-17502804 ] Till Rohrmann commented on FLINK-26424: --- I think we will soon solve this problem. There is no PR for it yet, though. > RequestTimeoutException > > > Key: FLINK-26424 > URL: https://issues.apache.org/jira/browse/FLINK-26424 > Project: Flink > Issue Type: Bug > Components: Stateful Functions >Affects Versions: statefun-3.2.0 >Reporter: xinchenyuan >Priority: Major > > there is no max retries, all I got is the call timeout > as doc said, [Transport Spec > |https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/docs/modules/http-endpoint/#transport-1]call > will be failed after timeout. > but when expcetion raised, runtime restart, I'm confused why a function > internal error will cause such a big problem, will MAX RETRIES be a > configurable param? > > 2022-02-28 17:58:32 > org.apache.flink.statefun.flink.core.functions.StatefulFunctionInvocationException: > An error occurred when attempting to invoke function FunctionType(tendoc, > AlertNotificationIngressCkafka). > at > org.apache.flink.statefun.flink.core.functions.StatefulFunction.receive(StatefulFunction.java:50) > at > org.apache.flink.statefun.flink.core.functions.ReusableContext.apply(ReusableContext.java:73) > at > org.apache.flink.statefun.flink.core.functions.FunctionActivation.applyNextPendingEnvelope(FunctionActivation.java:50) > at > org.apache.flink.statefun.flink.core.functions.LocalFunctionGroup.processNextEnvelope(LocalFunctionGroup.java:61) > at > org.apache.flink.statefun.flink.core.functions.Reductions.processEnvelopes(Reductions.java:164) > at > org.apache.flink.statefun.flink.core.functions.AsyncSink.drainOnOperatorThread(AsyncSink.java:119) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) > at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxExecutorImpl.yield(MailboxExecutorImpl.java:86) > at > org.apache.flink.statefun.flink.core.functions.FunctionGroupOperator.processElement(FunctionGroupOperator.java:88) > at > org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:108) > at > org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:89) > at > org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39) > at > org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50) > at > org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28) > at > org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.sendDownstream(FeedbackUnionOperator.java:186) > at > org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.processElement(FeedbackUnionOperator.java:86) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:205) > at > org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134) > at > org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) > at java.base/java.lang.Thread.run(Unknown Source) > Caused by: java.lang.IllegalStateException: Failure forwarding a message to a > remote function Address(tendoc, AlertNotificationIngressCkafka, cls-message) > at > org.apache.flink.statefun.flink.core.reqreply.RequestReplyFunction.onAsyncResult(RequestReplyFunction.java:170) > at > org.apache.flink.statefun.flink.core.reqreply.RequestReplyFunction.invoke(RequestReplyFunction.java:124) > at > org.apach
[jira] [Commented] (FLINK-26424) RequestTimeoutException
[ https://issues.apache.org/jira/browse/FLINK-26424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502111#comment-17502111 ] Till Rohrmann commented on FLINK-26424: --- I think the idea would be that you can define an egress to which the dead letter box outputs the dead letters. That way the user can monitor this egress for observing the failed messages. > RequestTimeoutException > > > Key: FLINK-26424 > URL: https://issues.apache.org/jira/browse/FLINK-26424 > Project: Flink > Issue Type: Bug > Components: Stateful Functions >Affects Versions: statefun-3.2.0 >Reporter: xinchenyuan >Priority: Major > > there is no max retries, all I got is the call timeout > as doc said, [Transport Spec > |https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/docs/modules/http-endpoint/#transport-1]call > will be failed after timeout. > but when expcetion raised, runtime restart, I'm confused why a function > internal error will cause such a big problem, will MAX RETRIES be a > configurable param? > > 2022-02-28 17:58:32 > org.apache.flink.statefun.flink.core.functions.StatefulFunctionInvocationException: > An error occurred when attempting to invoke function FunctionType(tendoc, > AlertNotificationIngressCkafka). > at > org.apache.flink.statefun.flink.core.functions.StatefulFunction.receive(StatefulFunction.java:50) > at > org.apache.flink.statefun.flink.core.functions.ReusableContext.apply(ReusableContext.java:73) > at > org.apache.flink.statefun.flink.core.functions.FunctionActivation.applyNextPendingEnvelope(FunctionActivation.java:50) > at > org.apache.flink.statefun.flink.core.functions.LocalFunctionGroup.processNextEnvelope(LocalFunctionGroup.java:61) > at > org.apache.flink.statefun.flink.core.functions.Reductions.processEnvelopes(Reductions.java:164) > at > org.apache.flink.statefun.flink.core.functions.AsyncSink.drainOnOperatorThread(AsyncSink.java:119) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) > at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxExecutorImpl.yield(MailboxExecutorImpl.java:86) > at > org.apache.flink.statefun.flink.core.functions.FunctionGroupOperator.processElement(FunctionGroupOperator.java:88) > at > org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:108) > at > org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:89) > at > org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39) > at > org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50) > at > org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28) > at > org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.sendDownstream(FeedbackUnionOperator.java:186) > at > org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.processElement(FeedbackUnionOperator.java:86) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:205) > at > org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134) > at > org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) > at java.base/java.lang.Thread.run(Unknown Source) > Caused by: java.lang.IllegalStateException: Failure forwarding a message to a > remote function Address(tendoc, AlertNotificationIngressCkafka, cls-message) > at > org.apache.flink.statefun.flink.core.reqreply.RequestReplyFunction.onAsyncResult(RequestReplyFunction.java:170) > at > org.apache.f
[jira] [Commented] (FLINK-26424) RequestTimeoutException
[ https://issues.apache.org/jira/browse/FLINK-26424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500597#comment-17500597 ] Till Rohrmann commented on FLINK-26424: --- I agree that a single failed remote function call should not affect the other function calls by restarting the whole topology. Could something like a dead letter box help in your situation? The system could send messages that can't be invoked or that failed for some reason to a special endpoint that either outputs the messages to somewhere or drops them. > RequestTimeoutException > > > Key: FLINK-26424 > URL: https://issues.apache.org/jira/browse/FLINK-26424 > Project: Flink > Issue Type: Bug > Components: Stateful Functions >Affects Versions: statefun-3.2.0 >Reporter: xinchenyuan >Priority: Major > > there is no max retries, all I got is the call timeout > as doc said, [Transport Spec > |https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/docs/modules/http-endpoint/#transport-1]call > will be failed after timeout. > but when expcetion raised, runtime restart, I'm confused why a function > internal error will cause such a big problem, will MAX RETRIES be a > configurable param? > > 2022-02-28 17:58:32 > org.apache.flink.statefun.flink.core.functions.StatefulFunctionInvocationException: > An error occurred when attempting to invoke function FunctionType(tendoc, > AlertNotificationIngressCkafka). > at > org.apache.flink.statefun.flink.core.functions.StatefulFunction.receive(StatefulFunction.java:50) > at > org.apache.flink.statefun.flink.core.functions.ReusableContext.apply(ReusableContext.java:73) > at > org.apache.flink.statefun.flink.core.functions.FunctionActivation.applyNextPendingEnvelope(FunctionActivation.java:50) > at > org.apache.flink.statefun.flink.core.functions.LocalFunctionGroup.processNextEnvelope(LocalFunctionGroup.java:61) > at > org.apache.flink.statefun.flink.core.functions.Reductions.processEnvelopes(Reductions.java:164) > at > org.apache.flink.statefun.flink.core.functions.AsyncSink.drainOnOperatorThread(AsyncSink.java:119) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) > at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxExecutorImpl.yield(MailboxExecutorImpl.java:86) > at > org.apache.flink.statefun.flink.core.functions.FunctionGroupOperator.processElement(FunctionGroupOperator.java:88) > at > org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:108) > at > org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:89) > at > org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39) > at > org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50) > at > org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28) > at > org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.sendDownstream(FeedbackUnionOperator.java:186) > at > org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.processElement(FeedbackUnionOperator.java:86) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:205) > at > org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134) > at > org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) > at java.base/java.lang.Thread.run(Unknown Source) > Caused by: java.lang.IllegalStateException: Failure forwarding a message to a > remote function Address(tendoc, AlertNotificationIngre
[jira] [Commented] (FLINK-26424) RequestTimeoutException
[ https://issues.apache.org/jira/browse/FLINK-26424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499964#comment-17499964 ] Till Rohrmann commented on FLINK-26424: --- Hi [~ychensha], I assume you are using StateFun's asynchronous {{RequestReplyClient}}, right? The request will fail with the {{RequestTimeoutException}} if it could not be completed within the call timeout. During this time, the client will retry the request with an exponential backoff strategy. Hence, I would recommend trying to increase {{spec.transports.timeout.call}} to something higher. > RequestTimeoutException > > > Key: FLINK-26424 > URL: https://issues.apache.org/jira/browse/FLINK-26424 > Project: Flink > Issue Type: Bug > Components: Stateful Functions >Affects Versions: statefun-3.2.0 >Reporter: xinchenyuan >Priority: Major > > there is no max retries, all I got is the call timeout > as doc said, [Transport Spec > |https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/docs/modules/http-endpoint/#transport-1]call > will be failed after timeout. > but when expcetion raised, runtime restart, I'm confused why a function > internal error will cause such a big problem, will MAX RETRIES be a > configurable param? > > 2022-02-28 17:58:32 > org.apache.flink.statefun.flink.core.functions.StatefulFunctionInvocationException: > An error occurred when attempting to invoke function FunctionType(tendoc, > AlertNotificationIngressCkafka). > at > org.apache.flink.statefun.flink.core.functions.StatefulFunction.receive(StatefulFunction.java:50) > at > org.apache.flink.statefun.flink.core.functions.ReusableContext.apply(ReusableContext.java:73) > at > org.apache.flink.statefun.flink.core.functions.FunctionActivation.applyNextPendingEnvelope(FunctionActivation.java:50) > at > org.apache.flink.statefun.flink.core.functions.LocalFunctionGroup.processNextEnvelope(LocalFunctionGroup.java:61) > at > org.apache.flink.statefun.flink.core.functions.Reductions.processEnvelopes(Reductions.java:164) > at > org.apache.flink.statefun.flink.core.functions.AsyncSink.drainOnOperatorThread(AsyncSink.java:119) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) > at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxExecutorImpl.yield(MailboxExecutorImpl.java:86) > at > org.apache.flink.statefun.flink.core.functions.FunctionGroupOperator.processElement(FunctionGroupOperator.java:88) > at > org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:108) > at > org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:89) > at > org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39) > at > org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50) > at > org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28) > at > org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.sendDownstream(FeedbackUnionOperator.java:186) > at > org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.processElement(FeedbackUnionOperator.java:86) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:205) > at > org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134) > at > org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) > at java.base/java.lang.Thread.run(Unknown Source) > Caused by: java.lang.IllegalStateException: Failure forwarding a message to a > remote function
[jira] [Commented] (FLINK-26086) fixed some causing ambiguities including the 'shit' comment
[ https://issues.apache.org/jira/browse/FLINK-26086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497256#comment-17497256 ] Till Rohrmann commented on FLINK-26086: --- Thanks for fixing this issue [~nyingping]. > fixed some causing ambiguities including the 'shit' comment > --- > > Key: FLINK-26086 > URL: https://issues.apache.org/jira/browse/FLINK-26086 > Project: Flink > Issue Type: Improvement >Reporter: nyingping >Assignee: nyingping >Priority: Minor > Labels: pull-request-available > Attachments: image-2022-02-11-16-38-12-674.png > > > such as > [`@param shiftTimeZone the shit timezone of the > window`|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/window/buffers/WindowBuffer.java#L96] > !image-2022-02-11-16-38-12-674.png! -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (FLINK-26086) fixed some causing ambiguities including the 'shit' comment
[ https://issues.apache.org/jira/browse/FLINK-26086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-26086: - Assignee: nyingping > fixed some causing ambiguities including the 'shit' comment > --- > > Key: FLINK-26086 > URL: https://issues.apache.org/jira/browse/FLINK-26086 > Project: Flink > Issue Type: Improvement >Reporter: nyingping >Assignee: nyingping >Priority: Minor > Labels: pull-request-available > Attachments: image-2022-02-11-16-38-12-674.png > > > such as > [`@param shiftTimeZone the shit timezone of the > window`|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/window/buffers/WindowBuffer.java#L96] > !image-2022-02-11-16-38-12-674.png! -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (FLINK-26340) Add ability in Golang SDK to create new statefun.Context from existing one, but with a new underlying context.Context
[ https://issues.apache.org/jira/browse/FLINK-26340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-26340: - Assignee: Galen Warren > Add ability in Golang SDK to create new statefun.Context from existing one, > but with a new underlying context.Context > - > > Key: FLINK-26340 > URL: https://issues.apache.org/jira/browse/FLINK-26340 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Affects Versions: statefun-3.3.0 >Reporter: Galen Warren >Assignee: Galen Warren >Priority: Minor > Labels: pull-request-available > Original Estimate: 72h > Remaining Estimate: 72h > > In the Golang SDK, statefun.Context embeds the context.Context interface and > is implemented by the statefunContext struct, which embeds a context.Context. > To support common patterns in Golang related to adding values to context, it > would be useful to be able to create a derived statefun.Context that is > equivalent to the original in terms of statefun functionality but which wraps > a different context.Context. > The proposal is to add a: > WithContext(ctx context.Context) statefun.Context > ... method to the statefun.Context interface and implement it on > statefunContext. This method would return the derived statefun context. > This is a breaking change to statefun.Context, but, given its purpose, we do > not expect there to be implementations of this interface outside the Golang > SDK. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-18356) Exit code 137 returned from process
[ https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495941#comment-17495941 ] Till Rohrmann commented on FLINK-18356: --- What is the state of this ticket [~chesnay]? > Exit code 137 returned from process > --- > > Key: FLINK-18356 > URL: https://issues.apache.org/jira/browse/FLINK-18356 > Project: Flink > Issue Type: Bug > Components: Build System / Azure Pipelines, Tests >Affects Versions: 1.12.0, 1.13.0, 1.14.0, 1.15.0 >Reporter: Piotr Nowojski >Assignee: Chesnay Schepler >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.15.0 > > Attachments: 1234.jpg, app-profiling_4.gif > > > {noformat} > = test session starts > == > platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1 > cachedir: .tox/py37-cython/.pytest_cache > rootdir: /__w/3/s/flink-python > collected 568 items > pyflink/common/tests/test_configuration.py ..[ > 1%] > pyflink/common/tests/test_execution_config.py ...[ > 5%] > pyflink/dataset/tests/test_execution_environment.py . > ##[error]Exit code 137 returned from process: file name '/bin/docker', > arguments 'exec -i -u 1002 > 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb > /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'. > Finishing: Test - python > {noformat} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=8d78fe4f-d658-5c70-12f8-4921589024c3 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (FLINK-26274) Test local recovery works across TaskManager process restarts
[ https://issues.apache.org/jira/browse/FLINK-26274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-26274: - Assignee: Konstantin Knauf > Test local recovery works across TaskManager process restarts > - > > Key: FLINK-26274 > URL: https://issues.apache.org/jira/browse/FLINK-26274 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / Coordination >Affects Versions: 1.15.0 >Reporter: Till Rohrmann >Assignee: Konstantin Knauf >Priority: Blocker > Labels: release-testing > Fix For: 1.15.0 > > > This ticket is a testing task for > [FLIP-201|https://cwiki.apache.org/confluence/x/wJuqCw]. > When enabling local recovery and configuring a working directory that can be > re-read after a process failure, Flink should now be able to recover locally. > We should test whether this is the case. Please take a look at the > documentation [1, 2] to see how to configure Flink to make use of it. > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/working_directory/ > [2] > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/#enabling-local-recovery-across-pod-restarts -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (FLINK-26191) Incorrect license in Elasticsearch connectors
[ https://issues.apache.org/jira/browse/FLINK-26191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-26191: - Assignee: Fabian Paul > Incorrect license in Elasticsearch connectors > - > > Key: FLINK-26191 > URL: https://issues.apache.org/jira/browse/FLINK-26191 > Project: Flink > Issue Type: Technical Debt > Components: Connectors / ElasticSearch >Reporter: Chesnay Schepler >Assignee: Fabian Paul >Priority: Blocker > Fix For: 1.15.0 > > > The sql-connector-elasticsearc0h 6/7 connector NOTICE lists the elasticsearch > dependencies as ASLv2, but they are nowadays (at least in part) licensed > differently (dual-licensed under elastic license 2.0 & Server Side Public > License (SSPL)). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25977) Close sink client and sink http client for KDS/KDF Sinks
[ https://issues.apache.org/jira/browse/FLINK-25977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495939#comment-17495939 ] Till Rohrmann commented on FLINK-25977: --- PR is reviewed. Waiting for CI to pass. > Close sink client and sink http client for KDS/KDF Sinks > > > Key: FLINK-25977 > URL: https://issues.apache.org/jira/browse/FLINK-25977 > Project: Flink > Issue Type: Bug > Components: Connectors / Kinesis >Reporter: Zichen Liu >Assignee: Ahmed Hamdy >Priority: Blocker > Labels: pull-request-available > Fix For: 1.15.0 > > > We are not closing the AWS SDK and HTTP clients for the new KDS/KDF async > sink. The close method needs to be invoked when operator stops. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25866) Support additional TLS configuration.
[ https://issues.apache.org/jira/browse/FLINK-25866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495578#comment-17495578 ] Till Rohrmann commented on FLINK-25866: --- Hi [~Fil Karnicki], I've assigned you to this ticket. The plan looks good to me. Happy coding :-) > Support additional TLS configuration. > - > > Key: FLINK-25866 > URL: https://issues.apache.org/jira/browse/FLINK-25866 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Igal Shilman >Assignee: Fil Karnicki >Priority: Major > > Currently the default HTTP client used to invoke remote functions does not > support customising the TLS settings as part of the endpoint spec definition. > This includes > using self-signed certificates, and providing client side certificates for > authentication (which is a slightly different requirement). > This issue is about including additional TLS settings to the default endpoint > resource definition, and supporting them in statefun-core. > User mailing list threads: > * [client cert auth in remote > function|https://lists.apache.org/thread/97nw245kxqp32qglwfynhhgyhgp2pxvg] > * [endpoint self-signed certificate > problem|https://lists.apache.org/thread/y2m2bpwg4n71rxfont6pgky2t8m19n7w] > > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (FLINK-25866) Support additional TLS configuration.
[ https://issues.apache.org/jira/browse/FLINK-25866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-25866: - Assignee: Fil Karnicki > Support additional TLS configuration. > - > > Key: FLINK-25866 > URL: https://issues.apache.org/jira/browse/FLINK-25866 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Igal Shilman >Assignee: Fil Karnicki >Priority: Major > > Currently the default HTTP client used to invoke remote functions does not > support customising the TLS settings as part of the endpoint spec definition. > This includes > using self-signed certificates, and providing client side certificates for > authentication (which is a slightly different requirement). > This issue is about including additional TLS settings to the default endpoint > resource definition, and supporting them in statefun-core. > User mailing list threads: > * [client cert auth in remote > function|https://lists.apache.org/thread/97nw245kxqp32qglwfynhhgyhgp2pxvg] > * [endpoint self-signed certificate > problem|https://lists.apache.org/thread/y2m2bpwg4n71rxfont6pgky2t8m19n7w] > > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-26274) Test local recovery works across TaskManager process restarts
[ https://issues.apache.org/jira/browse/FLINK-26274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-26274: -- Fix Version/s: 1.15.0 > Test local recovery works across TaskManager process restarts > - > > Key: FLINK-26274 > URL: https://issues.apache.org/jira/browse/FLINK-26274 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / Coordination >Reporter: Till Rohrmann >Priority: Blocker > Labels: release-testing > Fix For: 1.15.0 > > > This ticket is a testing task for > [FLIP-201|https://cwiki.apache.org/confluence/x/wJuqCw]. > When enabling local recovery and configuring a working directory that can be > re-read after a process failure, Flink should now be able to recover locally. > We should test whether this is the case. Please take a look at the > documentation [1, 2] to see how to configure Flink to make use of it. > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/working_directory/ > [2] > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/#enabling-local-recovery-across-pod-restarts -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26274) Test local recovery works across TaskManager process restarts
Till Rohrmann created FLINK-26274: - Summary: Test local recovery works across TaskManager process restarts Key: FLINK-26274 URL: https://issues.apache.org/jira/browse/FLINK-26274 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination Reporter: Till Rohrmann This ticket is a testing task for [FLIP-201|https://cwiki.apache.org/confluence/x/wJuqCw]. When enabling local recovery and configuring a working directory that can be re-read after a process failure, Flink should now be able to recover locally. We should test whether this is the case. Please take a look at the documentation [1, 2] to see how to configure Flink to make use of it. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/working_directory/ [2] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/#enabling-local-recovery-across-pod-restarts -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-26274) Test local recovery works across TaskManager process restarts
[ https://issues.apache.org/jira/browse/FLINK-26274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-26274: -- Affects Version/s: 1.15.0 > Test local recovery works across TaskManager process restarts > - > > Key: FLINK-26274 > URL: https://issues.apache.org/jira/browse/FLINK-26274 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / Coordination >Affects Versions: 1.15.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: release-testing > Fix For: 1.15.0 > > > This ticket is a testing task for > [FLIP-201|https://cwiki.apache.org/confluence/x/wJuqCw]. > When enabling local recovery and configuring a working directory that can be > re-read after a process failure, Flink should now be able to recover locally. > We should test whether this is the case. Please take a look at the > documentation [1, 2] to see how to configure Flink to make use of it. > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/working_directory/ > [2] > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/#enabling-local-recovery-across-pod-restarts -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-23870) Test the ability for ignoring in-flight data on recovery
[ https://issues.apache.org/jira/browse/FLINK-23870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-23870: -- Component/s: Runtime / Checkpointing > Test the ability for ignoring in-flight data on recovery > > > Key: FLINK-23870 > URL: https://issues.apache.org/jira/browse/FLINK-23870 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / Checkpointing >Reporter: Anton Kalashnikov >Priority: Blocker > Labels: release-testing > Fix For: 1.15.0 > > > Test for https://issues.apache.org/jira/browse/FLINK-22684: > # configure the external checkpoint address (storage) > # enable unaligned checkpoint always (not the default aligned or alternative) > # wait for the checkpoint then stop the job > # restore from the last checkpoint - it should be restored on the last > state(optional step) > # set the > `execution.checkpointing.recover-without-channel-state.checkpoint-id` to the > last checkpoint > # restore from the last checkpoint - in-flight data would be ignored so it > would be restored a little less data than expected in the usual case. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26271) TaskTest.testCancelTaskExceptionAfterTaskMarkedFailed
Till Rohrmann created FLINK-26271: - Summary: TaskTest.testCancelTaskExceptionAfterTaskMarkedFailed Key: FLINK-26271 URL: https://issues.apache.org/jira/browse/FLINK-26271 Project: Flink Issue Type: Improvement Components: Runtime / Task Affects Versions: 1.13.6 Reporter: Till Rohrmann The test {{TaskTest.testCancelTaskExceptionAfterTaskMarkedFailed}} failed on AZP with {code} Feb 21 03:27:40 [ERROR] testCancelTaskExceptionAfterTaskMarkedFailed(org.apache.flink.runtime.taskmanager.TaskTest) Time elapsed: 0.043 s <<< FAILURE! Feb 21 03:27:40 java.lang.AssertionError: expected: but was: Feb 21 03:27:40 at org.junit.Assert.fail(Assert.java:88) Feb 21 03:27:40 at org.junit.Assert.failNotEquals(Assert.java:834) Feb 21 03:27:40 at org.junit.Assert.assertEquals(Assert.java:118) Feb 21 03:27:40 at org.junit.Assert.assertEquals(Assert.java:144) Feb 21 03:27:40 at org.apache.flink.runtime.taskmanager.TaskTest.testCancelTaskExceptionAfterTaskMarkedFailed(TaskTest.java:598) Feb 21 03:27:40 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Feb 21 03:27:40 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Feb 21 03:27:40 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Feb 21 03:27:40 at java.lang.reflect.Method.invoke(Method.java:498) Feb 21 03:27:40 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) Feb 21 03:27:40 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) Feb 21 03:27:40 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) Feb 21 03:27:40 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) Feb 21 03:27:40 at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) Feb 21 03:27:40 at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) Feb 21 03:27:40 at org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) Feb 21 03:27:40 at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) Feb 21 03:27:40 at org.junit.rules.RunRules.evaluate(RunRules.java:20) Feb 21 03:27:40 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) Feb 21 03:27:40 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) Feb 21 03:27:40 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) Feb 21 03:27:40 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) Feb 21 03:27:40 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) Feb 21 03:27:40 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) Feb 21 03:27:40 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) Feb 21 03:27:40 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) Feb 21 03:27:40 at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) Feb 21 03:27:40 at org.junit.rules.RunRules.evaluate(RunRules.java:20) Feb 21 03:27:40 at org.junit.runners.ParentRunner.run(ParentRunner.java:363) Feb 21 03:27:40 at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) Feb 21 03:27:40 at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) Feb 21 03:27:40 at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) Feb 21 03:27:40 at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) Feb 21 03:27:40 at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) Feb 21 03:27:40 at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) Feb 21 03:27:40 at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) Feb 21 03:27:40 at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code} https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31903&view=logs&j=4d4a0d10-fca2-5507-8eed-c07f0bdf4887&t=c2734c79-73b6-521c-e85a-67c7ecae9107&l=5762 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25851) CassandraConnectorITCase.testRetrialAndDropTables shows table already exists errors on AZP
[ https://issues.apache.org/jira/browse/FLINK-25851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495356#comment-17495356 ] Till Rohrmann commented on FLINK-25851: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31904&view=logs&j=ba53eb01-1462-56a3-8e98-0dd97fbcaab5&t=2e426bf0-b717-56bb-ab62-d63086457354&l=12880 > CassandraConnectorITCase.testRetrialAndDropTables shows table already exists > errors on AZP > -- > > Key: FLINK-25851 > URL: https://issues.apache.org/jira/browse/FLINK-25851 > Project: Flink > Issue Type: Bug > Components: Connectors / Cassandra >Affects Versions: 1.15.0 >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Critical > Labels: pull-request-available, test-stability > > It happens even if the whole keyspace is dropped in a BeforeClass method and > the table noticed in the stacktrace is dropped in an After method and this > after method is executed even in case of retrials through the Rule. > Jan 24 20:21:33 com.datastax.driver.core.exceptions.AlreadyExistsException: > Table flink.batches already exists > Jan 24 20:21:33 at > com.datastax.driver.core.exceptions.AlreadyExistsException.copy(AlreadyExistsException.java:111) > > Jan 24 20:21:33 at > com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) > > Jan 24 20:21:33 at > com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245) > > Jan 24 20:21:33 at > com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:63) > Jan 24 20:21:33 at > com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:39) > Jan 24 20:21:33 at > org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase.testRetrialAndDropTables(CassandraConnectorITCase.java:554) > > Jan 24 20:21:33 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > Jan 24 20:21:33 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Jan 24 20:21:33 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > Jan 24 20:21:33 at java.lang.reflect.Method.invoke(Method.java:498) > Jan 24 20:21:33 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > > Jan 24 20:21:33 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > > Jan 24 20:21:33 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > > Jan 24 20:21:33 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > > Jan 24 20:21:33 at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > Jan 24 20:21:33 at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > Jan 24 20:21:33 at > org.apache.flink.testutils.junit.RetryRule$RetryOnExceptionStatement.evaluate(RetryRule.java:192) > > Jan 24 20:21:33 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > Jan 24 20:21:33 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > Jan 24 20:21:33 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jan 24 20:21:33 at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > > Jan 24 20:21:33 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > Jan 24 20:21:33 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > > Jan 24 20:21:33 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > > Jan 24 20:21:33 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > Jan 24 20:21:33 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > Jan 24 20:21:33 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > Jan 24 20:21:33 at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > Jan 24 20:21:33 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > Jan 24 20:21:33 at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > > cf: > [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30050&view=logs&j=d44f43ce-542c-597d-bf94-b0718c71e5e8&s=ae4f8708-9994-57d3-c2d7-b892156e7812&t=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d&l=11999] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (FLINK-25346) FLIP-197: API stability graduation process
[ https://issues.apache.org/jira/browse/FLINK-25346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-25346: - Assignee: (was: Till Rohrmann) > FLIP-197: API stability graduation process > -- > > Key: FLINK-25346 > URL: https://issues.apache.org/jira/browse/FLINK-25346 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Affects Versions: 1.15.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.15.0 > > > This ticket is an umbrella ticket for the work on > [FLIP-197|https://cwiki.apache.org/confluence/x/J5eqCw]. It will involve the > introduction of an API graduation process that is guarded by automated tests. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (FLINK-26014) Document how to use the working directory for faster local recoveries
[ https://issues.apache.org/jira/browse/FLINK-26014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-26014. - Resolution: Fixed Fixed via d893292893837037c15c07e16c9fe2b39ec44319 965ca2d7ccc938ac5a6948ccc910fd2cfc0135b4 > Document how to use the working directory for faster local recoveries > - > > Key: FLINK-26014 > URL: https://issues.apache.org/jira/browse/FLINK-26014 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.15.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Blocker > Labels: pull-request-available > Fix For: 1.15.0 > > > After having implemented FLIP-198 and FLIP-201, users can now use faster > TaskManager failover when using local recovery with persisted volumes. I > suggest to add documentation for explaining how to configure Flink to make > use of it. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (FLINK-23403) Decrease default values for heartbeat timeout and interval
[ https://issues.apache.org/jira/browse/FLINK-23403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-23403: - Assignee: (was: Till Rohrmann) > Decrease default values for heartbeat timeout and interval > -- > > Key: FLINK-23403 > URL: https://issues.apache.org/jira/browse/FLINK-23403 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration, Runtime / Coordination >Affects Versions: 1.14.0 >Reporter: Till Rohrmann >Priority: Major > Labels: pull-request-available, stale-assigned > Fix For: 1.15.0 > > > In order to speed up failure detection I suggest to decrease the default > values for the heartbeat timeout and interval from 50s/10s to 15s/3s. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (FLINK-25345) FLIP-196: Source API stability guarantees
[ https://issues.apache.org/jira/browse/FLINK-25345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-25345: - Assignee: (was: Till Rohrmann) > FLIP-196: Source API stability guarantees > - > > Key: FLINK-25345 > URL: https://issues.apache.org/jira/browse/FLINK-25345 > Project: Flink > Issue Type: Improvement > Components: API / DataStream, Documentation >Affects Versions: 1.15.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.15.0 > > > This ticket is an umbrella ticket for the work on > [FLIP-196|https://cwiki.apache.org/confluence/x/IJeqCw] which will properly > document and guard the agreed upon source API stability guarantees. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-26197) Allow playground egress to keep connection open and push messages
[ https://issues.apache.org/jira/browse/FLINK-26197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-26197: -- Labels: playground (was: ) > Allow playground egress to keep connection open and push messages > - > > Key: FLINK-26197 > URL: https://issues.apache.org/jira/browse/FLINK-26197 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Affects Versions: statefun-3.2.0, statefun-3.3.0 >Reporter: Till Rohrmann >Priority: Major > Labels: playground > > In order to improve the getting started experience it would be nice if the > playground egress can keep connections to clients open and push new messages > eagerly. Moreover, we could keep all messages stored in memory to allow > serving old messages. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26197) Allow playground egress to keep connection open and push messages
Till Rohrmann created FLINK-26197: - Summary: Allow playground egress to keep connection open and push messages Key: FLINK-26197 URL: https://issues.apache.org/jira/browse/FLINK-26197 Project: Flink Issue Type: Improvement Components: Stateful Functions Affects Versions: statefun-3.2.0, statefun-3.3.0 Reporter: Till Rohrmann In order to improve the getting started experience it would be nice if the playground egress can keep connections to clients open and push new messages eagerly. Moreover, we could keep all messages stored in memory to allow serving old messages. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (FLINK-26153) Add script to update README playground links for Statefun
[ https://issues.apache.org/jira/browse/FLINK-26153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-26153. - Fix Version/s: statefun-3.3.0 Resolution: Fixed Fixed via cd8e0237f63bb84aae4137ae74481e5291944208 > Add script to update README playground links for Statefun > - > > Key: FLINK-26153 > URL: https://issues.apache.org/jira/browse/FLINK-26153 > Project: Flink > Issue Type: Improvement > Components: Build System / Stateful Functions >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Major > Labels: pull-request-available > Fix For: statefun-3.3.0 > > > In order to keep the playground links in the various README's up to date, it > would be helpful to have a script that updates the links. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (FLINK-26158) Update statefun-playground examples to use playground ingress/egress
[ https://issues.apache.org/jira/browse/FLINK-26158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-26158. - Fix Version/s: statefun-3.3.0 statefun-3.2.1 Resolution: Fixed Fixed via 3.3.0: b436f8e c8bac78 d32b038 9dc5db1 c0b121e c22d280 3.2.1: fdb0e78 7f68b7a 2815d55 56e897f 2c8820d 3d2a1e4 > Update statefun-playground examples to use playground ingress/egress > > > Key: FLINK-26158 > URL: https://issues.apache.org/jira/browse/FLINK-26158 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Major > Labels: pull-request-available > Fix For: statefun-3.3.0, statefun-3.2.1 > > > With FLINK-26153, the statefun-playground has a new ingress/egress that > allows to interact with via curl. I propose to update the existing examples > to make use of it. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (FLINK-26155) Add playground ingress/egress for curl interaction
[ https://issues.apache.org/jira/browse/FLINK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-26155. - Fix Version/s: statefun-3.3.0 statefun-3.2.1 Resolution: Fixed Fixed via 3.3.0: cf7251c8c5f59b4003403a8ec39a55366ef36a1f 3.2.1: 1ba46baa2c645bc70883c5f435fee96a6b576230 > Add playground ingress/egress for curl interaction > -- > > Key: FLINK-26155 > URL: https://issues.apache.org/jira/browse/FLINK-26155 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Major > Labels: pull-request-available > Fix For: statefun-3.3.0, statefun-3.2.1 > > > In order to give a better playground experience we could add a special > playground ingress/egress that supports easy to use cURL interaction > (ingesting new messages and consuming produced messages). > Ingesting new message could look like: > {code} > curl -X PUT -H "Content-Type: application/vnd." -d '' > localhost:8090/// > {code} > Consuming could look like: > {code} > curl -X GET localhost:8091/ > {code} > The ingress/egress are not meant to be production ready and only support the > use cases of the playground. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26158) Update statefun-playground examples to use playground ingress/egress
Till Rohrmann created FLINK-26158: - Summary: Update statefun-playground examples to use playground ingress/egress Key: FLINK-26158 URL: https://issues.apache.org/jira/browse/FLINK-26158 Project: Flink Issue Type: Improvement Components: Stateful Functions Reporter: Till Rohrmann Assignee: Till Rohrmann With FLINK-26153, the statefun-playground has a new ingress/egress that allows to interact with via curl. I propose to update the existing examples to make use of it. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-26155) Add playground ingress/egress for curl interaction
[ https://issues.apache.org/jira/browse/FLINK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-26155: -- Summary: Add playground ingress/egress for curl interaction (was: Add playground ingress/egress for cURL interaction) > Add playground ingress/egress for curl interaction > -- > > Key: FLINK-26155 > URL: https://issues.apache.org/jira/browse/FLINK-26155 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Major > > In order to give a better playground experience we could add a special > playground ingress/egress that supports easy to use cURL interaction > (ingesting new messages and consuming produced messages). > Ingesting new message could look like: > {code} > curl -X PUT -H "Content-Type: application/vnd." -d '' > localhost:8090/// > {code} > Consuming could look like: > {code} > curl -X GET localhost:8091/ > {code} > The ingress/egress are not meant to be production ready and only support the > use cases of the playground. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26155) Add playground ingress/egress for cURL interaction
Till Rohrmann created FLINK-26155: - Summary: Add playground ingress/egress for cURL interaction Key: FLINK-26155 URL: https://issues.apache.org/jira/browse/FLINK-26155 Project: Flink Issue Type: Improvement Components: Stateful Functions Reporter: Till Rohrmann Assignee: Till Rohrmann In order to give a better playground experience we could add a special playground ingress/egress that supports easy to use cURL interaction (ingesting new messages and consuming produced messages). Ingesting new message could look like: {code} curl -X PUT -H "Content-Type: application/vnd." -d '' localhost:8090/// {code} Consuming could look like: {code} curl -X GET localhost:8091/ {code} The ingress/egress are not meant to be production ready and only support the use cases of the playground. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26153) Add script to update README playground links for Statefun
Till Rohrmann created FLINK-26153: - Summary: Add script to update README playground links for Statefun Key: FLINK-26153 URL: https://issues.apache.org/jira/browse/FLINK-26153 Project: Flink Issue Type: Improvement Components: Build System / Stateful Functions Reporter: Till Rohrmann Assignee: Till Rohrmann In order to keep the playground links in the various README's up to date, it would be helpful to have a script that updates the links. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (FLINK-25898) Add README.md to flink-statefun/statefun-sdk-js
[ https://issues.apache.org/jira/browse/FLINK-25898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-25898. - Fix Version/s: statefun-3.3.0 statefun-3.2.1 Resolution: Fixed Fixed via 3.3.0: 3a1680a4b9d849b8b97a6195895d5ba6c9c76a3a 3.2.1: 345f7440a71efaa6d54ddd2ea7c2d4683c8ad35b > Add README.md to flink-statefun/statefun-sdk-js > --- > > Key: FLINK-25898 > URL: https://issues.apache.org/jira/browse/FLINK-25898 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Affects Versions: statefun-3.2.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Major > Labels: pull-request-available > Fix For: statefun-3.3.0, statefun-3.2.1 > > > We should add a {{README.md}} to {{flink-statefun/statefun-sdk-js}}. This > would then also be displayed on > [npmjs.com|https://www.npmjs.com/package/apache-flink-statefun]. > Unfortunately, in order to publish the {{README.md}} we have to release a new > version of the npm package. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26098) TableAPI does not forward idleness configuration from DataStream
Till Rohrmann created FLINK-26098: - Summary: TableAPI does not forward idleness configuration from DataStream Key: FLINK-26098 URL: https://issues.apache.org/jira/browse/FLINK-26098 Project: Flink Issue Type: Bug Components: Table SQL / API Affects Versions: 1.14.3, 1.15.0 Reporter: Till Rohrmann The TableAPI does not forward the idleness configuration from a DataStream source. That can lead to the halt of processing if all sources are idle because {{WatermarkAssignerOperator}} [1] will never set a channel to active again. The only way to mitigate the problem is to explicitly configure the idleness for table sources via {{table.exec.source.idle-timeout}}. Configuring this value is actually not easy because creating a {{StreamExecutionEnvironment}} via {{create(StreamExecutionEnvironment, TableConfig)}} is deprecated. [1] https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/wmassigners/WatermarkAssignerOperator.java#L103 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25981) ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers failed
[ https://issues.apache.org/jira/browse/FLINK-25981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490783#comment-17490783 ] Till Rohrmann commented on FLINK-25981: --- Ok, I think we can ignore the latest occurrence. There has been a gap of roughly 30 minutes: {code} 12:48:27,451 [Curator-ConnectionStateManager-0] DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver [] - Connected to ZooKeeper quorum. Leader election can start. 12:48:27,453 [Curator-ConnectionStateManager-0] DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver [] - Connected to ZooKeeper quorum. Leader election can start. 12:48:27,457 [main-EventThread] INFO org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker [] - New config event received: {} 12:48:27,484 [main-EventThread] DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver [] - ZooKeeperMultipleComponentLeaderElectionDriver obtained the leadership. 12:48:27,511 [main] INFO org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver [] - Closing ZooKeeperMultipleComponentLeaderElectionDriver. 13:16:54,026 [main-EventThread] INFO org.apache.flink.shaded.curator5.org.apache.curator.framework.state.ConnectionStateManager [] - State change: SUSPENDED 13:16:54,179 [Curator-ConnectionStateManager-0] WARN org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver [] - Connection to ZooKeeper suspended, waiting for reconnection. 13:16:54,179 [Curator-ConnectionStateManager-0] WARN org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver [] - Connection to ZooKeeper suspended, waiting for reconnection. 13:16:55,073 [main-EventThread] INFO org.apache.flink.shaded.curator5.org.apache.curator.framework.state.ConnectionStateManager [] - State change: RECONNECTED {code} > ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers > failed > --- > > Key: FLINK-25981 > URL: https://issues.apache.org/jira/browse/FLINK-25981 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.0 >Reporter: Matthias Pohl >Priority: Major > Labels: test-stability > > We experienced a [build > failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30783&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7&l=15997] > in > {{ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers}}. > The test halted when waiting for the next leader in > [ZooKeeperMultipleComponentLeaderElectionDriverTest:256|https://github.com/apache/flink/blob/e8742f7f5cac34852d0e621036e1614bbdfe8ec3/flink-runtime/src/test/java/org/apache/flink/runtime/leaderelection/ZooKeeperMultipleComponentLeaderElectionDriverTest.java#L256] > {code} > Feb 04 18:02:54 "main" #1 prio=5 os_prio=0 tid=0x7fab0800b800 nid=0xe07 > waiting on condition [0x7fab12574000] > Feb 04 18:02:54java.lang.Thread.State: WAITING (parking) > Feb 04 18:02:54 at sun.misc.Unsafe.park(Native Method) > Feb 04 18:02:54 - parking to wait for <0x8065c5c8> (a > java.util.concurrent.CompletableFuture$Signaller) > Feb 04 18:02:54 at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > Feb 04 18:02:54 at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707) > Feb 04 18:02:54 at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > Feb 04 18:02:54 at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742) > Feb 04 18:02:54 at > java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947) > Feb 04 18:02:54 at > org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers(ZooKeeperMultipleComponentLeaderElectionDriverTest.java:256) > [...] > {code} > The extended Maven logs indicate that the timeout happened while waiting for > the second leader to be selected. > {code} > Test > org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers > is running. > > 17:15:10,437 [ Thread-16] INFO > org.apache.curator.test.TestingZooKeeperMain [] - Starting > server > 17:15:10,450 [main] INFO > org.apache.flink.runtime.util.ZooKeeperUtil
[jira] [Commented] (FLINK-25981) ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers failed
[ https://issues.apache.org/jira/browse/FLINK-25981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490781#comment-17490781 ] Till Rohrmann commented on FLINK-25981: --- The new stack trace is {code} 2022-02-10T13:16:59.3317243Z Feb 10 13:16:59 [ERROR] org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers Time elapsed: 1,711.883 s <<< FAILURE! 2022-02-10T13:16:59.3318299Z Feb 10 13:16:59 java.lang.AssertionError: 2022-02-10T13:16:59.3319424Z Feb 10 13:16:59 2022-02-10T13:16:59.3319771Z Feb 10 13:16:59 Expected size: 1 but was: 0 in: 2022-02-10T13:16:59.3323259Z Feb 10 13:16:59 [] 2022-02-10T13:16:59.3324585Z Feb 10 13:16:59at org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers(ZooKeeperMultipleComponentLeaderElectionDriverTest.java:264) 2022-02-10T13:16:59.3325873Z Feb 10 13:16:59at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2022-02-10T13:16:59.3326685Z Feb 10 13:16:59at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 2022-02-10T13:16:59.3327320Z Feb 10 13:16:59at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2022-02-10T13:16:59.3327890Z Feb 10 13:16:59at java.lang.reflect.Method.invoke(Method.java:498) 2022-02-10T13:16:59.3328509Z Feb 10 13:16:59at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725) 2022-02-10T13:16:59.3329147Z Feb 10 13:16:59at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) 2022-02-10T13:16:59.3329864Z Feb 10 13:16:59at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) 2022-02-10T13:16:59.3336949Z Feb 10 13:16:59at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) 2022-02-10T13:16:59.3337933Z Feb 10 13:16:59at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) 2022-02-10T13:16:59.3338887Z Feb 10 13:16:59at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) 2022-02-10T13:16:59.3339947Z Feb 10 13:16:59at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) 2022-02-10T13:16:59.3341415Z Feb 10 13:16:59at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) 2022-02-10T13:16:59.3342564Z Feb 10 13:16:59at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) 2022-02-10T13:16:59.3343672Z Feb 10 13:16:59at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) 2022-02-10T13:16:59.3344663Z Feb 10 13:16:59at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) 2022-02-10T13:16:59.3345610Z Feb 10 13:16:59at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) 2022-02-10T13:16:59.3346328Z Feb 10 13:16:59at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) 2022-02-10T13:16:59.3346995Z Feb 10 13:16:59at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) 2022-02-10T13:16:59.3347721Z Feb 10 13:16:59at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214) 2022-02-10T13:16:59.3348465Z Feb 10 13:16:59at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) 2022-02-10T13:16:59.3349172Z Feb 10 13:16:59at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210) 2022-02-10T13:16:59.3349892Z Feb 10 13:16:59at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135) 2022-02-10T13:16:59.3350599Z Feb 10 13:16:59at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:66) 2022-02-10T13:16:59.3351388Z Feb 10 13:16:59at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151) 2022-02-10T13:16:59.3352468Z Feb 10 13:16:59at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) 2022-02-10T13:16:59.3353204Z Feb 10 13:16:59at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141) 2022-02-10T13:16:59.3353880Z Feb 10 13:16:59at org.junit.
[jira] [Updated] (FLINK-25981) ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers failed
[ https://issues.apache.org/jira/browse/FLINK-25981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-25981: -- Summary: ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers failed (was: ZooKeeperMultipleComponentLeaderElectionDriverTest failed) > ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers > failed > --- > > Key: FLINK-25981 > URL: https://issues.apache.org/jira/browse/FLINK-25981 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.0 >Reporter: Matthias Pohl >Priority: Major > Labels: test-stability > > We experienced a [build > failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30783&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7&l=15997] > in > {{ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers}}. > The test halted when waiting for the next leader in > [ZooKeeperMultipleComponentLeaderElectionDriverTest:256|https://github.com/apache/flink/blob/e8742f7f5cac34852d0e621036e1614bbdfe8ec3/flink-runtime/src/test/java/org/apache/flink/runtime/leaderelection/ZooKeeperMultipleComponentLeaderElectionDriverTest.java#L256] > {code} > Feb 04 18:02:54 "main" #1 prio=5 os_prio=0 tid=0x7fab0800b800 nid=0xe07 > waiting on condition [0x7fab12574000] > Feb 04 18:02:54java.lang.Thread.State: WAITING (parking) > Feb 04 18:02:54 at sun.misc.Unsafe.park(Native Method) > Feb 04 18:02:54 - parking to wait for <0x8065c5c8> (a > java.util.concurrent.CompletableFuture$Signaller) > Feb 04 18:02:54 at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > Feb 04 18:02:54 at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707) > Feb 04 18:02:54 at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > Feb 04 18:02:54 at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742) > Feb 04 18:02:54 at > java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947) > Feb 04 18:02:54 at > org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers(ZooKeeperMultipleComponentLeaderElectionDriverTest.java:256) > [...] > {code} > The extended Maven logs indicate that the timeout happened while waiting for > the second leader to be selected. > {code} > Test > org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers > is running. > > 17:15:10,437 [ Thread-16] INFO > org.apache.curator.test.TestingZooKeeperMain [] - Starting > server > 17:15:10,450 [main] INFO > org.apache.flink.runtime.util.ZooKeeperUtils [] - Enforcing > default ACL for ZK connections > 17:15:10,451 [main] INFO > org.apache.flink.runtime.util.ZooKeeperUtils [] - Using > '/flink/default' as Zookeeper namespace. > 17:15:10,452 [main] INFO > org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl > [] - Starting > 17:15:10,455 [main] INFO > org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl > [] - Default schema > 17:15:10,462 [main-EventThread] INFO > org.apache.flink.shaded.curator5.org.apache.curator.framework.state.ConnectionStateManager > [] - State change: CONNECTED > 17:15:10,467 [main-EventThread] INFO > org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker > [] - New config event received: {} > 17:15:10,482 [Curator-ConnectionStateManager-0] DEBUG > org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver > [] - Connected to ZooKeeper quorum. Leader election can start. > 17:15:10,483 [Curator-ConnectionStateManager-0] DEBUG > org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver > [] - Connected to ZooKeeper quorum. Leader election can start. > 17:15:10,483 [Curator-ConnectionStateManager-0] DEBUG > org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver > [] - Connected to ZooKeeper quorum. Leader election can start. > 17:15:10,484 [main-EventThread] INFO > org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker > [] - New config event received: {} > 17:15:10,562 [main-Event
[jira] [Closed] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure
[ https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-26036. - Resolution: Fixed Fixed via 314f91df7019bf1ce1ede52b8ba7f2d0a1c44aac > LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > timeout on azure > --- > > Key: FLINK-26036 > URL: https://issues.apache.org/jira/browse/FLINK-26036 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.15.0 >Reporter: Yun Gao >Assignee: Till Rohrmann >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.15.0 > > > {code:java} > 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > Time elapsed: 62.252 s <<< ERROR! > 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 > java.util.concurrent.TimeoutException > 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) > 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) > 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14 at > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115) > 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14 at > java.lang.reflect.Method.invoke(Method.java:498) > 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14 at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725) > 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) > 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214) > 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14 at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210) > 2022-02-09T02:18:17.1845603Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135
[jira] [Commented] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure
[ https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490730#comment-17490730 ] Till Rohrmann commented on FLINK-26036: --- Another instance w/o the latest fix: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31211&view=logs&j=5c8e7682-d68f-54d1-16a2-a09310218a49&t=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba&l=22954 > LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > timeout on azure > --- > > Key: FLINK-26036 > URL: https://issues.apache.org/jira/browse/FLINK-26036 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.15.0 >Reporter: Yun Gao >Assignee: Till Rohrmann >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.15.0 > > > {code:java} > 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > Time elapsed: 62.252 s <<< ERROR! > 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 > java.util.concurrent.TimeoutException > 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) > 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) > 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14 at > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115) > 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14 at > java.lang.reflect.Method.invoke(Method.java:498) > 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14 at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725) > 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) > 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214) > 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14 at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestM
[jira] [Commented] (FLINK-24119) KafkaITCase.testTimestamps fails due to "Topic xxx already exist"
[ https://issues.apache.org/jira/browse/FLINK-24119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490731#comment-17490731 ] Till Rohrmann commented on FLINK-24119: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31211&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=15a22db7-8faa-5b34-3920-d33c9f0ca23c&l=35896 > KafkaITCase.testTimestamps fails due to "Topic xxx already exist" > - > > Key: FLINK-24119 > URL: https://issues.apache.org/jira/browse/FLINK-24119 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.14.0 >Reporter: Xintong Song >Assignee: Fabian Paul >Priority: Critical > Labels: test-stability > Fix For: 1.15.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23328&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=15a22db7-8faa-5b34-3920-d33c9f0ca23c&l=7419 > {code} > Sep 01 15:53:20 [ERROR] Tests run: 23, Failures: 1, Errors: 0, Skipped: 0, > Time elapsed: 162.65 s <<< FAILURE! - in > org.apache.flink.streaming.connectors.kafka.KafkaITCase > Sep 01 15:53:20 [ERROR] testTimestamps Time elapsed: 23.237 s <<< FAILURE! > Sep 01 15:53:20 java.lang.AssertionError: Create test topic : tstopic failed, > org.apache.kafka.common.errors.TopicExistsException: Topic 'tstopic' already > exists. > Sep 01 15:53:20 at org.junit.Assert.fail(Assert.java:89) > Sep 01 15:53:20 at > org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl.createTestTopic(KafkaTestEnvironmentImpl.java:226) > Sep 01 15:53:20 at > org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironment.createTestTopic(KafkaTestEnvironment.java:112) > Sep 01 15:53:20 at > org.apache.flink.streaming.connectors.kafka.KafkaTestBase.createTestTopic(KafkaTestBase.java:212) > Sep 01 15:53:20 at > org.apache.flink.streaming.connectors.kafka.KafkaITCase.testTimestamps(KafkaITCase.java:191) > Sep 01 15:53:20 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > Sep 01 15:53:20 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Sep 01 15:53:20 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Sep 01 15:53:20 at java.lang.reflect.Method.invoke(Method.java:498) > Sep 01 15:53:20 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > Sep 01 15:53:20 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > Sep 01 15:53:20 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > Sep 01 15:53:20 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > Sep 01 15:53:20 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > Sep 01 15:53:20 at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > Sep 01 15:53:20 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 01 15:53:20 at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-26081) Update document,becasue of add maven wrapper.
[ https://issues.apache.org/jira/browse/FLINK-26081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490726#comment-17490726 ] Till Rohrmann commented on FLINK-26081: --- Sure, I've assigned you. > Update document,becasue of add maven wrapper. > - > > Key: FLINK-26081 > URL: https://issues.apache.org/jira/browse/FLINK-26081 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.15.0 >Reporter: Aiden Gong >Assignee: Aiden Gong >Priority: Minor > Fix For: 1.15.0 > > > Update document,becasue of add maven wapper. > Related files: README.md、project-configuration.md、building.md、ide_setup.md. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (FLINK-26081) Update document,becasue of add maven wrapper.
[ https://issues.apache.org/jira/browse/FLINK-26081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-26081: - Assignee: Aiden Gong > Update document,becasue of add maven wrapper. > - > > Key: FLINK-26081 > URL: https://issues.apache.org/jira/browse/FLINK-26081 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.15.0 >Reporter: Aiden Gong >Assignee: Aiden Gong >Priority: Minor > Fix For: 1.15.0 > > > Update document,becasue of add maven wapper. > Related files: README.md、project-configuration.md、building.md、ide_setup.md. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (FLINK-25898) Add README.md to flink-statefun/statefun-sdk-js
[ https://issues.apache.org/jira/browse/FLINK-25898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-25898: - Assignee: Till Rohrmann > Add README.md to flink-statefun/statefun-sdk-js > --- > > Key: FLINK-25898 > URL: https://issues.apache.org/jira/browse/FLINK-25898 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Affects Versions: statefun-3.2.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Major > > We should add a {{README.md}} to {{flink-statefun/statefun-sdk-js}}. This > would then also be displayed on > [npmjs.com|https://www.npmjs.com/package/apache-flink-statefun]. > Unfortunately, in order to publish the {{README.md}} we have to release a new > version of the npm package. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (FLINK-26034) Add maven wrapper for flink
[ https://issues.apache.org/jira/browse/FLINK-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-26034. - Resolution: Fixed Fixed via cc5616d1f1a3566f8a4dbb7a016251cf0f34b47a > Add maven wrapper for flink > --- > > Key: FLINK-26034 > URL: https://issues.apache.org/jira/browse/FLINK-26034 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.15.0 >Reporter: Aiden Gong >Assignee: Aiden Gong >Priority: Minor > Labels: pull-request-available > Fix For: 1.15.0 > > > Idea just support this feature now. It is very helpful for contributors. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure
[ https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490351#comment-17490351 ] Till Rohrmann commented on FLINK-26036: --- I think the solution is to move the {{isRunning()}} check into the {{freeSlotInternal}} method. That way we should guarantee that we don't free slots while shutting down. > LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > timeout on azure > --- > > Key: FLINK-26036 > URL: https://issues.apache.org/jira/browse/FLINK-26036 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.15.0 >Reporter: Yun Gao >Assignee: Till Rohrmann >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.15.0 > > > {code:java} > 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > Time elapsed: 62.252 s <<< ERROR! > 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 > java.util.concurrent.TimeoutException > 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) > 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) > 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14 at > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115) > 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14 at > java.lang.reflect.Method.invoke(Method.java:498) > 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14 at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725) > 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) > 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214) > 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14 at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:21
[jira] [Commented] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure
[ https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490349#comment-17490349 ] Till Rohrmann commented on FLINK-26036: --- The problem seems to be the following: When shutting down the {{TaskExecutor}}, then we fail all running tasks and close the {{TaskSlotTableImpl}}. If now a terminal state of a {{Task}} is processed, then this removes the {{Task}} from the {{TaskSlotTableImpl}}. This in turn triggers the {{TaskExecutor.freeSlotInternal}} via the {{SlotActions.freeSlot}} call. > LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > timeout on azure > --- > > Key: FLINK-26036 > URL: https://issues.apache.org/jira/browse/FLINK-26036 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.15.0 >Reporter: Yun Gao >Assignee: Till Rohrmann >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.15.0 > > > {code:java} > 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > Time elapsed: 62.252 s <<< ERROR! > 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 > java.util.concurrent.TimeoutException > 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) > 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) > 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14 at > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115) > 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14 at > java.lang.reflect.Method.invoke(Method.java:498) > 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14 at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725) > 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) > 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214) > 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14 at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute
[jira] [Commented] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure
[ https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490131#comment-17490131 ] Till Rohrmann commented on FLINK-26036: --- Thanks for the pointer [~gaoyunhaii]. I will take another look at the problem. I assume that there is still somehow a call that triggers {{TaskExectuor.freeSlotInternal}}. > LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > timeout on azure > --- > > Key: FLINK-26036 > URL: https://issues.apache.org/jira/browse/FLINK-26036 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.15.0 >Reporter: Yun Gao >Assignee: Till Rohrmann >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.15.0 > > > {code:java} > 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > Time elapsed: 62.252 s <<< ERROR! > 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 > java.util.concurrent.TimeoutException > 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) > 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) > 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14 at > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115) > 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14 at > java.lang.reflect.Method.invoke(Method.java:498) > 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14 at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725) > 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) > 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214) > 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14 at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:2
[jira] [Closed] (FLINK-25975) add doc for how to use AvroParquetRecordFormat
[ https://issues.apache.org/jira/browse/FLINK-25975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-25975. - Resolution: Fixed Fixed via 222b72f2de43bb60b2f4741fde023493967d20f7 > add doc for how to use AvroParquetRecordFormat > -- > > Key: FLINK-25975 > URL: https://issues.apache.org/jira/browse/FLINK-25975 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.15.0 >Reporter: Jing Ge >Assignee: Jing Ge >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/formats/parquet/ -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure
[ https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-26036. - Resolution: Fixed Fixed via f839f12e2e425d8c32279fb505a8c624bbc2a266 > LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > timeout on azure > --- > > Key: FLINK-26036 > URL: https://issues.apache.org/jira/browse/FLINK-26036 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.15.0 >Reporter: Yun Gao >Assignee: Till Rohrmann >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.15.0 > > > {code:java} > 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > Time elapsed: 62.252 s <<< ERROR! > 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 > java.util.concurrent.TimeoutException > 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) > 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) > 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14 at > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115) > 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14 at > java.lang.reflect.Method.invoke(Method.java:498) > 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14 at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725) > 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) > 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214) > 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14 at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210) > 2022-02-09T02:18:17.1845603Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135
[jira] [Assigned] (FLINK-26034) Add maven wrapper for flink
[ https://issues.apache.org/jira/browse/FLINK-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-26034: - Assignee: Aiden Gong > Add maven wrapper for flink > --- > > Key: FLINK-26034 > URL: https://issues.apache.org/jira/browse/FLINK-26034 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.15.0 >Reporter: Aiden Gong >Assignee: Aiden Gong >Priority: Minor > Fix For: 1.15.0 > > > Idea just support this feature now. It is very helpful for contributors. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (FLINK-26032) log job info in the ContextEnvironment
[ https://issues.apache.org/jira/browse/FLINK-26032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-26032. - Fix Version/s: 1.15.0 Resolution: Fixed Fixed via 8ce9632849b6704a57e3836043cdd1e9dab39c15 > log job info in the ContextEnvironment > -- > > Key: FLINK-26032 > URL: https://issues.apache.org/jira/browse/FLINK-26032 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission >Reporter: Jing Ge >Assignee: Jing Ge >Priority: Minor > Labels: pull-request-available > Fix For: 1.15.0 > > > In Flink codebase, the {{org.apache.flink.client.program.ContextEnvironment}} > has a static > [Logger|https://github.com/apache/flink/blob/7f9587c723057e2b6cbaf748181c8c80a7f6703d/flink-clients/src/main/java/org/apache/flink/client/program/ContextEnvironment.java#L47] > defined in the class, however, it doesn’t use it to print any logs. Instead, > it prints logs with {{System.out}} and passes the Logger to > {{ShutdownHookUtil.addShutdownHook}} and > {{jobExecutionResultFuture.whenComplete}} for logging any hook errors. If > customer integrated the CLI (‘FlinkYarnSessionCli’ in their case) into a > multi-threaded program to submit jobs in parallel, does it lead to any logs > missing/override/disorder problems? > It is always helpful to log the status information during the job submit > process. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure
[ https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489424#comment-17489424 ] Till Rohrmann commented on FLINK-26036: --- The problem seems to be the shutdown hook introduced with FLINK-25709. In some cases, the shutdown hook manages to delete the {{slotAllocationSnapshots}} so that the newly restarted process cannot recover. > LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > timeout on azure > --- > > Key: FLINK-26036 > URL: https://issues.apache.org/jira/browse/FLINK-26036 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.15.0 >Reporter: Yun Gao >Assignee: Till Rohrmann >Priority: Critical > Labels: test-stability > Fix For: 1.15.0 > > > {code:java} > 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > Time elapsed: 62.252 s <<< ERROR! > 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 > java.util.concurrent.TimeoutException > 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) > 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) > 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14 at > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115) > 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14 at > java.lang.reflect.Method.invoke(Method.java:498) > 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14 at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725) > 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) > 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214) > 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14 at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescrip
[jira] [Commented] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure
[ https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489415#comment-17489415 ] Till Rohrmann commented on FLINK-26036: --- The problem seems to be that a restarted {{TaskManager}} process finds an empty {{slotAllocationSnapshots}} directory where actually the previously allocated slot snapshots should be contained. > LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > timeout on azure > --- > > Key: FLINK-26036 > URL: https://issues.apache.org/jira/browse/FLINK-26036 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.15.0 >Reporter: Yun Gao >Assignee: Till Rohrmann >Priority: Critical > Labels: test-stability > Fix For: 1.15.0 > > > {code:java} > 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > Time elapsed: 62.252 s <<< ERROR! > 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 > java.util.concurrent.TimeoutException > 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) > 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) > 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14 at > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115) > 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14 at > java.lang.reflect.Method.invoke(Method.java:498) > 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14 at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725) > 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) > 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214) > 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14 at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210
[jira] [Updated] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure
[ https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-26036: -- Fix Version/s: 1.15.0 > LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > timeout on azure > --- > > Key: FLINK-26036 > URL: https://issues.apache.org/jira/browse/FLINK-26036 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.15.0 >Reporter: Yun Gao >Assignee: Till Rohrmann >Priority: Critical > Labels: test-stability > Fix For: 1.15.0 > > > {code:java} > 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > Time elapsed: 62.252 s <<< ERROR! > 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 > java.util.concurrent.TimeoutException > 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) > 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) > 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14 at > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115) > 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14 at > java.lang.reflect.Method.invoke(Method.java:498) > 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14 at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725) > 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) > 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214) > 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14 at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210) > 2022-02-09T02:18:17.1845603Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135) > 2022-02-09T02:18:17.1846375Z Feb 09 02:18:14 at > org.junit.jupi
[jira] [Assigned] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure
[ https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-26036: - Assignee: Till Rohrmann > LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > timeout on azure > --- > > Key: FLINK-26036 > URL: https://issues.apache.org/jira/browse/FLINK-26036 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.15.0 >Reporter: Yun Gao >Assignee: Till Rohrmann >Priority: Critical > Labels: test-stability > > {code:java} > 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory > Time elapsed: 62.252 s <<< ERROR! > 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 > java.util.concurrent.TimeoutException > 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) > 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14 at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) > 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14 at > org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115) > 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14 at > java.lang.reflect.Method.invoke(Method.java:498) > 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14 at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725) > 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84) > 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14 at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214) > 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14 at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210) > 2022-02-09T02:18:17.1845603Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135) > 2022-02-09T02:18:17.1846375Z Feb 09 02:18:14 at > org.junit.jupiter.engine.descriptor.T
[jira] [Commented] (FLINK-26034) Add maven wrapper for flink
[ https://issues.apache.org/jira/browse/FLINK-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489364#comment-17489364 ] Till Rohrmann commented on FLINK-26034: --- This could indeed be helpful. Especially since we rely on maven 3.2.5. I'd be fine with adding something like this. > Add maven wrapper for flink > --- > > Key: FLINK-26034 > URL: https://issues.apache.org/jira/browse/FLINK-26034 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.15.0 >Reporter: Aiden Gong >Priority: Minor > Fix For: 1.15.0 > > > Idea just support this feature now. It is very helpful for contributors. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-26034) Add maven wrapper for flink
[ https://issues.apache.org/jira/browse/FLINK-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-26034: -- Summary: Add maven wrapper for flink (was: Add maven wapper for flink) > Add maven wrapper for flink > --- > > Key: FLINK-26034 > URL: https://issues.apache.org/jira/browse/FLINK-26034 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.15.0 >Reporter: Aiden Gong >Priority: Minor > Fix For: 1.15.0 > > > Idea just support this feature now. It is very helpful for contributors. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (FLINK-25933) Allow configuring different transports in RequestReplyFunctionBuilder
[ https://issues.apache.org/jira/browse/FLINK-25933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-25933: - Assignee: Galen Warren > Allow configuring different transports in RequestReplyFunctionBuilder > - > > Key: FLINK-25933 > URL: https://issues.apache.org/jira/browse/FLINK-25933 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Igal Shilman >Assignee: Galen Warren >Priority: Major > > Currently it is not possible to configure the type of the transport used > while using the data stream integration. > It would be useful to do so. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (FLINK-26003) Use Jackson serialization for persisting TaskExecutor state to working directory
[ https://issues.apache.org/jira/browse/FLINK-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-26003: - Assignee: (was: Till Rohrmann) > Use Jackson serialization for persisting TaskExecutor state to working > directory > > > Key: FLINK-26003 > URL: https://issues.apache.org/jira/browse/FLINK-26003 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.15.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.15.0 > > > In order to avoid Java serialization, we should use a different serialization > format for persisting {{TaskExecutor}} state in the working directory. One > idea could be to use Jackson for the serialization. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-23240) ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure
[ https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489003#comment-17489003 ] Till Rohrmann commented on FLINK-23240: --- It might be a problem for standalone setups where the JobManager process won't be automatically restarted. Moreover, it might lead to unnecessary restarts that might be confusing to people. > ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper > fails on azure > - > > Key: FLINK-23240 > URL: https://issues.apache.org/jira/browse/FLINK-23240 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.14.0, 1.15.0 >Reporter: Xintong Song >Priority: Blocker > Labels: test-stability > Fix For: 1.15.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186 > {code} > Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, > Time elapsed: 91.407 s <<< FAILURE! - in > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase > Jul 04 22:17:29 [ERROR] > testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase) > Time elapsed: 31.356 s <<< ERROR! > Jul 04 22:17:29 java.util.concurrent.ExecutionException: > java.util.concurrent.TimeoutException: Invocation of public abstract > java.util.concurrent.CompletableFuture > org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time) > timed out. > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Jul 04 22:17:29 at > java.base/java.lang.reflect.Method.invoke(Method.java:566) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > Jul 04 22:17:29 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > Jul 04 22:17:29 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > Jul 04 22:17:29 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > Jul 04 22:17:29 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > Jul 04 22:17:29 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > Jul 04 22:17:29 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > Jul 04 2
[jira] [Commented] (FLINK-26005) TableEnvironment.createTemporarySystemFunction cause NPE when using leftOuterLateralJoin
[ https://issues.apache.org/jira/browse/FLINK-26005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489002#comment-17489002 ] Till Rohrmann commented on FLINK-26005: --- I think [~twalthr] made it a subtask. > TableEnvironment.createTemporarySystemFunction cause NPE when using > leftOuterLateralJoin > > > Key: FLINK-26005 > URL: https://issues.apache.org/jira/browse/FLINK-26005 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Runtime >Affects Versions: 1.15.0, 1.14.3 >Reporter: Till Rohrmann >Priority: Major > > When trying out the {{Table.leftOuterLateralJoin}} with a table function that > was registered via {{TableEnvironment.createTemporarySystemFunction}} the > system failed with > {code} > Exception in thread "main" java.lang.NullPointerException > at > org.apache.calcite.tools.RelBuilder$Frame.(RelBuilder.java:3332) > at > org.apache.calcite.tools.RelBuilder$Frame.(RelBuilder.java:3317) > at org.apache.calcite.tools.RelBuilder.push(RelBuilder.java:282) > at > org.apache.calcite.tools.RelBuilder.functionScan(RelBuilder.java:1197) > at > org.apache.flink.table.planner.plan.QueryOperationConverter$SingleRelVisitor.visit(QueryOperationConverter.java:309) > at > org.apache.flink.table.planner.plan.QueryOperationConverter$SingleRelVisitor.visit(QueryOperationConverter.java:154) > at > org.apache.flink.table.operations.CalculatedQueryOperation.accept(CalculatedQueryOperation.java:94) > at > org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:151) > at > org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:133) > at > org.apache.flink.table.operations.utils.QueryOperationDefaultVisitor.visit(QueryOperationDefaultVisitor.java:87) > at > org.apache.flink.table.operations.CalculatedQueryOperation.accept(CalculatedQueryOperation.java:94) > at > org.apache.flink.table.planner.plan.QueryOperationConverter.lambda$defaultMethod$0(QueryOperationConverter.java:150) > at java.base/java.util.Arrays$ArrayList.forEach(Arrays.java:4390) > at > org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:150) > at > org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:133) > at > org.apache.flink.table.operations.utils.QueryOperationDefaultVisitor.visit(QueryOperationDefaultVisitor.java:62) > at > org.apache.flink.table.operations.JoinQueryOperation.accept(JoinQueryOperation.java:115) > at > org.apache.flink.table.planner.plan.QueryOperationConverter.lambda$defaultMethod$0(QueryOperationConverter.java:150) > at > java.base/java.util.Collections$SingletonList.forEach(Collections.java:4854) > at > org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:150) > at > org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:133) > at > org.apache.flink.table.operations.utils.QueryOperationDefaultVisitor.visit(QueryOperationDefaultVisitor.java:47) > at > org.apache.flink.table.operations.ProjectQueryOperation.accept(ProjectQueryOperation.java:76) > at > org.apache.flink.table.planner.calcite.FlinkRelBuilder.queryOperation(FlinkRelBuilder.scala:184) > at > org.apache.flink.table.planner.delegation.PlannerBase.translateToRel(PlannerBase.scala:214) > at > org.apache.flink.table.planner.delegation.PlannerBase.$anonfun$translate$1(PlannerBase.scala:182) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) > at scala.collection.Iterator.foreach(Iterator.scala:937) > at scala.collection.Iterator.foreach$(Iterator.scala:937) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1425) > at scala.collection.IterableLike.foreach(IterableLike.scala:70) > at scala.collection.IterableLike.foreach$(IterableLike.scala:69) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike.map(TraversableLike.scala:233) > at scala.collection.TraversableLike.map$(TraversableLike.scala:226) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:182) > at > org.apache.flink.table.api.internal.TableEnvironmentImpl.translate(TableEnvironmentImpl.java:1665) > at > org.apache.flink.table.api.internal.TableEnvironmentImpl.executeQueryOperation(TableEnvironmentIm
[jira] [Created] (FLINK-26014) Document how to use the working directory for faster local recoveries
Till Rohrmann created FLINK-26014: - Summary: Document how to use the working directory for faster local recoveries Key: FLINK-26014 URL: https://issues.apache.org/jira/browse/FLINK-26014 Project: Flink Issue Type: Improvement Components: Runtime / Coordination Affects Versions: 1.15.0 Reporter: Till Rohrmann Assignee: Till Rohrmann Fix For: 1.15.0 After having implemented FLIP-198 and FLIP-201, users can now use faster TaskManager failover when using local recovery with persisted volumes. I suggest to add documentation for explaining how to configure Flink to make use of it. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (FLINK-25817) FLIP-201: Persist local state in working directory
[ https://issues.apache.org/jira/browse/FLINK-25817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-25817. - Fix Version/s: 1.15.0 Resolution: Fixed Fixed via 4b39e4ad438 f95a0c7e867 1a1415119f9 45c661e02de d63c48dded3 577ef90710c aa78915f58c 990589c30a7 2ff1a18744e 00f2c627492 > FLIP-201: Persist local state in working directory > -- > > Key: FLINK-25817 > URL: https://issues.apache.org/jira/browse/FLINK-25817 > Project: Flink > Issue Type: New Feature > Components: Runtime / Coordination >Affects Versions: 1.15.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > This issue is the umbrella ticket for > [FLIP-201|https://cwiki.apache.org/confluence/x/wJuqCw] which aims at adding > support for persisting local state in Flink's working directory. This would > enable Flink in certain scenarios to recover locally even in case of process > failures. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26005) TableEnvironment.createTemporarySystemFunction cause NPE when using leftOuterLateralJoin
Till Rohrmann created FLINK-26005: - Summary: TableEnvironment.createTemporarySystemFunction cause NPE when using leftOuterLateralJoin Key: FLINK-26005 URL: https://issues.apache.org/jira/browse/FLINK-26005 Project: Flink Issue Type: Bug Components: Table SQL / Runtime Affects Versions: 1.14.3, 1.15.0 Reporter: Till Rohrmann When trying out the {{Table.leftOuterLateralJoin}} with a table function that was registered via {{TableEnvironment.createTemporarySystemFunction}} the system failed with {code} Exception in thread "main" java.lang.NullPointerException at org.apache.calcite.tools.RelBuilder$Frame.(RelBuilder.java:3332) at org.apache.calcite.tools.RelBuilder$Frame.(RelBuilder.java:3317) at org.apache.calcite.tools.RelBuilder.push(RelBuilder.java:282) at org.apache.calcite.tools.RelBuilder.functionScan(RelBuilder.java:1197) at org.apache.flink.table.planner.plan.QueryOperationConverter$SingleRelVisitor.visit(QueryOperationConverter.java:309) at org.apache.flink.table.planner.plan.QueryOperationConverter$SingleRelVisitor.visit(QueryOperationConverter.java:154) at org.apache.flink.table.operations.CalculatedQueryOperation.accept(CalculatedQueryOperation.java:94) at org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:151) at org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:133) at org.apache.flink.table.operations.utils.QueryOperationDefaultVisitor.visit(QueryOperationDefaultVisitor.java:87) at org.apache.flink.table.operations.CalculatedQueryOperation.accept(CalculatedQueryOperation.java:94) at org.apache.flink.table.planner.plan.QueryOperationConverter.lambda$defaultMethod$0(QueryOperationConverter.java:150) at java.base/java.util.Arrays$ArrayList.forEach(Arrays.java:4390) at org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:150) at org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:133) at org.apache.flink.table.operations.utils.QueryOperationDefaultVisitor.visit(QueryOperationDefaultVisitor.java:62) at org.apache.flink.table.operations.JoinQueryOperation.accept(JoinQueryOperation.java:115) at org.apache.flink.table.planner.plan.QueryOperationConverter.lambda$defaultMethod$0(QueryOperationConverter.java:150) at java.base/java.util.Collections$SingletonList.forEach(Collections.java:4854) at org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:150) at org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:133) at org.apache.flink.table.operations.utils.QueryOperationDefaultVisitor.visit(QueryOperationDefaultVisitor.java:47) at org.apache.flink.table.operations.ProjectQueryOperation.accept(ProjectQueryOperation.java:76) at org.apache.flink.table.planner.calcite.FlinkRelBuilder.queryOperation(FlinkRelBuilder.scala:184) at org.apache.flink.table.planner.delegation.PlannerBase.translateToRel(PlannerBase.scala:214) at org.apache.flink.table.planner.delegation.PlannerBase.$anonfun$translate$1(PlannerBase.scala:182) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) at scala.collection.Iterator.foreach(Iterator.scala:937) at scala.collection.Iterator.foreach$(Iterator.scala:937) at scala.collection.AbstractIterator.foreach(Iterator.scala:1425) at scala.collection.IterableLike.foreach(IterableLike.scala:70) at scala.collection.IterableLike.foreach$(IterableLike.scala:69) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike.map(TraversableLike.scala:233) at scala.collection.TraversableLike.map$(TraversableLike.scala:226) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:182) at org.apache.flink.table.api.internal.TableEnvironmentImpl.translate(TableEnvironmentImpl.java:1665) at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeQueryOperation(TableEnvironmentImpl.java:805) at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1274) at org.apache.flink.table.api.internal.TableImpl.execute(TableImpl.java:601) {code} Interestingly, when using the deprecated {{TableEnvironment.registerFunction}} it worked. Timo mentioned that this could be caused by a missing integration into the new type syst
[jira] [Created] (FLINK-26003) Use Jackson serialization for persisting TaskExecutor state to working directory
Till Rohrmann created FLINK-26003: - Summary: Use Jackson serialization for persisting TaskExecutor state to working directory Key: FLINK-26003 URL: https://issues.apache.org/jira/browse/FLINK-26003 Project: Flink Issue Type: Improvement Components: Runtime / Coordination Affects Versions: 1.15.0 Reporter: Till Rohrmann Assignee: Till Rohrmann Fix For: 1.15.0 In order to avoid Java serialization, we should use a different serialization format for persisting {{TaskExecutor}} state in the working directory. One idea could be to use Jackson for the serialization. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (FLINK-23240) ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure
[ https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488663#comment-17488663 ] Till Rohrmann edited comment on FLINK-23240 at 2/8/22, 8:40 AM: The problem seems to be that the {{ResourceManagerServiceImpl}} closes itself if it loses leadership. Therefore, the {{MiniCluster}} shuts down and the test hangs. It looks to me as if the {{ResourceManagerServiceImpl}} can only survive a single leadership session unless the property {{flink.tests.enable-rm-multi-leader-session}} is set ([code|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManagerServiceImpl.java#L220-L222]). If this is correct, then we might have a cluster instability because every RM leadership loss would trigger an unexpected termination of the {{ResourceManager}} which will terminate the process. [~xtsong] have I understood the code correctly? was (Author: till.rohrmann): The problem seems to be that the {{ResourceManagerServiceImpl}} closes itself if it loses leadership. Therefore, the {{MiniCluster}} shuts down and the test hangs. It looks to me as if the {{ResourceManagerServiceImpl}} can only survive a single leadership session unless the property {{flink.tests.enable-rm-multi-leader-session}} is set. If this is correct, then we might have a cluster instability because every RM leadership loss would trigger an unexpected termination of the {{ResourceManager}} which will terminate the process. [~xtsong] have I understood the code correctly? > ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper > fails on azure > - > > Key: FLINK-23240 > URL: https://issues.apache.org/jira/browse/FLINK-23240 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / Coordination >Affects Versions: 1.14.0, 1.15.0 >Reporter: Xintong Song >Priority: Blocker > Labels: test-stability > Fix For: 1.15.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186 > {code} > Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, > Time elapsed: 91.407 s <<< FAILURE! - in > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase > Jul 04 22:17:29 [ERROR] > testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase) > Time elapsed: 31.356 s <<< ERROR! > Jul 04 22:17:29 java.util.concurrent.ExecutionException: > java.util.concurrent.TimeoutException: Invocation of public abstract > java.util.concurrent.CompletableFuture > org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time) > timed out. > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Jul 04 22:17:29 at > java.base/java.lang.reflect.Method.invoke(Method.java:566) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > Jul 04 22:17:29 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > Jul 04 22:17:29 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > Jul 04 22:17:29 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > Jul 04 22:17
[jira] [Commented] (FLINK-23240) ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure
[ https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488665#comment-17488665 ] Till Rohrmann commented on FLINK-23240: --- cc [~dmvk] > ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper > fails on azure > - > > Key: FLINK-23240 > URL: https://issues.apache.org/jira/browse/FLINK-23240 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / Coordination >Affects Versions: 1.14.0, 1.15.0 >Reporter: Xintong Song >Priority: Blocker > Labels: test-stability > Fix For: 1.15.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186 > {code} > Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, > Time elapsed: 91.407 s <<< FAILURE! - in > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase > Jul 04 22:17:29 [ERROR] > testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase) > Time elapsed: 31.356 s <<< ERROR! > Jul 04 22:17:29 java.util.concurrent.ExecutionException: > java.util.concurrent.TimeoutException: Invocation of public abstract > java.util.concurrent.CompletableFuture > org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time) > timed out. > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Jul 04 22:17:29 at > java.base/java.lang.reflect.Method.invoke(Method.java:566) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > Jul 04 22:17:29 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > Jul 04 22:17:29 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > Jul 04 22:17:29 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > Jul 04 22:17:29 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > Jul 04 22:17:29 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > Jul 04 22:17:29 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.run(Parent
[jira] [Updated] (FLINK-23240) ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure
[ https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-23240: -- Affects Version/s: 1.15.0 > ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper > fails on azure > - > > Key: FLINK-23240 > URL: https://issues.apache.org/jira/browse/FLINK-23240 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / Coordination >Affects Versions: 1.14.0, 1.15.0 >Reporter: Xintong Song >Priority: Blocker > Labels: test-stability > Fix For: 1.15.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186 > {code} > Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, > Time elapsed: 91.407 s <<< FAILURE! - in > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase > Jul 04 22:17:29 [ERROR] > testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase) > Time elapsed: 31.356 s <<< ERROR! > Jul 04 22:17:29 java.util.concurrent.ExecutionException: > java.util.concurrent.TimeoutException: Invocation of public abstract > java.util.concurrent.CompletableFuture > org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time) > timed out. > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Jul 04 22:17:29 at > java.base/java.lang.reflect.Method.invoke(Method.java:566) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > Jul 04 22:17:29 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > Jul 04 22:17:29 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > Jul 04 22:17:29 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > Jul 04 22:17:29 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > Jul 04 22:17:29 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > Jul 04 22:17:29 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.run(ParentRunner.java:413) > Jul 04 22:17:29
[jira] [Commented] (FLINK-23240) ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure
[ https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488664#comment-17488664 ] Till Rohrmann commented on FLINK-23240: --- I am promoting this issue to be a blocker until we have invalidated my suspicion. > ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper > fails on azure > - > > Key: FLINK-23240 > URL: https://issues.apache.org/jira/browse/FLINK-23240 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / Coordination >Affects Versions: 1.14.0 >Reporter: Xintong Song >Priority: Blocker > Labels: test-stability > Fix For: 1.15.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186 > {code} > Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, > Time elapsed: 91.407 s <<< FAILURE! - in > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase > Jul 04 22:17:29 [ERROR] > testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase) > Time elapsed: 31.356 s <<< ERROR! > Jul 04 22:17:29 java.util.concurrent.ExecutionException: > java.util.concurrent.TimeoutException: Invocation of public abstract > java.util.concurrent.CompletableFuture > org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time) > timed out. > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Jul 04 22:17:29 at > java.base/java.lang.reflect.Method.invoke(Method.java:566) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > Jul 04 22:17:29 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > Jul 04 22:17:29 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > Jul 04 22:17:29 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > Jul 04 22:17:29 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > Jul 04 22:17:29 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > Jul 04 22:17:29 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jul 0
[jira] [Commented] (FLINK-23240) ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure
[ https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488663#comment-17488663 ] Till Rohrmann commented on FLINK-23240: --- The problem seems to be that the {{ResourceManagerServiceImpl}} closes itself if it loses leadership. Therefore, the {{MiniCluster}} shuts down and the test hangs. It looks to me as if the {{ResourceManagerServiceImpl}} can only survive a single leadership session unless the property {{flink.tests.enable-rm-multi-leader-session}} is set. If this is correct, then we might have a cluster instability because every RM leadership loss would trigger an unexpected termination of the {{ResourceManager}} which will terminate the process. [~xtsong] have I understood the code correctly? > ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper > fails on azure > - > > Key: FLINK-23240 > URL: https://issues.apache.org/jira/browse/FLINK-23240 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / Coordination >Affects Versions: 1.14.0 >Reporter: Xintong Song >Priority: Critical > Labels: test-stability > Fix For: 1.15.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186 > {code} > Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, > Time elapsed: 91.407 s <<< FAILURE! - in > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase > Jul 04 22:17:29 [ERROR] > testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase) > Time elapsed: 31.356 s <<< ERROR! > Jul 04 22:17:29 java.util.concurrent.ExecutionException: > java.util.concurrent.TimeoutException: Invocation of public abstract > java.util.concurrent.CompletableFuture > org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time) > timed out. > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Jul 04 22:17:29 at > java.base/java.lang.reflect.Method.invoke(Method.java:566) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > Jul 04 22:17:29 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > Jul 04 22:17:29 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > Jul 04 22:17:29 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > Jul 04 22:17:29 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > Jul 04 22:17:29 at > org.junit.runners.ParentRun
[jira] [Updated] (FLINK-23240) ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure
[ https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-23240: -- Priority: Blocker (was: Critical) > ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper > fails on azure > - > > Key: FLINK-23240 > URL: https://issues.apache.org/jira/browse/FLINK-23240 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / Coordination >Affects Versions: 1.14.0 >Reporter: Xintong Song >Priority: Blocker > Labels: test-stability > Fix For: 1.15.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186 > {code} > Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, > Time elapsed: 91.407 s <<< FAILURE! - in > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase > Jul 04 22:17:29 [ERROR] > testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase) > Time elapsed: 31.356 s <<< ERROR! > Jul 04 22:17:29 java.util.concurrent.ExecutionException: > java.util.concurrent.TimeoutException: Invocation of public abstract > java.util.concurrent.CompletableFuture > org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time) > timed out. > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Jul 04 22:17:29 at > java.base/java.lang.reflect.Method.invoke(Method.java:566) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > Jul 04 22:17:29 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > Jul 04 22:17:29 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > Jul 04 22:17:29 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > Jul 04 22:17:29 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > Jul 04 22:17:29 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > Jul 04 22:17:29 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.run(ParentRunner.java:413) > Jul 04 22:17:29
[jira] [Assigned] (FLINK-25937) SQL Client end-to-end test e2e fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-25937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-25937: - Assignee: David Morávek > SQL Client end-to-end test e2e fails on AZP > --- > > Key: FLINK-25937 > URL: https://issues.apache.org/jira/browse/FLINK-25937 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Table SQL / API >Affects Versions: 1.15.0 >Reporter: Till Rohrmann >Assignee: David Morávek >Priority: Critical > Labels: test-stability > > The {{SQL Client end-to-end test}} e2e tests fails on AZP when using the > {{AdaptiveScheduler}} because the scheduler expects that the parallelism is > set for all vertices: > {code} > Feb 03 03:45:13 org.apache.flink.runtime.client.JobInitializationException: > Could not start the JobMaster. > Feb 03 03:45:13 at > org.apache.flink.runtime.jobmaster.DefaultJobMasterServiceProcess.lambda$new$0(DefaultJobMasterServiceProcess.java:97) > Feb 03 03:45:13 at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > Feb 03 03:45:13 at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > Feb 03 03:45:13 at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > Feb 03 03:45:13 at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1609) > Feb 03 03:45:13 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Feb 03 03:45:13 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Feb 03 03:45:13 at java.lang.Thread.run(Thread.java:748) > Feb 03 03:45:13 Caused by: java.util.concurrent.CompletionException: > java.lang.IllegalStateException: The adaptive scheduler expects the > parallelism being set for each JobVertex (violated JobVertex: > f74b775b58627a33e46b8c155b320255). > Feb 03 03:45:13 at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) > Feb 03 03:45:13 at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) > Feb 03 03:45:13 at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606) > Feb 03 03:45:13 ... 3 more > Feb 03 03:45:13 Caused by: java.lang.IllegalStateException: The adaptive > scheduler expects the parallelism being set for each JobVertex (violated > JobVertex: f74b775b58627a33e46b8c155b320255). > Feb 03 03:45:13 at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:215) > Feb 03 03:45:13 at > org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.assertPreconditions(AdaptiveScheduler.java:296) > Feb 03 03:45:13 at > org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.(AdaptiveScheduler.java:230) > Feb 03 03:45:13 at > org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerFactory.createInstance(AdaptiveSchedulerFactory.java:122) > Feb 03 03:45:13 at > org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:115) > Feb 03 03:45:13 at > org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:345) > Feb 03 03:45:13 at > org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:322) > Feb 03 03:45:13 at > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:106) > Feb 03 03:45:13 at > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:94) > Feb 03 03:45:13 at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112) > Feb 03 03:45:13 at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) > Feb 03 03:45:13 ... 3 more > {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30662&view=logs&j=fb37c667-81b7-5c22-dd91-846535e99a97&t=39a035c3-c65e-573c-fb66-104c66c28912&l=5782 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (FLINK-25825) MySqlCatalogITCase fails on azure
[ https://issues.apache.org/jira/browse/FLINK-25825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-25825: - Assignee: RocMarshal > MySqlCatalogITCase fails on azure > - > > Key: FLINK-25825 > URL: https://issues.apache.org/jira/browse/FLINK-25825 > Project: Flink > Issue Type: Bug > Components: Connectors / JDBC, Table SQL / API >Affects Versions: 1.15.0 >Reporter: Roman Khachatryan >Assignee: RocMarshal >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.15.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30189&view=logs&j=e9af9cde-9a65-5281-a58e-2c8511d36983&t=c520d2c3-4d17-51f1-813b-4b0b74a0c307&l=13677 > > {code} > 2022-01-26T06:04:42.8019913Z Jan 26 06:04:42 [ERROR] > org.apache.flink.connector.jdbc.catalog.MySqlCatalogITCase.testFullPath Time > elapsed: 2.166 *s <<< FAILURE! > 2022-01-26T06:04:42.8025522Z Jan 26 06:04:42 java.lang.AssertionError: > expected: java.util.ArrayList<[+I[1, -1, 1, null, true, null, hello, 2021-0 > 8-04, 2021-08-04T01:54:16, -1, 1, -1.0, 1.0, enum2, -9.1, 9.1, -1, 1, -1, 1, > \{"k1": "v1"}, null, col_longtext, null, -1, 1, col_mediumtext, -99, 9 9, > -1.0, 1.0, set_ele1, -1, 1, col_text, 10:32:34, 2021-08-04T01:54:16, > col_tinytext, -1, 1, null, col_varchar, 2021-08-04T01:54:16.463, 09:33:43, > 2021-08-04T01:54:16.463, null], +I[2, -1, 1, null, true, null, hello, > 2021-08-04, 2021-08-04T01:53:19, -1, 1, -1.0, 1.0, enum2, -9.1, 9.1, -1, 1, > -1, 1, \{"k1": "v1"}, null, col_longtext, null, -1, 1, col_mediumtext, -99, > 99, -1.0, 1.0, set_ele1,set_ele12, -1, 1, col_text, 10:32:34, 2021-08- > 04T01:53:19, col_tinytext, -1, 1, null, col_varchar, 2021-08-04T01:53:19.098, > 09:33:43, 2021-08-04T01:53:19.098, null]]> but was: java.util.ArrayL > ist<[+I[1, -1, 1, null, true, null, hello, 2021-08-04, 2021-08-04T01:54:16, > -1, 1, -1.0, 1.0, enum2, -9.1, 9.1, -1, 1, -1, 1, \{"k1": "v1"}, null, > col_longtext, null, -1, 1, col_mediumtext, -99, 99, -1.0, 1.0, set_ele1, -1, > 1, col_text, 10:32:34, 2021-08-04T01:54:16, col_tinytext, -1, 1, null , > col_varchar, 2021-08-04T01:54:16.463, 09:33:43, 2021-08-04T01:54:16.463, > null], +I[2, -1, 1, null, true, null, hello, 2021-08-04, 2021-08-04T01: > 53:19, -1, 1, -1.0, 1.0, enum2, -9.1, 9.1, -1, 1, -1, 1, \{"k1": "v1"}, null, > col_longtext, null, -1, 1, col_mediumtext, -99, 99, -1.0, 1.0, set_el > e1,set_ele12, -1, 1, col_text, 10:32:34, 2021-08-04T01:53:19, col_tinytext, > -1, 1, null, col_varchar, 2021-08-04T01:53:19.098, 09:33:43, 2021-08-0 > 4T01:53:19.098, null]]> > 2022-01-26T06:04:42.8029336Z Jan 26 06:04:42 at > org.junit.Assert.fail(Assert.java:89) > 2022-01-26T06:04:42.8029824Z Jan 26 06:04:42 at > org.junit.Assert.failNotEquals(Assert.java:835) > 2022-01-26T06:04:42.8030319Z Jan 26 06:04:42 at > org.junit.Assert.assertEquals(Assert.java:120) > 2022-01-26T06:04:42.8030815Z Jan 26 06:04:42 at > org.junit.Assert.assertEquals(Assert.java:146) > 2022-01-26T06:04:42.8031419Z Jan 26 06:04:42 at > org.apache.flink.connector.jdbc.catalog.MySqlCatalogITCase.testFullPath(MySqlCatalogITCase.java*:306) > {code} > > {code} > 2022-01-26T06:04:43.2899378Z Jan 26 06:04:43 [ERROR] Failures: > 2022-01-26T06:04:43.2907942Z Jan 26 06:04:43 [ERROR] > MySqlCatalogITCase.testFullPath:306 expected: java.util.ArrayList<[+I[1, -1, > 1, null, true, > 2022-01-26T06:04:43.2914065Z Jan 26 06:04:43 [ERROR] > MySqlCatalogITCase.testGetTable:253 expected:<( > 2022-01-26T06:04:43.2983567Z Jan 26 06:04:43 [ERROR] > MySqlCatalogITCase.testSelectToInsert:323 expected: > java.util.ArrayList<[+I[1, -1, 1, null, > 2022-01-26T06:04:43.2997373Z Jan 26 06:04:43 [ERROR] > MySqlCatalogITCase.testWithoutCatalog:291 expected: > java.util.ArrayList<[+I[1, -1, 1, null, > 2022-01-26T06:04:43.3010450Z Jan 26 06:04:43 [ERROR] > MySqlCatalogITCase.testWithoutCatalogDB:278 expected: > java.util.ArrayList<[+I[1, -1, 1, nul > {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (FLINK-25968) Possible class leaks in flink-table / sql modules
[ https://issues.apache.org/jira/browse/FLINK-25968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-25968: - Assignee: Timo Walther > Possible class leaks in flink-table / sql modules > - > > Key: FLINK-25968 > URL: https://issues.apache.org/jira/browse/FLINK-25968 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner, Table SQL / Runtime >Affects Versions: 1.15.0 >Reporter: Yun Gao >Assignee: Timo Walther >Priority: Blocker > Fix For: 1.15.0 > > > This is the umbrella issues for possible class leaks in flink-table / sql > planner and runtimes. > Currently for a flink cluster, the flink classes are loaded by the system > ClassLoader while each job would have separate user ClassLoaders. In this > case, if some class loaded by the system ClassLoader has static variables > that reference the classes loaded by the user ClassLoaders, the user > ClassLoaders would not be able to be released, which might cause class leak > in some way. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-25968) Possible class leaks in flink-table / sql modules
[ https://issues.apache.org/jira/browse/FLINK-25968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-25968: -- Priority: Blocker (was: Critical) > Possible class leaks in flink-table / sql modules > - > > Key: FLINK-25968 > URL: https://issues.apache.org/jira/browse/FLINK-25968 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner, Table SQL / Runtime >Affects Versions: 1.15.0 >Reporter: Yun Gao >Priority: Blocker > Fix For: 1.15.0 > > > This is the umbrella issues for possible class leaks in flink-table / sql > planner and runtimes. > Currently for a flink cluster, the flink classes are loaded by the system > ClassLoader while each job would have separate user ClassLoaders. In this > case, if some class loaded by the system ClassLoader has static variables > that reference the classes loaded by the user ClassLoaders, the user > ClassLoaders would not be able to be released, which might cause class leak > in some way. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-25968) Possible class leaks in flink-table / sql modules
[ https://issues.apache.org/jira/browse/FLINK-25968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-25968: -- Fix Version/s: 1.15.0 > Possible class leaks in flink-table / sql modules > - > > Key: FLINK-25968 > URL: https://issues.apache.org/jira/browse/FLINK-25968 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner, Table SQL / Runtime >Affects Versions: 1.15.0 >Reporter: Yun Gao >Priority: Critical > Fix For: 1.15.0 > > > This is the umbrella issues for possible class leaks in flink-table / sql > planner and runtimes. > Currently for a flink cluster, the flink classes are loaded by the system > ClassLoader while each job would have separate user ClassLoaders. In this > case, if some class loaded by the system ClassLoader has static variables > that reference the classes loaded by the user ClassLoaders, the user > ClassLoaders would not be able to be released, which might cause class leak > in some way. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-25968) Possible class leaks in flink-table / sql modules
[ https://issues.apache.org/jira/browse/FLINK-25968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-25968: -- Priority: Critical (was: Major) > Possible class leaks in flink-table / sql modules > - > > Key: FLINK-25968 > URL: https://issues.apache.org/jira/browse/FLINK-25968 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner, Table SQL / Runtime >Affects Versions: 1.15.0 >Reporter: Yun Gao >Priority: Critical > > This is the umbrella issues for possible class leaks in flink-table / sql > planner and runtimes. > Currently for a flink cluster, the flink classes are loaded by the system > ClassLoader while each job would have separate user ClassLoaders. In this > case, if some class loaded by the system ClassLoader has static variables > that reference the classes loaded by the user ClassLoaders, the user > ClassLoaders would not be able to be released, which might cause class leak > in some way. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-23240) ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure
[ https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-23240: -- Component/s: Runtime / Coordination > ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper > fails on azure > - > > Key: FLINK-23240 > URL: https://issues.apache.org/jira/browse/FLINK-23240 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / Coordination >Affects Versions: 1.14.0 >Reporter: Xintong Song >Priority: Critical > Labels: test-stability > Fix For: 1.15.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186 > {code} > Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, > Time elapsed: 91.407 s <<< FAILURE! - in > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase > Jul 04 22:17:29 [ERROR] > testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase) > Time elapsed: 31.356 s <<< ERROR! > Jul 04 22:17:29 java.util.concurrent.ExecutionException: > java.util.concurrent.TimeoutException: Invocation of public abstract > java.util.concurrent.CompletableFuture > org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time) > timed out. > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > Jul 04 22:17:29 at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275) > Jul 04 22:17:29 at > org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Jul 04 22:17:29 at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Jul 04 22:17:29 at > java.base/java.lang.reflect.Method.invoke(Method.java:566) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > Jul 04 22:17:29 at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > Jul 04 22:17:29 at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > Jul 04 22:17:29 at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > Jul 04 22:17:29 at > org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) > Jul 04 22:17:29 at > org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > Jul 04 22:17:29 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > Jul 04 22:17:29 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > Jul 04 22:17:29 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > Jul 04 22:17:29 at > org.junit.runners.ParentRunner.run(ParentRunner.java:413) > Jul 04 22:17:29
[jira] [Updated] (FLINK-25992) JobDispatcherITCase.testRecoverFromCheckpointAfterLosingAndRegainingLeadership fails on azure
[ https://issues.apache.org/jira/browse/FLINK-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-25992: -- Summary: JobDispatcherITCase.testRecoverFromCheckpointAfterLosingAndRegainingLeadership fails on azure (was: JobDispatcherITCase..testRecoverFromCheckpointAfterLosingAndRegainingLeadership fails on azure) > JobDispatcherITCase.testRecoverFromCheckpointAfterLosingAndRegainingLeadership > fails on azure > - > > Key: FLINK-25992 > URL: https://issues.apache.org/jira/browse/FLINK-25992 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.15.0 >Reporter: Roman Khachatryan >Priority: Major > Labels: test-stability > Fix For: 1.15.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30871&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9154 > {code} > 19:41:35,515 [flink-akka.actor.default-dispatcher-9] WARN > org.apache.flink.runtime.taskmanager.Task[] - jobVertex > (1/1)#0 (7efdea21f5f95490e02117063ce8a314) switched from RUNNING to FAILED > with failure cause: java.lang.RuntimeException: Error while notify checkpoint > ABORT. > at > org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1457) > at > org.apache.flink.runtime.taskmanager.Task.notifyCheckpointAborted(Task.java:1407) > at > org.apache.flink.runtime.taskexecutor.TaskExecutor.abortCheckpoint(TaskExecutor.java:1021) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316) > at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) > at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) > at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) > at akka.actor.Actor.aroundReceive(Actor.scala:537) > at akka.actor.Actor.aroundReceive$(Actor.scala:535) > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580) > at akka.actor.ActorCell.invoke(ActorCell.scala:548) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270) > at akka.dispatch.Mailbox.run(Mailbox.scala:231) > at akka.dispatch.Mailbox.exec(Mailbox.scala:243) > at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) > Caused by: java.lang.UnsupportedOperationException: > notifyCheckpointAbortAsync not supported by > org.apache.flink.runtime.dispatcher.JobDispatcherITCase$AtLeastOneCheckpointInvokable > at > org.apache.flink.runtime.jobgraph.tasks.AbstractInvokable.notifyCheckpointAbortAsync(AbstractInvokable.java:205) > at > org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1430) > ... 31 more > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (FLINK-25935) Add a harness based entry point to simply getting started.
[ https://issues.apache.org/jira/browse/FLINK-25935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-25935. - Resolution: Fixed Fixed via 66b8cd0453990b4e56a48392388391dd4e6eafa0 > Add a harness based entry point to simply getting started. > -- > > Key: FLINK-25935 > URL: https://issues.apache.org/jira/browse/FLINK-25935 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Igal Shilman >Assignee: Till Rohrmann >Priority: Major > Labels: pull-request-available > > It would be nicer to improve the getting started experience by providing an > additional entry point in the StateFun distribution that is built on the > Harness. > This can be as simple as providing a Main function that configures RocksDB > and starts the StateFun Flink job. > The rest of the configurations needs to come from the module.yaml > > Having something like that will allow us to simplfy the playground even > further by reducing the start time and the memory requirements for a > docker-compose based example. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (FLINK-25934) Modernize statefun playground examples
[ https://issues.apache.org/jira/browse/FLINK-25934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-25934. - Resolution: Fixed Fixed via 5f082918129f4253797fb698adfe5f16bbc94166 726911f97a8b729bfd20cb366dc636be0c623f05 d6f6da3d77ad00fa8e26086a6d13551a3f55b928 > Modernize statefun playground examples > -- > > Key: FLINK-25934 > URL: https://issues.apache.org/jira/browse/FLINK-25934 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Igal Shilman >Assignee: Till Rohrmann >Priority: Major > Labels: pull-request-available > > It is about time to touch up abit the examples in playground. > Most of the docker-compose/docker files are pretty old and there are a lot of > room for improvement. > # use redpanda instead of kafka+zk - from local experiments it seems to cut > the start time and the memory requirements significantly. In addition it also > comes with a REST proxy, which can improve the interactivity with the > examples quite a lot. > # For the Java examples, there is no reason to use java8 for the remote > functions. We can use at least 11, if not higher. > # Replace the pair of a JobManager+TaskManager by a simple Minicluster -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25981) ZooKeeperMultipleComponentLeaderElectionDriverTest failed
[ https://issues.apache.org/jira/browse/FLINK-25981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488138#comment-17488138 ] Till Rohrmann commented on FLINK-25981: --- Hmm, from the logs it looks as if there is no new leader elected after the first election driver is closed. Unfortunately, the logs for ZooKeeper and Curator are disabled. Therefore, there is not a lot more to extract from the logs. I've tried reproducing the problem locally. This was unsuccessful so far. Maybe you can upload the logs for the failed run [~mapohl] with ZooKeeper and Curator logging enabled. > ZooKeeperMultipleComponentLeaderElectionDriverTest failed > - > > Key: FLINK-25981 > URL: https://issues.apache.org/jira/browse/FLINK-25981 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.0 >Reporter: Matthias Pohl >Priority: Major > Labels: test-stability > > We experienced a [build > failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30783&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7&l=15997] > in > {{ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers}}. > The test halted when waiting for the next leader in > [ZooKeeperMultipleComponentLeaderElectionDriverTest:256|https://github.com/apache/flink/blob/e8742f7f5cac34852d0e621036e1614bbdfe8ec3/flink-runtime/src/test/java/org/apache/flink/runtime/leaderelection/ZooKeeperMultipleComponentLeaderElectionDriverTest.java#L256] > {code} > Feb 04 18:02:54 "main" #1 prio=5 os_prio=0 tid=0x7fab0800b800 nid=0xe07 > waiting on condition [0x7fab12574000] > Feb 04 18:02:54java.lang.Thread.State: WAITING (parking) > Feb 04 18:02:54 at sun.misc.Unsafe.park(Native Method) > Feb 04 18:02:54 - parking to wait for <0x8065c5c8> (a > java.util.concurrent.CompletableFuture$Signaller) > Feb 04 18:02:54 at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > Feb 04 18:02:54 at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707) > Feb 04 18:02:54 at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > Feb 04 18:02:54 at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742) > Feb 04 18:02:54 at > java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947) > Feb 04 18:02:54 at > org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers(ZooKeeperMultipleComponentLeaderElectionDriverTest.java:256) > [...] > {code} > The extended Maven logs indicate that the timeout happened while waiting for > the second leader to be selected. > {code} > Test > org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers > is running. > > 17:15:10,437 [ Thread-16] INFO > org.apache.curator.test.TestingZooKeeperMain [] - Starting > server > 17:15:10,450 [main] INFO > org.apache.flink.runtime.util.ZooKeeperUtils [] - Enforcing > default ACL for ZK connections > 17:15:10,451 [main] INFO > org.apache.flink.runtime.util.ZooKeeperUtils [] - Using > '/flink/default' as Zookeeper namespace. > 17:15:10,452 [main] INFO > org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl > [] - Starting > 17:15:10,455 [main] INFO > org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl > [] - Default schema > 17:15:10,462 [main-EventThread] INFO > org.apache.flink.shaded.curator5.org.apache.curator.framework.state.ConnectionStateManager > [] - State change: CONNECTED > 17:15:10,467 [main-EventThread] INFO > org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker > [] - New config event received: {} > 17:15:10,482 [Curator-ConnectionStateManager-0] DEBUG > org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver > [] - Connected to ZooKeeper quorum. Leader election can start. > 17:15:10,483 [Curator-ConnectionStateManager-0] DEBUG > org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver > [] - Connected to ZooKeeper quorum. Leader election can start. > 17:15:10,483 [Curator-ConnectionStateManager-0] DEBUG > org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver > [] - Connected to ZooKeeper quorum. Leader e
[jira] [Assigned] (FLINK-25934) Modernize statefun playground examples
[ https://issues.apache.org/jira/browse/FLINK-25934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-25934: - Assignee: Till Rohrmann > Modernize statefun playground examples > -- > > Key: FLINK-25934 > URL: https://issues.apache.org/jira/browse/FLINK-25934 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Igal Shilman >Assignee: Till Rohrmann >Priority: Major > > It is about time to touch up abit the examples in playground. > Most of the docker-compose/docker files are pretty old and there are a lot of > room for improvement. > # use redpanda instead of kafka+zk - from local experiments it seems to cut > the start time and the memory requirements significantly. In addition it also > comes with a REST proxy, which can improve the interactivity with the > examples quite a lot. > # For the Java examples, there is no reason to use java8 for the remote > functions. We can use at least 11, if not higher. > # Replace the pair of a JobManager+TaskManager by a simple Minicluster -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-25893) ResourceManagerServiceImpl's lifecycle can lead to exceptions
[ https://issues.apache.org/jira/browse/FLINK-25893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488033#comment-17488033 ] Till Rohrmann commented on FLINK-25893: --- For option 1) I think a leading {{Dispatcher}} would decide that it is now time to shut down and to deregister the application. Then the {{ClusterEntrypoint}} would get the signal and initiate the deregistration. For option 2): Assuming that eventually a {{Dispatcher}} becomes leader that is running in the same process as the leading {{RM}} which then triggers the shut down, I think this can work. Moreover with FLINK-24038 the problem of a leading RM and {{Dispatcher}} running in different processes should no longer happen. > ResourceManagerServiceImpl's lifecycle can lead to exceptions > - > > Key: FLINK-25893 > URL: https://issues.apache.org/jira/browse/FLINK-25893 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.0, 1.14.3 >Reporter: Till Rohrmann >Assignee: Xintong Song >Priority: Critical > Labels: pull-request-available > > The {{ResourceManagerServiceImpl}} lifecycle can lead to exceptions when > calling {{ResourceManagerServiceImpl.deregisterApplication}}. The problem > arises when the {{DispatcherResourceManagerComponent}} is shutdown before the > {{ResourceManagerServiceImpl}} gains leadership or while it is starting the > {{ResourceManager}}. > One problem is that {{deregisterApplication}} returns an exceptionally > completed future if there is no leading {{ResourceManager}}. > Another problem is that if there is a leading {{ResourceManager}}, then it > can still be the case that it has not been started yet. If this is the case, > then > [ResourceManagerGateway.deregisterApplication|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManagerServiceImpl.java#L143] > will be discarded. The reason for this behaviour is that we create a > {{ResourceManager}} in one {{Runnable}} and only start it in another. Due to > this there can be the {{deregisterApplication}} call that gets the {{lock}} > in between. > I'd suggest to correct the lifecycle and contract of the > {{ResourceManagerServiceImpl.deregisterApplication}}. > Please note that due to this problem, the error reporting of this method has > been suppressed. See FLINK-25885 for more details. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-25924) KDF Integration tests intermittently fails
[ https://issues.apache.org/jira/browse/FLINK-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-25924: -- Affects Version/s: 1.15.0 > KDF Integration tests intermittently fails > -- > > Key: FLINK-25924 > URL: https://issues.apache.org/jira/browse/FLINK-25924 > Project: Flink > Issue Type: Bug > Components: Connectors / Common >Affects Versions: 1.15.0 >Reporter: Zichen Liu >Assignee: Ahmed Hamdy >Priority: Critical > Labels: pull-request-available, test-stability > > Intermittent failures introduced as part of merge (PR#18314: > [FLINK-24228[connectors/firehose] - Unified Async Sink for Kinesis > Firehose|https://github.com/apache/flink/pull/18314]): > # Failures are intermittent and affecting c. 1 in 7 of builds- on > {{flink-ci.flink}} and {{flink-ci.flink-master-mirror}} . > # The issue looks identical to the KinesaliteContainer startup issue > (Appendix 1). > # I have managed to reproduce the issue locally - if I start some parallel > containers and keep them running - and then run {{KinesisFirehoseSinkITCase}} > then c. 1 in 6 gives the error. > # The errors have a slightly different appearance on > {{flink-ci.flink-master-mirror}} vs {{flink-ci.flink}} which has the same > appearance as local. I only hope it is a difference in logging/killing > environment variables. (and that there aren’t 2 distinct issues) > Appendix 1: > {code:java} > org.testcontainers.containers.ContainerLaunchException: Container startup > failed > at > org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:336) > at > org.testcontainers.containers.GenericContainer.start(GenericContainer.java:317) > at > org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:1066) > at > ... 11 more > Caused by: org.testcontainers.containers.ContainerLaunchException: Could not > create/start container > at > org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:525) > at > org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:331) > at > org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81) > ... 12 more > Caused by: org.rnorth.ducttape.TimeoutException: Timeout waiting for result > with exception > at > org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:54) > at > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-25924) KDF Integration tests intermittently fails
[ https://issues.apache.org/jira/browse/FLINK-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-25924: -- Fix Version/s: (was: 1.15.0) > KDF Integration tests intermittently fails > -- > > Key: FLINK-25924 > URL: https://issues.apache.org/jira/browse/FLINK-25924 > Project: Flink > Issue Type: Bug > Components: Connectors / Common >Reporter: Zichen Liu >Assignee: Ahmed Hamdy >Priority: Critical > Labels: pull-request-available, test-stability > > Intermittent failures introduced as part of merge (PR#18314: > [FLINK-24228[connectors/firehose] - Unified Async Sink for Kinesis > Firehose|https://github.com/apache/flink/pull/18314]): > # Failures are intermittent and affecting c. 1 in 7 of builds- on > {{flink-ci.flink}} and {{flink-ci.flink-master-mirror}} . > # The issue looks identical to the KinesaliteContainer startup issue > (Appendix 1). > # I have managed to reproduce the issue locally - if I start some parallel > containers and keep them running - and then run {{KinesisFirehoseSinkITCase}} > then c. 1 in 6 gives the error. > # The errors have a slightly different appearance on > {{flink-ci.flink-master-mirror}} vs {{flink-ci.flink}} which has the same > appearance as local. I only hope it is a difference in logging/killing > environment variables. (and that there aren’t 2 distinct issues) > Appendix 1: > {code:java} > org.testcontainers.containers.ContainerLaunchException: Container startup > failed > at > org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:336) > at > org.testcontainers.containers.GenericContainer.start(GenericContainer.java:317) > at > org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:1066) > at > ... 11 more > Caused by: org.testcontainers.containers.ContainerLaunchException: Could not > create/start container > at > org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:525) > at > org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:331) > at > org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81) > ... 12 more > Caused by: org.rnorth.ducttape.TimeoutException: Timeout waiting for result > with exception > at > org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:54) > at > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (FLINK-25885) ClusterEntrypointTest.testWorkingDirectoryIsDeletedIfApplicationCompletes failed on azure
[ https://issues.apache.org/jira/browse/FLINK-25885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-25885. - Fix Version/s: 1.15.0 Resolution: Fixed Fixed via 59d2d84d83696d4775b6ff3ec97f8274ff6e371f > ClusterEntrypointTest.testWorkingDirectoryIsDeletedIfApplicationCompletes > failed on azure > -- > > Key: FLINK-25885 > URL: https://issues.apache.org/jira/browse/FLINK-25885 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.0 >Reporter: Yun Gao >Assignee: Till Rohrmann >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.15.0 > > > {code:java} > 2022-01-31T05:00:07.3113870Z Jan 31 05:00:07 > java.util.concurrent.CompletionException: > org.apache.flink.runtime.rpc.akka.exceptions.AkkaRpcException: Discard > message, because the rpc endpoint > akka.tcp://flink@127.0.0.1:6123/user/rpc/resourcemanager_2 has not been > started yet. > 2022-01-31T05:00:07.3115008Z Jan 31 05:00:07 at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > 2022-01-31T05:00:07.3115778Z Jan 31 05:00:07 at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > 2022-01-31T05:00:07.3116527Z Jan 31 05:00:07 at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > 2022-01-31T05:00:07.3117267Z Jan 31 05:00:07 at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > 2022-01-31T05:00:07.3118011Z Jan 31 05:00:07 at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > 2022-01-31T05:00:07.3118770Z Jan 31 05:00:07 at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > 2022-01-31T05:00:07.3119608Z Jan 31 05:00:07 at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:251) > 2022-01-31T05:00:07.3120425Z Jan 31 05:00:07 at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > 2022-01-31T05:00:07.3121199Z Jan 31 05:00:07 at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > 2022-01-31T05:00:07.3121957Z Jan 31 05:00:07 at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > 2022-01-31T05:00:07.3122716Z Jan 31 05:00:07 at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > 2022-01-31T05:00:07.3123457Z Jan 31 05:00:07 at > org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1387) > 2022-01-31T05:00:07.3124241Z Jan 31 05:00:07 at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93) > 2022-01-31T05:00:07.3125106Z Jan 31 05:00:07 at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) > 2022-01-31T05:00:07.3126063Z Jan 31 05:00:07 at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92) > 2022-01-31T05:00:07.3127207Z Jan 31 05:00:07 at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > 2022-01-31T05:00:07.3127982Z Jan 31 05:00:07 at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > 2022-01-31T05:00:07.3128741Z Jan 31 05:00:07 at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > 2022-01-31T05:00:07.3129497Z Jan 31 05:00:07 at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > 2022-01-31T05:00:07.3130385Z Jan 31 05:00:07 at > org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:45) > 2022-01-31T05:00:07.3131092Z Jan 31 05:00:07 at > akka.dispatch.OnComplete.internal(Future.scala:299) > 2022-01-31T05:00:07.3131695Z Jan 31 05:00:07 at > akka.dispatch.OnComplete.internal(Future.scala:297) > 2022-01-31T05:00:07.3132310Z Jan 31 05:00:07 at > akka.dispatch.japi$CallbackBridge.apply(Future.scala:224) > 2022-01-31T05:00:07.3132943Z Jan 31 05:00:07 at > akka.dispatch.japi$CallbackBridge.apply(Future.scala:221) > 2022-01-31T05:00:07.3133577Z Jan 31 05:00:07 at > scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) > 2022-01-31T05:00:07.3134340Z Jan 31 05:00:07 at > org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$DirectExecutionContext.execute(AkkaFutureUtils.java:65) > 2022-01-31T05:00:07.3135149Z Jan 31 05:00:07 at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68) > 2022-01-31T05:00:07.313589
[jira] [Comment Edited] (FLINK-21752) NullPointerException on restore in PojoSerializer
[ https://issues.apache.org/jira/browse/FLINK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486454#comment-17486454 ] Till Rohrmann edited comment on FLINK-21752 at 2/3/22, 1:03 PM: Yes, I agree. This is quite bad given that we provide [schema evolution for POJO types|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/fault-tolerance/serialization/schema_evolution/#supported-data-types-for-schema-evolution]. Let's fix it in all currently supported releases. was (Author: till.rohrmann): Yes, I agree. This is quite bad given that we provide [schema evolution for POJO types|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/fault-tolerance/serialization/schema_evolution/#supported-data-types-for-schema-evolution]. > NullPointerException on restore in PojoSerializer > - > > Key: FLINK-21752 > URL: https://issues.apache.org/jira/browse/FLINK-21752 > Project: Flink > Issue Type: Bug > Components: API / Type Serialization System >Affects Versions: 1.15.0, 1.13.5, 1.14.3 >Reporter: Roman Khachatryan >Assignee: Dawid Wysakowicz >Priority: Blocker > Labels: pull-request-available > > As originally reported in > [thread|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Schema-Evolution-Cannot-restore-from-savepoint-after-deleting-field-from-POJO-td42162.html], > after removing a field from a class restore from savepoint fails with the > following exception: > {code} > 2021-03-10T20:51:30.406Z INFO org.apache.flink.runtime.taskmanager.Task:960 > … (6/8) (d630d5ff0d7ae4fbc428b151abebab52) switched from RUNNING to FAILED. > java.lang.Exception: Exception while creating StreamOperatorStateContext. > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:253) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:901) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:415) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for > KeyedCoProcessOperator_c535ac415eeb524d67c88f4a481077d2_(6/8) from any of the > 1 provided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) > ... 6 common frames omitted > Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed > when trying to restore heap backend > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:116) > at > org.apache.flink.runtime.state.memory.MemoryStateBackend.createKeyedStateBackend(MemoryStateBackend.java:347) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) > ... 8 common frames omitted > Caused by: java.lang.NullPointerException: null > at > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.(PojoSerializer.java:123) > at > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.duplicate(PojoSerializer.java:186) > at > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.duplicate(PojoSerializer.java:56) > at > org.apache.flink.api.common.typeutils.CompositeSerializer$PrecomputedParameters.precompute(CompositeSerializer.java:228) > at > org.apache.flink.api.common.typeutils.CompositeSerializer.(CompositeSerializer.java:51) > at > org.apache.flink.runtime.state.ttl.TtlStateFactory$TtlSerializer.(TtlStateFactory.java:250) > at > org.apache.flink.runtime.state.ttl.TtlStateFactory$TtlSerializerSnaps