[jira] [Assigned] (FLINK-29814) Upgrade Stateful Functions to use Flink 1.15

2022-10-31 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-29814:
-

Assignee: Galen Warren

> Upgrade Stateful Functions to use Flink 1.15
> 
>
> Key: FLINK-29814
> URL: https://issues.apache.org/jira/browse/FLINK-29814
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Fil Karnicki
>Assignee: Galen Warren
>Priority: Major
>
> Upgrade Statefun to use the latest Flink 1.15.x
>  
> There already exists a pull request without a jira. Perhaps it can be 
> reviewed/amended as part of this task?
> [https://github.com/apache/flink-statefun/pull/314]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28747) "target_id can not be missing" in HTTP statefun request

2022-08-03 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574752#comment-17574752
 ] 

Till Rohrmann commented on FLINK-28747:
---

I think this is Protobuf's behaviour. If Protobuf sees that a field has the 
default value assigned, then it won't serialize this field. On the receiving 
end where the message is deserialized, Protobuf will return the default value 
for a missing field (for a string type it is the empty string).

> "target_id can not be missing" in HTTP statefun request
> ---
>
> Key: FLINK-28747
> URL: https://issues.apache.org/jira/browse/FLINK-28747
> Project: Flink
>  Issue Type: Bug
>  Components: Stateful Functions
>Affects Versions: statefun-3.0.0, statefun-3.2.0, statefun-3.1.1
>Reporter: Stephan Weinwurm
>Priority: Major
>
> Hi all,
> We've suddenly started to see the following exception in our HTTP statefun 
> functions endpoints:
> {code}Traceback (most recent call last):
>   File 
> "/src/.venv/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", 
> line 403, in run_asgi
> result = await app(self.scope, self.receive, self.send)
>   File 
> "/src/.venv/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", 
> line 78, in __call__
> return await self.app(scope, receive, send)
>   File "/src/worker/baseplate_asgi/asgi/baseplate_asgi_middleware.py", line 
> 37, in __call__
> await span_processor.execute()
>   File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line 
> 61, in execute
> raise e
>   File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line 
> 57, in execute
> await self.app(self.scope, self.receive, self.send)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/applications.py", 
> line 124, in __call__
> await self.middleware_stack(scope, receive, send)
>   File 
> "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 
> 184, in __call__
> raise exc
>   File 
> "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 
> 162, in __call__
> await self.app(scope, receive, _send)
>   File 
> "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", 
> line 75, in __call__
> raise exc
>   File 
> "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", 
> line 64, in __call__
> await self.app(scope, receive, sender)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 
> 680, in __call__
> await route.handle(scope, receive, send)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 
> 275, in handle
> await self.app(scope, receive, send)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 
> 65, in app
> response = await func(request)
>   File "/src/worker/baseplate_statefun/server/asgi/make_statefun_handler.py", 
> line 25, in statefun_handler
> result = await handler.handle_async(request_body)
>   File "/src/.venv/lib/python3.9/site-packages/statefun/request_reply_v3.py", 
> line 262, in handle_async
> msg = Message(target_typename=sdk_address.typename, 
> target_id=sdk_address.id,
>   File "/src/.venv/lib/python3.9/site-packages/statefun/messages.py", line 
> 42, in __init__
> raise ValueError("target_id can not be missing"){code}
> Interestingly, this has started to happen in three separate Flink deployments 
> at the very same time. The only thing in common between the three deployments 
> is that they consume the same Kafka topics.
> No deployments have happened when the issue started happening which was on 
> July 28th 3:05PM. We have since been continuously seeing the error.
> We were also able to extract the request that Flink sends to the HTTP 
> statefun endpoint:
> {code}{'invocation': {'target': {'namespace': 'com.x.dummy', 'type': 
> 'dummy'}, 'invocations': [{'argument': {'typename': 
> 'type.googleapis.com/v2_event.Event', 'has_value': True, 'value': 
> '-redicated-'}}]}}
> {code}
> As you can see, no `id` field is present in the `invocation.target` object or 
> the `target_id` was an empty string.
>  
> This is our module.yaml from one of the Flink deployments:
>  
> {code}
> version: "3.0"
> module:
> meta:
> type: remote
> spec:
> endpoints:
>  - endpoint:
> meta:
> kind: io.statefun.endpoints.v1/http
> spec:
> functions: com.x.dummy/dummy
> urlPathTemplate: [http://x-worker-dummy.x-functions:9090/statefun]
> timeouts:
> call: 2 min
> read: 2 min
> write: 2 min
> maxNumBatchRequests: 100
> ingresses:
>  - ingress:
> meta:
> type: io.statefun.kafka/ingress
> id: com.x/ingress
> spec:
> address: x-kafka-0.x.ue1.x.net:9092
> consumerGroupId: x-worker-dummy
> topics:
>  - topic: v2_post_events
> valueType: type.googleapis.

[jira] [Comment Edited] (FLINK-28747) "target_id can not be missing" in HTTP statefun request

2022-08-02 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574267#comment-17574267
 ] 

Till Rohrmann edited comment on FLINK-28747 at 8/2/22 2:57 PM:
---

Thanks for reporting this issue [~stepweiwu]. Without knowing the exact details 
of your StateFun job, I suspect that it could be caused by Kafka messages that 
only have an empty string as a key specified. This could also explain why you 
are seeing the problem appear across different services consuming from the same 
topic.

For some context, StateFun uses the key of a Kafka message as the target id for 
the StatefulFunction invocation. The target id identifies an instance of the 
StatefulFunction so that the runtime can associate state with it.

Empty strings should be valid keys, so I believe that this is a bug in the 
Python SDK where we check {{if not target_id: raise ValueError}} that is 
executed if the key is empty 
(https://github.com/apache/flink-statefun/blob/master/statefun-sdk-python/statefun/messages.py#L41).

As a workaround (if you can control it), you could require your Kafka message 
to require a non-empty key. The proper fix would be to fix the Python SDK.


was (Author: till.rohrmann):
Thanks for reporting this issue [~stepweiwu]. Without knowing the exact details 
of your StateFun job, I suspect that it could be caused by Kafka messages that 
only have an empty string as a key specified. This could also explain why you 
are seeing the problem appear across different services consuming from the same 
topic.

Empty strings should be valid keys, so I believe that this is a bug in the 
Python SDK where we check {{if not target_id: raise ValueError}} that is 
executed if the key is empty 
(https://github.com/apache/flink-statefun/blob/master/statefun-sdk-python/statefun/messages.py#L41).

As a workaround (if you can control it), you could require your Kafka message 
to require a non-empty key. The proper fix would be to fix the Python SDK.

> "target_id can not be missing" in HTTP statefun request
> ---
>
> Key: FLINK-28747
> URL: https://issues.apache.org/jira/browse/FLINK-28747
> Project: Flink
>  Issue Type: Bug
>  Components: Stateful Functions
>Affects Versions: statefun-3.0.0, statefun-3.2.0, statefun-3.1.1
>Reporter: Stephan Weinwurm
>Priority: Major
>
> Hi all,
> We've suddenly started to see the following exception in our HTTP statefun 
> functions endpoints:
> {code}Traceback (most recent call last):
>   File 
> "/src/.venv/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", 
> line 403, in run_asgi
> result = await app(self.scope, self.receive, self.send)
>   File 
> "/src/.venv/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", 
> line 78, in __call__
> return await self.app(scope, receive, send)
>   File "/src/worker/baseplate_asgi/asgi/baseplate_asgi_middleware.py", line 
> 37, in __call__
> await span_processor.execute()
>   File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line 
> 61, in execute
> raise e
>   File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line 
> 57, in execute
> await self.app(self.scope, self.receive, self.send)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/applications.py", 
> line 124, in __call__
> await self.middleware_stack(scope, receive, send)
>   File 
> "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 
> 184, in __call__
> raise exc
>   File 
> "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 
> 162, in __call__
> await self.app(scope, receive, _send)
>   File 
> "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", 
> line 75, in __call__
> raise exc
>   File 
> "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", 
> line 64, in __call__
> await self.app(scope, receive, sender)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 
> 680, in __call__
> await route.handle(scope, receive, send)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 
> 275, in handle
> await self.app(scope, receive, send)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 
> 65, in app
> response = await func(request)
>   File "/src/worker/baseplate_statefun/server/asgi/make_statefun_handler.py", 
> line 25, in statefun_handler
> result = await handler.handle_async(request_body)
>   File "/src/.venv/lib/python3.9/site-packages/statefun/request_reply_v3.py", 
> line 262, in handle_async
> msg = Message(target_typename=sdk_address.typename, 
> target_id=sdk_address.id,
>   File "/src/.venv/lib/python3.9/site-packages/sta

[jira] [Commented] (FLINK-28747) "target_id can not be missing" in HTTP statefun request

2022-08-02 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574267#comment-17574267
 ] 

Till Rohrmann commented on FLINK-28747:
---

Thanks for reporting this issue [~stepweiwu]. Without knowing the exact details 
of your StateFun job, I suspect that it could be caused by Kafka messages that 
only have an empty string as a key specified. This could also explain why you 
are seeing the problem appear across different services consuming from the same 
topic.

Empty strings should be valid keys, so I believe that this is a bug in the 
Python SDK where we check {{if not target_id: raise ValueError}} that is 
executed if the key is empty 
(https://github.com/apache/flink-statefun/blob/master/statefun-sdk-python/statefun/messages.py#L41).

As a workaround (if you can control it), you could require your Kafka message 
to require a non-empty key. The proper fix would be to fix the Python SDK.

> "target_id can not be missing" in HTTP statefun request
> ---
>
> Key: FLINK-28747
> URL: https://issues.apache.org/jira/browse/FLINK-28747
> Project: Flink
>  Issue Type: Bug
>  Components: Stateful Functions
>Affects Versions: statefun-3.0.0, statefun-3.2.0, statefun-3.1.1
>Reporter: Stephan Weinwurm
>Priority: Major
>
> Hi all,
> We've suddenly started to see the following exception in our HTTP statefun 
> functions endpoints:
> {code}Traceback (most recent call last):
>   File 
> "/src/.venv/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", 
> line 403, in run_asgi
> result = await app(self.scope, self.receive, self.send)
>   File 
> "/src/.venv/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", 
> line 78, in __call__
> return await self.app(scope, receive, send)
>   File "/src/worker/baseplate_asgi/asgi/baseplate_asgi_middleware.py", line 
> 37, in __call__
> await span_processor.execute()
>   File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line 
> 61, in execute
> raise e
>   File "/src/worker/baseplate_asgi/asgi/asgi_http_span_processor.py", line 
> 57, in execute
> await self.app(self.scope, self.receive, self.send)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/applications.py", 
> line 124, in __call__
> await self.middleware_stack(scope, receive, send)
>   File 
> "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 
> 184, in __call__
> raise exc
>   File 
> "/src/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 
> 162, in __call__
> await self.app(scope, receive, _send)
>   File 
> "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", 
> line 75, in __call__
> raise exc
>   File 
> "/src/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", 
> line 64, in __call__
> await self.app(scope, receive, sender)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 
> 680, in __call__
> await route.handle(scope, receive, send)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 
> 275, in handle
> await self.app(scope, receive, send)
>   File "/src/.venv/lib/python3.9/site-packages/starlette/routing.py", line 
> 65, in app
> response = await func(request)
>   File "/src/worker/baseplate_statefun/server/asgi/make_statefun_handler.py", 
> line 25, in statefun_handler
> result = await handler.handle_async(request_body)
>   File "/src/.venv/lib/python3.9/site-packages/statefun/request_reply_v3.py", 
> line 262, in handle_async
> msg = Message(target_typename=sdk_address.typename, 
> target_id=sdk_address.id,
>   File "/src/.venv/lib/python3.9/site-packages/statefun/messages.py", line 
> 42, in __init__
> raise ValueError("target_id can not be missing"){code}
> Interestingly, this has started to happen in three separate Flink deployments 
> at the very same time. The only thing in common between the three deployments 
> is that they consume the same Kafka topics.
> No deployments have happened when the issue started happening which was on 
> July 28th 3:05PM. We have since been continuously seeing the error.
> We were also able to extract the request that Flink sends to the HTTP 
> statefun endpoint:
> {code}{'invocation': {'target': {'namespace': 'com.x.dummy', 'type': 
> 'dummy'}, 'invocations': [{'argument': {'typename': 
> 'type.googleapis.com/v2_event.Event', 'has_value': True, 'value': 
> '-redicated-'}}]}}
> {code}
> As you can see, no `id` field is present in the `invocation.target` object or 
> the `target_id` was an empty string.
>  
> This is our module.yaml from one of the Flink deployments:
>  
> {code}
> version: "3.0"
> module:
> meta:
> type: remote
> spec:
> endpoints:
>  - endpoint:
> me

[jira] [Commented] (FLINK-26845) Document testing guidelines

2022-03-30 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514598#comment-17514598
 ] 

Till Rohrmann commented on FLINK-26845:
---

Hi [~Alcántara],

I think a good start would be to document and give examples for what is already 
there. We could create for each SDK a sub task to do this. Based on this we 
will pretty quickly realize which SDK is ahead in terms os testability and what 
needs to be added to the other SDKs.

Then, as you've proposed, we should tackle the e2e testing utilities. I would 
probably make this a separate ticket as well. I think that the e2e testing 
might be less SDK specific and could be covered in a generic fashion. The main 
part is probably the deployment of a SF cluster which should overlap a lot with 
the general deployment documentation of SF.

Would you be interested in helping with the first part of documenting what is 
already there?

> Document testing guidelines
> ---
>
> Key: FLINK-26845
> URL: https://issues.apache.org/jira/browse/FLINK-26845
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Salva
>Priority: Minor
>
> As of now, there seems to be not much guidance on how to approach testing. 
> Although it's true [~sjwiesman] that 
> {quote}Unlike other flink apis, the sdk doesn’t pull in the runtime so 
> testing should really look like any other code. There isn’t much statefun 
> specific.
> {quote}
> I think that at the very least testing should be mentioned as part of the 
> docs, even if it is only to stress the above fact. Indeed, as reported by 
> [~trohrmann], there seems that there are already some test utilities in place 
> but they are not well-documented, plus some potential ideas on how to improve 
> on that front
> {quote}Testing tools is definitely one area where we want to improve 
> significantly. Also the documentation for how to do things needs to be 
> updated. There are some testing utils that you can already use today: 
> [https://github.com/apache/flink-statefun/blob/master/statefun-sdk-java/src/main/java/org/apache/flink/statefun/sdk/java/testing/TestContext.java…|https://t.co/WGjyMA510b].
>  However, it is not well documented.
> {quote}
> Once the overall guidelines are in place for different testing strategies, 
> tests could be added to the examples in the 
> {_}[playground|https://github.com/apache/flink-statefun-playground]{_}.
> {_}Note{_}: Issue originally reported in Twitter: 
> [https://twitter.com/salvalcantara/status/1505834101026267136?s=20&t=Go2IHP6iP4ZmIyVLmIeD3g]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26617) Pass Kafka headers to remote functions and egresses

2022-03-30 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514596#comment-17514596
 ] 

Till Rohrmann commented on FLINK-26617:
---

If this is a requirement, then I guess we need to send the span context to the 
remote functions. I think this should be doable as this is not a lot of 
information.

At the moment I am not 100% sure what things need to change in order to fully 
support tracing in SF. To explore this and the requirements, I'd be fine if you 
explored it using the example of Kafka records with spans attached. Based on 
that we could think about how to generalize the mechanism so that it also works 
with other ingresses/egresses.

> Pass Kafka headers to remote functions and egresses
> ---
>
> Key: FLINK-26617
> URL: https://issues.apache.org/jira/browse/FLINK-26617
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Fil Karnicki
>Priority: Minor
>
> Typically OpenTelemetry (FLINK-22390) tracing spans get passed in kafka 
> headers. We could be passing not only the Kafka ConsumerRecord value, but 
> also the headers to remote functions, if the user configures their kafka 
> ingress to do so
> Similarly, kafka egresses could be configurable so that headers get passed on 
> via the KafkaProducerRecord proto to kafka



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26570) Remote module configuration interpolation

2022-03-28 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513271#comment-17513271
 ] 

Till Rohrmann commented on FLINK-26570:
---

Thanks for opening this PR [~Fil Karnicki]. I will take a look at it.

> Remote module configuration interpolation
> -
>
> Key: FLINK-26570
> URL: https://issues.apache.org/jira/browse/FLINK-26570
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Fil Karnicki
>Assignee: Fil Karnicki
>Priority: Major
>  Labels: pull-request-available
>
> Add the ability for users to provide placeholders in module.yaml, e.g.
> {code:java}
> kind: com.foo.bar/test
> spec:
>   something: ${REPLACE_ME}
>   transport:
> password: ${REPLACE_ME_WITH_A_SECRET}
> array:
>   - ${REPLACE_ME}
>   - sthElse {code}
> These placeholders would be resolved in 
> org.apache.flink.statefun.flink.core.jsonmodule.RemoteModule#bindComponent
> using 
> {code:java}
> ParameterTool.fromSystemProperties().mergeWith(ParameterTool.fromMap(globalConfiguration)
>  {code}
> by traversing the ComponentJsonObject.specJsonNode() and replacing values 
> that contain placeholders with values from the combined system+globalConfig 
> map



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26628) Use REST program arguments in StatefulFunctionsJob

2022-03-28 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513229#comment-17513229
 ] 

Till Rohrmann commented on FLINK-26628:
---

I think [~igal] is right here [~Fil Karnicki]. Flink's web submission has some 
limitations and this is one of them if I am not mistaken. In general, the 
recommendation in the Flink community is to not use the web submission but 
rather to use the cli. Hence, I believe that this issue is a Flink and not a SF 
issue.

> Use REST program arguments in StatefulFunctionsJob
> --
>
> Key: FLINK-26628
> URL: https://issues.apache.org/jira/browse/FLINK-26628
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Fil Karnicki
>Priority: Major
>
> Currently the Program Arguments passed into the REST api don't get used by 
> StreamingExecutionEnvironment in StatefulFunctionsJob (in this case the 
> checkpointing will not be set)
> {code:java}
> --execution.checkpointing.interval 1000 --state.backend rocksdb 
> --state.checkpoint-storage filesystem --state.checkpoints.dir file:///tmp/ 
> --statefun.embedded true {code}
> Conversely, Flink CLI params *do* get used by the 
> StreamingExecutionEnvironment in statefun jobs
> {code:java}
> flink run -Dexecution.checkpointing.interval=1000 -Dstate.backend=rocksdb 
> -Dstate.checkpoint-storage=filesystem -Dstate.checkpoints.dir=file:///tmp/ 
> -Dstatefun.embedded=true myjar.jar{code}
>  
> To reproduce,
>  # clone and run mvn package on 
> [https://github.com/FilKarnicki/statefun-flinkjob/tree/argsNotUsedViaRest]
>  # run docker-compose up in flinkjob/docker-compose
>  # observe checkpointing happening for this job
>  # go to [http://localhost:8081/#/submit] and submit 
> flinkjob-1.0-SNAPSHOT.jar again manually from target with program arguments
> {code:java}
>  --execution.checkpointing.interval 1000 --state.backend rocksdb 
> --state.checkpoint-storage filesystem --state.checkpoints.dir file:///tmp/ 
> --statefun.embedded true   {code}
>     5. observe no checkpointing happening for the second job



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26617) Pass Kafka headers to remote functions and egresses

2022-03-28 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513227#comment-17513227
 ] 

Till Rohrmann commented on FLINK-26617:
---

Thanks for creating this ticket [~Fil Karnicki]. I think the topic of tracing 
is very interesting as it opens SF up for better understanding what is going on.

Ideally, we can solve this problem generically as tracing is also interesting 
for other ingresses and SF calls in general. I think what we would need to add 
to the runtime is support for creating/deriving spans and propagating their 
contexts. 

I am not sure whether this information really needs to be forwarded to a remote 
function unless the remote function can spawn external calls that need the 
context as well. That way we don't need SDK specific handling logic. However, 
we would need to adjust the ingresses and egresses to understand the 
span/context field of a message.

> Pass Kafka headers to remote functions and egresses
> ---
>
> Key: FLINK-26617
> URL: https://issues.apache.org/jira/browse/FLINK-26617
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Fil Karnicki
>Priority: Minor
>
> Typically OpenTelemetry (FLINK-22390) tracing spans get passed in kafka 
> headers. We could be passing not only the Kafka ConsumerRecord value, but 
> also the headers to remote functions, if the user configures their kafka 
> ingress to do so
> Similarly, kafka egresses could be configurable so that headers get passed on 
> via the KafkaProducerRecord proto to kafka



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26424) RequestTimeoutException

2022-03-28 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513212#comment-17513212
 ] 

Till Rohrmann commented on FLINK-26424:
---

Yes, such a feature will definitely be an opt-in feature. Thanks for your 
feedback [~Fil Karnicki].

> RequestTimeoutException 
> 
>
> Key: FLINK-26424
> URL: https://issues.apache.org/jira/browse/FLINK-26424
> Project: Flink
>  Issue Type: Bug
>  Components: Stateful Functions
>Affects Versions: statefun-3.2.0
>Reporter: xinchenyuan
>Priority: Major
>
> there is no max retries, all I got is the call timeout
> as doc said, [Transport Spec  
> |https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/docs/modules/http-endpoint/#transport-1]call
>  will be failed after timeout.
> but when expcetion raised, runtime restart,  I'm confused why a function 
> internal error will cause such a big problem, will MAX RETRIES be a 
> configurable param?
>  
> 2022-02-28 17:58:32
> org.apache.flink.statefun.flink.core.functions.StatefulFunctionInvocationException:
>  An error occurred when attempting to invoke function FunctionType(tendoc, 
> AlertNotificationIngressCkafka).
> at 
> org.apache.flink.statefun.flink.core.functions.StatefulFunction.receive(StatefulFunction.java:50)
> at 
> org.apache.flink.statefun.flink.core.functions.ReusableContext.apply(ReusableContext.java:73)
> at 
> org.apache.flink.statefun.flink.core.functions.FunctionActivation.applyNextPendingEnvelope(FunctionActivation.java:50)
> at 
> org.apache.flink.statefun.flink.core.functions.LocalFunctionGroup.processNextEnvelope(LocalFunctionGroup.java:61)
> at 
> org.apache.flink.statefun.flink.core.functions.Reductions.processEnvelopes(Reductions.java:164)
> at 
> org.apache.flink.statefun.flink.core.functions.AsyncSink.drainOnOperatorThread(AsyncSink.java:119)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
> at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxExecutorImpl.yield(MailboxExecutorImpl.java:86)
> at 
> org.apache.flink.statefun.flink.core.functions.FunctionGroupOperator.processElement(FunctionGroupOperator.java:88)
> at 
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:108)
> at 
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:89)
> at 
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39)
> at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50)
> at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28)
> at 
> org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.sendDownstream(FeedbackUnionOperator.java:186)
> at 
> org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.processElement(FeedbackUnionOperator.java:86)
> at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:205)
> at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
> at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
> at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
> at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.IllegalStateException: Failure forwarding a message to a 
> remote function Address(tendoc, AlertNotificationIngressCkafka, cls-message)
> at 
> org.apache.flink.statefun.flink.core.reqreply.RequestReplyFunction.onAsyncResult(RequestReplyFunction.java:170)
> at 
> org.apache.flink.statefun.flink.core.reqreply.RequestReplyFunction.invoke(RequestReplyFunction.java:1

[jira] [Commented] (FLINK-26570) Remote module configuration interpolation

2022-03-10 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504762#comment-17504762
 ] 

Till Rohrmann commented on FLINK-26570:
---

Thanks for creating this ticket [~Fil Karnicki]. I like this idea. Could 
environment variables be another mechanism to supply placeholder values?

> Remote module configuration interpolation
> -
>
> Key: FLINK-26570
> URL: https://issues.apache.org/jira/browse/FLINK-26570
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Fil Karnicki
>Priority: Major
>
> Add the ability for users to provide placeholders in module.yaml, e.g.
> {code:java}
> kind: com.foo.bar/test
> spec:
>   something: ${REPLACE_ME}
>   transport:
> password: ${REPLACE_ME_WITH_A_SECRET}
> array:
>   - ${REPLACE_ME}
>   - sthElse {code}
> These placeholders would be resolved in 
> org.apache.flink.statefun.flink.core.jsonmodule.RemoteModule#bindComponent
> using 
> {code:java}
> ParameterTool.fromSystemProperties().mergeWith(ParameterTool.fromMap(globalConfiguration)
>  {code}
> by traversing the ComponentJsonObject.specJsonNode() and replacing values 
> that contain placeholders with values from the combined system+globalConfig 
> map



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-23190) Make task-slot allocation much more evenly

2022-03-08 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502807#comment-17502807
 ] 

Till Rohrmann commented on FLINK-23190:
---

Sorry for not getting back to your PR [~loyi]. Unfortunately, I have left my 
previous company and am no longer working very actively on the Flink project. 
The best thing would be to reach out to the Flink community and look for a new 
shepherd. Sorry for the inconveniences I've caused here.

> Make task-slot allocation much more evenly
> --
>
> Key: FLINK-23190
> URL: https://issues.apache.org/jira/browse/FLINK-23190
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0, 1.12.3, 1.13.1
>Reporter: loyi
>Assignee: loyi
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Attachments: image-2021-07-16-10-34-30-700.png
>
>
> FLINK-12122 only guarantees spreading out tasks across the set of TMs which 
> are registered at the time of scheduling, but our jobs are all runing on 
> active yarn mode, the job with smaller source parallelism offen cause 
> load-balance issues. 
>  
> For this job:
> {code:java}
> //  -ys 4 means 10 taskmanagers
> env.addSource(...).name("A").setParallelism(10).
>  map(...).name("B").setParallelism(30)
>  .map(...).name("C").setParallelism(40)
>  .addSink(...).name("D").setParallelism(20);
> {code}
>  
>  Flink-1.12.3 task allocation: 
> ||operator||tm1 ||tm2||tm3||tm4||tm5||5m6||tm7||tm8||tm9||tm10||
> |A| 
> 1|{color:#de350b}2{color}|{color:#de350b}2{color}|1|1|{color:#de350b}3{color}|{color:#de350b}0{color}|{color:#de350b}0{color}|{color:#de350b}0{color}|{color:#de350b}0{color}|
> |B|3|3|3|3|3|3|3|3|{color:#de350b}2{color}|{color:#de350b}4{color}|
> |C|4|4|4|4|4|4|4|4|4|4|
> |D|2|2|2|2|2|{color:#de350b}1{color}|{color:#de350b}1{color}|2|2|{color:#de350b}4{color}|
>  
> Suggestions:
> When TaskManger start register slots to slotManager , current processing 
> logic will choose  the first pendingSlot which meet its resource 
> requirements.  The "random" strategy usually causes uneven task allocation 
> when source-operator's parallelism is significantly below process-operator's. 
>   A simple feasible idea  is  {color:#de350b}partition{color} the current  
> "{color:#de350b}pendingSlots{color}" by their "JobVertexIds" (such as  let 
> AllocationID bring the detail)  , then allocate the slots proportionally to 
> each JobVertexGroup.
>  
> For above case, the 40 pendingSlots could be divided into 4 groups:
> [ABCD]: 10        // A、B、C、D reprents  {color:#de350b}jobVertexId{color}
> [BCD]: 10
> [CD]: 10
> [D]: 10
>  
> Every taskmanager will provide 4 slots one time, and each group will get 1 
> slot according their proportion (1/4), the final allocation result is below:
> [ABCD] : deploye on 10 different taskmangers
> [BCD]: deploye on 10 different taskmangers
> [CD]: deploye on 10  different taskmangers
> [D]: deploye on 10 different taskmangers
>  
> I have implement a [concept 
> code|https://github.com/saddays/flink/commit/dc82e60a7c7599fbcb58c14f8e3445bc8d07ace1]
>   based on Flink-1.12.3 ,  the patch version has {color:#de350b}fully 
> evenly{color} task allocation , and works well on my workload .  Are there 
> other point that have not been considered or  does it conflict with future 
> plans?      Sorry for my poor english.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26424) RequestTimeoutException

2022-03-08 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502804#comment-17502804
 ] 

Till Rohrmann commented on FLINK-26424:
---

I think we will soon solve this problem. There is no PR for it yet, though.

> RequestTimeoutException 
> 
>
> Key: FLINK-26424
> URL: https://issues.apache.org/jira/browse/FLINK-26424
> Project: Flink
>  Issue Type: Bug
>  Components: Stateful Functions
>Affects Versions: statefun-3.2.0
>Reporter: xinchenyuan
>Priority: Major
>
> there is no max retries, all I got is the call timeout
> as doc said, [Transport Spec  
> |https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/docs/modules/http-endpoint/#transport-1]call
>  will be failed after timeout.
> but when expcetion raised, runtime restart,  I'm confused why a function 
> internal error will cause such a big problem, will MAX RETRIES be a 
> configurable param?
>  
> 2022-02-28 17:58:32
> org.apache.flink.statefun.flink.core.functions.StatefulFunctionInvocationException:
>  An error occurred when attempting to invoke function FunctionType(tendoc, 
> AlertNotificationIngressCkafka).
> at 
> org.apache.flink.statefun.flink.core.functions.StatefulFunction.receive(StatefulFunction.java:50)
> at 
> org.apache.flink.statefun.flink.core.functions.ReusableContext.apply(ReusableContext.java:73)
> at 
> org.apache.flink.statefun.flink.core.functions.FunctionActivation.applyNextPendingEnvelope(FunctionActivation.java:50)
> at 
> org.apache.flink.statefun.flink.core.functions.LocalFunctionGroup.processNextEnvelope(LocalFunctionGroup.java:61)
> at 
> org.apache.flink.statefun.flink.core.functions.Reductions.processEnvelopes(Reductions.java:164)
> at 
> org.apache.flink.statefun.flink.core.functions.AsyncSink.drainOnOperatorThread(AsyncSink.java:119)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
> at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxExecutorImpl.yield(MailboxExecutorImpl.java:86)
> at 
> org.apache.flink.statefun.flink.core.functions.FunctionGroupOperator.processElement(FunctionGroupOperator.java:88)
> at 
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:108)
> at 
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:89)
> at 
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39)
> at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50)
> at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28)
> at 
> org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.sendDownstream(FeedbackUnionOperator.java:186)
> at 
> org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.processElement(FeedbackUnionOperator.java:86)
> at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:205)
> at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
> at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
> at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
> at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.IllegalStateException: Failure forwarding a message to a 
> remote function Address(tendoc, AlertNotificationIngressCkafka, cls-message)
> at 
> org.apache.flink.statefun.flink.core.reqreply.RequestReplyFunction.onAsyncResult(RequestReplyFunction.java:170)
> at 
> org.apache.flink.statefun.flink.core.reqreply.RequestReplyFunction.invoke(RequestReplyFunction.java:124)
> at 
> org.apach

[jira] [Commented] (FLINK-26424) RequestTimeoutException

2022-03-06 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502111#comment-17502111
 ] 

Till Rohrmann commented on FLINK-26424:
---

I think the idea would be that you can define an egress to which the dead 
letter box outputs the dead letters. That way the user can monitor this egress 
for observing the failed messages.

> RequestTimeoutException 
> 
>
> Key: FLINK-26424
> URL: https://issues.apache.org/jira/browse/FLINK-26424
> Project: Flink
>  Issue Type: Bug
>  Components: Stateful Functions
>Affects Versions: statefun-3.2.0
>Reporter: xinchenyuan
>Priority: Major
>
> there is no max retries, all I got is the call timeout
> as doc said, [Transport Spec  
> |https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/docs/modules/http-endpoint/#transport-1]call
>  will be failed after timeout.
> but when expcetion raised, runtime restart,  I'm confused why a function 
> internal error will cause such a big problem, will MAX RETRIES be a 
> configurable param?
>  
> 2022-02-28 17:58:32
> org.apache.flink.statefun.flink.core.functions.StatefulFunctionInvocationException:
>  An error occurred when attempting to invoke function FunctionType(tendoc, 
> AlertNotificationIngressCkafka).
> at 
> org.apache.flink.statefun.flink.core.functions.StatefulFunction.receive(StatefulFunction.java:50)
> at 
> org.apache.flink.statefun.flink.core.functions.ReusableContext.apply(ReusableContext.java:73)
> at 
> org.apache.flink.statefun.flink.core.functions.FunctionActivation.applyNextPendingEnvelope(FunctionActivation.java:50)
> at 
> org.apache.flink.statefun.flink.core.functions.LocalFunctionGroup.processNextEnvelope(LocalFunctionGroup.java:61)
> at 
> org.apache.flink.statefun.flink.core.functions.Reductions.processEnvelopes(Reductions.java:164)
> at 
> org.apache.flink.statefun.flink.core.functions.AsyncSink.drainOnOperatorThread(AsyncSink.java:119)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
> at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxExecutorImpl.yield(MailboxExecutorImpl.java:86)
> at 
> org.apache.flink.statefun.flink.core.functions.FunctionGroupOperator.processElement(FunctionGroupOperator.java:88)
> at 
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:108)
> at 
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:89)
> at 
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39)
> at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50)
> at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28)
> at 
> org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.sendDownstream(FeedbackUnionOperator.java:186)
> at 
> org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.processElement(FeedbackUnionOperator.java:86)
> at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:205)
> at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
> at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
> at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
> at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.IllegalStateException: Failure forwarding a message to a 
> remote function Address(tendoc, AlertNotificationIngressCkafka, cls-message)
> at 
> org.apache.flink.statefun.flink.core.reqreply.RequestReplyFunction.onAsyncResult(RequestReplyFunction.java:170)
> at 
> org.apache.f

[jira] [Commented] (FLINK-26424) RequestTimeoutException

2022-03-03 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500597#comment-17500597
 ] 

Till Rohrmann commented on FLINK-26424:
---

I agree that a single failed remote function call should not affect the other 
function calls by restarting the whole topology.

Could something like a dead letter box help in your situation? The system could 
send messages that can't be invoked or that failed for some reason to a special 
endpoint that either outputs the messages to somewhere or drops them.

> RequestTimeoutException 
> 
>
> Key: FLINK-26424
> URL: https://issues.apache.org/jira/browse/FLINK-26424
> Project: Flink
>  Issue Type: Bug
>  Components: Stateful Functions
>Affects Versions: statefun-3.2.0
>Reporter: xinchenyuan
>Priority: Major
>
> there is no max retries, all I got is the call timeout
> as doc said, [Transport Spec  
> |https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/docs/modules/http-endpoint/#transport-1]call
>  will be failed after timeout.
> but when expcetion raised, runtime restart,  I'm confused why a function 
> internal error will cause such a big problem, will MAX RETRIES be a 
> configurable param?
>  
> 2022-02-28 17:58:32
> org.apache.flink.statefun.flink.core.functions.StatefulFunctionInvocationException:
>  An error occurred when attempting to invoke function FunctionType(tendoc, 
> AlertNotificationIngressCkafka).
> at 
> org.apache.flink.statefun.flink.core.functions.StatefulFunction.receive(StatefulFunction.java:50)
> at 
> org.apache.flink.statefun.flink.core.functions.ReusableContext.apply(ReusableContext.java:73)
> at 
> org.apache.flink.statefun.flink.core.functions.FunctionActivation.applyNextPendingEnvelope(FunctionActivation.java:50)
> at 
> org.apache.flink.statefun.flink.core.functions.LocalFunctionGroup.processNextEnvelope(LocalFunctionGroup.java:61)
> at 
> org.apache.flink.statefun.flink.core.functions.Reductions.processEnvelopes(Reductions.java:164)
> at 
> org.apache.flink.statefun.flink.core.functions.AsyncSink.drainOnOperatorThread(AsyncSink.java:119)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
> at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxExecutorImpl.yield(MailboxExecutorImpl.java:86)
> at 
> org.apache.flink.statefun.flink.core.functions.FunctionGroupOperator.processElement(FunctionGroupOperator.java:88)
> at 
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:108)
> at 
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:89)
> at 
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39)
> at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50)
> at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28)
> at 
> org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.sendDownstream(FeedbackUnionOperator.java:186)
> at 
> org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.processElement(FeedbackUnionOperator.java:86)
> at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:205)
> at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
> at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
> at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
> at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.IllegalStateException: Failure forwarding a message to a 
> remote function Address(tendoc, AlertNotificationIngre

[jira] [Commented] (FLINK-26424) RequestTimeoutException

2022-03-02 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499964#comment-17499964
 ] 

Till Rohrmann commented on FLINK-26424:
---

Hi [~ychensha], I assume you are using StateFun's asynchronous 
{{RequestReplyClient}}, right?

The request will fail with the {{RequestTimeoutException}} if it could not be 
completed within the call timeout. During this time, the client will retry the 
request with an exponential backoff strategy. Hence, I would recommend trying 
to increase {{spec.transports.timeout.call}} to something higher.

> RequestTimeoutException 
> 
>
> Key: FLINK-26424
> URL: https://issues.apache.org/jira/browse/FLINK-26424
> Project: Flink
>  Issue Type: Bug
>  Components: Stateful Functions
>Affects Versions: statefun-3.2.0
>Reporter: xinchenyuan
>Priority: Major
>
> there is no max retries, all I got is the call timeout
> as doc said, [Transport Spec  
> |https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/docs/modules/http-endpoint/#transport-1]call
>  will be failed after timeout.
> but when expcetion raised, runtime restart,  I'm confused why a function 
> internal error will cause such a big problem, will MAX RETRIES be a 
> configurable param?
>  
> 2022-02-28 17:58:32
> org.apache.flink.statefun.flink.core.functions.StatefulFunctionInvocationException:
>  An error occurred when attempting to invoke function FunctionType(tendoc, 
> AlertNotificationIngressCkafka).
> at 
> org.apache.flink.statefun.flink.core.functions.StatefulFunction.receive(StatefulFunction.java:50)
> at 
> org.apache.flink.statefun.flink.core.functions.ReusableContext.apply(ReusableContext.java:73)
> at 
> org.apache.flink.statefun.flink.core.functions.FunctionActivation.applyNextPendingEnvelope(FunctionActivation.java:50)
> at 
> org.apache.flink.statefun.flink.core.functions.LocalFunctionGroup.processNextEnvelope(LocalFunctionGroup.java:61)
> at 
> org.apache.flink.statefun.flink.core.functions.Reductions.processEnvelopes(Reductions.java:164)
> at 
> org.apache.flink.statefun.flink.core.functions.AsyncSink.drainOnOperatorThread(AsyncSink.java:119)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
> at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxExecutorImpl.yield(MailboxExecutorImpl.java:86)
> at 
> org.apache.flink.statefun.flink.core.functions.FunctionGroupOperator.processElement(FunctionGroupOperator.java:88)
> at 
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:108)
> at 
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:89)
> at 
> org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39)
> at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50)
> at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28)
> at 
> org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.sendDownstream(FeedbackUnionOperator.java:186)
> at 
> org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.processElement(FeedbackUnionOperator.java:86)
> at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:205)
> at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
> at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
> at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
> at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.IllegalStateException: Failure forwarding a message to a 
> remote function

[jira] [Commented] (FLINK-26086) fixed some causing ambiguities including the 'shit' comment

2022-02-24 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497256#comment-17497256
 ] 

Till Rohrmann commented on FLINK-26086:
---

Thanks for fixing this issue [~nyingping].

> fixed some causing ambiguities including the 'shit' comment
> ---
>
> Key: FLINK-26086
> URL: https://issues.apache.org/jira/browse/FLINK-26086
> Project: Flink
>  Issue Type: Improvement
>Reporter: nyingping
>Assignee: nyingping
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2022-02-11-16-38-12-674.png
>
>
> such as 
> [`@param shiftTimeZone the shit timezone of the 
> window`|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/window/buffers/WindowBuffer.java#L96]
>  !image-2022-02-11-16-38-12-674.png! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (FLINK-26086) fixed some causing ambiguities including the 'shit' comment

2022-02-24 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-26086:
-

Assignee: nyingping

> fixed some causing ambiguities including the 'shit' comment
> ---
>
> Key: FLINK-26086
> URL: https://issues.apache.org/jira/browse/FLINK-26086
> Project: Flink
>  Issue Type: Improvement
>Reporter: nyingping
>Assignee: nyingping
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2022-02-11-16-38-12-674.png
>
>
> such as 
> [`@param shiftTimeZone the shit timezone of the 
> window`|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/aggregate/window/buffers/WindowBuffer.java#L96]
>  !image-2022-02-11-16-38-12-674.png! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (FLINK-26340) Add ability in Golang SDK to create new statefun.Context from existing one, but with a new underlying context.Context

2022-02-24 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-26340:
-

Assignee: Galen Warren

> Add ability in Golang SDK to create new statefun.Context from existing one, 
> but with a new underlying context.Context
> -
>
> Key: FLINK-26340
> URL: https://issues.apache.org/jira/browse/FLINK-26340
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Affects Versions: statefun-3.3.0
>Reporter: Galen Warren
>Assignee: Galen Warren
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In the Golang SDK, statefun.Context embeds the context.Context interface and 
> is implemented by the statefunContext struct, which embeds a context.Context. 
> To support common patterns in Golang related to adding values to context, it 
> would be useful to be able to create a derived statefun.Context that is 
> equivalent to the original in terms of statefun functionality but which wraps 
> a different context.Context.
> The proposal is to add a:
> WithContext(ctx context.Context) statefun.Context
> ... method to the statefun.Context interface and implement it on 
> statefunContext. This method would return the derived statefun context.
> This is a breaking change to statefun.Context, but, given its purpose, we do 
> not expect there to be implementations of this interface outside the Golang 
> SDK. 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-18356) Exit code 137 returned from process

2022-02-22 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495941#comment-17495941
 ] 

Till Rohrmann commented on FLINK-18356:
---

What is the state of this ticket [~chesnay]?

> Exit code 137 returned from process
> ---
>
> Key: FLINK-18356
> URL: https://issues.apache.org/jira/browse/FLINK-18356
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines, Tests
>Affects Versions: 1.12.0, 1.13.0, 1.14.0, 1.15.0
>Reporter: Piotr Nowojski
>Assignee: Chesnay Schepler
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.15.0
>
> Attachments: 1234.jpg, app-profiling_4.gif
>
>
> {noformat}
> = test session starts 
> ==
> platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
> cachedir: .tox/py37-cython/.pytest_cache
> rootdir: /__w/3/s/flink-python
> collected 568 items
> pyflink/common/tests/test_configuration.py ..[  
> 1%]
> pyflink/common/tests/test_execution_config.py ...[  
> 5%]
> pyflink/dataset/tests/test_execution_environment.py .
> ##[error]Exit code 137 returned from process: file name '/bin/docker', 
> arguments 'exec -i -u 1002 
> 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb 
> /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - python
> {noformat}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=8d78fe4f-d658-5c70-12f8-4921589024c3



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (FLINK-26274) Test local recovery works across TaskManager process restarts

2022-02-22 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-26274:
-

Assignee: Konstantin Knauf

> Test local recovery works across TaskManager process restarts
> -
>
> Key: FLINK-26274
> URL: https://issues.apache.org/jira/browse/FLINK-26274
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Affects Versions: 1.15.0
>Reporter: Till Rohrmann
>Assignee: Konstantin Knauf
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.15.0
>
>
> This ticket is a testing task for 
> [FLIP-201|https://cwiki.apache.org/confluence/x/wJuqCw].
> When enabling local recovery and configuring a working directory that can be 
> re-read after a process failure, Flink should now be able to recover locally. 
> We should test whether this is the case. Please take a look at the 
> documentation [1, 2] to see how to configure Flink to make use of it.
> [1] 
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/working_directory/
> [2] 
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/#enabling-local-recovery-across-pod-restarts



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (FLINK-26191) Incorrect license in Elasticsearch connectors

2022-02-22 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-26191:
-

Assignee: Fabian Paul

> Incorrect license in Elasticsearch connectors
> -
>
> Key: FLINK-26191
> URL: https://issues.apache.org/jira/browse/FLINK-26191
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / ElasticSearch
>Reporter: Chesnay Schepler
>Assignee: Fabian Paul
>Priority: Blocker
> Fix For: 1.15.0
>
>
> The sql-connector-elasticsearc0h 6/7 connector NOTICE lists the elasticsearch 
> dependencies as ASLv2, but they are nowadays (at least in part) licensed 
> differently (dual-licensed under elastic license 2.0 & Server Side Public 
> License (SSPL)).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25977) Close sink client and sink http client for KDS/KDF Sinks

2022-02-22 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495939#comment-17495939
 ] 

Till Rohrmann commented on FLINK-25977:
---

PR is reviewed. Waiting for CI to pass.

> Close sink client and sink http client for KDS/KDF Sinks
> 
>
> Key: FLINK-25977
> URL: https://issues.apache.org/jira/browse/FLINK-25977
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Reporter: Zichen Liu
>Assignee: Ahmed Hamdy
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> We are not closing the AWS SDK and HTTP clients for the new KDS/KDF async 
> sink. The close method needs to be invoked when operator stops.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25866) Support additional TLS configuration.

2022-02-21 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495578#comment-17495578
 ] 

Till Rohrmann commented on FLINK-25866:
---

Hi [~Fil Karnicki], I've assigned you to this ticket. The plan looks good to 
me. Happy coding :-)

> Support additional TLS configuration.
> -
>
> Key: FLINK-25866
> URL: https://issues.apache.org/jira/browse/FLINK-25866
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Fil Karnicki
>Priority: Major
>
> Currently the default HTTP client used to invoke remote functions does not 
> support customising the TLS settings as part of the endpoint spec definition. 
> This includes
> using self-signed certificates, and providing client side certificates for 
> authentication (which is a slightly different requirement).
> This issue is about including additional TLS settings to the default endpoint 
> resource definition, and supporting them in statefun-core.
> User mailing list threads:
>  * [client cert auth in remote 
> function|https://lists.apache.org/thread/97nw245kxqp32qglwfynhhgyhgp2pxvg]
>  * [endpoint self-signed certificate 
> problem|https://lists.apache.org/thread/y2m2bpwg4n71rxfont6pgky2t8m19n7w]
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (FLINK-25866) Support additional TLS configuration.

2022-02-21 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-25866:
-

Assignee: Fil Karnicki

> Support additional TLS configuration.
> -
>
> Key: FLINK-25866
> URL: https://issues.apache.org/jira/browse/FLINK-25866
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Fil Karnicki
>Priority: Major
>
> Currently the default HTTP client used to invoke remote functions does not 
> support customising the TLS settings as part of the endpoint spec definition. 
> This includes
> using self-signed certificates, and providing client side certificates for 
> authentication (which is a slightly different requirement).
> This issue is about including additional TLS settings to the default endpoint 
> resource definition, and supporting them in statefun-core.
> User mailing list threads:
>  * [client cert auth in remote 
> function|https://lists.apache.org/thread/97nw245kxqp32qglwfynhhgyhgp2pxvg]
>  * [endpoint self-signed certificate 
> problem|https://lists.apache.org/thread/y2m2bpwg4n71rxfont6pgky2t8m19n7w]
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26274) Test local recovery works across TaskManager process restarts

2022-02-21 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-26274:
--
Fix Version/s: 1.15.0

> Test local recovery works across TaskManager process restarts
> -
>
> Key: FLINK-26274
> URL: https://issues.apache.org/jira/browse/FLINK-26274
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Reporter: Till Rohrmann
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.15.0
>
>
> This ticket is a testing task for 
> [FLIP-201|https://cwiki.apache.org/confluence/x/wJuqCw].
> When enabling local recovery and configuring a working directory that can be 
> re-read after a process failure, Flink should now be able to recover locally. 
> We should test whether this is the case. Please take a look at the 
> documentation [1, 2] to see how to configure Flink to make use of it.
> [1] 
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/working_directory/
> [2] 
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/#enabling-local-recovery-across-pod-restarts



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26274) Test local recovery works across TaskManager process restarts

2022-02-21 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-26274:
-

 Summary: Test local recovery works across TaskManager process 
restarts
 Key: FLINK-26274
 URL: https://issues.apache.org/jira/browse/FLINK-26274
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Reporter: Till Rohrmann


This ticket is a testing task for 
[FLIP-201|https://cwiki.apache.org/confluence/x/wJuqCw].

When enabling local recovery and configuring a working directory that can be 
re-read after a process failure, Flink should now be able to recover locally. 
We should test whether this is the case. Please take a look at the 
documentation [1, 2] to see how to configure Flink to make use of it.

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/working_directory/
[2] 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/#enabling-local-recovery-across-pod-restarts



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26274) Test local recovery works across TaskManager process restarts

2022-02-21 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-26274:
--
Affects Version/s: 1.15.0

> Test local recovery works across TaskManager process restarts
> -
>
> Key: FLINK-26274
> URL: https://issues.apache.org/jira/browse/FLINK-26274
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Affects Versions: 1.15.0
>Reporter: Till Rohrmann
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.15.0
>
>
> This ticket is a testing task for 
> [FLIP-201|https://cwiki.apache.org/confluence/x/wJuqCw].
> When enabling local recovery and configuring a working directory that can be 
> re-read after a process failure, Flink should now be able to recover locally. 
> We should test whether this is the case. Please take a look at the 
> documentation [1, 2] to see how to configure Flink to make use of it.
> [1] 
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/working_directory/
> [2] 
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/#enabling-local-recovery-across-pod-restarts



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-23870) Test the ability for ignoring in-flight data on recovery

2022-02-21 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-23870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-23870:
--
Component/s: Runtime / Checkpointing

> Test the ability for ignoring in-flight data on recovery
> 
>
> Key: FLINK-23870
> URL: https://issues.apache.org/jira/browse/FLINK-23870
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Checkpointing
>Reporter: Anton Kalashnikov
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.15.0
>
>
> Test for  https://issues.apache.org/jira/browse/FLINK-22684:
>  # configure the external checkpoint address (storage)
>  # enable unaligned checkpoint always (not the default aligned or alternative)
>  # wait for the checkpoint then stop the job
>  # restore from the last checkpoint - it should be restored on the last 
> state(optional step)
>  # set the 
> `execution.checkpointing.recover-without-channel-state.checkpoint-id` to the 
> last checkpoint
>  # restore from the last checkpoint - in-flight data would be ignored so it 
> would be restored a little less data than expected in the usual case.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26271) TaskTest.testCancelTaskExceptionAfterTaskMarkedFailed

2022-02-20 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-26271:
-

 Summary: TaskTest.testCancelTaskExceptionAfterTaskMarkedFailed
 Key: FLINK-26271
 URL: https://issues.apache.org/jira/browse/FLINK-26271
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Task
Affects Versions: 1.13.6
Reporter: Till Rohrmann


The test {{TaskTest.testCancelTaskExceptionAfterTaskMarkedFailed}} failed on 
AZP with

{code}
Feb 21 03:27:40 [ERROR] 
testCancelTaskExceptionAfterTaskMarkedFailed(org.apache.flink.runtime.taskmanager.TaskTest)
  Time elapsed: 0.043 s  <<< FAILURE!
Feb 21 03:27:40 java.lang.AssertionError: expected: but was:
Feb 21 03:27:40 at org.junit.Assert.fail(Assert.java:88)
Feb 21 03:27:40 at org.junit.Assert.failNotEquals(Assert.java:834)
Feb 21 03:27:40 at org.junit.Assert.assertEquals(Assert.java:118)
Feb 21 03:27:40 at org.junit.Assert.assertEquals(Assert.java:144)
Feb 21 03:27:40 at 
org.apache.flink.runtime.taskmanager.TaskTest.testCancelTaskExceptionAfterTaskMarkedFailed(TaskTest.java:598)
Feb 21 03:27:40 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
Feb 21 03:27:40 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Feb 21 03:27:40 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Feb 21 03:27:40 at java.lang.reflect.Method.invoke(Method.java:498)
Feb 21 03:27:40 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
Feb 21 03:27:40 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
Feb 21 03:27:40 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
Feb 21 03:27:40 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
Feb 21 03:27:40 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
Feb 21 03:27:40 at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
Feb 21 03:27:40 at 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
Feb 21 03:27:40 at 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
Feb 21 03:27:40 at org.junit.rules.RunRules.evaluate(RunRules.java:20)
Feb 21 03:27:40 at 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
Feb 21 03:27:40 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
Feb 21 03:27:40 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
Feb 21 03:27:40 at 
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
Feb 21 03:27:40 at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
Feb 21 03:27:40 at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
Feb 21 03:27:40 at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
Feb 21 03:27:40 at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
Feb 21 03:27:40 at 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
Feb 21 03:27:40 at org.junit.rules.RunRules.evaluate(RunRules.java:20)
Feb 21 03:27:40 at 
org.junit.runners.ParentRunner.run(ParentRunner.java:363)
Feb 21 03:27:40 at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
Feb 21 03:27:40 at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
Feb 21 03:27:40 at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
Feb 21 03:27:40 at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
Feb 21 03:27:40 at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
Feb 21 03:27:40 at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
Feb 21 03:27:40 at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
Feb 21 03:27:40 at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31903&view=logs&j=4d4a0d10-fca2-5507-8eed-c07f0bdf4887&t=c2734c79-73b6-521c-e85a-67c7ecae9107&l=5762



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25851) CassandraConnectorITCase.testRetrialAndDropTables shows table already exists errors on AZP

2022-02-20 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495356#comment-17495356
 ] 

Till Rohrmann commented on FLINK-25851:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31904&view=logs&j=ba53eb01-1462-56a3-8e98-0dd97fbcaab5&t=2e426bf0-b717-56bb-ab62-d63086457354&l=12880

> CassandraConnectorITCase.testRetrialAndDropTables shows table already exists 
> errors on AZP
> --
>
> Key: FLINK-25851
> URL: https://issues.apache.org/jira/browse/FLINK-25851
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Cassandra
>Affects Versions: 1.15.0
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> It happens even if the whole keyspace is dropped in a BeforeClass method and 
> the table noticed in the stacktrace is dropped in an After method and this 
> after method is executed even in case of retrials through the Rule.
> Jan 24 20:21:33 com.datastax.driver.core.exceptions.AlreadyExistsException: 
> Table flink.batches already exists 
> Jan 24 20:21:33 at 
> com.datastax.driver.core.exceptions.AlreadyExistsException.copy(AlreadyExistsException.java:111)
>  
> Jan 24 20:21:33 at 
> com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
>  
> Jan 24 20:21:33 at 
> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
>  
> Jan 24 20:21:33 at 
> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:63) 
> Jan 24 20:21:33 at 
> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:39) 
> Jan 24 20:21:33 at 
> org.apache.flink.streaming.connectors.cassandra.CassandraConnectorITCase.testRetrialAndDropTables(CassandraConnectorITCase.java:554)
>  
> Jan 24 20:21:33 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) 
> Jan 24 20:21:33 at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> Jan 24 20:21:33 at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> Jan 24 20:21:33 at java.lang.reflect.Method.invoke(Method.java:498) 
> Jan 24 20:21:33 at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  
> Jan 24 20:21:33 at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  
> Jan 24 20:21:33 at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  
> Jan 24 20:21:33 at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  
> Jan 24 20:21:33 at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> Jan 24 20:21:33 at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> Jan 24 20:21:33 at 
> org.apache.flink.testutils.junit.RetryRule$RetryOnExceptionStatement.evaluate(RetryRule.java:192)
>  
> Jan 24 20:21:33 at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) 
> Jan 24 20:21:33 at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) 
> Jan 24 20:21:33 at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 
> Jan 24 20:21:33 at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  
> Jan 24 20:21:33 at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) 
> Jan 24 20:21:33 at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  
> Jan 24 20:21:33 at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  
> Jan 24 20:21:33 at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) 
> Jan 24 20:21:33 at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) 
> Jan 24 20:21:33 at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) 
> Jan 24 20:21:33 at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) 
> Jan 24 20:21:33 at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) 
> Jan 24 20:21:33 at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
>  
> cf: 
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30050&view=logs&j=d44f43ce-542c-597d-bf94-b0718c71e5e8&s=ae4f8708-9994-57d3-c2d7-b892156e7812&t=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d&l=11999]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (FLINK-25346) FLIP-197: API stability graduation process

2022-02-17 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-25346:
-

Assignee: (was: Till Rohrmann)

> FLIP-197: API stability graduation process
> --
>
> Key: FLINK-25346
> URL: https://issues.apache.org/jira/browse/FLINK-25346
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Affects Versions: 1.15.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.15.0
>
>
> This ticket is an umbrella ticket for the work on 
> [FLIP-197|https://cwiki.apache.org/confluence/x/J5eqCw]. It will involve the 
> introduction of an API graduation process that is guarded by automated tests.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-26014) Document how to use the working directory for faster local recoveries

2022-02-17 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-26014.
-
Resolution: Fixed

Fixed via

d893292893837037c15c07e16c9fe2b39ec44319
965ca2d7ccc938ac5a6948ccc910fd2cfc0135b4

> Document how to use the working directory for faster local recoveries
> -
>
> Key: FLINK-26014
> URL: https://issues.apache.org/jira/browse/FLINK-26014
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.15.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> After having implemented FLIP-198 and FLIP-201, users can now use faster 
> TaskManager failover when using local recovery with persisted volumes. I 
> suggest to add documentation for explaining how to configure Flink to make 
> use of it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (FLINK-23403) Decrease default values for heartbeat timeout and interval

2022-02-17 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-23403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-23403:
-

Assignee: (was: Till Rohrmann)

> Decrease default values for heartbeat timeout and interval
> --
>
> Key: FLINK-23403
> URL: https://issues.apache.org/jira/browse/FLINK-23403
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Configuration, Runtime / Coordination
>Affects Versions: 1.14.0
>Reporter: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.15.0
>
>
> In order to speed up failure detection I suggest to decrease the default 
> values for the heartbeat timeout and interval from 50s/10s to 15s/3s.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (FLINK-25345) FLIP-196: Source API stability guarantees

2022-02-17 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-25345:
-

Assignee: (was: Till Rohrmann)

> FLIP-196: Source API stability guarantees
> -
>
> Key: FLINK-25345
> URL: https://issues.apache.org/jira/browse/FLINK-25345
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream, Documentation
>Affects Versions: 1.15.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.15.0
>
>
> This ticket is an umbrella ticket for the work on 
> [FLIP-196|https://cwiki.apache.org/confluence/x/IJeqCw] which will properly 
> document and guard the agreed upon source API stability guarantees.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26197) Allow playground egress to keep connection open and push messages

2022-02-16 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-26197:
--
Labels: playground  (was: )

> Allow playground egress to keep connection open and push messages
> -
>
> Key: FLINK-26197
> URL: https://issues.apache.org/jira/browse/FLINK-26197
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Affects Versions: statefun-3.2.0, statefun-3.3.0
>Reporter: Till Rohrmann
>Priority: Major
>  Labels: playground
>
> In order to improve the getting started experience it would be nice if the 
> playground egress can keep connections to clients open and push new messages 
> eagerly. Moreover, we could keep all messages stored in memory to allow 
> serving old messages.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26197) Allow playground egress to keep connection open and push messages

2022-02-16 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-26197:
-

 Summary: Allow playground egress to keep connection open and push 
messages
 Key: FLINK-26197
 URL: https://issues.apache.org/jira/browse/FLINK-26197
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Affects Versions: statefun-3.2.0, statefun-3.3.0
Reporter: Till Rohrmann


In order to improve the getting started experience it would be nice if the 
playground egress can keep connections to clients open and push new messages 
eagerly. Moreover, we could keep all messages stored in memory to allow serving 
old messages.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-26153) Add script to update README playground links for Statefun

2022-02-16 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-26153.
-
Fix Version/s: statefun-3.3.0
   Resolution: Fixed

Fixed via cd8e0237f63bb84aae4137ae74481e5291944208

> Add script to update README playground links for Statefun
> -
>
> Key: FLINK-26153
> URL: https://issues.apache.org/jira/browse/FLINK-26153
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System / Stateful Functions
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available
> Fix For: statefun-3.3.0
>
>
> In order to keep the playground links in the various README's up to date, it 
> would be helpful to have a script that updates the links.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-26158) Update statefun-playground examples to use playground ingress/egress

2022-02-16 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-26158.
-
Fix Version/s: statefun-3.3.0
   statefun-3.2.1
   Resolution: Fixed

Fixed via

3.3.0:
b436f8e
c8bac78
d32b038
9dc5db1
c0b121e
c22d280

3.2.1:
fdb0e78
7f68b7a
2815d55
56e897f
2c8820d
3d2a1e4

> Update statefun-playground examples to use playground ingress/egress
> 
>
> Key: FLINK-26158
> URL: https://issues.apache.org/jira/browse/FLINK-26158
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available
> Fix For: statefun-3.3.0, statefun-3.2.1
>
>
> With FLINK-26153, the statefun-playground has a new ingress/egress that 
> allows to interact with via curl. I propose to update the existing examples 
> to make use of it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-26155) Add playground ingress/egress for curl interaction

2022-02-16 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-26155.
-
Fix Version/s: statefun-3.3.0
   statefun-3.2.1
   Resolution: Fixed

Fixed via

3.3.0: cf7251c8c5f59b4003403a8ec39a55366ef36a1f
3.2.1: 1ba46baa2c645bc70883c5f435fee96a6b576230

> Add playground ingress/egress for curl interaction
> --
>
> Key: FLINK-26155
> URL: https://issues.apache.org/jira/browse/FLINK-26155
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available
> Fix For: statefun-3.3.0, statefun-3.2.1
>
>
> In order to give a better playground experience we could add a special 
> playground ingress/egress that supports easy to use cURL interaction 
> (ingesting new messages and consuming produced messages).
> Ingesting new message could look like:
> {code}
> curl -X PUT -H "Content-Type: application/vnd." -d '' 
> localhost:8090///
> {code}
> Consuming could look like:
> {code}
> curl -X GET localhost:8091/
> {code}
> The ingress/egress are not meant to be production ready and only support the 
> use cases of the playground.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26158) Update statefun-playground examples to use playground ingress/egress

2022-02-15 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-26158:
-

 Summary: Update statefun-playground examples to use playground 
ingress/egress
 Key: FLINK-26158
 URL: https://issues.apache.org/jira/browse/FLINK-26158
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Till Rohrmann
Assignee: Till Rohrmann


With FLINK-26153, the statefun-playground has a new ingress/egress that allows 
to interact with via curl. I propose to update the existing examples to make 
use of it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26155) Add playground ingress/egress for curl interaction

2022-02-15 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-26155:
--
Summary: Add playground ingress/egress for curl interaction  (was: Add 
playground ingress/egress for cURL interaction)

> Add playground ingress/egress for curl interaction
> --
>
> Key: FLINK-26155
> URL: https://issues.apache.org/jira/browse/FLINK-26155
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Major
>
> In order to give a better playground experience we could add a special 
> playground ingress/egress that supports easy to use cURL interaction 
> (ingesting new messages and consuming produced messages).
> Ingesting new message could look like:
> {code}
> curl -X PUT -H "Content-Type: application/vnd." -d '' 
> localhost:8090///
> {code}
> Consuming could look like:
> {code}
> curl -X GET localhost:8091/
> {code}
> The ingress/egress are not meant to be production ready and only support the 
> use cases of the playground.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26155) Add playground ingress/egress for cURL interaction

2022-02-15 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-26155:
-

 Summary: Add playground ingress/egress for cURL interaction
 Key: FLINK-26155
 URL: https://issues.apache.org/jira/browse/FLINK-26155
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Till Rohrmann
Assignee: Till Rohrmann


In order to give a better playground experience we could add a special 
playground ingress/egress that supports easy to use cURL interaction (ingesting 
new messages and consuming produced messages).

Ingesting new message could look like:

{code}
curl -X PUT -H "Content-Type: application/vnd." -d '' 
localhost:8090///
{code}

Consuming could look like:

{code}
curl -X GET localhost:8091/
{code}

The ingress/egress are not meant to be production ready and only support the 
use cases of the playground.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26153) Add script to update README playground links for Statefun

2022-02-15 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-26153:
-

 Summary: Add script to update README playground links for Statefun
 Key: FLINK-26153
 URL: https://issues.apache.org/jira/browse/FLINK-26153
 Project: Flink
  Issue Type: Improvement
  Components: Build System / Stateful Functions
Reporter: Till Rohrmann
Assignee: Till Rohrmann


In order to keep the playground links in the various README's up to date, it 
would be helpful to have a script that updates the links.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-25898) Add README.md to flink-statefun/statefun-sdk-js

2022-02-15 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-25898.
-
Fix Version/s: statefun-3.3.0
   statefun-3.2.1
   Resolution: Fixed

Fixed via 

3.3.0: 3a1680a4b9d849b8b97a6195895d5ba6c9c76a3a
3.2.1: 345f7440a71efaa6d54ddd2ea7c2d4683c8ad35b

> Add README.md to flink-statefun/statefun-sdk-js
> ---
>
> Key: FLINK-25898
> URL: https://issues.apache.org/jira/browse/FLINK-25898
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Affects Versions: statefun-3.2.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available
> Fix For: statefun-3.3.0, statefun-3.2.1
>
>
> We should add a {{README.md}} to {{flink-statefun/statefun-sdk-js}}. This 
> would then also be displayed on 
> [npmjs.com|https://www.npmjs.com/package/apache-flink-statefun].
> Unfortunately, in order to publish the {{README.md}} we have to release a new 
> version of the npm package.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26098) TableAPI does not forward idleness configuration from DataStream

2022-02-12 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-26098:
-

 Summary: TableAPI does not forward idleness configuration from 
DataStream
 Key: FLINK-26098
 URL: https://issues.apache.org/jira/browse/FLINK-26098
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Affects Versions: 1.14.3, 1.15.0
Reporter: Till Rohrmann


The TableAPI does not forward the idleness configuration from a DataStream 
source. That can lead to the halt of processing if all sources are idle because 
{{WatermarkAssignerOperator}} [1] will never set a channel to active again. The 
only way to mitigate the problem is to explicitly configure the idleness for 
table sources via {{table.exec.source.idle-timeout}}. Configuring this value is 
actually not easy because creating a {{StreamExecutionEnvironment}} via 
{{create(StreamExecutionEnvironment, TableConfig)}} is deprecated.

[1] 
https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/wmassigners/WatermarkAssignerOperator.java#L103



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25981) ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers failed

2022-02-11 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490783#comment-17490783
 ] 

Till Rohrmann commented on FLINK-25981:
---

Ok, I think we can ignore the latest occurrence. There has been a gap of 
roughly 30 minutes:

{code}
12:48:27,451 [Curator-ConnectionStateManager-0] DEBUG 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - Connected to ZooKeeper quorum. Leader election can start.
12:48:27,453 [Curator-ConnectionStateManager-0] DEBUG 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - Connected to ZooKeeper quorum. Leader election can start.
12:48:27,457 [main-EventThread] INFO  
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker
 [] - New config event received: {}
12:48:27,484 [main-EventThread] DEBUG 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - ZooKeeperMultipleComponentLeaderElectionDriver obtained the leadership.
12:48:27,511 [main] INFO  
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - Closing ZooKeeperMultipleComponentLeaderElectionDriver.
13:16:54,026 [main-EventThread] INFO  
org.apache.flink.shaded.curator5.org.apache.curator.framework.state.ConnectionStateManager
 [] - State change: SUSPENDED
13:16:54,179 [Curator-ConnectionStateManager-0] WARN  
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - Connection to ZooKeeper suspended, waiting for reconnection.
13:16:54,179 [Curator-ConnectionStateManager-0] WARN  
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
 [] - Connection to ZooKeeper suspended, waiting for reconnection.
13:16:55,073 [main-EventThread] INFO  
org.apache.flink.shaded.curator5.org.apache.curator.framework.state.ConnectionStateManager
 [] - State change: RECONNECTED
{code}

> ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers
>  failed
> ---
>
> Key: FLINK-25981
> URL: https://issues.apache.org/jira/browse/FLINK-25981
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: test-stability
>
> We experienced a [build 
> failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30783&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7&l=15997]
>  in 
> {{ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers}}.
>  The test halted when waiting for the next leader in 
> [ZooKeeperMultipleComponentLeaderElectionDriverTest:256|https://github.com/apache/flink/blob/e8742f7f5cac34852d0e621036e1614bbdfe8ec3/flink-runtime/src/test/java/org/apache/flink/runtime/leaderelection/ZooKeeperMultipleComponentLeaderElectionDriverTest.java#L256]
> {code}
> Feb 04 18:02:54 "main" #1 prio=5 os_prio=0 tid=0x7fab0800b800 nid=0xe07 
> waiting on condition [0x7fab12574000]
> Feb 04 18:02:54java.lang.Thread.State: WAITING (parking)
> Feb 04 18:02:54   at sun.misc.Unsafe.park(Native Method)
> Feb 04 18:02:54   - parking to wait for  <0x8065c5c8> (a 
> java.util.concurrent.CompletableFuture$Signaller)
> Feb 04 18:02:54   at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> Feb 04 18:02:54   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> Feb 04 18:02:54   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> Feb 04 18:02:54   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> Feb 04 18:02:54   at 
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
> Feb 04 18:02:54   at 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers(ZooKeeperMultipleComponentLeaderElectionDriverTest.java:256)
> [...]
> {code}
> The extended Maven logs indicate that the timeout happened while waiting for 
> the second leader to be selected.
> {code}
> Test 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers
>  is running.
> 
> 17:15:10,437 [   Thread-16] INFO  
> org.apache.curator.test.TestingZooKeeperMain [] - Starting 
> server
> 17:15:10,450 [main] INFO  
> org.apache.flink.runtime.util.ZooKeeperUtil

[jira] [Commented] (FLINK-25981) ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers failed

2022-02-11 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490781#comment-17490781
 ] 

Till Rohrmann commented on FLINK-25981:
---

The new stack trace is

{code}
2022-02-10T13:16:59.3317243Z Feb 10 13:16:59 [ERROR] 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers
  Time elapsed: 1,711.883 s  <<< FAILURE!
2022-02-10T13:16:59.3318299Z Feb 10 13:16:59 java.lang.AssertionError: 
2022-02-10T13:16:59.3319424Z Feb 10 13:16:59 
2022-02-10T13:16:59.3319771Z Feb 10 13:16:59 Expected size: 1 but was: 0 in:
2022-02-10T13:16:59.3323259Z Feb 10 13:16:59 []
2022-02-10T13:16:59.3324585Z Feb 10 13:16:59at 
org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers(ZooKeeperMultipleComponentLeaderElectionDriverTest.java:264)
2022-02-10T13:16:59.3325873Z Feb 10 13:16:59at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2022-02-10T13:16:59.3326685Z Feb 10 13:16:59at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2022-02-10T13:16:59.3327320Z Feb 10 13:16:59at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-02-10T13:16:59.3327890Z Feb 10 13:16:59at 
java.lang.reflect.Method.invoke(Method.java:498)
2022-02-10T13:16:59.3328509Z Feb 10 13:16:59at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
2022-02-10T13:16:59.3329147Z Feb 10 13:16:59at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
2022-02-10T13:16:59.3329864Z Feb 10 13:16:59at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
2022-02-10T13:16:59.3336949Z Feb 10 13:16:59at 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
2022-02-10T13:16:59.3337933Z Feb 10 13:16:59at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
2022-02-10T13:16:59.3338887Z Feb 10 13:16:59at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
2022-02-10T13:16:59.3339947Z Feb 10 13:16:59at 
org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
2022-02-10T13:16:59.3341415Z Feb 10 13:16:59at 
org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
2022-02-10T13:16:59.3342564Z Feb 10 13:16:59at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
2022-02-10T13:16:59.3343672Z Feb 10 13:16:59at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
2022-02-10T13:16:59.3344663Z Feb 10 13:16:59at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
2022-02-10T13:16:59.3345610Z Feb 10 13:16:59at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
2022-02-10T13:16:59.3346328Z Feb 10 13:16:59at 
org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
2022-02-10T13:16:59.3346995Z Feb 10 13:16:59at 
org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
2022-02-10T13:16:59.3347721Z Feb 10 13:16:59at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
2022-02-10T13:16:59.3348465Z Feb 10 13:16:59at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
2022-02-10T13:16:59.3349172Z Feb 10 13:16:59at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210)
2022-02-10T13:16:59.3349892Z Feb 10 13:16:59at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135)
2022-02-10T13:16:59.3350599Z Feb 10 13:16:59at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:66)
2022-02-10T13:16:59.3351388Z Feb 10 13:16:59at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
2022-02-10T13:16:59.3352468Z Feb 10 13:16:59at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
2022-02-10T13:16:59.3353204Z Feb 10 13:16:59at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
2022-02-10T13:16:59.3353880Z Feb 10 13:16:59at 
org.junit.

[jira] [Updated] (FLINK-25981) ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers failed

2022-02-11 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-25981:
--
Summary: 
ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers
 failed  (was: ZooKeeperMultipleComponentLeaderElectionDriverTest failed)

> ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers
>  failed
> ---
>
> Key: FLINK-25981
> URL: https://issues.apache.org/jira/browse/FLINK-25981
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: test-stability
>
> We experienced a [build 
> failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30783&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7&l=15997]
>  in 
> {{ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers}}.
>  The test halted when waiting for the next leader in 
> [ZooKeeperMultipleComponentLeaderElectionDriverTest:256|https://github.com/apache/flink/blob/e8742f7f5cac34852d0e621036e1614bbdfe8ec3/flink-runtime/src/test/java/org/apache/flink/runtime/leaderelection/ZooKeeperMultipleComponentLeaderElectionDriverTest.java#L256]
> {code}
> Feb 04 18:02:54 "main" #1 prio=5 os_prio=0 tid=0x7fab0800b800 nid=0xe07 
> waiting on condition [0x7fab12574000]
> Feb 04 18:02:54java.lang.Thread.State: WAITING (parking)
> Feb 04 18:02:54   at sun.misc.Unsafe.park(Native Method)
> Feb 04 18:02:54   - parking to wait for  <0x8065c5c8> (a 
> java.util.concurrent.CompletableFuture$Signaller)
> Feb 04 18:02:54   at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> Feb 04 18:02:54   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> Feb 04 18:02:54   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> Feb 04 18:02:54   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> Feb 04 18:02:54   at 
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
> Feb 04 18:02:54   at 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers(ZooKeeperMultipleComponentLeaderElectionDriverTest.java:256)
> [...]
> {code}
> The extended Maven logs indicate that the timeout happened while waiting for 
> the second leader to be selected.
> {code}
> Test 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers
>  is running.
> 
> 17:15:10,437 [   Thread-16] INFO  
> org.apache.curator.test.TestingZooKeeperMain [] - Starting 
> server
> 17:15:10,450 [main] INFO  
> org.apache.flink.runtime.util.ZooKeeperUtils [] - Enforcing 
> default ACL for ZK connections
> 17:15:10,451 [main] INFO  
> org.apache.flink.runtime.util.ZooKeeperUtils [] - Using 
> '/flink/default' as Zookeeper namespace.
> 17:15:10,452 [main] INFO  
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl
>  [] - Starting
> 17:15:10,455 [main] INFO  
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl
>  [] - Default schema
> 17:15:10,462 [main-EventThread] INFO  
> org.apache.flink.shaded.curator5.org.apache.curator.framework.state.ConnectionStateManager
>  [] - State change: CONNECTED
> 17:15:10,467 [main-EventThread] INFO  
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker
>  [] - New config event received: {}
> 17:15:10,482 [Curator-ConnectionStateManager-0] DEBUG 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
>  [] - Connected to ZooKeeper quorum. Leader election can start.
> 17:15:10,483 [Curator-ConnectionStateManager-0] DEBUG 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
>  [] - Connected to ZooKeeper quorum. Leader election can start.
> 17:15:10,483 [Curator-ConnectionStateManager-0] DEBUG 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
>  [] - Connected to ZooKeeper quorum. Leader election can start.
> 17:15:10,484 [main-EventThread] INFO  
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker
>  [] - New config event received: {}
> 17:15:10,562 [main-Event

[jira] [Closed] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure

2022-02-11 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-26036.
-
Resolution: Fixed

Fixed via 314f91df7019bf1ce1ede52b8ba7f2d0a1c44aac

> LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory 
> timeout on azure
> ---
>
> Key: FLINK-26036
> URL: https://issues.apache.org/jira/browse/FLINK-26036
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0
>Reporter: Yun Gao
>Assignee: Till Rohrmann
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.15.0
>
>
> {code:java}
> 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory
>   Time elapsed: 62.252 s  <<< ERROR!
> 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 
> java.util.concurrent.TimeoutException
> 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
> 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14  at 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115)
> 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
> 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
> 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
> 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
> 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14  at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210)
> 2022-02-09T02:18:17.1845603Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135

[jira] [Commented] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure

2022-02-11 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490730#comment-17490730
 ] 

Till Rohrmann commented on FLINK-26036:
---

Another instance w/o the latest fix: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31211&view=logs&j=5c8e7682-d68f-54d1-16a2-a09310218a49&t=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba&l=22954

> LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory 
> timeout on azure
> ---
>
> Key: FLINK-26036
> URL: https://issues.apache.org/jira/browse/FLINK-26036
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0
>Reporter: Yun Gao
>Assignee: Till Rohrmann
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.15.0
>
>
> {code:java}
> 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory
>   Time elapsed: 62.252 s  <<< ERROR!
> 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 
> java.util.concurrent.TimeoutException
> 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
> 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14  at 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115)
> 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
> 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
> 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
> 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
> 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14  at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestM

[jira] [Commented] (FLINK-24119) KafkaITCase.testTimestamps fails due to "Topic xxx already exist"

2022-02-11 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490731#comment-17490731
 ] 

Till Rohrmann commented on FLINK-24119:
---

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31211&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=15a22db7-8faa-5b34-3920-d33c9f0ca23c&l=35896

> KafkaITCase.testTimestamps fails due to "Topic xxx already exist"
> -
>
> Key: FLINK-24119
> URL: https://issues.apache.org/jira/browse/FLINK-24119
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.14.0
>Reporter: Xintong Song
>Assignee: Fabian Paul
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23328&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=15a22db7-8faa-5b34-3920-d33c9f0ca23c&l=7419
> {code}
> Sep 01 15:53:20 [ERROR] Tests run: 23, Failures: 1, Errors: 0, Skipped: 0, 
> Time elapsed: 162.65 s <<< FAILURE! - in 
> org.apache.flink.streaming.connectors.kafka.KafkaITCase
> Sep 01 15:53:20 [ERROR] testTimestamps  Time elapsed: 23.237 s  <<< FAILURE!
> Sep 01 15:53:20 java.lang.AssertionError: Create test topic : tstopic failed, 
> org.apache.kafka.common.errors.TopicExistsException: Topic 'tstopic' already 
> exists.
> Sep 01 15:53:20   at org.junit.Assert.fail(Assert.java:89)
> Sep 01 15:53:20   at 
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl.createTestTopic(KafkaTestEnvironmentImpl.java:226)
> Sep 01 15:53:20   at 
> org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironment.createTestTopic(KafkaTestEnvironment.java:112)
> Sep 01 15:53:20   at 
> org.apache.flink.streaming.connectors.kafka.KafkaTestBase.createTestTopic(KafkaTestBase.java:212)
> Sep 01 15:53:20   at 
> org.apache.flink.streaming.connectors.kafka.KafkaITCase.testTimestamps(KafkaITCase.java:191)
> Sep 01 15:53:20   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Sep 01 15:53:20   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Sep 01 15:53:20   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Sep 01 15:53:20   at java.lang.reflect.Method.invoke(Method.java:498)
> Sep 01 15:53:20   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Sep 01 15:53:20   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Sep 01 15:53:20   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Sep 01 15:53:20   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Sep 01 15:53:20   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> Sep 01 15:53:20   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> Sep 01 15:53:20   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Sep 01 15:53:20   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26081) Update document,becasue of add maven wrapper.

2022-02-11 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490726#comment-17490726
 ] 

Till Rohrmann commented on FLINK-26081:
---

Sure, I've assigned you.

> Update document,becasue of add maven wrapper.
> -
>
> Key: FLINK-26081
> URL: https://issues.apache.org/jira/browse/FLINK-26081
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.15.0
>Reporter: Aiden Gong
>Assignee: Aiden Gong
>Priority: Minor
> Fix For: 1.15.0
>
>
> Update document,becasue of add maven wapper.
> Related files: README.md、project-configuration.md、building.md、ide_setup.md.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (FLINK-26081) Update document,becasue of add maven wrapper.

2022-02-11 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-26081:
-

Assignee: Aiden Gong

> Update document,becasue of add maven wrapper.
> -
>
> Key: FLINK-26081
> URL: https://issues.apache.org/jira/browse/FLINK-26081
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.15.0
>Reporter: Aiden Gong
>Assignee: Aiden Gong
>Priority: Minor
> Fix For: 1.15.0
>
>
> Update document,becasue of add maven wapper.
> Related files: README.md、project-configuration.md、building.md、ide_setup.md.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (FLINK-25898) Add README.md to flink-statefun/statefun-sdk-js

2022-02-10 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-25898:
-

Assignee: Till Rohrmann

> Add README.md to flink-statefun/statefun-sdk-js
> ---
>
> Key: FLINK-25898
> URL: https://issues.apache.org/jira/browse/FLINK-25898
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Affects Versions: statefun-3.2.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Major
>
> We should add a {{README.md}} to {{flink-statefun/statefun-sdk-js}}. This 
> would then also be displayed on 
> [npmjs.com|https://www.npmjs.com/package/apache-flink-statefun].
> Unfortunately, in order to publish the {{README.md}} we have to release a new 
> version of the npm package.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-26034) Add maven wrapper for flink

2022-02-10 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-26034.
-
Resolution: Fixed

Fixed via cc5616d1f1a3566f8a4dbb7a016251cf0f34b47a

> Add maven wrapper for flink
> ---
>
> Key: FLINK-26034
> URL: https://issues.apache.org/jira/browse/FLINK-26034
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.15.0
>Reporter: Aiden Gong
>Assignee: Aiden Gong
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> Idea just support this feature now. It is very helpful for contributors.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure

2022-02-10 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490351#comment-17490351
 ] 

Till Rohrmann commented on FLINK-26036:
---

I think the solution is to move the {{isRunning()}} check into the 
{{freeSlotInternal}} method. That way we should guarantee that we don't free 
slots while shutting down.

> LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory 
> timeout on azure
> ---
>
> Key: FLINK-26036
> URL: https://issues.apache.org/jira/browse/FLINK-26036
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0
>Reporter: Yun Gao
>Assignee: Till Rohrmann
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.15.0
>
>
> {code:java}
> 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory
>   Time elapsed: 62.252 s  <<< ERROR!
> 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 
> java.util.concurrent.TimeoutException
> 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
> 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14  at 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115)
> 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
> 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
> 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
> 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
> 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14  at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:21

[jira] [Commented] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure

2022-02-10 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490349#comment-17490349
 ] 

Till Rohrmann commented on FLINK-26036:
---

The problem seems to be the following: When shutting down the {{TaskExecutor}}, 
then we fail all running tasks and close the {{TaskSlotTableImpl}}. If now a 
terminal state of a {{Task}} is processed, then this removes the {{Task}} from 
the {{TaskSlotTableImpl}}. This in turn triggers the 
{{TaskExecutor.freeSlotInternal}} via the {{SlotActions.freeSlot}} call.

> LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory 
> timeout on azure
> ---
>
> Key: FLINK-26036
> URL: https://issues.apache.org/jira/browse/FLINK-26036
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0
>Reporter: Yun Gao
>Assignee: Till Rohrmann
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.15.0
>
>
> {code:java}
> 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory
>   Time elapsed: 62.252 s  <<< ERROR!
> 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 
> java.util.concurrent.TimeoutException
> 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
> 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14  at 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115)
> 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
> 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
> 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
> 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
> 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14  at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute

[jira] [Commented] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure

2022-02-10 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490131#comment-17490131
 ] 

Till Rohrmann commented on FLINK-26036:
---

Thanks for the pointer [~gaoyunhaii]. I will take another look at the problem. 
I assume that there is still somehow a call that triggers 
{{TaskExectuor.freeSlotInternal}}.

> LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory 
> timeout on azure
> ---
>
> Key: FLINK-26036
> URL: https://issues.apache.org/jira/browse/FLINK-26036
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0
>Reporter: Yun Gao
>Assignee: Till Rohrmann
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.15.0
>
>
> {code:java}
> 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory
>   Time elapsed: 62.252 s  <<< ERROR!
> 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 
> java.util.concurrent.TimeoutException
> 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
> 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14  at 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115)
> 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
> 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
> 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
> 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
> 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14  at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:2

[jira] [Closed] (FLINK-25975) add doc for how to use AvroParquetRecordFormat

2022-02-10 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-25975.
-
Resolution: Fixed

Fixed via 222b72f2de43bb60b2f4741fde023493967d20f7

> add doc for how to use AvroParquetRecordFormat
> --
>
> Key: FLINK-25975
> URL: https://issues.apache.org/jira/browse/FLINK-25975
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.15.0
>Reporter: Jing Ge
>Assignee: Jing Ge
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/formats/parquet/



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure

2022-02-09 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-26036.
-
Resolution: Fixed

Fixed via f839f12e2e425d8c32279fb505a8c624bbc2a266

> LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory 
> timeout on azure
> ---
>
> Key: FLINK-26036
> URL: https://issues.apache.org/jira/browse/FLINK-26036
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0
>Reporter: Yun Gao
>Assignee: Till Rohrmann
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.15.0
>
>
> {code:java}
> 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory
>   Time elapsed: 62.252 s  <<< ERROR!
> 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 
> java.util.concurrent.TimeoutException
> 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
> 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14  at 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115)
> 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
> 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
> 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
> 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
> 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14  at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210)
> 2022-02-09T02:18:17.1845603Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135

[jira] [Assigned] (FLINK-26034) Add maven wrapper for flink

2022-02-09 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-26034:
-

Assignee: Aiden Gong

> Add maven wrapper for flink
> ---
>
> Key: FLINK-26034
> URL: https://issues.apache.org/jira/browse/FLINK-26034
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.15.0
>Reporter: Aiden Gong
>Assignee: Aiden Gong
>Priority: Minor
> Fix For: 1.15.0
>
>
> Idea just support this feature now. It is very helpful for contributors.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-26032) log job info in the ContextEnvironment

2022-02-09 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-26032.
-
Fix Version/s: 1.15.0
   Resolution: Fixed

Fixed via 8ce9632849b6704a57e3836043cdd1e9dab39c15

> log job info in the ContextEnvironment
> --
>
> Key: FLINK-26032
> URL: https://issues.apache.org/jira/browse/FLINK-26032
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission
>Reporter: Jing Ge
>Assignee: Jing Ge
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> In Flink codebase, the {{org.apache.flink.client.program.ContextEnvironment}} 
> has a static 
> [Logger|https://github.com/apache/flink/blob/7f9587c723057e2b6cbaf748181c8c80a7f6703d/flink-clients/src/main/java/org/apache/flink/client/program/ContextEnvironment.java#L47]
>  defined in the class, however, it doesn’t use it to print any logs. Instead, 
> it prints logs with {{System.out}} and passes the Logger to 
> {{ShutdownHookUtil.addShutdownHook}} and 
> {{jobExecutionResultFuture.whenComplete}} for logging any hook errors.  If 
> customer integrated the CLI (‘FlinkYarnSessionCli’ in their case) into a 
> multi-threaded program to submit jobs in parallel, does it lead to any logs 
> missing/override/disorder problems? 
> It is always helpful to log the status information during the job submit 
> process.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure

2022-02-09 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489424#comment-17489424
 ] 

Till Rohrmann commented on FLINK-26036:
---

The problem seems to be the shutdown hook introduced with FLINK-25709. In some 
cases, the shutdown hook manages to delete the {{slotAllocationSnapshots}} so 
that the newly restarted process cannot recover.

> LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory 
> timeout on azure
> ---
>
> Key: FLINK-26036
> URL: https://issues.apache.org/jira/browse/FLINK-26036
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0
>Reporter: Yun Gao
>Assignee: Till Rohrmann
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.15.0
>
>
> {code:java}
> 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory
>   Time elapsed: 62.252 s  <<< ERROR!
> 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 
> java.util.concurrent.TimeoutException
> 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
> 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14  at 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115)
> 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
> 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
> 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
> 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
> 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14  at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescrip

[jira] [Commented] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure

2022-02-09 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489415#comment-17489415
 ] 

Till Rohrmann commented on FLINK-26036:
---

The problem seems to be that a restarted {{TaskManager}} process finds an empty 
{{slotAllocationSnapshots}} directory where actually the previously allocated 
slot snapshots should be contained.

> LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory 
> timeout on azure
> ---
>
> Key: FLINK-26036
> URL: https://issues.apache.org/jira/browse/FLINK-26036
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0
>Reporter: Yun Gao
>Assignee: Till Rohrmann
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.15.0
>
>
> {code:java}
> 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory
>   Time elapsed: 62.252 s  <<< ERROR!
> 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 
> java.util.concurrent.TimeoutException
> 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
> 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14  at 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115)
> 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
> 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
> 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
> 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
> 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14  at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210

[jira] [Updated] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure

2022-02-09 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-26036:
--
Fix Version/s: 1.15.0

> LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory 
> timeout on azure
> ---
>
> Key: FLINK-26036
> URL: https://issues.apache.org/jira/browse/FLINK-26036
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0
>Reporter: Yun Gao
>Assignee: Till Rohrmann
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.15.0
>
>
> {code:java}
> 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory
>   Time elapsed: 62.252 s  <<< ERROR!
> 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 
> java.util.concurrent.TimeoutException
> 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
> 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14  at 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115)
> 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
> 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
> 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
> 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
> 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14  at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210)
> 2022-02-09T02:18:17.1845603Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135)
> 2022-02-09T02:18:17.1846375Z Feb 09 02:18:14  at 
> org.junit.jupi

[jira] [Assigned] (FLINK-26036) LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory timeout on azure

2022-02-09 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-26036:
-

Assignee: Till Rohrmann

> LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory 
> timeout on azure
> ---
>
> Key: FLINK-26036
> URL: https://issues.apache.org/jira/browse/FLINK-26036
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0
>Reporter: Yun Gao
>Assignee: Till Rohrmann
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> 022-02-09T02:18:17.1827314Z Feb 09 02:18:14 [ERROR] 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory
>   Time elapsed: 62.252 s  <<< ERROR!
> 2022-02-09T02:18:17.1827940Z Feb 09 02:18:14 
> java.util.concurrent.TimeoutException
> 2022-02-09T02:18:17.1828450Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
> 2022-02-09T02:18:17.1829040Z Feb 09 02:18:14  at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> 2022-02-09T02:18:17.1829752Z Feb 09 02:18:14  at 
> org.apache.flink.test.recovery.LocalRecoveryITCase.testRecoverLocallyFromProcessCrashWithWorkingDirectory(LocalRecoveryITCase.java:115)
> 2022-02-09T02:18:17.1830407Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-02-09T02:18:17.1830954Z Feb 09 02:18:14  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-02-09T02:18:17.1831582Z Feb 09 02:18:14  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-02-09T02:18:17.1832135Z Feb 09 02:18:14  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-02-09T02:18:17.1832697Z Feb 09 02:18:14  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
> 2022-02-09T02:18:17.1833566Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2022-02-09T02:18:17.1834394Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2022-02-09T02:18:17.1835125Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2022-02-09T02:18:17.1835875Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
> 2022-02-09T02:18:17.1836565Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
> 2022-02-09T02:18:17.1837294Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2022-02-09T02:18:17.1838007Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2022-02-09T02:18:17.1838743Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2022-02-09T02:18:17.1839499Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2022-02-09T02:18:17.1840224Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2022-02-09T02:18:17.1840952Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2022-02-09T02:18:17.1841616Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2022-02-09T02:18:17.1842257Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2022-02-09T02:18:17.1842951Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
> 2022-02-09T02:18:17.1843681Z Feb 09 02:18:14  at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> 2022-02-09T02:18:17.1844782Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210)
> 2022-02-09T02:18:17.1845603Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135)
> 2022-02-09T02:18:17.1846375Z Feb 09 02:18:14  at 
> org.junit.jupiter.engine.descriptor.T

[jira] [Commented] (FLINK-26034) Add maven wrapper for flink

2022-02-09 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489364#comment-17489364
 ] 

Till Rohrmann commented on FLINK-26034:
---

This could indeed be helpful. Especially since we rely on maven 3.2.5. I'd be 
fine with adding something like this.

> Add maven wrapper for flink
> ---
>
> Key: FLINK-26034
> URL: https://issues.apache.org/jira/browse/FLINK-26034
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.15.0
>Reporter: Aiden Gong
>Priority: Minor
> Fix For: 1.15.0
>
>
> Idea just support this feature now. It is very helpful for contributors.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26034) Add maven wrapper for flink

2022-02-09 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-26034:
--
Summary: Add maven wrapper for flink  (was: Add maven wapper for flink)

> Add maven wrapper for flink
> ---
>
> Key: FLINK-26034
> URL: https://issues.apache.org/jira/browse/FLINK-26034
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.15.0
>Reporter: Aiden Gong
>Priority: Minor
> Fix For: 1.15.0
>
>
> Idea just support this feature now. It is very helpful for contributors.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (FLINK-25933) Allow configuring different transports in RequestReplyFunctionBuilder

2022-02-09 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-25933:
-

Assignee: Galen Warren

> Allow configuring different transports in RequestReplyFunctionBuilder
> -
>
> Key: FLINK-25933
> URL: https://issues.apache.org/jira/browse/FLINK-25933
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Galen Warren
>Priority: Major
>
> Currently it is not possible to configure the type of the transport used 
> while using the data stream integration.
> It would be useful to do so.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (FLINK-26003) Use Jackson serialization for persisting TaskExecutor state to working directory

2022-02-08 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-26003:
-

Assignee: (was: Till Rohrmann)

> Use Jackson serialization for persisting TaskExecutor state to working 
> directory
> 
>
> Key: FLINK-26003
> URL: https://issues.apache.org/jira/browse/FLINK-26003
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.15.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.15.0
>
>
> In order to avoid Java serialization, we should use a different serialization 
> format for persisting {{TaskExecutor}} state in the working directory. One 
> idea could be to use Jackson for the serialization.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-23240) ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure

2022-02-08 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489003#comment-17489003
 ] 

Till Rohrmann commented on FLINK-23240:
---

It might be a problem for standalone setups where the JobManager process won't 
be automatically restarted. Moreover, it might lead to unnecessary restarts 
that might be confusing to people.

> ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper
>  fails on azure
> -
>
> Key: FLINK-23240
> URL: https://issues.apache.org/jira/browse/FLINK-23240
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0, 1.15.0
>Reporter: Xintong Song
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186
> {code}
> Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 91.407 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase
> Jul 04 22:17:29 [ERROR] 
> testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase)
>   Time elapsed: 31.356 s  <<< ERROR!
> Jul 04 22:17:29 java.util.concurrent.ExecutionException: 
> java.util.concurrent.TimeoutException: Invocation of public abstract 
> java.util.concurrent.CompletableFuture 
> org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time)
>  timed out.
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jul 04 22:17:29   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jul 04 22:17:29   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> Jul 04 22:17:29   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> Jul 04 22:17:29   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> Jul 04 22:17:29   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> Jul 04 2

[jira] [Commented] (FLINK-26005) TableEnvironment.createTemporarySystemFunction cause NPE when using leftOuterLateralJoin

2022-02-08 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489002#comment-17489002
 ] 

Till Rohrmann commented on FLINK-26005:
---

I think [~twalthr] made it a subtask.

> TableEnvironment.createTemporarySystemFunction cause NPE when using 
> leftOuterLateralJoin
> 
>
> Key: FLINK-26005
> URL: https://issues.apache.org/jira/browse/FLINK-26005
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Runtime
>Affects Versions: 1.15.0, 1.14.3
>Reporter: Till Rohrmann
>Priority: Major
>
> When trying out the {{Table.leftOuterLateralJoin}} with a table function that 
> was registered via {{TableEnvironment.createTemporarySystemFunction}} the 
> system failed with 
> {code}
> Exception in thread "main" java.lang.NullPointerException
>   at 
> org.apache.calcite.tools.RelBuilder$Frame.(RelBuilder.java:3332)
>   at 
> org.apache.calcite.tools.RelBuilder$Frame.(RelBuilder.java:3317)
>   at org.apache.calcite.tools.RelBuilder.push(RelBuilder.java:282)
>   at 
> org.apache.calcite.tools.RelBuilder.functionScan(RelBuilder.java:1197)
>   at 
> org.apache.flink.table.planner.plan.QueryOperationConverter$SingleRelVisitor.visit(QueryOperationConverter.java:309)
>   at 
> org.apache.flink.table.planner.plan.QueryOperationConverter$SingleRelVisitor.visit(QueryOperationConverter.java:154)
>   at 
> org.apache.flink.table.operations.CalculatedQueryOperation.accept(CalculatedQueryOperation.java:94)
>   at 
> org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:151)
>   at 
> org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:133)
>   at 
> org.apache.flink.table.operations.utils.QueryOperationDefaultVisitor.visit(QueryOperationDefaultVisitor.java:87)
>   at 
> org.apache.flink.table.operations.CalculatedQueryOperation.accept(CalculatedQueryOperation.java:94)
>   at 
> org.apache.flink.table.planner.plan.QueryOperationConverter.lambda$defaultMethod$0(QueryOperationConverter.java:150)
>   at java.base/java.util.Arrays$ArrayList.forEach(Arrays.java:4390)
>   at 
> org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:150)
>   at 
> org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:133)
>   at 
> org.apache.flink.table.operations.utils.QueryOperationDefaultVisitor.visit(QueryOperationDefaultVisitor.java:62)
>   at 
> org.apache.flink.table.operations.JoinQueryOperation.accept(JoinQueryOperation.java:115)
>   at 
> org.apache.flink.table.planner.plan.QueryOperationConverter.lambda$defaultMethod$0(QueryOperationConverter.java:150)
>   at 
> java.base/java.util.Collections$SingletonList.forEach(Collections.java:4854)
>   at 
> org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:150)
>   at 
> org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:133)
>   at 
> org.apache.flink.table.operations.utils.QueryOperationDefaultVisitor.visit(QueryOperationDefaultVisitor.java:47)
>   at 
> org.apache.flink.table.operations.ProjectQueryOperation.accept(ProjectQueryOperation.java:76)
>   at 
> org.apache.flink.table.planner.calcite.FlinkRelBuilder.queryOperation(FlinkRelBuilder.scala:184)
>   at 
> org.apache.flink.table.planner.delegation.PlannerBase.translateToRel(PlannerBase.scala:214)
>   at 
> org.apache.flink.table.planner.delegation.PlannerBase.$anonfun$translate$1(PlannerBase.scala:182)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
>   at scala.collection.Iterator.foreach(Iterator.scala:937)
>   at scala.collection.Iterator.foreach$(Iterator.scala:937)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:70)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:233)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:182)
>   at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.translate(TableEnvironmentImpl.java:1665)
>   at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeQueryOperation(TableEnvironmentIm

[jira] [Created] (FLINK-26014) Document how to use the working directory for faster local recoveries

2022-02-08 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-26014:
-

 Summary: Document how to use the working directory for faster 
local recoveries
 Key: FLINK-26014
 URL: https://issues.apache.org/jira/browse/FLINK-26014
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.15.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.15.0


After having implemented FLIP-198 and FLIP-201, users can now use faster 
TaskManager failover when using local recovery with persisted volumes. I 
suggest to add documentation for explaining how to configure Flink to make use 
of it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-25817) FLIP-201: Persist local state in working directory

2022-02-08 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-25817.
-
Fix Version/s: 1.15.0
   Resolution: Fixed

Fixed via

4b39e4ad438
f95a0c7e867
1a1415119f9
45c661e02de
d63c48dded3
577ef90710c
aa78915f58c
990589c30a7
2ff1a18744e
00f2c627492

> FLIP-201: Persist local state in working directory
> --
>
> Key: FLINK-25817
> URL: https://issues.apache.org/jira/browse/FLINK-25817
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Affects Versions: 1.15.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> This issue is the umbrella ticket for 
> [FLIP-201|https://cwiki.apache.org/confluence/x/wJuqCw] which aims at adding 
> support for persisting local state in Flink's working directory. This would 
> enable Flink in certain scenarios to recover locally even in case of process 
> failures.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26005) TableEnvironment.createTemporarySystemFunction cause NPE when using leftOuterLateralJoin

2022-02-08 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-26005:
-

 Summary: TableEnvironment.createTemporarySystemFunction cause NPE 
when using leftOuterLateralJoin
 Key: FLINK-26005
 URL: https://issues.apache.org/jira/browse/FLINK-26005
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.14.3, 1.15.0
Reporter: Till Rohrmann


When trying out the {{Table.leftOuterLateralJoin}} with a table function that 
was registered via {{TableEnvironment.createTemporarySystemFunction}} the 
system failed with 

{code}
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.calcite.tools.RelBuilder$Frame.(RelBuilder.java:3332)
at 
org.apache.calcite.tools.RelBuilder$Frame.(RelBuilder.java:3317)
at org.apache.calcite.tools.RelBuilder.push(RelBuilder.java:282)
at 
org.apache.calcite.tools.RelBuilder.functionScan(RelBuilder.java:1197)
at 
org.apache.flink.table.planner.plan.QueryOperationConverter$SingleRelVisitor.visit(QueryOperationConverter.java:309)
at 
org.apache.flink.table.planner.plan.QueryOperationConverter$SingleRelVisitor.visit(QueryOperationConverter.java:154)
at 
org.apache.flink.table.operations.CalculatedQueryOperation.accept(CalculatedQueryOperation.java:94)
at 
org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:151)
at 
org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:133)
at 
org.apache.flink.table.operations.utils.QueryOperationDefaultVisitor.visit(QueryOperationDefaultVisitor.java:87)
at 
org.apache.flink.table.operations.CalculatedQueryOperation.accept(CalculatedQueryOperation.java:94)
at 
org.apache.flink.table.planner.plan.QueryOperationConverter.lambda$defaultMethod$0(QueryOperationConverter.java:150)
at java.base/java.util.Arrays$ArrayList.forEach(Arrays.java:4390)
at 
org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:150)
at 
org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:133)
at 
org.apache.flink.table.operations.utils.QueryOperationDefaultVisitor.visit(QueryOperationDefaultVisitor.java:62)
at 
org.apache.flink.table.operations.JoinQueryOperation.accept(JoinQueryOperation.java:115)
at 
org.apache.flink.table.planner.plan.QueryOperationConverter.lambda$defaultMethod$0(QueryOperationConverter.java:150)
at 
java.base/java.util.Collections$SingletonList.forEach(Collections.java:4854)
at 
org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:150)
at 
org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:133)
at 
org.apache.flink.table.operations.utils.QueryOperationDefaultVisitor.visit(QueryOperationDefaultVisitor.java:47)
at 
org.apache.flink.table.operations.ProjectQueryOperation.accept(ProjectQueryOperation.java:76)
at 
org.apache.flink.table.planner.calcite.FlinkRelBuilder.queryOperation(FlinkRelBuilder.scala:184)
at 
org.apache.flink.table.planner.delegation.PlannerBase.translateToRel(PlannerBase.scala:214)
at 
org.apache.flink.table.planner.delegation.PlannerBase.$anonfun$translate$1(PlannerBase.scala:182)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
at scala.collection.Iterator.foreach(Iterator.scala:937)
at scala.collection.Iterator.foreach$(Iterator.scala:937)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
at scala.collection.IterableLike.foreach(IterableLike.scala:70)
at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike.map(TraversableLike.scala:233)
at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:182)
at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.translate(TableEnvironmentImpl.java:1665)
at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeQueryOperation(TableEnvironmentImpl.java:805)
at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1274)
at 
org.apache.flink.table.api.internal.TableImpl.execute(TableImpl.java:601)
{code}

Interestingly, when using the deprecated {{TableEnvironment.registerFunction}} 
it worked. Timo mentioned that this could be caused by a missing integration 
into the new type syst

[jira] [Created] (FLINK-26003) Use Jackson serialization for persisting TaskExecutor state to working directory

2022-02-08 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-26003:
-

 Summary: Use Jackson serialization for persisting TaskExecutor 
state to working directory
 Key: FLINK-26003
 URL: https://issues.apache.org/jira/browse/FLINK-26003
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.15.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.15.0


In order to avoid Java serialization, we should use a different serialization 
format for persisting {{TaskExecutor}} state in the working directory. One idea 
could be to use Jackson for the serialization.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-23240) ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure

2022-02-08 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488663#comment-17488663
 ] 

Till Rohrmann edited comment on FLINK-23240 at 2/8/22, 8:40 AM:


The problem seems to be that the {{ResourceManagerServiceImpl}} closes itself 
if it loses leadership. Therefore, the {{MiniCluster}} shuts down and the test 
hangs.

It looks to me as if the {{ResourceManagerServiceImpl}} can only survive a 
single leadership session unless the property 
{{flink.tests.enable-rm-multi-leader-session}} is set 
([code|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManagerServiceImpl.java#L220-L222]).
 If this is correct, then we might have a cluster instability because every RM 
leadership loss would trigger an unexpected termination of the 
{{ResourceManager}} which will terminate the process. [~xtsong] have I 
understood the code correctly?


was (Author: till.rohrmann):
The problem seems to be that the {{ResourceManagerServiceImpl}} closes itself 
if it loses leadership. Therefore, the {{MiniCluster}} shuts down and the test 
hangs.

It looks to me as if the {{ResourceManagerServiceImpl}} can only survive a 
single leadership session unless the property 
{{flink.tests.enable-rm-multi-leader-session}} is set. If this is correct, then 
we might have a cluster instability because every RM leadership loss would 
trigger an unexpected termination of the {{ResourceManager}} which will 
terminate the process. [~xtsong] have I understood the code correctly?

> ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper
>  fails on azure
> -
>
> Key: FLINK-23240
> URL: https://issues.apache.org/jira/browse/FLINK-23240
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / Coordination
>Affects Versions: 1.14.0, 1.15.0
>Reporter: Xintong Song
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186
> {code}
> Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 91.407 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase
> Jul 04 22:17:29 [ERROR] 
> testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase)
>   Time elapsed: 31.356 s  <<< ERROR!
> Jul 04 22:17:29 java.util.concurrent.ExecutionException: 
> java.util.concurrent.TimeoutException: Invocation of public abstract 
> java.util.concurrent.CompletableFuture 
> org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time)
>  timed out.
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jul 04 22:17:29   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jul 04 22:17:29   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> Jul 04 22:17

[jira] [Commented] (FLINK-23240) ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure

2022-02-08 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488665#comment-17488665
 ] 

Till Rohrmann commented on FLINK-23240:
---

cc [~dmvk]

> ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper
>  fails on azure
> -
>
> Key: FLINK-23240
> URL: https://issues.apache.org/jira/browse/FLINK-23240
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / Coordination
>Affects Versions: 1.14.0, 1.15.0
>Reporter: Xintong Song
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186
> {code}
> Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 91.407 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase
> Jul 04 22:17:29 [ERROR] 
> testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase)
>   Time elapsed: 31.356 s  <<< ERROR!
> Jul 04 22:17:29 java.util.concurrent.ExecutionException: 
> java.util.concurrent.TimeoutException: Invocation of public abstract 
> java.util.concurrent.CompletableFuture 
> org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time)
>  timed out.
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jul 04 22:17:29   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jul 04 22:17:29   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> Jul 04 22:17:29   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> Jul 04 22:17:29   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> Jul 04 22:17:29   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.run(Parent

[jira] [Updated] (FLINK-23240) ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure

2022-02-08 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-23240:
--
Affects Version/s: 1.15.0

> ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper
>  fails on azure
> -
>
> Key: FLINK-23240
> URL: https://issues.apache.org/jira/browse/FLINK-23240
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / Coordination
>Affects Versions: 1.14.0, 1.15.0
>Reporter: Xintong Song
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186
> {code}
> Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 91.407 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase
> Jul 04 22:17:29 [ERROR] 
> testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase)
>   Time elapsed: 31.356 s  <<< ERROR!
> Jul 04 22:17:29 java.util.concurrent.ExecutionException: 
> java.util.concurrent.TimeoutException: Invocation of public abstract 
> java.util.concurrent.CompletableFuture 
> org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time)
>  timed out.
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jul 04 22:17:29   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jul 04 22:17:29   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> Jul 04 22:17:29   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> Jul 04 22:17:29   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> Jul 04 22:17:29   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> Jul 04 22:17:29 

[jira] [Commented] (FLINK-23240) ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure

2022-02-08 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488664#comment-17488664
 ] 

Till Rohrmann commented on FLINK-23240:
---

I am promoting this issue to be a blocker until we have invalidated my 
suspicion.

> ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper
>  fails on azure
> -
>
> Key: FLINK-23240
> URL: https://issues.apache.org/jira/browse/FLINK-23240
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / Coordination
>Affects Versions: 1.14.0
>Reporter: Xintong Song
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186
> {code}
> Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 91.407 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase
> Jul 04 22:17:29 [ERROR] 
> testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase)
>   Time elapsed: 31.356 s  <<< ERROR!
> Jul 04 22:17:29 java.util.concurrent.ExecutionException: 
> java.util.concurrent.TimeoutException: Invocation of public abstract 
> java.util.concurrent.CompletableFuture 
> org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time)
>  timed out.
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jul 04 22:17:29   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jul 04 22:17:29   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> Jul 04 22:17:29   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> Jul 04 22:17:29   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> Jul 04 22:17:29   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Jul 0

[jira] [Commented] (FLINK-23240) ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure

2022-02-08 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488663#comment-17488663
 ] 

Till Rohrmann commented on FLINK-23240:
---

The problem seems to be that the {{ResourceManagerServiceImpl}} closes itself 
if it loses leadership. Therefore, the {{MiniCluster}} shuts down and the test 
hangs.

It looks to me as if the {{ResourceManagerServiceImpl}} can only survive a 
single leadership session unless the property 
{{flink.tests.enable-rm-multi-leader-session}} is set. If this is correct, then 
we might have a cluster instability because every RM leadership loss would 
trigger an unexpected termination of the {{ResourceManager}} which will 
terminate the process. [~xtsong] have I understood the code correctly?

> ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper
>  fails on azure
> -
>
> Key: FLINK-23240
> URL: https://issues.apache.org/jira/browse/FLINK-23240
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / Coordination
>Affects Versions: 1.14.0
>Reporter: Xintong Song
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186
> {code}
> Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 91.407 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase
> Jul 04 22:17:29 [ERROR] 
> testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase)
>   Time elapsed: 31.356 s  <<< ERROR!
> Jul 04 22:17:29 java.util.concurrent.ExecutionException: 
> java.util.concurrent.TimeoutException: Invocation of public abstract 
> java.util.concurrent.CompletableFuture 
> org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time)
>  timed out.
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jul 04 22:17:29   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jul 04 22:17:29   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> Jul 04 22:17:29   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRun

[jira] [Updated] (FLINK-23240) ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure

2022-02-08 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-23240:
--
Priority: Blocker  (was: Critical)

> ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper
>  fails on azure
> -
>
> Key: FLINK-23240
> URL: https://issues.apache.org/jira/browse/FLINK-23240
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / Coordination
>Affects Versions: 1.14.0
>Reporter: Xintong Song
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186
> {code}
> Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 91.407 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase
> Jul 04 22:17:29 [ERROR] 
> testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase)
>   Time elapsed: 31.356 s  <<< ERROR!
> Jul 04 22:17:29 java.util.concurrent.ExecutionException: 
> java.util.concurrent.TimeoutException: Invocation of public abstract 
> java.util.concurrent.CompletableFuture 
> org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time)
>  timed out.
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jul 04 22:17:29   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jul 04 22:17:29   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> Jul 04 22:17:29   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> Jul 04 22:17:29   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> Jul 04 22:17:29   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> Jul 04 22:17:29

[jira] [Assigned] (FLINK-25937) SQL Client end-to-end test e2e fails on AZP

2022-02-08 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-25937:
-

Assignee: David Morávek

> SQL Client end-to-end test e2e fails on AZP
> ---
>
> Key: FLINK-25937
> URL: https://issues.apache.org/jira/browse/FLINK-25937
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Table SQL / API
>Affects Versions: 1.15.0
>Reporter: Till Rohrmann
>Assignee: David Morávek
>Priority: Critical
>  Labels: test-stability
>
> The {{SQL Client end-to-end test}} e2e tests fails on AZP when using the 
> {{AdaptiveScheduler}} because the scheduler expects that the parallelism is 
> set for all vertices:
> {code}
> Feb 03 03:45:13 org.apache.flink.runtime.client.JobInitializationException: 
> Could not start the JobMaster.
> Feb 03 03:45:13   at 
> org.apache.flink.runtime.jobmaster.DefaultJobMasterServiceProcess.lambda$new$0(DefaultJobMasterServiceProcess.java:97)
> Feb 03 03:45:13   at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
> Feb 03 03:45:13   at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
> Feb 03 03:45:13   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> Feb 03 03:45:13   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1609)
> Feb 03 03:45:13   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> Feb 03 03:45:13   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> Feb 03 03:45:13   at java.lang.Thread.run(Thread.java:748)
> Feb 03 03:45:13 Caused by: java.util.concurrent.CompletionException: 
> java.lang.IllegalStateException: The adaptive scheduler expects the 
> parallelism being set for each JobVertex (violated JobVertex: 
> f74b775b58627a33e46b8c155b320255).
> Feb 03 03:45:13   at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
> Feb 03 03:45:13   at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
> Feb 03 03:45:13   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
> Feb 03 03:45:13   ... 3 more
> Feb 03 03:45:13 Caused by: java.lang.IllegalStateException: The adaptive 
> scheduler expects the parallelism being set for each JobVertex (violated 
> JobVertex: f74b775b58627a33e46b8c155b320255).
> Feb 03 03:45:13   at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:215)
> Feb 03 03:45:13   at 
> org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.assertPreconditions(AdaptiveScheduler.java:296)
> Feb 03 03:45:13   at 
> org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.(AdaptiveScheduler.java:230)
> Feb 03 03:45:13   at 
> org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerFactory.createInstance(AdaptiveSchedulerFactory.java:122)
> Feb 03 03:45:13   at 
> org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:115)
> Feb 03 03:45:13   at 
> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:345)
> Feb 03 03:45:13   at 
> org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:322)
> Feb 03 03:45:13   at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:106)
> Feb 03 03:45:13   at 
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:94)
> Feb 03 03:45:13   at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
> Feb 03 03:45:13   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
> Feb 03 03:45:13   ... 3 more
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30662&view=logs&j=fb37c667-81b7-5c22-dd91-846535e99a97&t=39a035c3-c65e-573c-fb66-104c66c28912&l=5782



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (FLINK-25825) MySqlCatalogITCase fails on azure

2022-02-08 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-25825:
-

Assignee: RocMarshal

> MySqlCatalogITCase fails on azure
> -
>
> Key: FLINK-25825
> URL: https://issues.apache.org/jira/browse/FLINK-25825
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / JDBC, Table SQL / API
>Affects Versions: 1.15.0
>Reporter: Roman Khachatryan
>Assignee: RocMarshal
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30189&view=logs&j=e9af9cde-9a65-5281-a58e-2c8511d36983&t=c520d2c3-4d17-51f1-813b-4b0b74a0c307&l=13677
>  
> {code}
> 2022-01-26T06:04:42.8019913Z Jan 26 06:04:42 [ERROR] 
> org.apache.flink.connector.jdbc.catalog.MySqlCatalogITCase.testFullPath  Time 
> elapsed: 2.166 *s  <<< FAILURE!
> 2022-01-26T06:04:42.8025522Z Jan 26 06:04:42 java.lang.AssertionError: 
> expected: java.util.ArrayList<[+I[1, -1, 1, null, true, null, hello, 2021-0 
> 8-04, 2021-08-04T01:54:16, -1, 1, -1.0, 1.0, enum2, -9.1, 9.1, -1, 1, -1, 1, 
> \{"k1": "v1"}, null, col_longtext, null, -1, 1, col_mediumtext, -99, 9 9, 
> -1.0, 1.0, set_ele1, -1, 1, col_text, 10:32:34, 2021-08-04T01:54:16, 
> col_tinytext, -1, 1, null, col_varchar, 2021-08-04T01:54:16.463, 09:33:43,  
> 2021-08-04T01:54:16.463, null], +I[2, -1, 1, null, true, null, hello, 
> 2021-08-04, 2021-08-04T01:53:19, -1, 1, -1.0, 1.0, enum2, -9.1, 9.1, -1, 1,  
> -1, 1, \{"k1": "v1"}, null, col_longtext, null, -1, 1, col_mediumtext, -99, 
> 99, -1.0, 1.0, set_ele1,set_ele12, -1, 1, col_text, 10:32:34, 2021-08- 
> 04T01:53:19, col_tinytext, -1, 1, null, col_varchar, 2021-08-04T01:53:19.098, 
> 09:33:43, 2021-08-04T01:53:19.098, null]]> but was: java.util.ArrayL 
> ist<[+I[1, -1, 1, null, true, null, hello, 2021-08-04, 2021-08-04T01:54:16, 
> -1, 1, -1.0, 1.0, enum2, -9.1, 9.1, -1, 1, -1, 1, \{"k1": "v1"}, null,  
> col_longtext, null, -1, 1, col_mediumtext, -99, 99, -1.0, 1.0, set_ele1, -1, 
> 1, col_text, 10:32:34, 2021-08-04T01:54:16, col_tinytext, -1, 1, null , 
> col_varchar, 2021-08-04T01:54:16.463, 09:33:43, 2021-08-04T01:54:16.463, 
> null], +I[2, -1, 1, null, true, null, hello, 2021-08-04, 2021-08-04T01: 
> 53:19, -1, 1, -1.0, 1.0, enum2, -9.1, 9.1, -1, 1, -1, 1, \{"k1": "v1"}, null, 
> col_longtext, null, -1, 1, col_mediumtext, -99, 99, -1.0, 1.0, set_el 
> e1,set_ele12, -1, 1, col_text, 10:32:34, 2021-08-04T01:53:19, col_tinytext, 
> -1, 1, null, col_varchar, 2021-08-04T01:53:19.098, 09:33:43, 2021-08-0 
> 4T01:53:19.098, null]]>
> 2022-01-26T06:04:42.8029336Z Jan 26 06:04:42    at 
> org.junit.Assert.fail(Assert.java:89)
> 2022-01-26T06:04:42.8029824Z Jan 26 06:04:42    at 
> org.junit.Assert.failNotEquals(Assert.java:835)
> 2022-01-26T06:04:42.8030319Z Jan 26 06:04:42    at 
> org.junit.Assert.assertEquals(Assert.java:120)
> 2022-01-26T06:04:42.8030815Z Jan 26 06:04:42    at 
> org.junit.Assert.assertEquals(Assert.java:146)
> 2022-01-26T06:04:42.8031419Z Jan 26 06:04:42    at 
> org.apache.flink.connector.jdbc.catalog.MySqlCatalogITCase.testFullPath(MySqlCatalogITCase.java*:306)
> {code}
>  
> {code}
> 2022-01-26T06:04:43.2899378Z Jan 26 06:04:43 [ERROR] Failures:
> 2022-01-26T06:04:43.2907942Z Jan 26 06:04:43 [ERROR]   
> MySqlCatalogITCase.testFullPath:306 expected: java.util.ArrayList<[+I[1, -1, 
> 1, null, true,
> 2022-01-26T06:04:43.2914065Z Jan 26 06:04:43 [ERROR]   
> MySqlCatalogITCase.testGetTable:253 expected:<(
> 2022-01-26T06:04:43.2983567Z Jan 26 06:04:43 [ERROR]   
> MySqlCatalogITCase.testSelectToInsert:323 expected: 
> java.util.ArrayList<[+I[1, -1, 1, null,
> 2022-01-26T06:04:43.2997373Z Jan 26 06:04:43 [ERROR]   
> MySqlCatalogITCase.testWithoutCatalog:291 expected: 
> java.util.ArrayList<[+I[1, -1, 1, null,
> 2022-01-26T06:04:43.3010450Z Jan 26 06:04:43 [ERROR]   
> MySqlCatalogITCase.testWithoutCatalogDB:278 expected: 
> java.util.ArrayList<[+I[1, -1, 1, nul
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (FLINK-25968) Possible class leaks in flink-table / sql modules

2022-02-08 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-25968:
-

Assignee: Timo Walther

> Possible class leaks in flink-table / sql modules
> -
>
> Key: FLINK-25968
> URL: https://issues.apache.org/jira/browse/FLINK-25968
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner, Table SQL / Runtime
>Affects Versions: 1.15.0
>Reporter: Yun Gao
>Assignee: Timo Walther
>Priority: Blocker
> Fix For: 1.15.0
>
>
> This is the umbrella issues for possible class leaks in flink-table / sql 
> planner and runtimes.
> Currently for a flink cluster, the flink classes are loaded by the system 
> ClassLoader while each job would have separate user ClassLoaders. In this 
> case, if some class loaded by the system ClassLoader has static variables 
> that reference the classes loaded by the user  ClassLoaders, the user 
> ClassLoaders would not be able to be released, which might cause class leak 
> in some way. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25968) Possible class leaks in flink-table / sql modules

2022-02-08 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-25968:
--
Priority: Blocker  (was: Critical)

> Possible class leaks in flink-table / sql modules
> -
>
> Key: FLINK-25968
> URL: https://issues.apache.org/jira/browse/FLINK-25968
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner, Table SQL / Runtime
>Affects Versions: 1.15.0
>Reporter: Yun Gao
>Priority: Blocker
> Fix For: 1.15.0
>
>
> This is the umbrella issues for possible class leaks in flink-table / sql 
> planner and runtimes.
> Currently for a flink cluster, the flink classes are loaded by the system 
> ClassLoader while each job would have separate user ClassLoaders. In this 
> case, if some class loaded by the system ClassLoader has static variables 
> that reference the classes loaded by the user  ClassLoaders, the user 
> ClassLoaders would not be able to be released, which might cause class leak 
> in some way. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25968) Possible class leaks in flink-table / sql modules

2022-02-08 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-25968:
--
Fix Version/s: 1.15.0

> Possible class leaks in flink-table / sql modules
> -
>
> Key: FLINK-25968
> URL: https://issues.apache.org/jira/browse/FLINK-25968
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner, Table SQL / Runtime
>Affects Versions: 1.15.0
>Reporter: Yun Gao
>Priority: Critical
> Fix For: 1.15.0
>
>
> This is the umbrella issues for possible class leaks in flink-table / sql 
> planner and runtimes.
> Currently for a flink cluster, the flink classes are loaded by the system 
> ClassLoader while each job would have separate user ClassLoaders. In this 
> case, if some class loaded by the system ClassLoader has static variables 
> that reference the classes loaded by the user  ClassLoaders, the user 
> ClassLoaders would not be able to be released, which might cause class leak 
> in some way. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25968) Possible class leaks in flink-table / sql modules

2022-02-08 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-25968:
--
Priority: Critical  (was: Major)

> Possible class leaks in flink-table / sql modules
> -
>
> Key: FLINK-25968
> URL: https://issues.apache.org/jira/browse/FLINK-25968
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner, Table SQL / Runtime
>Affects Versions: 1.15.0
>Reporter: Yun Gao
>Priority: Critical
>
> This is the umbrella issues for possible class leaks in flink-table / sql 
> planner and runtimes.
> Currently for a flink cluster, the flink classes are loaded by the system 
> ClassLoader while each job would have separate user ClassLoaders. In this 
> case, if some class loaded by the system ClassLoader has static variables 
> that reference the classes loaded by the user  ClassLoaders, the user 
> ClassLoaders would not be able to be released, which might cause class leak 
> in some way. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-23240) ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper fails on azure

2022-02-07 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-23240:
--
Component/s: Runtime / Coordination

> ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper
>  fails on azure
> -
>
> Key: FLINK-23240
> URL: https://issues.apache.org/jira/browse/FLINK-23240
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / Coordination
>Affects Versions: 1.14.0
>Reporter: Xintong Song
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19872&view=logs&j=b0a398c0-685b-599c-eb57-c8c2a771138e&t=d13f554f-d4b9-50f8-30ee-d49c6fb0b3cc&l=10186
> {code}
> Jul 04 22:17:29 [ERROR] Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 91.407 s <<< FAILURE! - in 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase
> Jul 04 22:17:29 [ERROR] 
> testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase)
>   Time elapsed: 31.356 s  <<< ERROR!
> Jul 04 22:17:29 java.util.concurrent.ExecutionException: 
> java.util.concurrent.TimeoutException: Invocation of public abstract 
> java.util.concurrent.CompletableFuture 
> org.apache.flink.runtime.webmonitor.RestfulGateway.cancelJob(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time)
>  timed out.
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> Jul 04 22:17:29   at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:303)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:275)
> Jul 04 22:17:29   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsWithLocalRecoveryZookeeper(ResumeCheckpointManuallyITCase.java:215)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 04 22:17:29   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jul 04 22:17:29   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jul 04 22:17:29   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Jul 04 22:17:29   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jul 04 22:17:29   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> Jul 04 22:17:29   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> Jul 04 22:17:29   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> Jul 04 22:17:29   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> Jul 04 22:17:29   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Jul 04 22:17:29   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> Jul 04 22:17:29  

[jira] [Updated] (FLINK-25992) JobDispatcherITCase.testRecoverFromCheckpointAfterLosingAndRegainingLeadership fails on azure

2022-02-07 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-25992:
--
Summary: 
JobDispatcherITCase.testRecoverFromCheckpointAfterLosingAndRegainingLeadership 
fails on azure  (was: 
JobDispatcherITCase..testRecoverFromCheckpointAfterLosingAndRegainingLeadership 
fails on azure)

> JobDispatcherITCase.testRecoverFromCheckpointAfterLosingAndRegainingLeadership
>  fails on azure
> -
>
> Key: FLINK-25992
> URL: https://issues.apache.org/jira/browse/FLINK-25992
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.15.0
>Reporter: Roman Khachatryan
>Priority: Major
>  Labels: test-stability
> Fix For: 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30871&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9154
> {code}
> 19:41:35,515 [flink-akka.actor.default-dispatcher-9] WARN  
> org.apache.flink.runtime.taskmanager.Task[] - jobVertex 
> (1/1)#0 (7efdea21f5f95490e02117063ce8a314) switched from RUNNING to FAILED 
> with failure cause: java.lang.RuntimeException: Error while notify checkpoint 
> ABORT.
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1457)
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpointAborted(Task.java:1407)
>   at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.abortCheckpoint(TaskExecutor.java:1021)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
>   at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
>   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
>   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
>   at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
>   at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
>   at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
>   at akka.actor.Actor.aroundReceive(Actor.scala:537)
>   at akka.actor.Actor.aroundReceive$(Actor.scala:535)
>   at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:548)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:231)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
>   at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
>   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
>   at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
>   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> Caused by: java.lang.UnsupportedOperationException: 
> notifyCheckpointAbortAsync not supported by 
> org.apache.flink.runtime.dispatcher.JobDispatcherITCase$AtLeastOneCheckpointInvokable
>   at 
> org.apache.flink.runtime.jobgraph.tasks.AbstractInvokable.notifyCheckpointAbortAsync(AbstractInvokable.java:205)
>   at 
> org.apache.flink.runtime.taskmanager.Task.notifyCheckpoint(Task.java:1430)
>   ... 31 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-25935) Add a harness based entry point to simply getting started.

2022-02-07 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-25935.
-
Resolution: Fixed

Fixed via 66b8cd0453990b4e56a48392388391dd4e6eafa0

> Add a harness based entry point to simply getting started.
> --
>
> Key: FLINK-25935
> URL: https://issues.apache.org/jira/browse/FLINK-25935
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available
>
> It would be nicer to improve the getting started experience by providing an 
> additional entry point in the StateFun distribution that is built on the 
> Harness.
> This can be as simple as providing a Main function that configures RocksDB 
> and starts the StateFun Flink job.
> The rest of the configurations needs to come from the module.yaml
>  
> Having something like that will allow us to simplfy the playground even 
> further by reducing the start time and the memory requirements for a 
> docker-compose based example.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-25934) Modernize statefun playground examples

2022-02-07 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-25934.
-
Resolution: Fixed

Fixed via

5f082918129f4253797fb698adfe5f16bbc94166
726911f97a8b729bfd20cb366dc636be0c623f05
d6f6da3d77ad00fa8e26086a6d13551a3f55b928

> Modernize statefun playground examples
> --
>
> Key: FLINK-25934
> URL: https://issues.apache.org/jira/browse/FLINK-25934
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available
>
> It is about time to touch up abit the examples in playground.
> Most of the docker-compose/docker files are pretty old and there are a lot of 
> room for improvement.
>  # use redpanda instead of kafka+zk - from local experiments it seems to cut 
> the start time and the memory requirements significantly. In addition it also 
> comes with a REST proxy, which can improve the interactivity with the 
> examples quite a lot.
>  # For the Java examples, there is no reason to use java8 for the remote 
> functions. We can use at least 11, if not higher.
>  # Replace the pair of a JobManager+TaskManager by a simple Minicluster 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25981) ZooKeeperMultipleComponentLeaderElectionDriverTest failed

2022-02-07 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488138#comment-17488138
 ] 

Till Rohrmann commented on FLINK-25981:
---

Hmm, from the logs it looks as if there is no new leader elected after the 
first election driver is closed. Unfortunately, the logs for ZooKeeper and 
Curator are disabled. Therefore, there is not a lot more to extract from the 
logs. I've tried reproducing the problem locally. This was unsuccessful so far. 
Maybe you can upload the logs for the failed run [~mapohl] with ZooKeeper and 
Curator logging enabled.

> ZooKeeperMultipleComponentLeaderElectionDriverTest failed
> -
>
> Key: FLINK-25981
> URL: https://issues.apache.org/jira/browse/FLINK-25981
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: test-stability
>
> We experienced a [build 
> failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30783&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7&l=15997]
>  in 
> {{ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers}}.
>  The test halted when waiting for the next leader in 
> [ZooKeeperMultipleComponentLeaderElectionDriverTest:256|https://github.com/apache/flink/blob/e8742f7f5cac34852d0e621036e1614bbdfe8ec3/flink-runtime/src/test/java/org/apache/flink/runtime/leaderelection/ZooKeeperMultipleComponentLeaderElectionDriverTest.java#L256]
> {code}
> Feb 04 18:02:54 "main" #1 prio=5 os_prio=0 tid=0x7fab0800b800 nid=0xe07 
> waiting on condition [0x7fab12574000]
> Feb 04 18:02:54java.lang.Thread.State: WAITING (parking)
> Feb 04 18:02:54   at sun.misc.Unsafe.park(Native Method)
> Feb 04 18:02:54   - parking to wait for  <0x8065c5c8> (a 
> java.util.concurrent.CompletableFuture$Signaller)
> Feb 04 18:02:54   at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> Feb 04 18:02:54   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> Feb 04 18:02:54   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> Feb 04 18:02:54   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> Feb 04 18:02:54   at 
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
> Feb 04 18:02:54   at 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers(ZooKeeperMultipleComponentLeaderElectionDriverTest.java:256)
> [...]
> {code}
> The extended Maven logs indicate that the timeout happened while waiting for 
> the second leader to be selected.
> {code}
> Test 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriverTest.testLeaderElectionWithMultipleDrivers
>  is running.
> 
> 17:15:10,437 [   Thread-16] INFO  
> org.apache.curator.test.TestingZooKeeperMain [] - Starting 
> server
> 17:15:10,450 [main] INFO  
> org.apache.flink.runtime.util.ZooKeeperUtils [] - Enforcing 
> default ACL for ZK connections
> 17:15:10,451 [main] INFO  
> org.apache.flink.runtime.util.ZooKeeperUtils [] - Using 
> '/flink/default' as Zookeeper namespace.
> 17:15:10,452 [main] INFO  
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl
>  [] - Starting
> 17:15:10,455 [main] INFO  
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl
>  [] - Default schema
> 17:15:10,462 [main-EventThread] INFO  
> org.apache.flink.shaded.curator5.org.apache.curator.framework.state.ConnectionStateManager
>  [] - State change: CONNECTED
> 17:15:10,467 [main-EventThread] INFO  
> org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker
>  [] - New config event received: {}
> 17:15:10,482 [Curator-ConnectionStateManager-0] DEBUG 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
>  [] - Connected to ZooKeeper quorum. Leader election can start.
> 17:15:10,483 [Curator-ConnectionStateManager-0] DEBUG 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
>  [] - Connected to ZooKeeper quorum. Leader election can start.
> 17:15:10,483 [Curator-ConnectionStateManager-0] DEBUG 
> org.apache.flink.runtime.leaderelection.ZooKeeperMultipleComponentLeaderElectionDriver
>  [] - Connected to ZooKeeper quorum. Leader e

[jira] [Assigned] (FLINK-25934) Modernize statefun playground examples

2022-02-07 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-25934:
-

Assignee: Till Rohrmann

> Modernize statefun playground examples
> --
>
> Key: FLINK-25934
> URL: https://issues.apache.org/jira/browse/FLINK-25934
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Till Rohrmann
>Priority: Major
>
> It is about time to touch up abit the examples in playground.
> Most of the docker-compose/docker files are pretty old and there are a lot of 
> room for improvement.
>  # use redpanda instead of kafka+zk - from local experiments it seems to cut 
> the start time and the memory requirements significantly. In addition it also 
> comes with a REST proxy, which can improve the interactivity with the 
> examples quite a lot.
>  # For the Java examples, there is no reason to use java8 for the remote 
> functions. We can use at least 11, if not higher.
>  # Replace the pair of a JobManager+TaskManager by a simple Minicluster 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-25893) ResourceManagerServiceImpl's lifecycle can lead to exceptions

2022-02-07 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488033#comment-17488033
 ] 

Till Rohrmann commented on FLINK-25893:
---

For option 1) I think a leading {{Dispatcher}} would decide that it is now time 
to shut down and to deregister the application. Then the {{ClusterEntrypoint}} 
would get the signal and initiate the deregistration.

For option 2): Assuming that eventually a {{Dispatcher}} becomes leader that is 
running in the same process as the leading {{RM}} which then triggers the shut 
down, I think this can work. Moreover with FLINK-24038 the problem of a leading 
RM and {{Dispatcher}} running in different processes should no longer happen.

> ResourceManagerServiceImpl's lifecycle can lead to exceptions
> -
>
> Key: FLINK-25893
> URL: https://issues.apache.org/jira/browse/FLINK-25893
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.0, 1.14.3
>Reporter: Till Rohrmann
>Assignee: Xintong Song
>Priority: Critical
>  Labels: pull-request-available
>
> The {{ResourceManagerServiceImpl}} lifecycle can lead to exceptions when 
> calling {{ResourceManagerServiceImpl.deregisterApplication}}. The problem 
> arises when the {{DispatcherResourceManagerComponent}} is shutdown before the 
> {{ResourceManagerServiceImpl}} gains leadership or while it is starting the 
> {{ResourceManager}}.
> One problem is that {{deregisterApplication}} returns an exceptionally 
> completed future if there is no leading {{ResourceManager}}.
> Another problem is that if there is a leading {{ResourceManager}}, then it 
> can still be the case that it has not been started yet. If this is the case, 
> then 
> [ResourceManagerGateway.deregisterApplication|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManagerServiceImpl.java#L143]
>  will be discarded. The reason for this behaviour is that we create a 
> {{ResourceManager}} in one {{Runnable}} and only start it in another. Due to 
> this there can be the {{deregisterApplication}} call that gets the {{lock}} 
> in between.
> I'd suggest to correct the lifecycle and contract of the 
> {{ResourceManagerServiceImpl.deregisterApplication}}.
> Please note that due to this problem, the error reporting of this method has 
> been suppressed. See FLINK-25885 for more details.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25924) KDF Integration tests intermittently fails

2022-02-07 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-25924:
--
Affects Version/s: 1.15.0

> KDF Integration tests intermittently fails
> --
>
> Key: FLINK-25924
> URL: https://issues.apache.org/jira/browse/FLINK-25924
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.15.0
>Reporter: Zichen Liu
>Assignee: Ahmed Hamdy
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> Intermittent failures introduced as part of merge (PR#18314: 
> [FLINK-24228[connectors/firehose] - Unified Async Sink for Kinesis 
> Firehose|https://github.com/apache/flink/pull/18314]):
>  # Failures are intermittent and affecting c. 1 in 7 of builds- on 
> {{flink-ci.flink}} and {{flink-ci.flink-master-mirror}} .
>  # The issue looks identical to the KinesaliteContainer startup issue 
> (Appendix 1).
>  # I have managed to reproduce the issue locally - if I start some parallel 
> containers and keep them running - and then run {{KinesisFirehoseSinkITCase}} 
>  then c. 1 in 6 gives the error.
>  # The errors have a slightly different appearance on 
> {{flink-ci.flink-master-mirror}} vs {{flink-ci.flink}} which has the same 
> appearance as local. I only hope it is a difference in logging/killing 
> environment variables. (and that there aren’t 2 distinct issues)
> Appendix 1:
> {code:java}
> org.testcontainers.containers.ContainerLaunchException: Container startup 
> failed
> at 
> org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:336)
> at 
> org.testcontainers.containers.GenericContainer.start(GenericContainer.java:317)
> at 
> org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:1066)
> at 
> ... 11 more
> Caused by: org.testcontainers.containers.ContainerLaunchException: Could not 
> create/start container
> at 
> org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:525)
> at 
> org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:331)
> at 
> org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81)
> ... 12 more
> Caused by: org.rnorth.ducttape.TimeoutException: Timeout waiting for result 
> with exception
> at 
> org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:54)
> at
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-25924) KDF Integration tests intermittently fails

2022-02-07 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-25924:
--
Fix Version/s: (was: 1.15.0)

> KDF Integration tests intermittently fails
> --
>
> Key: FLINK-25924
> URL: https://issues.apache.org/jira/browse/FLINK-25924
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Reporter: Zichen Liu
>Assignee: Ahmed Hamdy
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> Intermittent failures introduced as part of merge (PR#18314: 
> [FLINK-24228[connectors/firehose] - Unified Async Sink for Kinesis 
> Firehose|https://github.com/apache/flink/pull/18314]):
>  # Failures are intermittent and affecting c. 1 in 7 of builds- on 
> {{flink-ci.flink}} and {{flink-ci.flink-master-mirror}} .
>  # The issue looks identical to the KinesaliteContainer startup issue 
> (Appendix 1).
>  # I have managed to reproduce the issue locally - if I start some parallel 
> containers and keep them running - and then run {{KinesisFirehoseSinkITCase}} 
>  then c. 1 in 6 gives the error.
>  # The errors have a slightly different appearance on 
> {{flink-ci.flink-master-mirror}} vs {{flink-ci.flink}} which has the same 
> appearance as local. I only hope it is a difference in logging/killing 
> environment variables. (and that there aren’t 2 distinct issues)
> Appendix 1:
> {code:java}
> org.testcontainers.containers.ContainerLaunchException: Container startup 
> failed
> at 
> org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:336)
> at 
> org.testcontainers.containers.GenericContainer.start(GenericContainer.java:317)
> at 
> org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:1066)
> at 
> ... 11 more
> Caused by: org.testcontainers.containers.ContainerLaunchException: Could not 
> create/start container
> at 
> org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:525)
> at 
> org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:331)
> at 
> org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81)
> ... 12 more
> Caused by: org.rnorth.ducttape.TimeoutException: Timeout waiting for result 
> with exception
> at 
> org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:54)
> at
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-25885) ClusterEntrypointTest.testWorkingDirectoryIsDeletedIfApplicationCompletes failed on azure

2022-02-03 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-25885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-25885.
-
Fix Version/s: 1.15.0
   Resolution: Fixed

Fixed via 59d2d84d83696d4775b6ff3ec97f8274ff6e371f

> ClusterEntrypointTest.testWorkingDirectoryIsDeletedIfApplicationCompletes  
> failed on azure
> --
>
> Key: FLINK-25885
> URL: https://issues.apache.org/jira/browse/FLINK-25885
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.15.0
>Reporter: Yun Gao
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available, test-stability
> Fix For: 1.15.0
>
>
> {code:java}
> 2022-01-31T05:00:07.3113870Z Jan 31 05:00:07 
> java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.rpc.akka.exceptions.AkkaRpcException: Discard 
> message, because the rpc endpoint 
> akka.tcp://flink@127.0.0.1:6123/user/rpc/resourcemanager_2 has not been 
> started yet.
> 2022-01-31T05:00:07.3115008Z Jan 31 05:00:07  at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> 2022-01-31T05:00:07.3115778Z Jan 31 05:00:07  at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> 2022-01-31T05:00:07.3116527Z Jan 31 05:00:07  at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
> 2022-01-31T05:00:07.3117267Z Jan 31 05:00:07  at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> 2022-01-31T05:00:07.3118011Z Jan 31 05:00:07  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2022-01-31T05:00:07.3118770Z Jan 31 05:00:07  at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
> 2022-01-31T05:00:07.3119608Z Jan 31 05:00:07  at 
> org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:251)
> 2022-01-31T05:00:07.3120425Z Jan 31 05:00:07  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
> 2022-01-31T05:00:07.3121199Z Jan 31 05:00:07  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
> 2022-01-31T05:00:07.3121957Z Jan 31 05:00:07  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2022-01-31T05:00:07.3122716Z Jan 31 05:00:07  at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
> 2022-01-31T05:00:07.3123457Z Jan 31 05:00:07  at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1387)
> 2022-01-31T05:00:07.3124241Z Jan 31 05:00:07  at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
> 2022-01-31T05:00:07.3125106Z Jan 31 05:00:07  at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
> 2022-01-31T05:00:07.3126063Z Jan 31 05:00:07  at 
> org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
> 2022-01-31T05:00:07.3127207Z Jan 31 05:00:07  at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
> 2022-01-31T05:00:07.3127982Z Jan 31 05:00:07  at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
> 2022-01-31T05:00:07.3128741Z Jan 31 05:00:07  at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 2022-01-31T05:00:07.3129497Z Jan 31 05:00:07  at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
> 2022-01-31T05:00:07.3130385Z Jan 31 05:00:07  at 
> org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:45)
> 2022-01-31T05:00:07.3131092Z Jan 31 05:00:07  at 
> akka.dispatch.OnComplete.internal(Future.scala:299)
> 2022-01-31T05:00:07.3131695Z Jan 31 05:00:07  at 
> akka.dispatch.OnComplete.internal(Future.scala:297)
> 2022-01-31T05:00:07.3132310Z Jan 31 05:00:07  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:224)
> 2022-01-31T05:00:07.3132943Z Jan 31 05:00:07  at 
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:221)
> 2022-01-31T05:00:07.3133577Z Jan 31 05:00:07  at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
> 2022-01-31T05:00:07.3134340Z Jan 31 05:00:07  at 
> org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$DirectExecutionContext.execute(AkkaFutureUtils.java:65)
> 2022-01-31T05:00:07.3135149Z Jan 31 05:00:07  at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
> 2022-01-31T05:00:07.313589

[jira] [Comment Edited] (FLINK-21752) NullPointerException on restore in PojoSerializer

2022-02-03 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486454#comment-17486454
 ] 

Till Rohrmann edited comment on FLINK-21752 at 2/3/22, 1:03 PM:


Yes, I agree. This is quite bad given that we provide [schema evolution for 
POJO 
types|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/fault-tolerance/serialization/schema_evolution/#supported-data-types-for-schema-evolution].
 Let's fix it in all currently supported releases.


was (Author: till.rohrmann):
Yes, I agree. This is quite bad given that we provide [schema evolution for 
POJO 
types|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/fault-tolerance/serialization/schema_evolution/#supported-data-types-for-schema-evolution].

> NullPointerException on restore in PojoSerializer
> -
>
> Key: FLINK-21752
> URL: https://issues.apache.org/jira/browse/FLINK-21752
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System
>Affects Versions: 1.15.0, 1.13.5, 1.14.3
>Reporter: Roman Khachatryan
>Assignee: Dawid Wysakowicz
>Priority: Blocker
>  Labels: pull-request-available
>
> As originally reported in 
> [thread|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Schema-Evolution-Cannot-restore-from-savepoint-after-deleting-field-from-POJO-td42162.html],
>  after removing a field from a class restore from savepoint fails with the 
> following exception:
> {code}
> 2021-03-10T20:51:30.406Z INFO  org.apache.flink.runtime.taskmanager.Task:960 
> … (6/8) (d630d5ff0d7ae4fbc428b151abebab52) switched from RUNNING to FAILED. 
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:253)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:901)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:415)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for 
> KeyedCoProcessOperator_c535ac415eeb524d67c88f4a481077d2_(6/8) from any of the 
> 1 provided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> ... 6 common frames omitted
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed 
> when trying to restore heap backend
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:116)
> at 
> org.apache.flink.runtime.state.memory.MemoryStateBackend.createKeyedStateBackend(MemoryStateBackend.java:347)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
> ... 8 common frames omitted
> Caused by: java.lang.NullPointerException: null
> at 
> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.(PojoSerializer.java:123)
> at 
> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.duplicate(PojoSerializer.java:186)
> at 
> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.duplicate(PojoSerializer.java:56)
> at 
> org.apache.flink.api.common.typeutils.CompositeSerializer$PrecomputedParameters.precompute(CompositeSerializer.java:228)
> at 
> org.apache.flink.api.common.typeutils.CompositeSerializer.(CompositeSerializer.java:51)
> at 
> org.apache.flink.runtime.state.ttl.TtlStateFactory$TtlSerializer.(TtlStateFactory.java:250)
> at 
> org.apache.flink.runtime.state.ttl.TtlStateFactory$TtlSerializerSnaps

  1   2   3   4   5   6   7   8   9   10   >