[jira] [Commented] (FLINK-23556) SQLClientSchemaRegistryITCase fails with " Subject ... not found"

2021-08-21 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402730#comment-17402730
 ] 

Biao Geng commented on FLINK-23556:
---

hi [~xtsong], I think my pull request is ready. I tried 3 times of e2e tests 
and this case works fine. I would appreciate a lot if you can help me to find a 
reviewer for it. Thanks! 

The pr link: 
[https://github.com/apache/flink/pull/16864|https://github.com/apache/flink/pull/16864]

> SQLClientSchemaRegistryITCase fails with " Subject ... not found"
> -
>
> Key: FLINK-23556
> URL: https://issues.apache.org/jira/browse/FLINK-23556
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Ecosystem
>Affects Versions: 1.14.0
>Reporter: Dawid Wysakowicz
>Assignee: Biao Geng
>Priority: Blocker
>  Labels: pull-request-available, stale-blocker, test-stability
> Fix For: 1.14.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=21129=logs=91bf6583-3fb2-592f-e4d4-d79d79c3230a=cc5499f8-bdde-5157-0d76-b6528ecd808e=25337
> {code}
> Jul 28 23:37:48 [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 209.44 s <<< FAILURE! - in 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase
> Jul 28 23:37:48 [ERROR] 
> testWriting(org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase)  
> Time elapsed: 81.146 s  <<< ERROR!
> Jul 28 23:37:48 
> io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: 
> Subject 'test-user-behavior-d18d4af2-3830-4620-9993-340c13f50cc2-value' not 
> found.; error code: 40401
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:292)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:352)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:769)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:760)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getAllVersions(CachedSchemaRegistryClient.java:364)
> Jul 28 23:37:48   at 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.getAllVersions(SQLClientSchemaRegistryITCase.java:230)
> Jul 28 23:37:48   at 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.testWriting(SQLClientSchemaRegistryITCase.java:195)
> Jul 28 23:37:48   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jul 28 23:37:48   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 28 23:37:48   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jul 28 23:37:48   at java.lang.reflect.Method.invoke(Method.java:498)
> Jul 28 23:37:48   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jul 28 23:37:48   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> Jul 28 23:37:48   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Jul 28 23:37:48   at java.lang.Thread.run(Thread.java:748)
> Jul 28 23:37:48 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-23556) SQLClientSchemaRegistryITCase fails with " Subject ... not found"

2021-08-18 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401174#comment-17401174
 ] 

Biao Geng edited comment on FLINK-23556 at 8/18/21, 4:09 PM:
-

Hi, [~xtsong] . I may need one more day to verify the PR after today's 
discussion with [~renqs] . 
 Once I think it is ready, I will contact you for a review.

Thanks!


was (Author: bgeng777):
Hi, [~xtsong] . I may need one day to verify the PR after today's discussion 
with [~renqs] . 
Once I think it is ready, I will contact you for a review.

Thanks!

> SQLClientSchemaRegistryITCase fails with " Subject ... not found"
> -
>
> Key: FLINK-23556
> URL: https://issues.apache.org/jira/browse/FLINK-23556
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Ecosystem
>Affects Versions: 1.14.0
>Reporter: Dawid Wysakowicz
>Assignee: Biao Geng
>Priority: Blocker
>  Labels: pull-request-available, stale-blocker, test-stability
> Fix For: 1.14.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=21129=logs=91bf6583-3fb2-592f-e4d4-d79d79c3230a=cc5499f8-bdde-5157-0d76-b6528ecd808e=25337
> {code}
> Jul 28 23:37:48 [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 209.44 s <<< FAILURE! - in 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase
> Jul 28 23:37:48 [ERROR] 
> testWriting(org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase)  
> Time elapsed: 81.146 s  <<< ERROR!
> Jul 28 23:37:48 
> io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: 
> Subject 'test-user-behavior-d18d4af2-3830-4620-9993-340c13f50cc2-value' not 
> found.; error code: 40401
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:292)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:352)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:769)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:760)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getAllVersions(CachedSchemaRegistryClient.java:364)
> Jul 28 23:37:48   at 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.getAllVersions(SQLClientSchemaRegistryITCase.java:230)
> Jul 28 23:37:48   at 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.testWriting(SQLClientSchemaRegistryITCase.java:195)
> Jul 28 23:37:48   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jul 28 23:37:48   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 28 23:37:48   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jul 28 23:37:48   at java.lang.reflect.Method.invoke(Method.java:498)
> Jul 28 23:37:48   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jul 28 23:37:48   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> Jul 28 23:37:48   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Jul 28 23:37:48   at java.lang.Thread.run(Thread.java:748)
> Jul 28 23:37:48 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-23556) SQLClientSchemaRegistryITCase fails with " Subject ... not found"

2021-08-18 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401174#comment-17401174
 ] 

Biao Geng commented on FLINK-23556:
---

Hi, [~xtsong] . I may need one day to verify the PR after today's discussion 
with [~renqs] . 
Once I think it is ready, I will contact you for a review.

Thanks!

> SQLClientSchemaRegistryITCase fails with " Subject ... not found"
> -
>
> Key: FLINK-23556
> URL: https://issues.apache.org/jira/browse/FLINK-23556
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Ecosystem
>Affects Versions: 1.14.0
>Reporter: Dawid Wysakowicz
>Assignee: Biao Geng
>Priority: Blocker
>  Labels: pull-request-available, stale-blocker, test-stability
> Fix For: 1.14.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=21129=logs=91bf6583-3fb2-592f-e4d4-d79d79c3230a=cc5499f8-bdde-5157-0d76-b6528ecd808e=25337
> {code}
> Jul 28 23:37:48 [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 209.44 s <<< FAILURE! - in 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase
> Jul 28 23:37:48 [ERROR] 
> testWriting(org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase)  
> Time elapsed: 81.146 s  <<< ERROR!
> Jul 28 23:37:48 
> io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: 
> Subject 'test-user-behavior-d18d4af2-3830-4620-9993-340c13f50cc2-value' not 
> found.; error code: 40401
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:292)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:352)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:769)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:760)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getAllVersions(CachedSchemaRegistryClient.java:364)
> Jul 28 23:37:48   at 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.getAllVersions(SQLClientSchemaRegistryITCase.java:230)
> Jul 28 23:37:48   at 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.testWriting(SQLClientSchemaRegistryITCase.java:195)
> Jul 28 23:37:48   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jul 28 23:37:48   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 28 23:37:48   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jul 28 23:37:48   at java.lang.reflect.Method.invoke(Method.java:498)
> Jul 28 23:37:48   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jul 28 23:37:48   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> Jul 28 23:37:48   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Jul 28 23:37:48   at java.lang.Thread.run(Thread.java:748)
> Jul 28 23:37:48 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-23556) SQLClientSchemaRegistryITCase fails with " Subject ... not found"

2021-08-17 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400465#comment-17400465
 ] 

Biao Geng commented on FLINK-23556:
---

hi [~xtsong], thanks a lot for your support! I have dived into this IT case for 
some days and create a [PR|[https://github.com/apache/flink/pull/16864]] to 
make it more stable. But my current fix does not actually guarantee the success 
of this case. It should just make it more stable.

My summary:
 * AFAIK, the logic and implementation of this case is correct.
 * Due to the complexity of this case, the running time of this IT case should 
not be limited to 120 seconds. Due to my own tests, even only the running time 
of the internal flink job can exceed the limit easily.
 * I believe the failure of this case should not be caused by flink. Instead, I 
doubt the network issue of test containers(i.e. a kafka container, a flink 
container and a schemaRegistry container) in this case is more likely be the 
root cause. But I need more time to verify this guess.

 

> SQLClientSchemaRegistryITCase fails with " Subject ... not found"
> -
>
> Key: FLINK-23556
> URL: https://issues.apache.org/jira/browse/FLINK-23556
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Ecosystem
>Affects Versions: 1.14.0
>Reporter: Dawid Wysakowicz
>Assignee: Biao Geng
>Priority: Blocker
>  Labels: pull-request-available, stale-blocker, test-stability
> Fix For: 1.14.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=21129=logs=91bf6583-3fb2-592f-e4d4-d79d79c3230a=cc5499f8-bdde-5157-0d76-b6528ecd808e=25337
> {code}
> Jul 28 23:37:48 [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 209.44 s <<< FAILURE! - in 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase
> Jul 28 23:37:48 [ERROR] 
> testWriting(org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase)  
> Time elapsed: 81.146 s  <<< ERROR!
> Jul 28 23:37:48 
> io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: 
> Subject 'test-user-behavior-d18d4af2-3830-4620-9993-340c13f50cc2-value' not 
> found.; error code: 40401
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:292)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:352)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:769)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:760)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getAllVersions(CachedSchemaRegistryClient.java:364)
> Jul 28 23:37:48   at 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.getAllVersions(SQLClientSchemaRegistryITCase.java:230)
> Jul 28 23:37:48   at 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.testWriting(SQLClientSchemaRegistryITCase.java:195)
> Jul 28 23:37:48   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jul 28 23:37:48   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 28 23:37:48   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jul 28 23:37:48   at java.lang.reflect.Method.invoke(Method.java:498)
> Jul 28 23:37:48   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jul 28 23:37:48   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> Jul 28 23:37:48   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Jul 28 23:37:48   at java.lang.Thread.run(Thread.java:748)
> Jul 28 23:37:48 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-23556) SQLClientSchemaRegistryITCase fails with " Subject ... not found"

2021-08-23 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402730#comment-17402730
 ] 

Biao Geng edited comment on FLINK-23556 at 8/24/21, 4:29 AM:
-

hi [~xtsong], I think my pull request is ready. I tried 3 times of e2e tests 
and this case works fine. I would appreciate a lot if you can help me to find a 
reviewer for it. Thanks! 

The pr link: [https://github.com/apache/flink/pull/16952]


was (Author: bgeng777):
hi [~xtsong], I think my pull request is ready. I tried 3 times of e2e tests 
and this case works fine. I would appreciate a lot if you can help me to find a 
reviewer for it. Thanks! 

The pr link: 
[https://github.com/apache/flink/pull/16864|https://github.com/apache/flink/pull/16864]

> SQLClientSchemaRegistryITCase fails with " Subject ... not found"
> -
>
> Key: FLINK-23556
> URL: https://issues.apache.org/jira/browse/FLINK-23556
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Ecosystem
>Affects Versions: 1.14.0
>Reporter: Dawid Wysakowicz
>Assignee: Biao Geng
>Priority: Blocker
>  Labels: pull-request-available, stale-blocker, test-stability
> Fix For: 1.14.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=21129=logs=91bf6583-3fb2-592f-e4d4-d79d79c3230a=cc5499f8-bdde-5157-0d76-b6528ecd808e=25337
> {code}
> Jul 28 23:37:48 [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 209.44 s <<< FAILURE! - in 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase
> Jul 28 23:37:48 [ERROR] 
> testWriting(org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase)  
> Time elapsed: 81.146 s  <<< ERROR!
> Jul 28 23:37:48 
> io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: 
> Subject 'test-user-behavior-d18d4af2-3830-4620-9993-340c13f50cc2-value' not 
> found.; error code: 40401
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:292)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:352)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:769)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:760)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getAllVersions(CachedSchemaRegistryClient.java:364)
> Jul 28 23:37:48   at 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.getAllVersions(SQLClientSchemaRegistryITCase.java:230)
> Jul 28 23:37:48   at 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.testWriting(SQLClientSchemaRegistryITCase.java:195)
> Jul 28 23:37:48   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jul 28 23:37:48   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 28 23:37:48   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jul 28 23:37:48   at java.lang.reflect.Method.invoke(Method.java:498)
> Jul 28 23:37:48   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jul 28 23:37:48   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> Jul 28 23:37:48   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Jul 28 23:37:48   at java.lang.Thread.run(Thread.java:748)
> Jul 28 23:37:48 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-23556) SQLClientSchemaRegistryITCase fails with " Subject ... not found"

2021-08-10 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396751#comment-17396751
 ] 

Biao Geng commented on FLINK-23556:
---

I have read the test codes and I doubt that the port in the 
`avro-confluent.url` may be wrong due to the port mapping of the container. I 
will discuss with [~renqs] to verify the guess.

> SQLClientSchemaRegistryITCase fails with " Subject ... not found"
> -
>
> Key: FLINK-23556
> URL: https://issues.apache.org/jira/browse/FLINK-23556
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Ecosystem
>Affects Versions: 1.14.0
>Reporter: Dawid Wysakowicz
>Assignee: Biao Geng
>Priority: Blocker
>  Labels: stale-blocker, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=21129=logs=91bf6583-3fb2-592f-e4d4-d79d79c3230a=cc5499f8-bdde-5157-0d76-b6528ecd808e=25337
> {code}
> Jul 28 23:37:48 [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 209.44 s <<< FAILURE! - in 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase
> Jul 28 23:37:48 [ERROR] 
> testWriting(org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase)  
> Time elapsed: 81.146 s  <<< ERROR!
> Jul 28 23:37:48 
> io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: 
> Subject 'test-user-behavior-d18d4af2-3830-4620-9993-340c13f50cc2-value' not 
> found.; error code: 40401
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:292)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:352)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:769)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:760)
> Jul 28 23:37:48   at 
> io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getAllVersions(CachedSchemaRegistryClient.java:364)
> Jul 28 23:37:48   at 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.getAllVersions(SQLClientSchemaRegistryITCase.java:230)
> Jul 28 23:37:48   at 
> org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.testWriting(SQLClientSchemaRegistryITCase.java:195)
> Jul 28 23:37:48   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jul 28 23:37:48   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 28 23:37:48   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Jul 28 23:37:48   at java.lang.reflect.Method.invoke(Method.java:498)
> Jul 28 23:37:48   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Jul 28 23:37:48   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> Jul 28 23:37:48   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> Jul 28 23:37:48   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Jul 28 23:37:48   at java.lang.Thread.run(Thread.java:748)
> Jul 28 23:37:48 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-24897) Enable application mode on YARN to use usrlib

2021-11-18 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446258#comment-17446258
 ] 

Biao Geng commented on FLINK-24897:
---

Hi [~trohrmann] , would you mind giving any comments on if we have the promise 
that jars under {{usrlib}} will always be loaded by user classloader? 
Besides, after above discussion with Yang, the current solution is:
1. {{usrlib}} will be shipped automatically if it exists.
2. If we add {{usrlib}} in ship files again, we will throw exception.
3. {{usrlib}} will work for both per job and application mode.
4. Jars in {{usrlib}} will be loaded by user classloader only when 
UserJarInclusion is DISABLED. In other cases, AppClassLoader will be used.

Thanks.

> Enable application mode on YARN to use usrlib
> -
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Priority: Major
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store their user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, classes in those user jars will be loaded by AppClassLoader. 
> But if I do not ship these jars, there is no convenient way to utilize these 
> jars in my flink run command. Currently, all I can do seems to use `-C` 
> option, which means I have to upload my jars to some shared store first and 
> then use these remote paths. It is not so perfect as we have already make it 
> possible to ship jars or files directly and we also introduce `usrlib` in 
> application mode on YARN. It would be more user-friendly if we can allow 
> shipping `usrlib` from local to remote cluster while using 
> FlinkUserCodeClassLoader to load classes in the jars in `usrlib`.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-24897) Enable application mode on YARN to use usrlib

2021-11-29 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450424#comment-17450424
 ] 

Biao Geng commented on FLINK-24897:
---

Thank you very much. I will follow above discussion to create a pull request in 
this(hopefully) or next week. 


> Enable application mode on YARN to use usrlib
> -
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Assignee: Biao Geng
>Priority: Major
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store their user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, classes in those user jars will be loaded by AppClassLoader. 
> But if I do not ship these jars, there is no convenient way to utilize these 
> jars in my flink run command. Currently, all I can do seems to use `-C` 
> option, which means I have to upload my jars to some shared store first and 
> then use these remote paths. It is not so perfect as we have already make it 
> possible to ship jars or files directly and we also introduce `usrlib` in 
> application mode on YARN. It would be more user-friendly if we can allow 
> shipping `usrlib` from local to remote cluster while using 
> FlinkUserCodeClassLoader to load classes in the jars in `usrlib`.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-24897) Enable application mode on YARN to use usrlib

2021-11-15 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-24897:
--
Description: 
Hi there, 
I am working to utilize application mode to submit flink jobs to YARN cluster 
but I find that currently there is no easy way to ship my user-defined 
jars(e.g. some custom connectors or udf jars that would be shared by some jobs) 
and ask the FlinkUserCodeClassLoader to load classes in these jars. 

I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
solution that users can use `usrlib` directory to store their user-defined jars 
and these jars would be loaded by FlinkUserCodeClassLoader when the job is 
executed on JM/TM.

But on YARN mode, `usrlib` does not work as that:

In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if I 
want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my local 
machine) to remote cluster, I must not set  UserJarInclusion to DISABLED due to 
the checkArgument(). However, if I do not set that option to DISABLED, the user 
jars to be shipped will be added into systemClassPaths. As a result, classes in 
those user jars will be loaded by AppClassLoader. 

But if I do not ship these jars, there is no convenient way to utilize these 
jars in my flink run command. Currently, all I can do is to use `-C` option, 
which means I have to upload my jars to some shared store first and then use 
these remote paths. It is not so perfect as we have already make it possible to 
ship jars or files directly and we also introduce `usrlib` in application mode 
on YARN. It would be more user-friendly, if we can allow shipping `usrlib` from 
local to remote cluster while using FlinkUserCodeClassLoader to load classes in 
the jars in `usrlib`.

 

  was:
Hi there, 
I am working to utilize application mode to submit flink jobs to YARN cluster 
but I find that currently there is no easy way to ship my user-defined 
jars(e.g. some custom connectors or udf jars that would be shared by some jobs) 
and ask the FlinkUserCodeClassLoader to load classes in these jars. 

I checked some relevant jiras, like  
[FLINK-21289|https://issues.apache.org/jira/browse/FLINK-21289]. In k8s mode, 
there is a solution that users can use `usrlib` directory to store there 
user-defined jars and these jars would be loaded by FlinkUserCodeClassLoader 
when the job is executed on JM/TM.

But on YARN mode, `usrlib` does not work as that:

In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if I 
want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my local 
machine) to remote cluster, I must not set  UserJarInclusion to DISABLED due to 
the checkArgument(). However, if I do not set that option to DISABLED, the user 
jars to be shipped will be added into systemClassPaths. As a result, classes in 
those user jars will be loaded by AppClassLoader. 

But if I do not ship these jars, there is no convenient way to utilize these 
jars in my flink run command. Currently, all I can do is to use `-C` option, 
which means I have to upload my jars to some shared store first and then use 
these remote paths. It is not so perfect as we have already make it possible to 
ship jars or files directly and we also introduce `usrlib` in application mode 
on YARN. It would be more user-friendly, if we can allow shipping `usrlib` from 
local to remote cluster while using FlinkUserCodeClassLoader to load classes in 
the jars in `usrlib`.

 


> Enable application mode on YARN to use usrlib
> -
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Priority: Major
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store their user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, 

[jira] [Commented] (FLINK-24897) Enable application mode on YARN to use usrlib

2021-11-15 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443680#comment-17443680
 ] 

Biao Geng commented on FLINK-24897:
---

Hi, [~trohrmann] [~wangyang0918] [~xtsong], very sorry to bother you but you 
folks are expert in this area and would you mind giving any feedback on this 
issue?  Thanks a lot!
I am also willing to do some work on this issue if we decide to add such 
ability.

> Enable application mode on YARN to use usrlib
> -
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Priority: Major
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store their user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, classes in those user jars will be loaded by AppClassLoader. 
> But if I do not ship these jars, there is no convenient way to utilize these 
> jars in my flink run command. Currently, all I can do seems to use `-C` 
> option, which means I have to upload my jars to some shared store first and 
> then use these remote paths. It is not so perfect as we have already make it 
> possible to ship jars or files directly and we also introduce `usrlib` in 
> application mode on YARN. It would be more user-friendly if we can allow 
> shipping `usrlib` from local to remote cluster while using 
> FlinkUserCodeClassLoader to load classes in the jars in `usrlib`.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24897) Enable application mode on YARN to use usrlib

2021-11-15 Thread Biao Geng (Jira)
Biao Geng created FLINK-24897:
-

 Summary: Enable application mode on YARN to use usrlib
 Key: FLINK-24897
 URL: https://issues.apache.org/jira/browse/FLINK-24897
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Biao Geng


Hi there, 
I am working to utilize application mode to submit flink jobs to YARN cluster 
but I find that currently there is no easy way to ship my user-defined 
jars(e.g. some custom connectors or udf jars that would be shared by all jobs) 
and ask the FlinkUserCodeClassLoader to load classes in these jars. 

I checked some relevant jiras, like  [#FLINK-21289]. In k8s mode, there is a 
solution that users can use `usrlib` directory to store there user-defined jars 
and these jars would be loaded by FlinkUserCodeClassLoader when the job is 
executed on JM/TM.

But on YARN mode, `usrlib` does not work as that:

In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if I 
want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my local 
machine) to remote cluster, I must not set  UserJarInclusion to DISABLED due to 
the checkArgument(). However, if I do not set that option to DISABLED, the user 
jars to be shipped will be added into systemClassPaths. As a result, classes in 
those user jars will be loaded by AppClassLoader. 

But if I do not ship these jars, there is no convenient way to utilize these 
jars in my flink run command. Currently, all I can do is to use `-C` option, 
which means I have to upload my jars to some shared store first and then use 
these remote paths. It is not so perfect as we have already make it possible to 
ship jars or files directly and we also introduce `usrlib` in application mode 
on YARN. It would be more user-friendly, if we can allow shipping `usrlib` from 
local to remote cluster while using FlinkUserCodeClassLoader to load classes in 
the jars in `usrlib`.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-24897) Enable application mode on YARN to use usrlib

2021-11-15 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-24897:
--
Description: 
Hi there, 
I am working to utilize application mode to submit flink jobs to YARN cluster 
but I find that currently there is no easy way to ship my user-defined 
jars(e.g. some custom connectors or udf jars that would be shared by some jobs) 
and ask the FlinkUserCodeClassLoader to load classes in these jars. 

I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
solution that users can use `usrlib` directory to store their user-defined jars 
and these jars would be loaded by FlinkUserCodeClassLoader when the job is 
executed on JM/TM.

But on YARN mode, `usrlib` does not work as that:

In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if I 
want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my local 
machine) to remote cluster, I must not set  UserJarInclusion to DISABLED due to 
the checkArgument(). However, if I do not set that option to DISABLED, the user 
jars to be shipped will be added into systemClassPaths. As a result, classes in 
those user jars will be loaded by AppClassLoader. 

But if I do not ship these jars, there is no convenient way to utilize these 
jars in my flink run command. Currently, all I can do seems to use `-C` option, 
which means I have to upload my jars to some shared store first and then use 
these remote paths. It is not so perfect as we have already make it possible to 
ship jars or files directly and we also introduce `usrlib` in application mode 
on YARN. It would be more user-friendly, if we can allow shipping `usrlib` from 
local to remote cluster while using FlinkUserCodeClassLoader to load classes in 
the jars in `usrlib`.

 

  was:
Hi there, 
I am working to utilize application mode to submit flink jobs to YARN cluster 
but I find that currently there is no easy way to ship my user-defined 
jars(e.g. some custom connectors or udf jars that would be shared by some jobs) 
and ask the FlinkUserCodeClassLoader to load classes in these jars. 

I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
solution that users can use `usrlib` directory to store their user-defined jars 
and these jars would be loaded by FlinkUserCodeClassLoader when the job is 
executed on JM/TM.

But on YARN mode, `usrlib` does not work as that:

In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if I 
want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my local 
machine) to remote cluster, I must not set  UserJarInclusion to DISABLED due to 
the checkArgument(). However, if I do not set that option to DISABLED, the user 
jars to be shipped will be added into systemClassPaths. As a result, classes in 
those user jars will be loaded by AppClassLoader. 

But if I do not ship these jars, there is no convenient way to utilize these 
jars in my flink run command. Currently, all I can do is to use `-C` option, 
which means I have to upload my jars to some shared store first and then use 
these remote paths. It is not so perfect as we have already make it possible to 
ship jars or files directly and we also introduce `usrlib` in application mode 
on YARN. It would be more user-friendly, if we can allow shipping `usrlib` from 
local to remote cluster while using FlinkUserCodeClassLoader to load classes in 
the jars in `usrlib`.

 


> Enable application mode on YARN to use usrlib
> -
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Priority: Major
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store their user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, classes in those user jars will be loaded by 

[jira] [Updated] (FLINK-24897) Enable application mode on YARN to use usrlib

2021-11-15 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-24897:
--
Description: 
Hi there, 
I am working to utilize application mode to submit flink jobs to YARN cluster 
but I find that currently there is no easy way to ship my user-defined 
jars(e.g. some custom connectors or udf jars that would be shared by some jobs) 
and ask the FlinkUserCodeClassLoader to load classes in these jars. 

I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
solution that users can use `usrlib` directory to store their user-defined jars 
and these jars would be loaded by FlinkUserCodeClassLoader when the job is 
executed on JM/TM.

But on YARN mode, `usrlib` does not work as that:

In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if I 
want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my local 
machine) to remote cluster, I must not set  UserJarInclusion to DISABLED due to 
the checkArgument(). However, if I do not set that option to DISABLED, the user 
jars to be shipped will be added into systemClassPaths. As a result, classes in 
those user jars will be loaded by AppClassLoader. 

But if I do not ship these jars, there is no convenient way to utilize these 
jars in my flink run command. Currently, all I can do seems to use `-C` option, 
which means I have to upload my jars to some shared store first and then use 
these remote paths. It is not so perfect as we have already make it possible to 
ship jars or files directly and we also introduce `usrlib` in application mode 
on YARN. It would be more user-friendly if we can allow shipping `usrlib` from 
local to remote cluster while using FlinkUserCodeClassLoader to load classes in 
the jars in `usrlib`.

 

  was:
Hi there, 
I am working to utilize application mode to submit flink jobs to YARN cluster 
but I find that currently there is no easy way to ship my user-defined 
jars(e.g. some custom connectors or udf jars that would be shared by some jobs) 
and ask the FlinkUserCodeClassLoader to load classes in these jars. 

I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
solution that users can use `usrlib` directory to store their user-defined jars 
and these jars would be loaded by FlinkUserCodeClassLoader when the job is 
executed on JM/TM.

But on YARN mode, `usrlib` does not work as that:

In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if I 
want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my local 
machine) to remote cluster, I must not set  UserJarInclusion to DISABLED due to 
the checkArgument(). However, if I do not set that option to DISABLED, the user 
jars to be shipped will be added into systemClassPaths. As a result, classes in 
those user jars will be loaded by AppClassLoader. 

But if I do not ship these jars, there is no convenient way to utilize these 
jars in my flink run command. Currently, all I can do seems to use `-C` option, 
which means I have to upload my jars to some shared store first and then use 
these remote paths. It is not so perfect as we have already make it possible to 
ship jars or files directly and we also introduce `usrlib` in application mode 
on YARN. It would be more user-friendly, if we can allow shipping `usrlib` from 
local to remote cluster while using FlinkUserCodeClassLoader to load classes in 
the jars in `usrlib`.

 


> Enable application mode on YARN to use usrlib
> -
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Priority: Major
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store their user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, classes in those user jars will be loaded by 

[jira] [Updated] (FLINK-24897) Enable application mode on YARN to use usrlib

2021-11-15 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-24897:
--
Description: 
Hi there, 
I am working to utilize application mode to submit flink jobs to YARN cluster 
but I find that currently there is no easy way to ship my user-defined 
jars(e.g. some custom connectors or udf jars that would be shared by some jobs) 
and ask the FlinkUserCodeClassLoader to load classes in these jars. 

I checked some relevant jiras, like  
[FLINK-21289|https://issues.apache.org/jira/browse/FLINK-21289]. In k8s mode, 
there is a solution that users can use `usrlib` directory to store there 
user-defined jars and these jars would be loaded by FlinkUserCodeClassLoader 
when the job is executed on JM/TM.

But on YARN mode, `usrlib` does not work as that:

In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if I 
want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my local 
machine) to remote cluster, I must not set  UserJarInclusion to DISABLED due to 
the checkArgument(). However, if I do not set that option to DISABLED, the user 
jars to be shipped will be added into systemClassPaths. As a result, classes in 
those user jars will be loaded by AppClassLoader. 

But if I do not ship these jars, there is no convenient way to utilize these 
jars in my flink run command. Currently, all I can do is to use `-C` option, 
which means I have to upload my jars to some shared store first and then use 
these remote paths. It is not so perfect as we have already make it possible to 
ship jars or files directly and we also introduce `usrlib` in application mode 
on YARN. It would be more user-friendly, if we can allow shipping `usrlib` from 
local to remote cluster while using FlinkUserCodeClassLoader to load classes in 
the jars in `usrlib`.

 

  was:
Hi there, 
I am working to utilize application mode to submit flink jobs to YARN cluster 
but I find that currently there is no easy way to ship my user-defined 
jars(e.g. some custom connectors or udf jars that would be shared by some jobs) 
and ask the FlinkUserCodeClassLoader to load classes in these jars. 

I checked some relevant jiras, like  [#FLINK-21289]. In k8s mode, there is a 
solution that users can use `usrlib` directory to store there user-defined jars 
and these jars would be loaded by FlinkUserCodeClassLoader when the job is 
executed on JM/TM.

But on YARN mode, `usrlib` does not work as that:

In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if I 
want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my local 
machine) to remote cluster, I must not set  UserJarInclusion to DISABLED due to 
the checkArgument(). However, if I do not set that option to DISABLED, the user 
jars to be shipped will be added into systemClassPaths. As a result, classes in 
those user jars will be loaded by AppClassLoader. 

But if I do not ship these jars, there is no convenient way to utilize these 
jars in my flink run command. Currently, all I can do is to use `-C` option, 
which means I have to upload my jars to some shared store first and then use 
these remote paths. It is not so perfect as we have already make it possible to 
ship jars or files directly and we also introduce `usrlib` in application mode 
on YARN. It would be more user-friendly, if we can allow shipping `usrlib` from 
local to remote cluster while using FlinkUserCodeClassLoader to load classes in 
the jars in `usrlib`.

 


> Enable application mode on YARN to use usrlib
> -
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Priority: Major
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  
> [FLINK-21289|https://issues.apache.org/jira/browse/FLINK-21289]. In k8s mode, 
> there is a solution that users can use `usrlib` directory to store there 
> user-defined jars and these jars would be loaded by FlinkUserCodeClassLoader 
> when the job is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped 

[jira] [Updated] (FLINK-24897) Enable application mode on YARN to use usrlib

2021-11-15 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-24897:
--
Description: 
Hi there, 
I am working to utilize application mode to submit flink jobs to YARN cluster 
but I find that currently there is no easy way to ship my user-defined 
jars(e.g. some custom connectors or udf jars that would be shared by some jobs) 
and ask the FlinkUserCodeClassLoader to load classes in these jars. 

I checked some relevant jiras, like  [#FLINK-21289]. In k8s mode, there is a 
solution that users can use `usrlib` directory to store there user-defined jars 
and these jars would be loaded by FlinkUserCodeClassLoader when the job is 
executed on JM/TM.

But on YARN mode, `usrlib` does not work as that:

In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if I 
want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my local 
machine) to remote cluster, I must not set  UserJarInclusion to DISABLED due to 
the checkArgument(). However, if I do not set that option to DISABLED, the user 
jars to be shipped will be added into systemClassPaths. As a result, classes in 
those user jars will be loaded by AppClassLoader. 

But if I do not ship these jars, there is no convenient way to utilize these 
jars in my flink run command. Currently, all I can do is to use `-C` option, 
which means I have to upload my jars to some shared store first and then use 
these remote paths. It is not so perfect as we have already make it possible to 
ship jars or files directly and we also introduce `usrlib` in application mode 
on YARN. It would be more user-friendly, if we can allow shipping `usrlib` from 
local to remote cluster while using FlinkUserCodeClassLoader to load classes in 
the jars in `usrlib`.

 

  was:
Hi there, 
I am working to utilize application mode to submit flink jobs to YARN cluster 
but I find that currently there is no easy way to ship my user-defined 
jars(e.g. some custom connectors or udf jars that would be shared by all jobs) 
and ask the FlinkUserCodeClassLoader to load classes in these jars. 

I checked some relevant jiras, like  [#FLINK-21289]. In k8s mode, there is a 
solution that users can use `usrlib` directory to store there user-defined jars 
and these jars would be loaded by FlinkUserCodeClassLoader when the job is 
executed on JM/TM.

But on YARN mode, `usrlib` does not work as that:

In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if I 
want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my local 
machine) to remote cluster, I must not set  UserJarInclusion to DISABLED due to 
the checkArgument(). However, if I do not set that option to DISABLED, the user 
jars to be shipped will be added into systemClassPaths. As a result, classes in 
those user jars will be loaded by AppClassLoader. 

But if I do not ship these jars, there is no convenient way to utilize these 
jars in my flink run command. Currently, all I can do is to use `-C` option, 
which means I have to upload my jars to some shared store first and then use 
these remote paths. It is not so perfect as we have already make it possible to 
ship jars or files directly and we also introduce `usrlib` in application mode 
on YARN. It would be more user-friendly, if we can allow shipping `usrlib` from 
local to remote cluster while using FlinkUserCodeClassLoader to load classes in 
the jars in `usrlib`.

 


> Enable application mode on YARN to use usrlib
> -
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Priority: Major
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  [#FLINK-21289]. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store there user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, classes in those user jars will be loaded by 

[jira] [Commented] (FLINK-24897) Enable application mode on YARN to use usrlib

2021-11-18 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17445748#comment-17445748
 ] 

Biao Geng commented on FLINK-24897:
---

[~wangyang0918] Thanks for the reply! I also think about throwing exception 
when considering the 3 options. I can see its good points. My only concern is 
that it is somehow different with what we have done when users specify {{lib}} 
in ship files as well. I do not find constraints on specifying {{lib}} when 
shipping files. If we do not need to worry about that, reporting error makes 
senses to me as well.

For your third point, I believe it will bring least changes to our current 
codebase. But I just have one question about the history of {{usrlib}} : do we 
have any contract with users that jars in {{usrlib}} will be loaded by user 
classloader by default? If the answer is false, I think your solution is the 
cleanest one.

 

BTW, the name of {{yarn.per-job-cluster.include-user-jar}} may be changed to 
something like \{{yarn.include-user-jar}} if we achieve the consensus about 
point 3.

 

> Enable application mode on YARN to use usrlib
> -
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Priority: Major
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store their user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, classes in those user jars will be loaded by AppClassLoader. 
> But if I do not ship these jars, there is no convenient way to utilize these 
> jars in my flink run command. Currently, all I can do seems to use `-C` 
> option, which means I have to upload my jars to some shared store first and 
> then use these remote paths. It is not so perfect as we have already make it 
> possible to ship jars or files directly and we also introduce `usrlib` in 
> application mode on YARN. It would be more user-friendly if we can allow 
> shipping `usrlib` from local to remote cluster while using 
> FlinkUserCodeClassLoader to load classes in the jars in `usrlib`.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-24897) Enable application mode on YARN to use usrlib

2021-11-17 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17445631#comment-17445631
 ] 

Biao Geng commented on FLINK-24897:
---

Hi [~trohrmann] and [~wangyang0918] thank you very much for your reply.
I agree with Till's suggestion about reusing the existing logic to include 
{{usrlib}} in user classloader.
Yang's questions are also helpful and critical: 
*A summary of my answer abourt {{{}usrlib{}}}:*
0. We should ship {{usrlib}} by default like what we have done for {{lib}} dir.
1. We should avoid uploading it again and not add classes in it into system 
path if users specify {{usrlib}} again in the {{yarn.ship-files}} option.
2. It should work for per-job mode
3. Only when UserJarInclusion is DISABLED will {{usrlib}} take effect in 
per-job mode. But we should consider the default value of {{UserJarInclusion}} 
option.

*Datail:*

Q1:
Currently, I think we should ship {{usrlib}} by default if it exists because 
AFAIK, {{usrlib}} is the default userClassPath which is defined by flink. If we 
ask the user to explicitly specify it, it is somehow waste the flink's contract 
with users. 
When users specify a shipped directory named as "usrlib", I think there are 3 
options:
Option1: skip it
Option2: report error
Option3: do nothing but just upload it and add files in {{usrlib}} into system 
classpaths

Option1 seems to be easiest, just as what we have done for {{flink_dist.jar}} 
when users specify {{lib}} in ship files.
Option3 is worthwhile to mention as if users specify {{usrlib}} in ship files, 
files in {{usrlib}} will be added into system classpaths but if users use 
child-first resolve order, files in {{usrlib}} will also be loaded by 
UserClassLoader as they are in userClassPath as well. Bad things happen If 
users choose parent-first resolve order, files in {{usrlib}} will be loaded by 
AppClassLoader which breaks the design. 
So, in summary, I think skipping it is a better one.

Q2:
After checking codes about {{FileJobGraphRetriever}} and 
{{{}YarnJobClusterEntrypoint{}}}, I think we have prepared for using {{usrlib}} 
if we upload it to the cluster.

Q3:
I agree only when UserJarInclusion is DISABLED will {{usrlib}} take effect in 
per-job mode. But currently default value of UserJarInclusion is {{ORDERED}} 
and works for all 3 modes(per job, session, app). If we agree the {{usrlib}} 
should be shipped automatically, we may need to consider the default value of 
this option if we want to use UserClassLoader to load jars in {{{}usrlib{}}}.

> Enable application mode on YARN to use usrlib
> -
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Priority: Major
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store their user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, classes in those user jars will be loaded by AppClassLoader. 
> But if I do not ship these jars, there is no convenient way to utilize these 
> jars in my flink run command. Currently, all I can do seems to use `-C` 
> option, which means I have to upload my jars to some shared store first and 
> then use these remote paths. It is not so perfect as we have already make it 
> possible to ship jars or files directly and we also introduce `usrlib` in 
> application mode on YARN. It would be more user-friendly if we can allow 
> shipping `usrlib` from local to remote cluster while using 
> FlinkUserCodeClassLoader to load classes in the jars in `usrlib`.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-24897) Enable application mode on YARN to use usrlib

2021-11-17 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17445605#comment-17445605
 ] 

Biao Geng commented on FLINK-24897:
---

[~long jiang] , thanks a lot for your advice. I just focus on the local 
`usrlib` but your solution about supporting hdfs is also a good point. I am 
also wondering when you implement the `usrlib` by yourself, will you guys ship 
it by default or ask the user to specify it. especially when you support 
directories in hdfs?

> Enable application mode on YARN to use usrlib
> -
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Priority: Major
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store their user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, classes in those user jars will be loaded by AppClassLoader. 
> But if I do not ship these jars, there is no convenient way to utilize these 
> jars in my flink run command. Currently, all I can do seems to use `-C` 
> option, which means I have to upload my jars to some shared store first and 
> then use these remote paths. It is not so perfect as we have already make it 
> possible to ship jars or files directly and we also introduce `usrlib` in 
> application mode on YARN. It would be more user-friendly if we can allow 
> shipping `usrlib` from local to remote cluster while using 
> FlinkUserCodeClassLoader to load classes in the jars in `usrlib`.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-24494) Avro Confluent Registry SQL kafka connector fails to write to the topic with schema

2021-11-02 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437208#comment-17437208
 ] 

Biao Geng commented on FLINK-24494:
---

Hi [~mahen], I have met this problem as well. 
I have implemented a simple catalog that will connect to Schema Registry(SR) 
service to get the schema string of the topic in Kafka whose format is 
Confluent Avro. Also, I tried to utilize the 
'AvroSchemaConverter#convertToDataType()' method to parse the Confluent Avro 
schema string from SR and it works well in flink sql job like 'SELECT * FROM 
xx_topic' . However, when I tried to insert some new records into the topic, by 
executing sql statements like 'INSERT INTO xx_topic SELECT * FROM xx_topic 
LIMIT 2', it is strange to see that in SR API interface, I found there is a new 
version of schema whose name is 'record' and the namespace is erased. That's 
the same with [~MartijnVisser]'s description.

I wonder if there is any reason for hardcoding the record name in schema as 
'record' and if we have any plan to fix this.

 

Thank you very much.

> Avro Confluent Registry SQL kafka connector  fails to write to the topic with 
> schema 
> -
>
> Key: FLINK-24494
> URL: https://issues.apache.org/jira/browse/FLINK-24494
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Table SQL / API
>Affects Versions: 1.14.0, 1.13.2
>Reporter: Mahendran Ponnusamy
>Priority: Critical
> Attachments: Screen Shot 2021-10-12 at 10.38.55 AM.png, 
> image-2021-10-12-12-09-30-374.png, image-2021-10-12-12-11-04-664.png, 
> image-2021-10-12-12-18-53-016.png, image-2021-10-12-12-19-37-227.png, 
> image-2021-10-12-12-21-02-008.png, image-2021-10-12-12-21-46-177.png
>
>
>  *Summary:*
> Given a schema registered to a topic with name and namespace
> when the flink sql with upsert-kafka connector writes to the topic,
> it fails coz row it tries to produce is not compatible with the schema 
> registered
>  
> *Root cause:*
> The upsert-kafka connector auto generates a schema with the +*name as 
> `record` and no namespace*+.  The below schema is generated by the connector. 
> I'm expecting the connector should pull the schema from the subject and use 
> ConfluentAvroRowSerialization to[which is not there today i believe] 
> serialize using the schema from the subject.
> Schema generated by the upsert-kafka connector which is using 
> AvroRowSerializer interanally
> !image-2021-10-12-12-21-46-177.png|width=813,height=23!
> {color:#cc7832}Schema Registered to the subject: {color}
> {
>  {color:#9876aa}"type" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"record"{color}{color:#cc7832},{color} 
> {color:#9876aa}"name" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"SimpleCustomer"{color}{color:#cc7832},{color} 
> {color:#9876aa}"namespace" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"com...example.model"{color}{color:#cc7832},{color} 
> {color:#9876aa}"fields" {color}{color:#cc7832}: {color}[ {
>  {color:#9876aa}"name" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"customerId"{color}{color:#cc7832},{color} 
> {color:#9876aa}"type" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"string"{color} }{color:#cc7832}, {color}{
>  {color:#9876aa}"name" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"age"{color}{color:#cc7832},{color} 
> {color:#9876aa}"type" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"int"{color}{color:#cc7832},{color} 
> {color:#9876aa}"default" {color}{color:#cc7832}: 
> {color}{color:#6897bb}0{color} }]
>  }
>  
> Table SQL with upsert-kafka connector
> !image-2021-10-12-12-18-53-016.png|width=351,height=176!
>  
> The name "record" hardcoded
> !image-2021-10-12-12-21-02-008.png|width=464,height=138!  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-24494) Avro Confluent Registry SQL kafka connector fails to write to the topic with schema

2021-11-02 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437208#comment-17437208
 ] 

Biao Geng edited comment on FLINK-24494 at 11/2/21, 9:56 AM:
-

Hi [~mahen], I have met this problem as well. 
 I have implemented a simple catalog that will connect to Schema Registry(SR) 
service to get the schema string of the topic in Kafka whose format is 
Confluent Avro. Also, I tried to utilize the 
'AvroSchemaConverter#convertToDataType()' method to parse the Confluent Avro 
schema string from SR and so I can directly get the CatalogTable from my 
catalog.  It works well in flink sql job like 'SELECT * FROM xx_topic' . 
However, when I tried to insert some new records into the topic, by executing 
sql statements like 'INSERT INTO xx_topic SELECT * FROM xx_topic LIMIT 2',  
thought the sql job works well, it is strange to see that in SR API interface, 
there is a new version of schema generated whose name is 'record' and  
namespace is erased. Same with [~MartijnVisser]'s description.

I wonder if there is any reason for hardcoding the record name in schema as 
'record'.
Thank you very much.

 


was (Author: bgeng777):
Hi [~mahen], I have met this problem as well. 
 I have implemented a simple catalog that will connect to Schema Registry(SR) 
service to get the schema string of the topic in Kafka whose format is 
Confluent Avro. Also, I tried to utilize the 
'AvroSchemaConverter#convertToDataType()' method to parse the Confluent Avro 
schema string from SR and so I can directly get the CatalogTable from my 
catalog.  It works well in flink sql job like 'SELECT * FROM xx_topic' . 
However, when I tried to insert some new records into the topic, by executing 
sql statements like 'INSERT INTO xx_topic SELECT * FROM xx_topic LIMIT 2', it 
is strange to see that in SR API interface, there is a new version of schema 
generated whose name is 'record' and the namespace is erased. I believe that's 
the same with [~MartijnVisser]'s description.

I wonder if there is any reason for hardcoding the record name in schema as 
'record' and if we have any plan to fix this.

Thank you very much.

> Avro Confluent Registry SQL kafka connector  fails to write to the topic with 
> schema 
> -
>
> Key: FLINK-24494
> URL: https://issues.apache.org/jira/browse/FLINK-24494
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Table SQL / API
>Affects Versions: 1.14.0, 1.13.2
>Reporter: Mahendran Ponnusamy
>Priority: Critical
> Attachments: Screen Shot 2021-10-12 at 10.38.55 AM.png, 
> image-2021-10-12-12-09-30-374.png, image-2021-10-12-12-11-04-664.png, 
> image-2021-10-12-12-18-53-016.png, image-2021-10-12-12-19-37-227.png, 
> image-2021-10-12-12-21-02-008.png, image-2021-10-12-12-21-46-177.png
>
>
>  *Summary:*
> Given a schema registered to a topic with name and namespace
> when the flink sql with upsert-kafka connector writes to the topic,
> it fails coz row it tries to produce is not compatible with the schema 
> registered
>  
> *Root cause:*
> The upsert-kafka connector auto generates a schema with the +*name as 
> `record` and no namespace*+.  The below schema is generated by the connector. 
> I'm expecting the connector should pull the schema from the subject and use 
> ConfluentAvroRowSerialization to[which is not there today i believe] 
> serialize using the schema from the subject.
> Schema generated by the upsert-kafka connector which is using 
> AvroRowSerializer interanally
> !image-2021-10-12-12-21-46-177.png|width=813,height=23!
> {color:#cc7832}Schema Registered to the subject: {color}
> {
>  {color:#9876aa}"type" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"record"{color}{color:#cc7832},{color} 
> {color:#9876aa}"name" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"SimpleCustomer"{color}{color:#cc7832},{color} 
> {color:#9876aa}"namespace" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"com...example.model"{color}{color:#cc7832},{color} 
> {color:#9876aa}"fields" {color}{color:#cc7832}: {color}[ {
>  {color:#9876aa}"name" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"customerId"{color}{color:#cc7832},{color} 
> {color:#9876aa}"type" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"string"{color} }{color:#cc7832}, {color}{
>  {color:#9876aa}"name" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"age"{color}{color:#cc7832},{color} 
> {color:#9876aa}"type" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"int"{color}{color:#cc7832},{color} 
> {color:#9876aa}"default" {color}{color:#cc7832}: 
> {color}{color:#6897bb}0{color} }]
>  }
>  
> Table SQL with upsert-kafka connector
> !image-2021-10-12-12-18-53-016.png|width=351,height=176!
>  
> The name "record" hardcoded
> 

[jira] [Comment Edited] (FLINK-24494) Avro Confluent Registry SQL kafka connector fails to write to the topic with schema

2021-11-02 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437208#comment-17437208
 ] 

Biao Geng edited comment on FLINK-24494 at 11/2/21, 9:10 AM:
-

Hi [~mahen], I have met this problem as well. 
 I have implemented a simple catalog that will connect to Schema Registry(SR) 
service to get the schema string of the topic in Kafka whose format is 
Confluent Avro. Also, I tried to utilize the 
'AvroSchemaConverter#convertToDataType()' method to parse the Confluent Avro 
schema string from SR and so I can directly get the CatalogTable from my 
catalog.  It works well in flink sql job like 'SELECT * FROM xx_topic' . 
However, when I tried to insert some new records into the topic, by executing 
sql statements like 'INSERT INTO xx_topic SELECT * FROM xx_topic LIMIT 2', it 
is strange to see that in SR API interface, there is a new version of schema 
generated whose name is 'record' and the namespace is erased. I believe that's 
the same with [~MartijnVisser]'s description.

I wonder if there is any reason for hardcoding the record name in schema as 
'record' and if we have any plan to fix this.

Thank you very much.


was (Author: bgeng777):
Hi [~mahen], I have met this problem as well. 
I have implemented a simple catalog that will connect to Schema Registry(SR) 
service to get the schema string of the topic in Kafka whose format is 
Confluent Avro. Also, I tried to utilize the 
'AvroSchemaConverter#convertToDataType()' method to parse the Confluent Avro 
schema string from SR and it works well in flink sql job like 'SELECT * FROM 
xx_topic' . However, when I tried to insert some new records into the topic, by 
executing sql statements like 'INSERT INTO xx_topic SELECT * FROM xx_topic 
LIMIT 2', it is strange to see that in SR API interface, I found there is a new 
version of schema whose name is 'record' and the namespace is erased. That's 
the same with [~MartijnVisser]'s description.

I wonder if there is any reason for hardcoding the record name in schema as 
'record' and if we have any plan to fix this.

 

Thank you very much.

> Avro Confluent Registry SQL kafka connector  fails to write to the topic with 
> schema 
> -
>
> Key: FLINK-24494
> URL: https://issues.apache.org/jira/browse/FLINK-24494
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Table SQL / API
>Affects Versions: 1.14.0, 1.13.2
>Reporter: Mahendran Ponnusamy
>Priority: Critical
> Attachments: Screen Shot 2021-10-12 at 10.38.55 AM.png, 
> image-2021-10-12-12-09-30-374.png, image-2021-10-12-12-11-04-664.png, 
> image-2021-10-12-12-18-53-016.png, image-2021-10-12-12-19-37-227.png, 
> image-2021-10-12-12-21-02-008.png, image-2021-10-12-12-21-46-177.png
>
>
>  *Summary:*
> Given a schema registered to a topic with name and namespace
> when the flink sql with upsert-kafka connector writes to the topic,
> it fails coz row it tries to produce is not compatible with the schema 
> registered
>  
> *Root cause:*
> The upsert-kafka connector auto generates a schema with the +*name as 
> `record` and no namespace*+.  The below schema is generated by the connector. 
> I'm expecting the connector should pull the schema from the subject and use 
> ConfluentAvroRowSerialization to[which is not there today i believe] 
> serialize using the schema from the subject.
> Schema generated by the upsert-kafka connector which is using 
> AvroRowSerializer interanally
> !image-2021-10-12-12-21-46-177.png|width=813,height=23!
> {color:#cc7832}Schema Registered to the subject: {color}
> {
>  {color:#9876aa}"type" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"record"{color}{color:#cc7832},{color} 
> {color:#9876aa}"name" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"SimpleCustomer"{color}{color:#cc7832},{color} 
> {color:#9876aa}"namespace" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"com...example.model"{color}{color:#cc7832},{color} 
> {color:#9876aa}"fields" {color}{color:#cc7832}: {color}[ {
>  {color:#9876aa}"name" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"customerId"{color}{color:#cc7832},{color} 
> {color:#9876aa}"type" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"string"{color} }{color:#cc7832}, {color}{
>  {color:#9876aa}"name" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"age"{color}{color:#cc7832},{color} 
> {color:#9876aa}"type" {color}{color:#cc7832}: 
> {color}{color:#6a8759}"int"{color}{color:#cc7832},{color} 
> {color:#9876aa}"default" {color}{color:#cc7832}: 
> {color}{color:#6897bb}0{color} }]
>  }
>  
> Table SQL with upsert-kafka connector
> !image-2021-10-12-12-18-53-016.png|width=351,height=176!
>  
> The name "record" hardcoded
> !image-2021-10-12-12-21-02-008.png|width=464,height=138! 

[jira] [Created] (FLINK-24682) Unify the -C option behavior in both yarn application and per-job mode

2021-10-28 Thread Biao Geng (Jira)
Biao Geng created FLINK-24682:
-

 Summary: Unify the -C option behavior in both yarn application and 
per-job mode
 Key: FLINK-24682
 URL: https://issues.apache.org/jira/browse/FLINK-24682
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Affects Versions: 1.12.3
 Environment: flink 1.12.3

yarn 2.8.5
Reporter: Biao Geng


Recently, when switching the job submission mode from per-job mode to 
application mode on yarn, we found the behavior of '-C' ('–-classpath') is 
somehow misleading:
In per-job mode, the `main()` method of the program is executed in the local 
machine and '-C' option works well when we use it to specify some local user 
jars like -C file://xx.jar.
But in application mode, this option works differently: as the `main()` method 
will be executed on the job manager in the cluster, it is unclear where the url 
like `file://xx.jar` points. It seems that `file://xx.jar` is located 
on the job manager machine in the cluster due to the code. If that is true, it 
may mislead users as in per-job mode, it refers to the the jars in the client 
machine. 
In summary, if we can unify the -C option behavior in both yarn application and 
per-job mode, it would help users to switch to application mode more smoothly 
and more importantly, it makes it much easier to specify some local jars, that 
should be loaded by UserClassLoader, on the client machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-24682) Unify the -C option behavior in both yarn application and per-job mode

2021-11-08 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17440335#comment-17440335
 ] 

Biao Geng commented on FLINK-24682:
---

Hi, [~trohrmann] [~xtsong], would you mind leaving any comments on this issue?

Thanks

> Unify the -C option behavior in both yarn application and per-job mode
> --
>
> Key: FLINK-24682
> URL: https://issues.apache.org/jira/browse/FLINK-24682
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.12.3
> Environment: flink 1.12.3
> yarn 2.8.5
>Reporter: Biao Geng
>Priority: Major
>
> Recently, when switching the job submission mode from per-job mode to 
> application mode on yarn, we found the behavior of '-C' ('–-classpath') is 
> somehow misleading:
> In per-job mode, the `main()` method of the program is executed in the local 
> machine and '-C' option works well when we use it to specify some local user 
> jars like -C file://xx.jar.
> But in application mode, this option works differently: as the `main()` 
> method will be executed on the job manager in the cluster, it is unclear 
> where the url like `file://xx.jar` points. It seems that 
> `file://xx.jar` is located on the job manager machine in the cluster due 
> to the code. If that is true, it may mislead users as in per-job mode, it 
> refers to the the jars in the client machine. 
> In summary, if we can unify the -C option behavior in both yarn application 
> and per-job mode, it would help users to switch to application mode more 
> smoothly and more importantly, it makes it much easier to specify some local 
> jars, that should be loaded by UserClassLoader, on the client machine.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26605) JobManagerDeploymentStatus.READY does not correctly reflect the real Flink cluster status

2022-03-13 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505984#comment-17505984
 ] 

Biao Geng commented on FLINK-26605:
---

Hi [~wangyang0918], the simple rest call for the state check of the cluster 
makes sense and I am willing to take this issue about robustness. Would you 
mind assigning it to me?

> JobManagerDeploymentStatus.READY does not correctly reflect the real Flink 
> cluster status
> -
>
> Key: FLINK-26605
> URL: https://issues.apache.org/jira/browse/FLINK-26605
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Yang Wang
>Priority: Major
>
> Follow the discussion in the PR[1].
> {{JobManagerDeploymentStatus.READY}} does not mean the Flink cluster is ready 
> for accepting REST API calls. This will cause problems when we are running 
> session cluster.
> For example, if something is wrong or very slow with the leader election, 
> {{JobManagerDeploymentStatus}} is {{READY}} but the session cluster is not 
> ready for accepting job submission. What I mean is to update the 
> {{JobManagerDeploymentStatus}} to {{READY}} only when the flink cluster is 
> actually working.
>  
> [1]. 
> https://github.com/apache/flink-kubernetes-operator/pull/51#discussion_r824359419



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26702) Sporadic failures in JobObserverTest

2022-03-17 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17508100#comment-17508100
 ] 

Biao Geng commented on FLINK-26702:
---

Hi [~mbalassi] you are not alone. I have also seen this failure in my openning 
PR's CI in my own github in the morning. I am not so sure why my CI is 
triggered. But I happened to set the build config of in my own CI to show more 
info for other issues and I got the full error messgae of this failure:
{quote}[INFO] Running 
org.apache.flink.kubernetes.operator.observer.JobObserverTest
2022-03-17 03:23:31,091 o.a.f.k.o.o.JobObserver [INFO ] [.] Getting job 
statuses for test-cluster
2022-03-17 03:23:31,091 o.a.f.k.o.o.JobObserver [INFO ] [.] Job statuses 
updated for test-cluster
2022-03-17 03:23:31,091 o.a.f.k.o.o.JobObserver [INFO ] [.] Getting job 
statuses for test-cluster
2022-03-17 03:23:31,091 o.a.f.k.o.o.JobObserver [INFO ] [.] Job statuses 
updated for test-cluster
2022-03-17 03:23:31,091 o.a.f.k.o.o.JobObserver [INFO ] [.] Getting job 
statuses for test-cluster
2022-03-17 03:23:31,091 o.a.f.k.o.o.JobObserver [INFO ] [.] Job statuses 
updated for test-cluster
2022-03-17 03:23:31,093 o.a.f.k.o.o.JobObserver [INFO ] [.] JobManager 
deployment test-cluster in namespace flink-operator-test exists but not ready, 
status DeploymentStatus(availableReplicas=1, collisionCount=null, 
conditions=[], observedGeneration=null, readyReplicas=null, replicas=1, 
unavailableReplicas=null, updatedReplicas=null, additionalProperties={})
2022-03-17 03:23:31,093 o.a.f.k.o.o.JobObserver [INFO ] [.] JobManager 
deployment test-cluster in namespace flink-operator-test exists but not ready, 
status DeploymentStatus(availableReplicas=1, collisionCount=null, 
conditions=[], observedGeneration=null, readyReplicas=null, replicas=1, 
unavailableReplicas=null, updatedReplicas=null, additionalProperties={})
2022-03-17 03:23:31,093 o.a.f.k.o.o.JobObserver [INFO ] [.] JobManager 
deployment test-cluster in namespace flink-operator-test port ready, waiting 
for the REST API...
2022-03-17 03:23:31,093 o.a.f.k.o.o.JobObserver [INFO ] [.] Getting job 
statuses for test-cluster
2022-03-17 03:23:31,093 o.a.f.k.o.o.JobObserver [INFO ] [.] Job statuses 
updated for test-cluster
2022-03-17 03:23:31,093 o.a.f.k.o.o.JobObserver [INFO ] [.] Getting job 
statuses for test-cluster
2022-03-17 03:23:31,093 o.a.f.k.o.o.JobObserver [INFO ] [.] Job statuses 
updated for test-cluster
Error: Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.039 s 
<<< FAILURE! - in org.apache.flink.kubernetes.operator.observer.JobObserverTest
Error: 
org.apache.flink.kubernetes.operator.observer.JobObserverTest.observeApplicationCluster
 Time elapsed: 0.013 s <<< FAILURE!
org.opentest4j.AssertionFailedError: expected: <0> but was: <1>
at 
org.apache.flink.kubernetes.operator.observer.JobObserverTest.observeApplicationCluster(JobObserverTest.java:99)
{quote}
 

> Sporadic failures in JobObserverTest
> 
>
> Key: FLINK-26702
> URL: https://issues.apache.org/jira/browse/FLINK-26702
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Reporter: Márton Balassi
>Priority: Major
>
> I have occasionally observed the following failure during the regular build:
>  
> {code:java}
> mvn clean install
> ...
> [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.244 
> s - in org.apache.flink.kubernetes.operator.service.FlinkServiceTest
> [INFO] Running 
> org.apache.flink.kubernetes.operator.validation.DeploymentValidatorTest
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.007 
> s - in org.apache.flink.kubernetes.operator.validation.DeploymentValidatorTest
> [INFO] 
> [INFO] Results:
> [INFO] 
> [ERROR] Failures: 
> [ERROR]   JobObserverTest.observeApplicationCluster:99 expected: <0> but was: 
> <1>
> [INFO] 
> [ERROR] Tests run: 34, Failures: 1, Errors: 0, Skipped: 0
> [INFO] 
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Flink Kubernetes: .. SUCCESS [  4.223 
> s]
> [INFO] Flink Kubernetes Shaded  SUCCESS [  5.097 
> s]
> [INFO] Flink Kubernetes Operator .. FAILURE [ 34.596 
> s]
> [INFO] Flink Kubernetes Webhook ... SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 44.065 s
> [INFO] Finished at: 2022-03-17T09:43:22+01:00
> [INFO] Final Memory: 160M/554M
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> 

[jira] [Created] (FLINK-26671) Update 'Developer Guide' in README

2022-03-16 Thread Biao Geng (Jira)
Biao Geng created FLINK-26671:
-

 Summary: Update 'Developer Guide' in README
 Key: FLINK-26671
 URL: https://issues.apache.org/jira/browse/FLINK-26671
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Biao Geng
 Attachments: image-2022-03-16-14-05-56-936.png

The shell command in README is now based on the root dir of the repo. In 
Developer Guide,   we should update the {{helm install ...}}  to {{helm install 
flink-operator helm/flink-operator --set 
image.repository=/flink-java-operator --set image.tag=latest}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26655) Merge isJobManagerPortReady and isJobManagerServing to check if JM pod can work correctly

2022-03-15 Thread Biao Geng (Jira)
Biao Geng created FLINK-26655:
-

 Summary: Merge isJobManagerPortReady and isJobManagerServing to 
check if JM pod can work correctly
 Key: FLINK-26655
 URL: https://issues.apache.org/jira/browse/FLINK-26655
 Project: Flink
  Issue Type: Sub-task
Reporter: Biao Geng


We now consider that JM pod can have 2 possible states after launching:
 # JM is launched but port is not ready.
 # JM is launched, port is ready but rest service is not ready.

It looks that they can be merged as what we really care is if the JM can serve 
REST calls correctly, not if the JM port is ready.
With above observation, we can merge {{isJobManagerPortReady}} and 
{{isJobManagerServing}} to check if JM pod can serve correctly.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26605) Check if JM can serve rest api calls every time before reconcile

2022-03-18 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-26605:
--
Summary: Check if JM can serve rest api calls every time before reconcile  
(was: Improve the observe logic in SessionObserver)

> Check if JM can serve rest api calls every time before reconcile
> 
>
> Key: FLINK-26605
> URL: https://issues.apache.org/jira/browse/FLINK-26605
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Yang Wang
>Assignee: Biao Geng
>Priority: Major
>  Labels: pull-request-available
>
> Follow the discussion in the PR[1].
> {{JobManagerDeploymentStatus.READY}} does not mean the Flink cluster is ready 
> for accepting REST API calls. This will cause problems when we are running 
> session cluster.
> For example, if something is wrong or very slow with the leader election, 
> {{JobManagerDeploymentStatus}} is {{READY}} but the session cluster is not 
> ready for accepting job submission. What I mean is to update the 
> {{JobManagerDeploymentStatus}} to {{READY}} only when the flink cluster is 
> actually working.
> Another case is even though JobManager crashed backoff, the 
> {{JobManagerDeploymentStatus}} is still {{READY}}.
>  
> [1]. 
> https://github.com/apache/flink-kubernetes-operator/pull/51#discussion_r824359419



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26655) Improve the observe logic in SessionObserver

2022-03-18 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-26655:
--
Summary: Improve the observe logic in SessionObserver  (was: Merge 
isJobManagerPortReady and isJobManagerServing to check if JM pod can work 
correctly)

> Improve the observe logic in SessionObserver
> 
>
> Key: FLINK-26655
> URL: https://issues.apache.org/jira/browse/FLINK-26655
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Biao Geng
>Priority: Major
>
> -We now consider that JM pod can have 2 possible states after launching:-
>  # -JM is launched but port is not ready.-
>  # -JM is launched, port is ready but rest service is not ready.-
> -It looks that they can be merged as what we really care is if the JM can 
> serve REST calls correctly, not if the JM port is ready.-
> -With above observation, we can merge {{isJobManagerPortReady}} and 
> {{isJobManagerServing}} to check if JM pod can serve correctly.-



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26655) Merge isJobManagerPortReady and isJobManagerServing to check if JM pod can work correctly

2022-03-18 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-26655:
--
Description: 
-We now consider that JM pod can have 2 possible states after launching:-
 # -JM is launched but port is not ready.-
 # -JM is launched, port is ready but rest service is not ready.-

-It looks that they can be merged as what we really care is if the JM can serve 
REST calls correctly, not if the JM port is ready.-
-With above observation, we can merge {{isJobManagerPortReady}} and 
{{isJobManagerServing}} to check if JM pod can serve correctly.-

  was:
We now consider that JM pod can have 2 possible states after launching:
 # JM is launched but port is not ready.
 # JM is launched, port is ready but rest service is not ready.

It looks that they can be merged as what we really care is if the JM can serve 
REST calls correctly, not if the JM port is ready.
With above observation, we can merge {{isJobManagerPortReady}} and 
{{isJobManagerServing}} to check if JM pod can serve correctly.



> Merge isJobManagerPortReady and isJobManagerServing to check if JM pod can 
> work correctly
> -
>
> Key: FLINK-26655
> URL: https://issues.apache.org/jira/browse/FLINK-26655
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Biao Geng
>Priority: Major
>
> -We now consider that JM pod can have 2 possible states after launching:-
>  # -JM is launched but port is not ready.-
>  # -JM is launched, port is ready but rest service is not ready.-
> -It looks that they can be merged as what we really care is if the JM can 
> serve REST calls correctly, not if the JM port is ready.-
> -With above observation, we can merge {{isJobManagerPortReady}} and 
> {{isJobManagerServing}} to check if JM pod can serve correctly.-



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26605) Improve the observe logic in SessionObserver

2022-03-18 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-26605:
--
Summary: Improve the observe logic in SessionObserver  (was: 
JobManagerDeploymentStatus.READY does not correctly reflect the real Flink 
cluster status)

> Improve the observe logic in SessionObserver
> 
>
> Key: FLINK-26605
> URL: https://issues.apache.org/jira/browse/FLINK-26605
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Yang Wang
>Assignee: Biao Geng
>Priority: Major
>  Labels: pull-request-available
>
> Follow the discussion in the PR[1].
> {{JobManagerDeploymentStatus.READY}} does not mean the Flink cluster is ready 
> for accepting REST API calls. This will cause problems when we are running 
> session cluster.
> For example, if something is wrong or very slow with the leader election, 
> {{JobManagerDeploymentStatus}} is {{READY}} but the session cluster is not 
> ready for accepting job submission. What I mean is to update the 
> {{JobManagerDeploymentStatus}} to {{READY}} only when the flink cluster is 
> actually working.
> Another case is even though JobManager crashed backoff, the 
> {{JobManagerDeploymentStatus}} is still {{READY}}.
>  
> [1]. 
> https://github.com/apache/flink-kubernetes-operator/pull/51#discussion_r824359419



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26655) Improve the observe logic in SessionObserver

2022-03-18 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-26655:
--
Description: 
-We now consider that JM pod can have 2 possible states after launching:-
 # -JM is launched but port is not ready.-
 # -JM is launched, port is ready but rest service is not ready.-

-It looks that they can be merged as what we really care is if the JM can serve 
REST calls correctly, not if the JM port is ready.-
-With above observation, we can merge {{isJobManagerPortReady}} and 
{{isJobManagerServing}} to check if JM pod can serve correctly.-

 

Following the discussion in PR 
[62|https://github.com/apache/flink-kubernetes-operator/pull/62], we currently 
discard {{isJobManagerServing}}  and use the {{isJobManagerPortReady}} together 
with a call to `flinkService.listJobs()`to make sure job manager can serve rest 
call correctly. The original question of this jira is solved.
Now I adjust this jira to track the improvement of the observe logic in 
SessionObserver due to the review comments.

  was:
-We now consider that JM pod can have 2 possible states after launching:-
 # -JM is launched but port is not ready.-
 # -JM is launched, port is ready but rest service is not ready.-

-It looks that they can be merged as what we really care is if the JM can serve 
REST calls correctly, not if the JM port is ready.-
-With above observation, we can merge {{isJobManagerPortReady}} and 
{{isJobManagerServing}} to check if JM pod can serve correctly.-

 

Following the discussion in PR 
[62|https://github.com/apache/flink-kubernetes-operator/pull/62], we currently 
discard {{isJobManagerServing}}  and use the {{isJobManagerPortReady}} together 
with a call to `flinkService.listJobs()`to make sure job manager can serve rest 
call correctly. The original question of this jira is solved.
Now I adjust this jira to track the improvement of the 


> Improve the observe logic in SessionObserver
> 
>
> Key: FLINK-26655
> URL: https://issues.apache.org/jira/browse/FLINK-26655
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Biao Geng
>Priority: Major
>
> -We now consider that JM pod can have 2 possible states after launching:-
>  # -JM is launched but port is not ready.-
>  # -JM is launched, port is ready but rest service is not ready.-
> -It looks that they can be merged as what we really care is if the JM can 
> serve REST calls correctly, not if the JM port is ready.-
> -With above observation, we can merge {{isJobManagerPortReady}} and 
> {{isJobManagerServing}} to check if JM pod can serve correctly.-
>  
> Following the discussion in PR 
> [62|https://github.com/apache/flink-kubernetes-operator/pull/62], we 
> currently discard {{isJobManagerServing}}  and use the 
> {{isJobManagerPortReady}} together with a call to `flinkService.listJobs()`to 
> make sure job manager can serve rest call correctly. The original question of 
> this jira is solved.
> Now I adjust this jira to track the improvement of the observe logic in 
> SessionObserver due to the review comments.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26655) Improve the observe logic in SessionObserver

2022-03-18 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-26655:
--
Description: 
-We now consider that JM pod can have 2 possible states after launching:-
 # -JM is launched but port is not ready.-
 # -JM is launched, port is ready but rest service is not ready.-

-It looks that they can be merged as what we really care is if the JM can serve 
REST calls correctly, not if the JM port is ready.-
-With above observation, we can merge {{isJobManagerPortReady}} and 
{{isJobManagerServing}} to check if JM pod can serve correctly.-

 

Following the discussion in PR 
[62|https://github.com/apache/flink-kubernetes-operator/pull/62], we currently 
discard {{isJobManagerServing}}  and use the {{isJobManagerPortReady}} together 
with a call to `flinkService.listJobs()`to make sure job manager can serve rest 
call correctly. The original question of this jira is solved.
Now I adjust this jira to track the improvement of the 

  was:
-We now consider that JM pod can have 2 possible states after launching:-
 # -JM is launched but port is not ready.-
 # -JM is launched, port is ready but rest service is not ready.-

-It looks that they can be merged as what we really care is if the JM can serve 
REST calls correctly, not if the JM port is ready.-
-With above observation, we can merge {{isJobManagerPortReady}} and 
{{isJobManagerServing}} to check if JM pod can serve correctly.-


> Improve the observe logic in SessionObserver
> 
>
> Key: FLINK-26655
> URL: https://issues.apache.org/jira/browse/FLINK-26655
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Biao Geng
>Priority: Major
>
> -We now consider that JM pod can have 2 possible states after launching:-
>  # -JM is launched but port is not ready.-
>  # -JM is launched, port is ready but rest service is not ready.-
> -It looks that they can be merged as what we really care is if the JM can 
> serve REST calls correctly, not if the JM port is ready.-
> -With above observation, we can merge {{isJobManagerPortReady}} and 
> {{isJobManagerServing}} to check if JM pod can serve correctly.-
>  
> Following the discussion in PR 
> [62|https://github.com/apache/flink-kubernetes-operator/pull/62], we 
> currently discard {{isJobManagerServing}}  and use the 
> {{isJobManagerPortReady}} together with a call to `flinkService.listJobs()`to 
> make sure job manager can serve rest call correctly. The original question of 
> this jira is solved.
> Now I adjust this jira to track the improvement of the 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26655) Improve the observe logic in SessionObserver

2022-03-18 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509168#comment-17509168
 ] 

Biao Geng commented on FLINK-26655:
---

I rewrite this jira after our previous discussion about the check of rest 
service. Now it is used to track the improvement of the observe logic in 
SessionObserver according to your review. cc [~wangyang0918]

> Improve the observe logic in SessionObserver
> 
>
> Key: FLINK-26655
> URL: https://issues.apache.org/jira/browse/FLINK-26655
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Biao Geng
>Priority: Major
>
> -We now consider that JM pod can have 2 possible states after launching:-
>  # -JM is launched but port is not ready.-
>  # -JM is launched, port is ready but rest service is not ready.-
> -It looks that they can be merged as what we really care is if the JM can 
> serve REST calls correctly, not if the JM port is ready.-
> -With above observation, we can merge {{isJobManagerPortReady}} and 
> {{isJobManagerServing}} to check if JM pod can serve correctly.-
>  
> Following the discussion in PR 
> [62|https://github.com/apache/flink-kubernetes-operator/pull/62], we 
> currently discard {{isJobManagerServing}}  and use the 
> {{isJobManagerPortReady}} together with a call to `flinkService.listJobs()`to 
> make sure job manager can serve rest call correctly. The original question of 
> this jira is solved.
> Now I adjust this jira to track the improvement of the observe logic in 
> SessionObserver due to the review comments.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26163) Refactor FlinkUtils#getEffectiveConfig into smaller pieces

2022-02-16 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17493096#comment-17493096
 ] 

Biao Geng edited comment on FLINK-26163 at 2/16/22, 9:18 AM:
-

hi [~gyfora] , I am happy to take this ticket to refactor the configuration 
generation. Would you mind assign it to me? Just happened to do something 
similar in the 
[[FLINK-26030|https://issues.apache.org/jira/browse/FLINK-26030]].


was (Author: bgeng777):
hi [~gyfora] , I am happy to take this ticket to refactor the configuration 
generation. Would you mind assign it to me? Just happened to do something 
similar in the [FLINK-26030](https://issues.apache.org/jira/browse/FLINK-26030).

> Refactor FlinkUtils#getEffectiveConfig into smaller pieces
> --
>
> Key: FLINK-26163
> URL: https://issues.apache.org/jira/browse/FLINK-26163
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Reporter: Gyula Fora
>Priority: Major
>
> There is currently one very large method dealing with Configuration 
> management. This logic probably deserves it's own utility class and a few 
> more modular methods.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26163) Refactor FlinkUtils#getEffectiveConfig into smaller pieces

2022-02-16 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17493096#comment-17493096
 ] 

Biao Geng commented on FLINK-26163:
---

hi [~gyfora] , I am happy to take this ticket to refactor the configuration 
generation. Would you mind assign it to me? Just happened to do something 
similar in the [FLINK-26030](https://issues.apache.org/jira/browse/FLINK-26030).

> Refactor FlinkUtils#getEffectiveConfig into smaller pieces
> --
>
> Key: FLINK-26163
> URL: https://issues.apache.org/jira/browse/FLINK-26163
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Reporter: Gyula Fora
>Priority: Major
>
> There is currently one very large method dealing with Configuration 
> management. This logic probably deserves it's own utility class and a few 
> more modular methods.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26178) Use the same enum for expected and observed jobstate (JobState / JobStatus.state)

2022-02-23 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497233#comment-17497233
 ] 

Biao Geng commented on FLINK-26178:
---

Hi [~gyfora], seems this ticket has been opened for some days. I am happy to 
take this ticket and would you mind assigning it to me?

> Use the same enum for expected and observed jobstate (JobState / 
> JobStatus.state)
> -
>
> Key: FLINK-26178
> URL: https://issues.apache.org/jira/browse/FLINK-26178
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Major
>
> We should consolidate these two and maybe even use Flink's own job state enum 
> here if makes sense



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26178) Use the same enum for expected and observed jobstate (JobState / JobStatus.state)

2022-02-24 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497521#comment-17497521
 ] 

Biao Geng edited comment on FLINK-26178 at 2/24/22, 4:52 PM:
-

Some thoughts in doing this ticket:
As the description of this ticket said, Flink has its own 
{{org.apache.flink.api.common.JobStatus}} (including INITIALIZING, RUNNING, 
SUSPENDED, ...), which may be included in 
{{{}org.apache.flink.runtime.client.JobStatusMessage{}}}. But here I prefer to 
use the {{org.apache.flink.kubernetes.operator.crd.spec.JobState}} for 2 
reasons:
1. {{crd.spec.JobState}} is mainly for reconciling in this k8s operator while 
{{api.common.JobStatus}} is used by JM in Flink to orchestrates the job. 
Thougth the reconciling process may be drived by the 
{{{}api.common.JobStatus{}}}, these 2 fields are used for different purpose and 
should be separated.
2. {{crd.spec.JobState}} currently only includes RUNNING and SUSPENDED. We 
should consider FLINK-26139 first before we draw the conclusion if 
{{org.apache.flink.api.common.JobStatus is all we need.}}


was (Author: bgeng777):
Some thoughts in doing this ticket:
As the description of this ticket said, Flink has its own 
{{org.apache.flink.api.common.JobStatus}} (including INITIALIZING, RUNNING, 
SUSPENDED, ...), which may be included in 
{{{}org.apache.flink.runtime.client.JobStatusMessage{}}}. But here I prefer to 
use the {{org.apache.flink.kubernetes.operator.crd.spec.JobState}} for 2 
reasons:
1. {{crd.spec.JobState}} is mainly for reconciling in this k8s operator while 
{{api.common.JobStatus}} is used by JM in Flink to orchestrates the job. 
Thougth the reconciling process may be drived by the 
{{{}api.common.JobStatus{}}}, these 2 fields are used for different purpose and 
should be separated.
2. {{crd.spec.JobState}} currently only includes RUNNING and SUSPENDED. More 
state will increase the complexity of the reconciling quickly, which should be 
considered carefully and only when we find some strong cases.

> Use the same enum for expected and observed jobstate (JobState / 
> JobStatus.state)
> -
>
> Key: FLINK-26178
> URL: https://issues.apache.org/jira/browse/FLINK-26178
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Assignee: Biao Geng
>Priority: Major
>
> We should consolidate these two and maybe even use Flink's own job state enum 
> here if makes sense



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26178) Use the same enum for expected and observed jobstate (JobState / JobStatus.state)

2022-02-24 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497521#comment-17497521
 ] 

Biao Geng edited comment on FLINK-26178 at 2/24/22, 4:34 PM:
-

Some thoughts in doing this ticket:
As the description of this ticket said, Flink has its own 
{{org.apache.flink.api.common.JobStatus}} (including INITIALIZING, RUNNING, 
SUSPENDED, ...), which may be included in 
{{{}org.apache.flink.runtime.client.JobStatusMessage{}}}. But here I prefer to 
use the {{org.apache.flink.kubernetes.operator.crd.spec.JobState}} for 2 
reasons:
1. {{crd.spec.JobState}} is mainly for reconciling in this k8s operator while 
{{api.common.JobStatus}} is used by JM in Flink to orchestrates the job. 
Thougth the reconciling process may be drived by the 
{{{}api.common.JobStatus{}}}, these 2 fields are used for different purpose and 
should be separated.
2. {{crd.spec.JobState}} currently only includes RUNNING and SUSPENDED. More 
state will increase the complexity of the reconciling quickly, which should be 
considered carefully and only when we find some strong cases.


was (Author: bgeng777):
Some thoughts in doing this ticket:
As the description of this ticket said, Flink has its own 
{{org.apache.flink.api.common.JobStatus}} (including INITIALIZING, RUNNING, 
SUSPENDED, ...), which may be included in 
{{org.apache.flink.runtime.client.JobStatusMessage}}. But here I prefer to use 
the {{org.apache.flink.kubernetes.operator.crd.spec.JobState}} for 2 reasons:
1. {{crd.spec.JobState}} is mainly for reconciling in this k8s operator while 
{{api.common.JobStatus}} is used by JM in Flink to orchestrates the job. 
Thought the reconciling process may be drived by the 
{{org.apache.flink.api.common.JobStatus}}, they are used for different purpose.
2. {{crd.spec.JobState}} currently only includes RUNNING and SUSPENDED. More 
state will increase the complexity of the reconciling quickly, which should be 
considered carefully and only when we find some strong cases.

> Use the same enum for expected and observed jobstate (JobState / 
> JobStatus.state)
> -
>
> Key: FLINK-26178
> URL: https://issues.apache.org/jira/browse/FLINK-26178
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Assignee: Biao Geng
>Priority: Major
>
> We should consolidate these two and maybe even use Flink's own job state enum 
> here if makes sense



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26178) Use the same enum for expected and observed jobstate (JobState / JobStatus.state)

2022-02-24 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497521#comment-17497521
 ] 

Biao Geng commented on FLINK-26178:
---

Some thoughts in doing this ticket:
As the description of this ticket said, Flink has its own 
{{org.apache.flink.api.common.JobStatus}} (including INITIALIZING, RUNNING, 
SUSPENDED, ...), which may be included in 
{{org.apache.flink.runtime.client.JobStatusMessage}}. But here I prefer to use 
the {{org.apache.flink.kubernetes.operator.crd.spec.JobState}} for 2 reasons:
1. {{crd.spec.JobState}} is mainly for reconciling in this k8s operator while 
{{api.common.JobStatus}} is used by JM in Flink to orchestrates the job. 
Thought the reconciling process may be drived by the 
{{org.apache.flink.api.common.JobStatus}}, they are used for different purpose.
2. {{crd.spec.JobState}} currently only includes RUNNING and SUSPENDED. More 
state will increase the complexity of the reconciling quickly, which should be 
considered carefully and only when we find some strong cases.

> Use the same enum for expected and observed jobstate (JobState / 
> JobStatus.state)
> -
>
> Key: FLINK-26178
> URL: https://issues.apache.org/jira/browse/FLINK-26178
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Assignee: Biao Geng
>Priority: Major
>
> We should consolidate these two and maybe even use Flink's own job state enum 
> here if makes sense



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26216) Make 'replicas' work in JobManager Spec

2022-02-19 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17494981#comment-17494981
 ] 

Biao Geng commented on FLINK-26216:
---

hi [~ConradJam] thanks a lot for your words but I do not have the permission to 
assign this ticket. Before we ask help from others, I just want to mention that 
in the [Jira|https://issues.apache.org/jira/browse/FLINK-26163?filter=-1] of 
refactor FlinkUtils, I have added codes like:
{quote}effectiveConfig.setInteger(KubernetesConfigOptions.KUBERNETES_JOBMANAGER_REPLICAS,
 spec.getJobManager().getReplicas());
{quote}
And I verified it on my own k8s cluster and 2 JM can be started correctly. 
Please let me konw if that is your expected fix. If that is true, maybe we do 
not need to do it twice(sorry for the late update in this jira). If that is not 
your case, I believe we can ask others to assign this ticket to you.

> Make 'replicas' work in JobManager Spec
> ---
>
> Key: FLINK-26216
> URL: https://issues.apache.org/jira/browse/FLINK-26216
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Priority: Minor
>
> In our flink kubernetes operator's cr, we allow users to set the replica of 
> JobManager. 
> But in our {{FlinkUtils#getEffectiveConfig}} method, we currently not set 
> this value from the yaml file and as a result, the {{replicas}} will not work 
> and the default value(i.e. 1) will be applied. 
> Though we believe one JM with KubernetesHaService should be enough for most 
> HA cases, the {{replicas}} field of JM also makes sense since more than one 
> JM can reduce down time and make recovery of JM failure faster. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26405) Add validation check of num of JM replica

2022-02-28 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-26405:
--
External issue URL:   (was: 
https://issues.apache.org/jira/browse/FLINK-26216)

> Add validation check of num of JM replica
> -
>
> Key: FLINK-26405
> URL: https://issues.apache.org/jira/browse/FLINK-26405
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Priority: Minor
>
> When HA is enabled, the replicas of JM can be 1 or 2, 
> When HA is not set, the replicas of JM should always be 1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26405) Add validation check of num of JM replica

2022-02-28 Thread Biao Geng (Jira)
Biao Geng created FLINK-26405:
-

 Summary: Add validation check of num of JM replica
 Key: FLINK-26405
 URL: https://issues.apache.org/jira/browse/FLINK-26405
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Biao Geng


When HA is enabled, the replicas of JM can be 1 or 2, 

When HA is not set, the replicas of JM should always be 1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26405) Add validation check of num of JM replica

2022-02-28 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-26405:
--
Parent: FLINK-25963
Issue Type: Sub-task  (was: Improvement)

> Add validation check of num of JM replica
> -
>
> Key: FLINK-26405
> URL: https://issues.apache.org/jira/browse/FLINK-26405
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Priority: Minor
>
> When HA is enabled, the replicas of JM can be 1 or 2, 
> When HA is not set, the replicas of JM should always be 1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26405) Add validation check of num of JM replica

2022-02-28 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-26405:
--
External issue URL: https://issues.apache.org/jira/browse/FLINK-26216

> Add validation check of num of JM replica
> -
>
> Key: FLINK-26405
> URL: https://issues.apache.org/jira/browse/FLINK-26405
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Priority: Minor
>
> When HA is enabled, the replicas of JM can be 1 or 2, 
> When HA is not set, the replicas of JM should always be 1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26405) Add validation check of num of JM replica

2022-02-28 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499295#comment-17499295
 ] 

Biao Geng commented on FLINK-26405:
---

Hi [~wangyang0918] I want to take this Jira to make the support of JM replicas 
more robust. Could you please assign this one to me?

> Add validation check of num of JM replica
> -
>
> Key: FLINK-26405
> URL: https://issues.apache.org/jira/browse/FLINK-26405
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Priority: Minor
>
> When HA is enabled, the replicas of JM can be 1 or 2, 
> When HA is not set, the replicas of JM should always be 1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26405) Add validation check of num of JM replica

2022-02-28 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-26405:
--
Description: 
When HA is enabled, the replicas of JM can be 1 or more, 

When HA is not set, the replicas of JM should always be 1.

  was:
When HA is enabled, the replicas of JM can be 1 or 2, 

When HA is not set, the replicas of JM should always be 1.


> Add validation check of num of JM replica
> -
>
> Key: FLINK-26405
> URL: https://issues.apache.org/jira/browse/FLINK-26405
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Assignee: Biao Geng
>Priority: Minor
>
> When HA is enabled, the replicas of JM can be 1 or more, 
> When HA is not set, the replicas of JM should always be 1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26216) Make 'replicas' work in JobManager Spec

2022-02-17 Thread Biao Geng (Jira)
Biao Geng created FLINK-26216:
-

 Summary: Make 'replicas' work in JobManager Spec
 Key: FLINK-26216
 URL: https://issues.apache.org/jira/browse/FLINK-26216
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Reporter: Biao Geng


In our flink kubernetes operator's cr, we allow users to set the replica of 
JobManager. 
But in our {{FlinkUtils#getEffectiveConfig}} method, we currently not set this 
value from the yaml file and as a result, the {{replicas}} will not work and 
the default value(i.e. 1) will be applied. 
Though we believe one JM with KubernetesHaService should be enough for most HA 
cases, the {{replicas}} field of JM also makes sense since more than one JM can 
reduce down time and make recovery of JM failure faster. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26216) Make 'replicas' work in JobManager Spec

2022-02-17 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-26216:
--
Parent: FLINK-25963
Issue Type: Sub-task  (was: Bug)

> Make 'replicas' work in JobManager Spec
> ---
>
> Key: FLINK-26216
> URL: https://issues.apache.org/jira/browse/FLINK-26216
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Priority: Minor
>
> In our flink kubernetes operator's cr, we allow users to set the replica of 
> JobManager. 
> But in our {{FlinkUtils#getEffectiveConfig}} method, we currently not set 
> this value from the yaml file and as a result, the {{replicas}} will not work 
> and the default value(i.e. 1) will be applied. 
> Though we believe one JM with KubernetesHaService should be enough for most 
> HA cases, the {{replicas}} field of JM also makes sense since more than one 
> JM can reduce down time and make recovery of JM failure faster. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26216) Make 'replicas' work in JobManager Spec

2022-02-21 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495930#comment-17495930
 ] 

Biao Geng commented on FLINK-26216:
---

[~gyfora] thanks; I will draft the pr ASAP.

> Make 'replicas' work in JobManager Spec
> ---
>
> Key: FLINK-26216
> URL: https://issues.apache.org/jira/browse/FLINK-26216
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Assignee: Biao Geng
>Priority: Minor
>
> In our flink kubernetes operator's cr, we allow users to set the replica of 
> JobManager. 
> But in our {{FlinkUtils#getEffectiveConfig}} method, we currently not set 
> this value from the yaml file and as a result, the {{replicas}} will not work 
> and the default value(i.e. 1) will be applied. 
> Though we believe one JM with KubernetesHaService should be enough for most 
> HA cases, the {{replicas}} field of JM also makes sense since more than one 
> JM can reduce down time and make recovery of JM failure faster. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26139) Improve JobStatus tracking and handle different job states

2022-02-25 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498139#comment-17498139
 ] 

Biao Geng commented on FLINK-26139:
---

After checking the job state transition in Flink, I think in this operator, for 
our flink application, we may only need to process 3 kinds of states: 
{{{}RUNNING{}}}, {{SUSPENDED}} and {{{}TERMINATED{}}}.

{{{}RUNNING{}}}: The application cluster is running. The actual job status 
could be any of non_terminal state(i.e. INITIALIZING, CREATED, RUNNING, 
FAILING, CANCELLING, RESTARTING or RECONCILING). JM is responsible for managing 
the internal state transistion and the operator just consider it as running.
{{{}TERMINATED{}}}: The application is finished due to the completion, failure 
or cancellation of the job. In this state, the operator should clean up the 
deployment and all other resources. No potential upgrades any more.
{{{}SUSPENDED{}}}: From the API doc: "The job has been suspended which means 
that it has been stopped but not been removed from a potential HA job store." 
There could be potential upgrades happened in this state, like modifying the 
parallism and then the state can be tranferred to {{{}RUNNING{}}}.

The state machine can be as simple as the following picture:

!image-2022-02-25-21-22-08-636.png|width=200,height=119!

I am a little uncertained that if we should split the {{Terminal}} state into 
more specific states like FAILED, FINISHED and CANCELLED so that we can track 
the status better in k8s side.

Does above design make sense? cc [~wangyang0918] [~gyfora] 

> Improve JobStatus tracking and handle different job states
> --
>
> Key: FLINK-26139
> URL: https://issues.apache.org/jira/browse/FLINK-26139
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Major
> Attachments: image-2022-02-25-21-22-08-636.png
>
>
> Currently we do not handle any job status changes such as cancellations, 
> errors or job completions.
> We should introduce some mechanism to react and deal with these changes and 
> expose them in the status as they can potentially affect upgrades.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26139) Improve JobStatus tracking and handle different job states

2022-02-25 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-26139:
--
Attachment: image-2022-02-25-21-22-08-636.png

> Improve JobStatus tracking and handle different job states
> --
>
> Key: FLINK-26139
> URL: https://issues.apache.org/jira/browse/FLINK-26139
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Major
> Attachments: image-2022-02-25-21-22-08-636.png
>
>
> Currently we do not handle any job status changes such as cancellations, 
> errors or job completions.
> We should introduce some mechanism to react and deal with these changes and 
> expose them in the status as they can potentially affect upgrades.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26216) Make 'replicas' work in JobManager Spec

2022-02-20 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495290#comment-17495290
 ] 

Biao Geng edited comment on FLINK-26216 at 2/21/22, 2:55 AM:
-

[~ConradJam] Thanks for you reply! You give a very valuable point and in my 
opinion, such straightforward config error should be raised in front instead of 
in reconcile phase, just like the target of this 
https://issues.apache.org/jira/browse/FLINK-26136 . I will try to finish my 
refactor PR with the simple fix of JM HA ASAP and for the validation part, I 
think you can either try to take FLINK-26136 or submit a following PR for this 
ticket only. I believe both choice will be encouraged by our community.


was (Author: bgeng777):
[~ConradJam] You give a very valuable point and in my opinion, such 
straightforward config error should be raised in front instead of in reconcile 
phase, just like the target of this 
https://issues.apache.org/jira/browse/FLINK-26136 . I will try to finish my 
refactor PR with the simple fix of JM HA ASAP and for the validation part, I 
think you can either try to take FLINK-26136 or submit a following PR for this 
ticket only. I believe both choice will be encouraged by our community.

> Make 'replicas' work in JobManager Spec
> ---
>
> Key: FLINK-26216
> URL: https://issues.apache.org/jira/browse/FLINK-26216
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Priority: Minor
>
> In our flink kubernetes operator's cr, we allow users to set the replica of 
> JobManager. 
> But in our {{FlinkUtils#getEffectiveConfig}} method, we currently not set 
> this value from the yaml file and as a result, the {{replicas}} will not work 
> and the default value(i.e. 1) will be applied. 
> Though we believe one JM with KubernetesHaService should be enough for most 
> HA cases, the {{replicas}} field of JM also makes sense since more than one 
> JM can reduce down time and make recovery of JM failure faster. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26216) Make 'replicas' work in JobManager Spec

2022-02-20 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495290#comment-17495290
 ] 

Biao Geng commented on FLINK-26216:
---

[~ConradJam] You give a very valuable point and in my opinion, such 
straightforward config error should be raised in front instead of in reconcile 
phase, just like the target of this 
https://issues.apache.org/jira/browse/FLINK-26136 . I will try to finish my 
refactor PR with the simple fix of JM HA ASAP and for the validation part, I 
think you can either try to take FLINK-26136 or submit a following PR for this 
ticket only. I believe both choice will be encouraged by our community.

> Make 'replicas' work in JobManager Spec
> ---
>
> Key: FLINK-26216
> URL: https://issues.apache.org/jira/browse/FLINK-26216
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Priority: Minor
>
> In our flink kubernetes operator's cr, we allow users to set the replica of 
> JobManager. 
> But in our {{FlinkUtils#getEffectiveConfig}} method, we currently not set 
> this value from the yaml file and as a result, the {{replicas}} will not work 
> and the default value(i.e. 1) will be applied. 
> Though we believe one JM with KubernetesHaService should be enough for most 
> HA cases, the {{replicas}} field of JM also makes sense since more than one 
> JM can reduce down time and make recovery of JM failure faster. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26825) Document State Machine Graph for JM Deployment

2022-03-23 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511100#comment-17511100
 ] 

Biao Geng commented on FLINK-26825:
---

Hi [~gyfora] would you mind assigning this ticket to me?

> Document State Machine Graph for JM Deployment
> --
>
> Key: FLINK-26825
> URL: https://issues.apache.org/jira/browse/FLINK-26825
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Biao Geng
>Priority: Major
>
>  We may need a state machine graph for JM deployment or the whole cluster 
> like 
> [lyft’s|https://github.com/lyft/flinkk8soperator/blob/master/docs/state_machine.md]
>  in our document to help our developers refine our code later and our users 
> learn about how the oeprator reconciles.
> The state machine is highly probably to be changed when we introduce more 
> features(e.g. rollback strategy) and we should update the doc when such 
> change happens.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26825) Document State Machine Graph for JM Deployment

2022-03-23 Thread Biao Geng (Jira)
Biao Geng created FLINK-26825:
-

 Summary: Document State Machine Graph for JM Deployment
 Key: FLINK-26825
 URL: https://issues.apache.org/jira/browse/FLINK-26825
 Project: Flink
  Issue Type: Sub-task
Reporter: Biao Geng


 We may need a state machine graph for JM deployment or the whole cluster like 
[lyft’s|https://github.com/lyft/flinkk8soperator/blob/master/docs/state_machine.md]
 in our document to help our developers refine our code later and our users 
learn about how the oeprator reconciles.

The state machine is highly probably to be changed when we introduce more 
features(e.g. rollback strategy) and we should update the doc when such change 
happens.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26832) Output more status info for JobObserver

2022-03-23 Thread Biao Geng (Jira)
Biao Geng created FLINK-26832:
-

 Summary: Output more status info for JobObserver
 Key: FLINK-26832
 URL: https://issues.apache.org/jira/browse/FLINK-26832
 Project: Flink
  Issue Type: Sub-task
Reporter: Biao Geng


For {{JobObserver#observeFlinkJobStatus()}}, we currently only 
{{logger.info("Job status successfully updated");}}.
This is could be more informative if we output actual job status here to help 
users check the status of the Job due to flink operator's log, not only 
depending on the flink web ui.

The proposed change looks like:
{{logger.info("Job status successfully updated from {} to {}", currentState, 
targetState);}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26611) Document operator config options

2022-04-02 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516254#comment-17516254
 ] 

Biao Geng commented on FLINK-26611:
---

[~gyfora] I see this jira is open and in the plan of 0.1.0 version. If no one 
has started on it, I can take it.


> Document operator config options
> 
>
> Key: FLINK-26611
> URL: https://issues.apache.org/jira/browse/FLINK-26611
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Major
> Fix For: kubernetes-operator-0.1.0
>
>
> Similar to how other flink configs are documented we should also document the 
> configs found in OperatorConfigOptions.
> Generating the documentation might be possible but maybe it's easier to do it 
> manually.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27009) Support SQL job submission in flink kubernetes opeartor

2022-04-02 Thread Biao Geng (Jira)
Biao Geng created FLINK-27009:
-

 Summary: Support SQL job submission in flink kubernetes opeartor
 Key: FLINK-27009
 URL: https://issues.apache.org/jira/browse/FLINK-27009
 Project: Flink
  Issue Type: New Feature
  Components: Kubernetes Operator
Reporter: Biao Geng


Currently, the flink kubernetes opeartor is for jar job using application or 
session cluster. For SQL job, there is no out of box solution in the operator.  
One simple and short-term solution is to wrap the SQL script into a jar job 
using table API with limitation.
The long-term solution may work with 
[FLINK-26541|https://issues.apache.org/jira/browse/FLINK-26541] to achieve the 
full support.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26892) Observe current status before validating CR changes

2022-03-29 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513858#comment-17513858
 ] 

Biao Geng commented on FLINK-26892:
---

Hi [~gyfora], I see your point.

The reason for I asking above question first is that I did some experiments and 
found the wrong config would not affect the deployment. Later I found it is 
because that I enabled webhook check, which would do a check first.

I removed the webhook and applied some wrong config (e.g. JM replica = -1) and 
our {{reconcile() }} is stuck on the error. Your proposed solution is necessary 
for those who do not enable webhook check.



> Observe current status before validating CR changes
> ---
>
> Key: FLINK-26892
> URL: https://issues.apache.org/jira/browse/FLINK-26892
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Major
> Fix For: kubernetes-operator-1.0.0
>
>
> Currently validation is the first step in the controller loop which means 
> that when there is a validation error we fail to observe the status of 
> currently running deployments.
> We should change the order of operations and observe before validation. 
> Furthermore observe should use the previous configuration not the one after 
> the CR change.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26892) Observe current status before validating CR changes

2022-03-28 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513791#comment-17513791
 ] 

Biao Geng commented on FLINK-26892:
---

In our current implementation, validation error should only happen when users 
apply a changed yaml file with wrong configs, right? It seems that no bad 
things will happen: the triggered {{FlinkDeploymentController# reconcile()}} 
directly returns with no further reschedule. The operator keeps using previous 
configs and {{reconcile() }} logic(so, the {{reconcile()}} will be triggered 
after 15 secs and the state of the JM/job can be updated).

Is anything I missed or strong point to trigger a new observe() once the yaml 
file is applied?

 

> Observe current status before validating CR changes
> ---
>
> Key: FLINK-26892
> URL: https://issues.apache.org/jira/browse/FLINK-26892
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Major
> Fix For: kubernetes-operator-1.0.0
>
>
> Currently validation is the first step in the controller loop which means 
> that when there is a validation error we fail to observe the status of 
> currently running deployments.
> We should change the order of operations and observe before validation. 
> Furthermore observe should use the previous configuration not the one after 
> the CR change.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26892) Observe current status before validating CR changes

2022-03-28 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513791#comment-17513791
 ] 

Biao Geng edited comment on FLINK-26892 at 3/29/22, 4:09 AM:
-

In our current code, IIUC, validation error should only happen when users apply 
a changed yaml file with wrong configs, right? 
It seems that no bad things will happen: the triggered 
{{FlinkDeploymentController# reconcile()}} directly returns with no further 
reschedule. The operator keeps using previous configs and {{reconcile() }} 
logic(so, the {{reconcile()}} will be triggered after 15 secs and the state of 
the JM/job can be updated).
Is anything I missed or strong point to trigger a new observe() once the yaml 
file is applied?

 


was (Author: bgeng777):
In our current implementation, validation error should only happen when users 
apply a changed yaml file with wrong configs, right? It seems that no bad 
things will happen: the triggered {{FlinkDeploymentController# reconcile()}} 
directly returns with no further reschedule. The operator keeps using previous 
configs and {{reconcile() }} logic(so, the {{reconcile()}} will be triggered 
after 15 secs and the state of the JM/job can be updated).

Is anything I missed or strong point to trigger a new observe() once the yaml 
file is applied?

 

> Observe current status before validating CR changes
> ---
>
> Key: FLINK-26892
> URL: https://issues.apache.org/jira/browse/FLINK-26892
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Major
> Fix For: kubernetes-operator-1.0.0
>
>
> Currently validation is the first step in the controller loop which means 
> that when there is a validation error we fail to observe the status of 
> currently running deployments.
> We should change the order of operations and observe before validation. 
> Furthermore observe should use the previous configuration not the one after 
> the CR change.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26892) Observe current status before validating CR changes

2022-03-28 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513791#comment-17513791
 ] 

Biao Geng edited comment on FLINK-26892 at 3/29/22, 4:10 AM:
-

In our current code, IIUC, validation error should only happen when users apply 
a changed yaml file with wrong configs, right? 
It seems that no bad things will happen: the triggered 
{{FlinkDeploymentController# reconcile()}} directly returns with no further 
reschedule. The operator keeps using previous configs and {{reconcile()}} 
logic(so, the {{reconcile()}} will be triggered after 15 secs and the state of 
the JM/job can be updated).
Is anything I missed or strong point to trigger a new observe() once the yaml 
file is applied?

 


was (Author: bgeng777):
In our current code, IIUC, validation error should only happen when users apply 
a changed yaml file with wrong configs, right? 
It seems that no bad things will happen: the triggered 
{{FlinkDeploymentController# reconcile()}} directly returns with no further 
reschedule. The operator keeps using previous configs and {{reconcile() }} 
logic(so, the {{reconcile()}} will be triggered after 15 secs and the state of 
the JM/job can be updated).
Is anything I missed or strong point to trigger a new observe() once the yaml 
file is applied?

 

> Observe current status before validating CR changes
> ---
>
> Key: FLINK-26892
> URL: https://issues.apache.org/jira/browse/FLINK-26892
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Major
> Fix For: kubernetes-operator-1.0.0
>
>
> Currently validation is the first step in the controller loop which means 
> that when there is a validation error we fail to observe the status of 
> currently running deployments.
> We should change the order of operations and observe before validation. 
> Furthermore observe should use the previous configuration not the one after 
> the CR change.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26892) Observe current status before validating CR changes

2022-03-29 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513971#comment-17513971
 ] 

Biao Geng commented on FLINK-26892:
---

I try to summary what we should do for this case:
1. Save latest validated config ({{lastReconciledSpec}} should have already 
been sufficient.)
2. Adjust current logic to observe with latest validated config -> validate new 
config -> reconcile with new config

Is there any drawback of above idea? If it is fine, I can take this ticket and 
work on it.

> Observe current status before validating CR changes
> ---
>
> Key: FLINK-26892
> URL: https://issues.apache.org/jira/browse/FLINK-26892
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Major
> Fix For: kubernetes-operator-1.0.0
>
>
> Currently validation is the first step in the controller loop which means 
> that when there is a validation error we fail to observe the status of 
> currently running deployments.
> We should change the order of operations and observe before validation. 
> Furthermore observe should use the previous configuration not the one after 
> the CR change.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-24897) Enable application mode on YARN to use usrlib

2022-01-28 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17483689#comment-17483689
 ] 

Biao Geng commented on FLINK-24897:
---

hi [~wangyang0918], I have completed the 
[PR|https://github.com/apache/flink/pull/18531]. Would you mind reviewing it in 
some days?  Thanks!

Besides, I find that the method 
`ClusterEntrypointUtils#tryFindUserLibDirectory` may be buggy: it is a method 
in the ClusterEntrypointUtils which is utilized by Entrypoint classes to find 
if there are any `usrlib` in the standalone/k8s/yarn cluster. However, the code 
shows that it will use the `FLINK_LIB_DIR` to find if there are any `usrlib` in 
the remote. But in YARN, the uploaded `usrlib` is located in the YARN cache 
dir(e.g. appcache/application_xx/container_yy). As a result, if we set 
`FLINK_LIB_DIR` by default in the remote cluster, `tryFindUserLibDirectory` 
will try to find `usrlib` in wrong location.
The reason for such bug not affecting correctness is that in most cases, 
`FLINK_LIB_DIR` will not be set in remote cluster's ENV and the current code 
then will use working dir which is System.getProperty("user.dir") by default.  
The working dir is the correct choice in most cases. 
I am not sure if this bug should be fixed in this PR or we should create a new 
one due to it will influence standalone/k8s/yarn clusterEntryPoint.


> Enable application mode on YARN to use usrlib
> -
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Assignee: Biao Geng
>Priority: Major
>  Labels: pull-request-available
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store their user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, classes in those user jars will be loaded by AppClassLoader. 
> But if I do not ship these jars, there is no convenient way to utilize these 
> jars in my flink run command. Currently, all I can do seems to use `-C` 
> option, which means I have to upload my jars to some shared store first and 
> then use these remote paths. It is not so perfect as we have already make it 
> possible to ship jars or files directly and we also introduce `usrlib` in 
> application mode on YARN. It would be more user-friendly if we can allow 
> shipping `usrlib` from local to remote cluster while using 
> FlinkUserCodeClassLoader to load classes in the jars in `usrlib`.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-24897) Make usrlib work for YARN application and per-job

2022-02-08 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-24897:
--
Summary: Make usrlib work for YARN application and per-job  (was: Make 
usrlib could work for YARN application and per-job)

> Make usrlib work for YARN application and per-job
> -
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Assignee: Biao Geng
>Priority: Major
>  Labels: pull-request-available
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store their user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, classes in those user jars will be loaded by AppClassLoader. 
> But if I do not ship these jars, there is no convenient way to utilize these 
> jars in my flink run command. Currently, all I can do seems to use `-C` 
> option, which means I have to upload my jars to some shared store first and 
> then use these remote paths. It is not so perfect as we have already make it 
> possible to ship jars or files directly and we also introduce `usrlib` in 
> application mode on YARN. It would be more user-friendly if we can allow 
> shipping `usrlib` from local to remote cluster while using 
> FlinkUserCodeClassLoader to load classes in the jars in `usrlib`.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-24897) Make usrlib could work for YARN application and per-job

2022-02-08 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446258#comment-17446258
 ] 

Biao Geng edited comment on FLINK-24897 at 2/8/22, 11:01 AM:
-

Hi [~trohrmann] , would you mind giving any comments on if we have the promise 
that jars under {{usrlib}} will always be loaded by user classloader? 
Besides, after above discussion with Yang, the current solution is:
1. {{usrlib}} will be shipped automatically if it exists.
2. If we add {{usrlib}} in ship files again, we will throw exception.
3. {{usrlib}} will work for both per job and application mode.
4. Jars in {{usrlib}} will be loaded by user classloader only when 
UserJarInclusion is DISABLED. 
 In other cases, AppClassLoader will be used.
5.  {{usrlib}} directory should not exist in {{yarn.provided.lib.dirs}} or 
{{yarn.ship-files}}. If users want to ship {{usrlib}}, they should just create 
{{usrlib}} under FLINK_HOME dir.

Thanks.


was (Author: bgeng777):
Hi [~trohrmann] , would you mind giving any comments on if we have the promise 
that jars under {{usrlib}} will always be loaded by user classloader? 
Besides, after above discussion with Yang, the current solution is:
1. {{usrlib}} will be shipped automatically if it exists.
2. If we add {{usrlib}} in ship files again, we will throw exception.
3. {{usrlib}} will work for both per job and application mode.
4. Jars in {{usrlib}} will be loaded by user classloader only when 
UserJarInclusion is DISABLED. In other cases, AppClassLoader will be used.

Thanks.

> Make usrlib could work for YARN application and per-job
> ---
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Assignee: Biao Geng
>Priority: Major
>  Labels: pull-request-available
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store their user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, classes in those user jars will be loaded by AppClassLoader. 
> But if I do not ship these jars, there is no convenient way to utilize these 
> jars in my flink run command. Currently, all I can do seems to use `-C` 
> option, which means I have to upload my jars to some shared store first and 
> then use these remote paths. It is not so perfect as we have already make it 
> possible to ship jars or files directly and we also introduce `usrlib` in 
> application mode on YARN. It would be more user-friendly if we can allow 
> shipping `usrlib` from local to remote cluster while using 
> FlinkUserCodeClassLoader to load classes in the jars in `usrlib`.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26030) Set FLINK_LIB_DIR to 'lib' under working dir in YARN containers

2022-02-08 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-26030:
--
Component/s: Deployment / YARN

> Set FLINK_LIB_DIR to 'lib' under working dir in YARN containers
> ---
>
> Key: FLINK-26030
> URL: https://issues.apache.org/jira/browse/FLINK-26030
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26030) Set FLINK_LIB_DIR to 'lib' under working dir in YARN containers

2022-02-08 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-26030:
--
Summary: Set FLINK_LIB_DIR to 'lib' under working dir in YARN containers  
(was: Set FLINK_LIB_DIR to lib under working dir in YARN containers)

> Set FLINK_LIB_DIR to 'lib' under working dir in YARN containers
> ---
>
> Key: FLINK-26030
> URL: https://issues.apache.org/jira/browse/FLINK-26030
> Project: Flink
>  Issue Type: Improvement
>Reporter: Biao Geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26030) Set FLINK_LIB_DIR to lib under working dir in YARN containers

2022-02-08 Thread Biao Geng (Jira)
Biao Geng created FLINK-26030:
-

 Summary: Set FLINK_LIB_DIR to lib under working dir in YARN 
containers
 Key: FLINK-26030
 URL: https://issues.apache.org/jira/browse/FLINK-26030
 Project: Flink
  Issue Type: Improvement
Reporter: Biao Geng






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26030) Set FLINK_LIB_DIR to 'lib' under working dir in YARN containers

2022-02-08 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-26030:
--
Description: 
Currently, we utilize 
{{org.apache.flink.runtime.entrypoint.ClusterEntrypointUtils#tryFindUserLibDirectory}}
 to locate usrlib in both flink client and cluster side. 
This method relies on the value of environment variable {{FLINK_LIB_DIR}} to 
find the {{usrlib}}.
It makes sense in client side since in {{bin/config.sh}}, {{FLINK_LIB_DIR}} 
will be set by default(i.e. {{FLINK_HOME/lib}} if not exists. But in YARN 
cluster's containers, when we want to reuse this method to find {{usrlib}}, as 
the YARN usually starts the process using commands like 

bq. /bin/bash -c /usr/lib/jvm/java-1.8.0/bin/java -Xmx1073741824 -Xms1073741824 
-XX:MaxMetaspaceSize=268435456org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint
 -D jobmanager.memory.off-heap.size=134217728b -D 
jobmanager.memory.jvm-overhead.min=201326592b -D 
jobmanager.memory.jvm-metaspace.size=268435456b -D 
jobmanager.memory.heap.size=1073741824b -D 
jobmanager.memory.jvm-overhead.max=201326592b ...

{{FLINK_LIB_DIR}} is not guaranteed to be set in such case. Current codes will 
use current working dir to locate the {{usrlib}} which is correct in most 
cases. But bad things can happen if the machine which the YARN container 
resides in has already set {{FLINK_LIB_DIR}} to a different folder. In that 
case, codes will try to find {{usrlib}} in a undesired place. 



> Set FLINK_LIB_DIR to 'lib' under working dir in YARN containers
> ---
>
> Key: FLINK-26030
> URL: https://issues.apache.org/jira/browse/FLINK-26030
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Priority: Major
>
> Currently, we utilize 
> {{org.apache.flink.runtime.entrypoint.ClusterEntrypointUtils#tryFindUserLibDirectory}}
>  to locate usrlib in both flink client and cluster side. 
> This method relies on the value of environment variable {{FLINK_LIB_DIR}} to 
> find the {{usrlib}}.
> It makes sense in client side since in {{bin/config.sh}}, {{FLINK_LIB_DIR}} 
> will be set by default(i.e. {{FLINK_HOME/lib}} if not exists. But in YARN 
> cluster's containers, when we want to reuse this method to find {{usrlib}}, 
> as the YARN usually starts the process using commands like 
> bq. /bin/bash -c /usr/lib/jvm/java-1.8.0/bin/java -Xmx1073741824 
> -Xms1073741824 
> -XX:MaxMetaspaceSize=268435456org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint
>  -D jobmanager.memory.off-heap.size=134217728b -D 
> jobmanager.memory.jvm-overhead.min=201326592b -D 
> jobmanager.memory.jvm-metaspace.size=268435456b -D 
> jobmanager.memory.heap.size=1073741824b -D 
> jobmanager.memory.jvm-overhead.max=201326592b ...
> {{FLINK_LIB_DIR}} is not guaranteed to be set in such case. Current codes 
> will use current working dir to locate the {{usrlib}} which is correct in 
> most cases. But bad things can happen if the machine which the YARN container 
> resides in has already set {{FLINK_LIB_DIR}} to a different folder. In that 
> case, codes will try to find {{usrlib}} in a undesired place. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26030) Set FLINK_LIB_DIR to 'lib' under working dir in YARN containers

2022-02-08 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-26030:
--
Description: 
Currently, we utilize 
{{org.apache.flink.runtime.entrypoint.ClusterEntrypointUtils#tryFindUserLibDirectory}}
 to locate usrlib in both flink client and cluster side. 
This method relies on the value of environment variable {{FLINK_LIB_DIR}} to 
find the {{usrlib}}.
It makes sense in client side since in {{bin/config.sh}}, {{FLINK_LIB_DIR}} 
will be set by default(i.e. {{FLINK_HOME/lib}} if not exists. But in YARN 
cluster's containers, when we want to reuse this method to find {{usrlib}}, as 
the YARN usually starts the process using commands like 

bq. /bin/bash -c /usr/lib/jvm/java-1.8.0/bin/java -Xmx1073741824 -Xms1073741824 
-XX:MaxMetaspaceSize=268435456org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint
 -D jobmanager.memory.off-heap.size=134217728b -D 
jobmanager.memory.jvm-overhead.min=201326592b -D 
jobmanager.memory.jvm-metaspace.size=268435456b -D 
jobmanager.memory.heap.size=1073741824b -D 
jobmanager.memory.jvm-overhead.max=201326592b ...

{{FLINK_LIB_DIR}} is not guaranteed to be set in such case. Current codes will 
use current working dir to locate the {{usrlib}} which is correct in most 
cases. But bad things can happen if the machine which the YARN container 
resides in has already set {{FLINK_LIB_DIR}} to a different folder. In that 
case, codes will try to find {{usrlib}} in a undesired place. 

One possible solution would be overriding the {{FLINK_LIB_DIR}} in YARN 
container env to the {{lib}} dir under YARN's workding dir.

  was:
Currently, we utilize 
{{org.apache.flink.runtime.entrypoint.ClusterEntrypointUtils#tryFindUserLibDirectory}}
 to locate usrlib in both flink client and cluster side. 
This method relies on the value of environment variable {{FLINK_LIB_DIR}} to 
find the {{usrlib}}.
It makes sense in client side since in {{bin/config.sh}}, {{FLINK_LIB_DIR}} 
will be set by default(i.e. {{FLINK_HOME/lib}} if not exists. But in YARN 
cluster's containers, when we want to reuse this method to find {{usrlib}}, as 
the YARN usually starts the process using commands like 

bq. /bin/bash -c /usr/lib/jvm/java-1.8.0/bin/java -Xmx1073741824 -Xms1073741824 
-XX:MaxMetaspaceSize=268435456org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint
 -D jobmanager.memory.off-heap.size=134217728b -D 
jobmanager.memory.jvm-overhead.min=201326592b -D 
jobmanager.memory.jvm-metaspace.size=268435456b -D 
jobmanager.memory.heap.size=1073741824b -D 
jobmanager.memory.jvm-overhead.max=201326592b ...

{{FLINK_LIB_DIR}} is not guaranteed to be set in such case. Current codes will 
use current working dir to locate the {{usrlib}} which is correct in most 
cases. But bad things can happen if the machine which the YARN container 
resides in has already set {{FLINK_LIB_DIR}} to a different folder. In that 
case, codes will try to find {{usrlib}} in a undesired place. 




> Set FLINK_LIB_DIR to 'lib' under working dir in YARN containers
> ---
>
> Key: FLINK-26030
> URL: https://issues.apache.org/jira/browse/FLINK-26030
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Priority: Major
>
> Currently, we utilize 
> {{org.apache.flink.runtime.entrypoint.ClusterEntrypointUtils#tryFindUserLibDirectory}}
>  to locate usrlib in both flink client and cluster side. 
> This method relies on the value of environment variable {{FLINK_LIB_DIR}} to 
> find the {{usrlib}}.
> It makes sense in client side since in {{bin/config.sh}}, {{FLINK_LIB_DIR}} 
> will be set by default(i.e. {{FLINK_HOME/lib}} if not exists. But in YARN 
> cluster's containers, when we want to reuse this method to find {{usrlib}}, 
> as the YARN usually starts the process using commands like 
> bq. /bin/bash -c /usr/lib/jvm/java-1.8.0/bin/java -Xmx1073741824 
> -Xms1073741824 
> -XX:MaxMetaspaceSize=268435456org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint
>  -D jobmanager.memory.off-heap.size=134217728b -D 
> jobmanager.memory.jvm-overhead.min=201326592b -D 
> jobmanager.memory.jvm-metaspace.size=268435456b -D 
> jobmanager.memory.heap.size=1073741824b -D 
> jobmanager.memory.jvm-overhead.max=201326592b ...
> {{FLINK_LIB_DIR}} is not guaranteed to be set in such case. Current codes 
> will use current working dir to locate the {{usrlib}} which is correct in 
> most cases. But bad things can happen if the machine which the YARN container 
> resides in has already set {{FLINK_LIB_DIR}} to a different folder. In that 
> case, codes will try to find {{usrlib}} in a undesired place. 
> One possible solution would be overriding the {{FLINK_LIB_DIR}} in YARN 
> container env to the {{lib}} dir under YARN's workding dir.



--
This message was 

[jira] [Created] (FLINK-26047) Support usrlib in HDFS for YARN application mode

2022-02-09 Thread Biao Geng (Jira)
Biao Geng created FLINK-26047:
-

 Summary: Support usrlib in HDFS for YARN application mode
 Key: FLINK-26047
 URL: https://issues.apache.org/jira/browse/FLINK-26047
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Biao Geng


In YARN Application mode, we currently support using user jar and lib jar from 
HDFS. For example, we can run commands like:
{quote}./bin/flink run-application -t yarn-application \
-Dyarn.provided.lib.dirs="hdfs://myhdfs/my-remote-flink-dist-dir" \
hdfs://myhdfs/jars/my-application.jar{quote}
For {{usrlib}}, we currently only support local directory. I propose to add 
HDFS support for {{usrlib}} to work with CLASSPATH_INCLUDE_USER_JAR better. It 
can also benefit cases like using notebook to submit flink job.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-24897) Enable application mode on YARN to use usrlib

2022-02-07 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17483689#comment-17483689
 ] 

Biao Geng edited comment on FLINK-24897 at 2/7/22, 9:05 AM:


hi [~wangyang0918], I have completed the 
[PR|https://github.com/apache/flink/pull/18531]. Would you mind reviewing it in 
some days?  Thanks!

{{Besides, I find that the method 
`ClusterEntrypointUtils#tryFindUserLibDirectory` may be buggy: it is a method 
in the ClusterEntrypointUtils which is utilized by Entrypoint classes to find 
if there are any `usrlib` in the standalone/k8s/yarn cluster. However, the code 
shows that it will use the `FLINK_LIB_DIR` to find if there are any `usrlib` in 
the remote. But in YARN, the uploaded `usrlib` is located in the YARN cache 
dir(e.g. appcache/application_xx/container_yy). As a result, if we set 
`FLINK_LIB_DIR` by default in the remote cluster, `tryFindUserLibDirectory` 
will try to find `usrlib` in wrong location.
The reason for such bug not affecting correctness is that in most cases, 
`FLINK_LIB_DIR` will not be set in remote cluster's ENV and the current code 
then will use working dir which is System.getProperty("user.dir") by default.  
The working dir is the correct choice in most cases. 

}}
Add: discussed with Yang offline, and above behavior is by design.



was (Author: bgeng777):
hi [~wangyang0918], I have completed the 
[PR|https://github.com/apache/flink/pull/18531]. Would you mind reviewing it in 
some days?  Thanks!

Besides, I find that the method 
`ClusterEntrypointUtils#tryFindUserLibDirectory` may be buggy: it is a method 
in the ClusterEntrypointUtils which is utilized by Entrypoint classes to find 
if there are any `usrlib` in the standalone/k8s/yarn cluster. However, the code 
shows that it will use the `FLINK_LIB_DIR` to find if there are any `usrlib` in 
the remote. But in YARN, the uploaded `usrlib` is located in the YARN cache 
dir(e.g. appcache/application_xx/container_yy). As a result, if we set 
`FLINK_LIB_DIR` by default in the remote cluster, `tryFindUserLibDirectory` 
will try to find `usrlib` in wrong location.
The reason for such bug not affecting correctness is that in most cases, 
`FLINK_LIB_DIR` will not be set in remote cluster's ENV and the current code 
then will use working dir which is System.getProperty("user.dir") by default.  
The working dir is the correct choice in most cases. 
I am not sure if this bug should be fixed in this PR or we should create a new 
one due to it will influence standalone/k8s/yarn clusterEntryPoint.


> Enable application mode on YARN to use usrlib
> -
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Assignee: Biao Geng
>Priority: Major
>  Labels: pull-request-available
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store their user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, classes in those user jars will be loaded by AppClassLoader. 
> But if I do not ship these jars, there is no convenient way to utilize these 
> jars in my flink run command. Currently, all I can do seems to use `-C` 
> option, which means I have to upload my jars to some shared store first and 
> then use these remote paths. It is not so perfect as we have already make it 
> possible to ship jars or files directly and we also introduce `usrlib` in 
> application mode on YARN. It would be more user-friendly if we can allow 
> shipping `usrlib` from local to remote cluster while using 
> FlinkUserCodeClassLoader to load classes in the jars in `usrlib`.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-24897) Enable application mode on YARN to use usrlib

2022-02-07 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17483689#comment-17483689
 ] 

Biao Geng edited comment on FLINK-24897 at 2/7/22, 9:07 AM:


hi [~wangyang0918], I have completed the 
[PR|https://github.com/apache/flink/pull/18531]. Would you mind reviewing it in 
some days?  Thanks!

{{Besides, I find that the method 
`ClusterEntrypointUtils#tryFindUserLibDirectory` may be buggy: it is a method 
in the ClusterEntrypointUtils which is utilized by Entrypoint classes to find 
if there are any `usrlib` in the standalone/k8s/yarn cluster. However, the code 
shows that it will use the `FLINK_LIB_DIR` to find if there are any `usrlib` in 
the remote. But in YARN, the uploaded `usrlib` is located in the YARN cache 
dir(e.g. appcache/application_xx/container_yy). As a result, if we set 
`FLINK_LIB_DIR` by default in the remote cluster, `tryFindUserLibDirectory` 
will try to find `usrlib` in wrong location.
The reason for such bug not affecting correctness is that in most cases, 
`FLINK_LIB_DIR` will not be set in remote cluster's ENV and the current code 
then will use working dir which is System.getProperty("user.dir") by default.  
The working dir is the correct choice in most cases. 

}}



was (Author: bgeng777):
hi [~wangyang0918], I have completed the 
[PR|https://github.com/apache/flink/pull/18531]. Would you mind reviewing it in 
some days?  Thanks!

{{Besides, I find that the method 
`ClusterEntrypointUtils#tryFindUserLibDirectory` may be buggy: it is a method 
in the ClusterEntrypointUtils which is utilized by Entrypoint classes to find 
if there are any `usrlib` in the standalone/k8s/yarn cluster. However, the code 
shows that it will use the `FLINK_LIB_DIR` to find if there are any `usrlib` in 
the remote. But in YARN, the uploaded `usrlib` is located in the YARN cache 
dir(e.g. appcache/application_xx/container_yy). As a result, if we set 
`FLINK_LIB_DIR` by default in the remote cluster, `tryFindUserLibDirectory` 
will try to find `usrlib` in wrong location.
The reason for such bug not affecting correctness is that in most cases, 
`FLINK_LIB_DIR` will not be set in remote cluster's ENV and the current code 
then will use working dir which is System.getProperty("user.dir") by default.  
The working dir is the correct choice in most cases. 

}}
Add: discussed with Yang offline, and above behavior is by design.


> Enable application mode on YARN to use usrlib
> -
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Assignee: Biao Geng
>Priority: Major
>  Labels: pull-request-available
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store their user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, classes in those user jars will be loaded by AppClassLoader. 
> But if I do not ship these jars, there is no convenient way to utilize these 
> jars in my flink run command. Currently, all I can do seems to use `-C` 
> option, which means I have to upload my jars to some shared store first and 
> then use these remote paths. It is not so perfect as we have already make it 
> possible to ship jars or files directly and we also introduce `usrlib` in 
> application mode on YARN. It would be more user-friendly if we can allow 
> shipping `usrlib` from local to remote cluster while using 
> FlinkUserCodeClassLoader to load classes in the jars in `usrlib`.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26030) Set FLINK_LIB_DIR to 'lib' under working dir in YARN containers

2022-02-09 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489957#comment-17489957
 ] 

Biao Geng commented on FLINK-26030:
---

Yes, it should be a bug.

I believe to be consistent with the behavior in flink client side, in YARN 
cluster side, we should make the {{FLINK_LIB_DIR}} in YARN container env to the 
{{lib}} dir under YARN's working dir. If that is the case, I may start 
investigating on how to implement it properly. Let me know if above assumption 
make sense. Thanks!

> Set FLINK_LIB_DIR to 'lib' under working dir in YARN containers
> ---
>
> Key: FLINK-26030
> URL: https://issues.apache.org/jira/browse/FLINK-26030
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Priority: Minor
>
> Currently, we utilize 
> {{org.apache.flink.runtime.entrypoint.ClusterEntrypointUtils#tryFindUserLibDirectory}}
>  to locate usrlib in both flink client and cluster side. 
> This method relies on the value of environment variable {{FLINK_LIB_DIR}} to 
> find the {{usrlib}}.
> It makes sense in client side since in {{bin/config.sh}}, {{FLINK_LIB_DIR}} 
> will be set by default(i.e. {{FLINK_HOME/lib}} if not exists. But in YARN 
> cluster's containers, when we want to reuse this method to find {{usrlib}}, 
> as the YARN usually starts the process using commands like 
> bq. /bin/bash -c /usr/lib/jvm/java-1.8.0/bin/java -Xmx1073741824 
> -Xms1073741824 
> -XX:MaxMetaspaceSize=268435456org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint
>  -D jobmanager.memory.off-heap.size=134217728b -D 
> jobmanager.memory.jvm-overhead.min=201326592b -D 
> jobmanager.memory.jvm-metaspace.size=268435456b -D 
> jobmanager.memory.heap.size=1073741824b -D 
> jobmanager.memory.jvm-overhead.max=201326592b ...
> {{FLINK_LIB_DIR}} is not guaranteed to be set in such case. Current codes 
> will use current working dir to locate the {{usrlib}} which is correct in 
> most cases. But bad things can happen if the machine which the YARN container 
> resides in has already set {{FLINK_LIB_DIR}} to a different folder. In that 
> case, codes will try to find {{usrlib}} in a undesired place. 
> One possible solution would be overriding the {{FLINK_LIB_DIR}} in YARN 
> container env to the {{lib}} dir under YARN's workding dir.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26030) Set FLINK_LIB_DIR to 'lib' under working dir in YARN containers

2022-02-09 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-26030:
--
Description: 
Currently, we utilize 
{{org.apache.flink.runtime.entrypoint.ClusterEntrypointUtils#tryFindUserLibDirectory}}
 to locate usrlib in both flink client and cluster side. 
This method relies on the value of environment variable {{FLINK_LIB_DIR}} to 
find the {{{}usrlib{}}}.
It makes sense in client side since in {{{}bin/config.sh{}}}, {{FLINK_LIB_DIR}} 
will be set by default(i.e. {{FLINK_HOME/lib}} if not exists. But in YARN 
cluster's containers, when we want to reuse this method to find {{{}usrlib{}}}, 
as the YARN usually starts the process using commands like
{quote}/bin/bash -c /usr/lib/jvm/java-1.8.0/bin/java -Xmx1073741824 
-Xms1073741824 
-XX:MaxMetaspaceSize=268435456org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint
 -D jobmanager.memory.off-heap.size=134217728b -D 
jobmanager.memory.jvm-overhead.min=201326592b -D 
jobmanager.memory.jvm-metaspace.size=268435456b -D 
jobmanager.memory.heap.size=1073741824b -D 
jobmanager.memory.jvm-overhead.max=201326592b ...
{quote}
{{FLINK_LIB_DIR}} is not guaranteed to be set in such case. Current codes will 
use current working dir to locate the {{usrlib}} which is correct in most 
cases. But bad things can happen if the machine which the YARN container 
resides in has already set {{FLINK_LIB_DIR}} to a different folder. In that 
case, codes will try to find {{usrlib}} in a undesired place.

One possible solution would be overriding the {{FLINK_LIB_DIR}} in YARN 
container env to the {{lib}} dir under YARN's working dir.

  was:
Currently, we utilize 
{{org.apache.flink.runtime.entrypoint.ClusterEntrypointUtils#tryFindUserLibDirectory}}
 to locate usrlib in both flink client and cluster side. 
This method relies on the value of environment variable {{FLINK_LIB_DIR}} to 
find the {{usrlib}}.
It makes sense in client side since in {{bin/config.sh}}, {{FLINK_LIB_DIR}} 
will be set by default(i.e. {{FLINK_HOME/lib}} if not exists. But in YARN 
cluster's containers, when we want to reuse this method to find {{usrlib}}, as 
the YARN usually starts the process using commands like 

bq. /bin/bash -c /usr/lib/jvm/java-1.8.0/bin/java -Xmx1073741824 -Xms1073741824 
-XX:MaxMetaspaceSize=268435456org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint
 -D jobmanager.memory.off-heap.size=134217728b -D 
jobmanager.memory.jvm-overhead.min=201326592b -D 
jobmanager.memory.jvm-metaspace.size=268435456b -D 
jobmanager.memory.heap.size=1073741824b -D 
jobmanager.memory.jvm-overhead.max=201326592b ...

{{FLINK_LIB_DIR}} is not guaranteed to be set in such case. Current codes will 
use current working dir to locate the {{usrlib}} which is correct in most 
cases. But bad things can happen if the machine which the YARN container 
resides in has already set {{FLINK_LIB_DIR}} to a different folder. In that 
case, codes will try to find {{usrlib}} in a undesired place. 

One possible solution would be overriding the {{FLINK_LIB_DIR}} in YARN 
container env to the {{lib}} dir under YARN's workding dir.


> Set FLINK_LIB_DIR to 'lib' under working dir in YARN containers
> ---
>
> Key: FLINK-26030
> URL: https://issues.apache.org/jira/browse/FLINK-26030
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Priority: Minor
>
> Currently, we utilize 
> {{org.apache.flink.runtime.entrypoint.ClusterEntrypointUtils#tryFindUserLibDirectory}}
>  to locate usrlib in both flink client and cluster side. 
> This method relies on the value of environment variable {{FLINK_LIB_DIR}} to 
> find the {{{}usrlib{}}}.
> It makes sense in client side since in {{{}bin/config.sh{}}}, 
> {{FLINK_LIB_DIR}} will be set by default(i.e. {{FLINK_HOME/lib}} if not 
> exists. But in YARN cluster's containers, when we want to reuse this method 
> to find {{{}usrlib{}}}, as the YARN usually starts the process using commands 
> like
> {quote}/bin/bash -c /usr/lib/jvm/java-1.8.0/bin/java -Xmx1073741824 
> -Xms1073741824 
> -XX:MaxMetaspaceSize=268435456org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint
>  -D jobmanager.memory.off-heap.size=134217728b -D 
> jobmanager.memory.jvm-overhead.min=201326592b -D 
> jobmanager.memory.jvm-metaspace.size=268435456b -D 
> jobmanager.memory.heap.size=1073741824b -D 
> jobmanager.memory.jvm-overhead.max=201326592b ...
> {quote}
> {{FLINK_LIB_DIR}} is not guaranteed to be set in such case. Current codes 
> will use current working dir to locate the {{usrlib}} which is correct in 
> most cases. But bad things can happen if the machine which the YARN container 
> resides in has already set {{FLINK_LIB_DIR}} to a different folder. In that 
> case, codes will try to find {{usrlib}} in a 

[jira] [Comment Edited] (FLINK-24897) Make usrlib work for YARN application and per-job

2022-02-09 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446258#comment-17446258
 ] 

Biao Geng edited comment on FLINK-24897 at 2/10/22, 2:35 AM:
-

Hi [~trohrmann] , would you mind giving any comments on if we have the promise 
that jars under {{usrlib}} will always be loaded by user classloader? 
Besides, after above discussion with Yang, the current solution is:
1. {{usrlib}} will be shipped automatically if it exists.
2. If we add {{usrlib}} in ship files again, we will throw exception.
3. {{usrlib}} will work for both per job and application mode.
4. Jars in {{usrlib}} will be loaded by user classloader only when 
UserJarInclusion is DISABLED. 
In other cases, AppClassLoader will be used.
5. {{usrlib}} directory should not exist in {{yarn.provided.lib.dirs}} or 
{{{}yarn.ship-files{}}}. If users want to ship {{{}usrlib{}}}, they should just 
create {{usrlib}} under FLINK_HOME dir.

6. key of CLASSPATH_INCLUDE_USER_JAR will be renamed from 
\{{yarn.per-job-cluster.include-user-jar}} to 
\{{yarn.classpath.include-user-jar}} for better naming.

Thanks.


was (Author: bgeng777):
Hi [~trohrmann] , would you mind giving any comments on if we have the promise 
that jars under {{usrlib}} will always be loaded by user classloader? 
Besides, after above discussion with Yang, the current solution is:
1. {{usrlib}} will be shipped automatically if it exists.
2. If we add {{usrlib}} in ship files again, we will throw exception.
3. {{usrlib}} will work for both per job and application mode.
4. Jars in {{usrlib}} will be loaded by user classloader only when 
UserJarInclusion is DISABLED. 
 In other cases, AppClassLoader will be used.
5.  {{usrlib}} directory should not exist in {{yarn.provided.lib.dirs}} or 
{{yarn.ship-files}}. If users want to ship {{usrlib}}, they should just create 
{{usrlib}} under FLINK_HOME dir.

Thanks.

> Make usrlib work for YARN application and per-job
> -
>
> Key: FLINK-24897
> URL: https://issues.apache.org/jira/browse/FLINK-24897
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Assignee: Biao Geng
>Priority: Major
>  Labels: pull-request-available
>
> Hi there, 
> I am working to utilize application mode to submit flink jobs to YARN cluster 
> but I find that currently there is no easy way to ship my user-defined 
> jars(e.g. some custom connectors or udf jars that would be shared by some 
> jobs) and ask the FlinkUserCodeClassLoader to load classes in these jars. 
> I checked some relevant jiras, like  FLINK-21289. In k8s mode, there is a 
> solution that users can use `usrlib` directory to store their user-defined 
> jars and these jars would be loaded by FlinkUserCodeClassLoader when the job 
> is executed on JM/TM.
> But on YARN mode, `usrlib` does not work as that:
> In this method(org.apache.flink.yarn.YarnClusterDescriptor#addShipFiles), if 
> I want to use `yarn.ship-files` to ship `usrlib` from my flink client(in my 
> local machine) to remote cluster, I must not set  UserJarInclusion to 
> DISABLED due to the checkArgument(). However, if I do not set that option to 
> DISABLED, the user jars to be shipped will be added into systemClassPaths. As 
> a result, classes in those user jars will be loaded by AppClassLoader. 
> But if I do not ship these jars, there is no convenient way to utilize these 
> jars in my flink run command. Currently, all I can do seems to use `-C` 
> option, which means I have to upload my jars to some shared store first and 
> then use these remote paths. It is not so perfect as we have already make it 
> possible to ship jars or files directly and we also introduce `usrlib` in 
> application mode on YARN. It would be more user-friendly if we can allow 
> shipping `usrlib` from local to remote cluster while using 
> FlinkUserCodeClassLoader to load classes in the jars in `usrlib`.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26047) Support usrlib in HDFS for YARN application mode

2022-02-09 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489955#comment-17489955
 ] 

Biao Geng commented on FLINK-26047:
---

hi [~wangyang0918] Sure. I would take the ticket and start working ASAP.

> Support usrlib in HDFS for YARN application mode
> 
>
> Key: FLINK-26047
> URL: https://issues.apache.org/jira/browse/FLINK-26047
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Reporter: Biao Geng
>Priority: Major
>
> In YARN Application mode, we currently support using user jar and lib jar 
> from HDFS. For example, we can run commands like:
> {quote}./bin/flink run-application -t yarn-application \
>   -Dyarn.provided.lib.dirs="hdfs://myhdfs/my-remote-flink-dist-dir" \
>   hdfs://myhdfs/jars/my-application.jar{quote}
> For {{usrlib}}, we currently only support local directory. I propose to add 
> HDFS support for {{usrlib}} to work with CLASSPATH_INCLUDE_USER_JAR better. 
> It can also benefit cases like using notebook to submit flink job.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (FLINK-24682) Unify the -C option behavior in both yarn application and per-job mode

2022-02-09 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng resolved FLINK-24682.
---
Resolution: Won't Do

> Unify the -C option behavior in both yarn application and per-job mode
> --
>
> Key: FLINK-24682
> URL: https://issues.apache.org/jira/browse/FLINK-24682
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.12.3
> Environment: flink 1.12.3
> yarn 2.8.5
>Reporter: Biao Geng
>Priority: Major
>
> Recently, when switching the job submission mode from per-job mode to 
> application mode on yarn, we found the behavior of '-C' ('–-classpath') is 
> somehow misleading:
> In per-job mode, the `main()` method of the program is executed in the local 
> machine and '-C' option works well when we use it to specify some local user 
> jars like -C file://xx.jar.
> But in application mode, this option works differently: as the `main()` 
> method will be executed on the job manager in the cluster, it is unclear 
> where the url like `file://xx.jar` points. It seems that 
> `file://xx.jar` is located on the job manager machine in the cluster due 
> to the code. If that is true, it may mislead users as in per-job mode, it 
> refers to the the jars in the client machine. 
> In summary, if we can unify the -C option behavior in both yarn application 
> and per-job mode, it would help users to switch to application mode more 
> smoothly and more importantly, it makes it much easier to specify some local 
> jars, that should be loaded by UserClassLoader, on the client machine.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-24682) Unify the -C option behavior in both yarn application and per-job mode

2022-02-09 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489959#comment-17489959
 ] 

Biao Geng commented on FLINK-24682:
---

Per-job mode will be deprecate due to 
https://issues.apache.org/jira/browse/FLINK-25999

Some relevant use cases of loading user-specified jars with UserClassLoader can 
be achieved with https://issues.apache.org/jira/browse/FLINK-24897

Thus this Jira will be closed.

> Unify the -C option behavior in both yarn application and per-job mode
> --
>
> Key: FLINK-24682
> URL: https://issues.apache.org/jira/browse/FLINK-24682
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.12.3
> Environment: flink 1.12.3
> yarn 2.8.5
>Reporter: Biao Geng
>Priority: Major
>
> Recently, when switching the job submission mode from per-job mode to 
> application mode on yarn, we found the behavior of '-C' ('–-classpath') is 
> somehow misleading:
> In per-job mode, the `main()` method of the program is executed in the local 
> machine and '-C' option works well when we use it to specify some local user 
> jars like -C file://xx.jar.
> But in application mode, this option works differently: as the `main()` 
> method will be executed on the job manager in the cluster, it is unclear 
> where the url like `file://xx.jar` points. It seems that 
> `file://xx.jar` is located on the job manager machine in the cluster due 
> to the code. If that is true, it may mislead users as in per-job mode, it 
> refers to the the jars in the client machine. 
> In summary, if we can unify the -C option behavior in both yarn application 
> and per-job mode, it would help users to switch to application mode more 
> smoothly and more importantly, it makes it much easier to specify some local 
> jars, that should be loaded by UserClassLoader, on the client machine.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-27109) Improve the creation of ClusterRole in Flink K8s operator

2022-04-07 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-27109:
--
Priority: Not a Priority  (was: Major)

> Improve the creation of ClusterRole in Flink K8s operator
> -
>
> Key: FLINK-27109
> URL: https://issues.apache.org/jira/browse/FLINK-27109
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Priority: Not a Priority
>
> As the 
> [doc|https://kubernetes.io/docs/reference/access-authn-authz/rbac/#clusterrole-example]
>  of k8s said, ClusterRole is one kind of non-namespaced resource. 
> In our helm chart, we now define the ClusterRole with name 'flink-operator' 
> and the namespace field in metadata will be omitted. As a result, if a user 
> wants to install multiple flink-kubernetes-operator in different namespace, 
> the ClusterRole 'flink-operator' will be created multiple times. 
> Errors like
> {quote}Error: INSTALLATION FAILED: rendered manifests contain a resource that 
> already exists. Unable to continue with install: ClusterRole "flink-operator" 
> in namespace "" exists and cannot be imported into the current release: 
> invalid ownership metadata; annotation validation error: key 
> "meta.helm.sh/release-namespace" must equal "c-8725bcef1dc84d6f": current 
> value is "default"
> {quote}
> will be thrown.
> Solution 1 could be adding the namespace as a postfix in the name of 
> ClusterRole. 
> Solution 2 is to add if else check like 
> [this|https://stackoverflow.com/questions/65110332/clusterrole-exists-and-cannot-be-imported-into-the-current-release]
>  to avoid creating existed resource.  One important drawback of solution 2 is 
> that when uninstalling one helm release, the created ClusterRole will be 
> removed as well.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27109) Improve the creation of ClusterRole in Flink K8s operator

2022-04-07 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518699#comment-17518699
 ] 

Biao Geng commented on FLINK-27109:
---

[~wangyang0918]I see your point. It's my bad to ignore the usage of the 
`watchNamespaces`. I will utilize this field.

> Improve the creation of ClusterRole in Flink K8s operator
> -
>
> Key: FLINK-27109
> URL: https://issues.apache.org/jira/browse/FLINK-27109
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Priority: Major
>
> As the 
> [doc|https://kubernetes.io/docs/reference/access-authn-authz/rbac/#clusterrole-example]
>  of k8s said, ClusterRole is one kind of non-namespaced resource. 
> In our helm chart, we now define the ClusterRole with name 'flink-operator' 
> and the namespace field in metadata will be omitted. As a result, if a user 
> wants to install multiple flink-kubernetes-operator in different namespace, 
> the ClusterRole 'flink-operator' will be created multiple times. 
> Errors like
> {quote}Error: INSTALLATION FAILED: rendered manifests contain a resource that 
> already exists. Unable to continue with install: ClusterRole "flink-operator" 
> in namespace "" exists and cannot be imported into the current release: 
> invalid ownership metadata; annotation validation error: key 
> "meta.helm.sh/release-namespace" must equal "c-8725bcef1dc84d6f": current 
> value is "default"
> {quote}
> will be thrown.
> Solution 1 could be adding the namespace as a postfix in the name of 
> ClusterRole. 
> Solution 2 is to add if else check like 
> [this|https://stackoverflow.com/questions/65110332/clusterrole-exists-and-cannot-be-imported-into-the-current-release]
>  to avoid creating existed resource.  One important drawback of solution 2 is 
> that when uninstalling one helm release, the created ClusterRole will be 
> removed as well.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-27109) Improve the creation of ClusterRole in Flink K8s operator

2022-04-07 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-27109:
--
Description: 
As the 
[doc|https://kubernetes.io/docs/reference/access-authn-authz/rbac/#clusterrole-example]
 of k8s said, ClusterRole is one kind of non-namespaced resource. 
In our helm chart, we now define the ClusterRole with name 'flink-operator' and 
the namespace field in metadata will be omitted. As a result, if a user wants 
to install multiple flink-kubernetes-operator in different namespace, the 
ClusterRole 'flink-operator' will be created multiple times. 
Errors like
{quote}Error: INSTALLATION FAILED: rendered manifests contain a resource that 
already exists. Unable to continue with install: ClusterRole "flink-operator" 
in namespace "" exists and cannot be imported into the current release: invalid 
ownership metadata; annotation validation error: key 
"meta.helm.sh/release-namespace" must equal "c-8725bcef1dc84d6f": current value 
is "default"
{quote}
will be thrown.

One solution could be adding the namespace as a postfix in the name of 
ClusterRole.
Another possible solution is to add if else check like 
[this|https://stackoverflow.com/questions/65110332/clusterrole-exists-and-cannot-be-imported-into-the-current-release]
 to avoid creating existed resource.

  was:
As the 
[doc|https://kubernetes.io/docs/reference/access-authn-authz/rbac/#clusterrole-example]
 of k8s said, ClusterRole is one kind of non-namespaced resource. 
In our helm chart, we now define the ClusterRole with name 'flink-operator' and 
the namespace field in metadata will be omitted. As a result, if a user wants 
to install multiple flink-kubernetes-operator in different namespace, the 
ClusterRole 'flink-operator' will be created multiple times. 
Errors like
{quote}Error: INSTALLATION FAILED: rendered manifests contain a resource that 
already exists. Unable to continue with install: ClusterRole "flink-operator" 
in namespace "" exists and cannot be imported into the current release: invalid 
ownership metadata; annotation validation error: key 
"meta.helm.sh/release-namespace" must equal "c-8725bcef1dc84d6f": current value 
is "default"
{quote}
will be thrown.

One solution could be adding the namespace as a postfix in the name of 
ClusterRole.
Another possible solution is to add if else check to avoid creating existed 
resource.


> Improve the creation of ClusterRole in Flink K8s operator
> -
>
> Key: FLINK-27109
> URL: https://issues.apache.org/jira/browse/FLINK-27109
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Priority: Major
>
> As the 
> [doc|https://kubernetes.io/docs/reference/access-authn-authz/rbac/#clusterrole-example]
>  of k8s said, ClusterRole is one kind of non-namespaced resource. 
> In our helm chart, we now define the ClusterRole with name 'flink-operator' 
> and the namespace field in metadata will be omitted. As a result, if a user 
> wants to install multiple flink-kubernetes-operator in different namespace, 
> the ClusterRole 'flink-operator' will be created multiple times. 
> Errors like
> {quote}Error: INSTALLATION FAILED: rendered manifests contain a resource that 
> already exists. Unable to continue with install: ClusterRole "flink-operator" 
> in namespace "" exists and cannot be imported into the current release: 
> invalid ownership metadata; annotation validation error: key 
> "meta.helm.sh/release-namespace" must equal "c-8725bcef1dc84d6f": current 
> value is "default"
> {quote}
> will be thrown.
> One solution could be adding the namespace as a postfix in the name of 
> ClusterRole.
> Another possible solution is to add if else check like 
> [this|https://stackoverflow.com/questions/65110332/clusterrole-exists-and-cannot-be-imported-into-the-current-release]
>  to avoid creating existed resource.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27109) The naming pattern of ClusterRole in Flink K8s operator should consider namespace

2022-04-07 Thread Biao Geng (Jira)
Biao Geng created FLINK-27109:
-

 Summary: The naming pattern of ClusterRole in Flink K8s operator 
should consider namespace
 Key: FLINK-27109
 URL: https://issues.apache.org/jira/browse/FLINK-27109
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Biao Geng


As the 
[doc|https://kubernetes.io/docs/reference/access-authn-authz/rbac/#clusterrole-example]
 of k8s said, ClusterRole is one kind of non-namespaced resource. 
In our helm chart, we now define the ClusterRole with name 'flink-operator' and 
the namespace field in metadata will be omitted. As a result, if a user wants 
to install multiple flink-kubernetes-operator in different namespace, the 
ClusterRole 'flink-operator' will be created multiple times. 
Errors like
{quote}Error: INSTALLATION FAILED: rendered manifests contain a resource that 
already exists. Unable to continue with install: ClusterRole "flink-operator" 
in namespace "" exists and cannot be imported into the current release: invalid 
ownership metadata; annotation validation error: key 
"meta.helm.sh/release-namespace" must equal "c-8725bcef1dc84d6f": current value 
is "default"
{quote}
will be thrown.

One solution could be adding the namespace as a postfix in the name of 
ClusterRole.
Another possible solution is to add if else check to avoid creating existed 
resource.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-27109) Improve the creation of ClusterRole in Flink K8s operator

2022-04-07 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-27109:
--
Description: 
As the 
[doc|https://kubernetes.io/docs/reference/access-authn-authz/rbac/#clusterrole-example]
 of k8s said, ClusterRole is one kind of non-namespaced resource. 
In our helm chart, we now define the ClusterRole with name 'flink-operator' and 
the namespace field in metadata will be omitted. As a result, if a user wants 
to install multiple flink-kubernetes-operator in different namespace, the 
ClusterRole 'flink-operator' will be created multiple times. 
Errors like
{quote}Error: INSTALLATION FAILED: rendered manifests contain a resource that 
already exists. Unable to continue with install: ClusterRole "flink-operator" 
in namespace "" exists and cannot be imported into the current release: invalid 
ownership metadata; annotation validation error: key 
"meta.helm.sh/release-namespace" must equal "c-8725bcef1dc84d6f": current value 
is "default"
{quote}
will be thrown.

Solution 1 could be adding the namespace as a postfix in the name of 
ClusterRole. 
Solution 2 is to add if else check like 
[this|https://stackoverflow.com/questions/65110332/clusterrole-exists-and-cannot-be-imported-into-the-current-release]
 to avoid creating existed resource.  One important drawback of solution 2 is 
that when uninstalling one helm release, the created ClusterRole will be 
removed as well.

  was:
As the 
[doc|https://kubernetes.io/docs/reference/access-authn-authz/rbac/#clusterrole-example]
 of k8s said, ClusterRole is one kind of non-namespaced resource. 
In our helm chart, we now define the ClusterRole with name 'flink-operator' and 
the namespace field in metadata will be omitted. As a result, if a user wants 
to install multiple flink-kubernetes-operator in different namespace, the 
ClusterRole 'flink-operator' will be created multiple times. 
Errors like
{quote}Error: INSTALLATION FAILED: rendered manifests contain a resource that 
already exists. Unable to continue with install: ClusterRole "flink-operator" 
in namespace "" exists and cannot be imported into the current release: invalid 
ownership metadata; annotation validation error: key 
"meta.helm.sh/release-namespace" must equal "c-8725bcef1dc84d6f": current value 
is "default"
{quote}
will be thrown.

One solution could be adding the namespace as a postfix in the name of 
ClusterRole.
Another possible solution is to add if else check like 
[this|https://stackoverflow.com/questions/65110332/clusterrole-exists-and-cannot-be-imported-into-the-current-release]
 to avoid creating existed resource.


> Improve the creation of ClusterRole in Flink K8s operator
> -
>
> Key: FLINK-27109
> URL: https://issues.apache.org/jira/browse/FLINK-27109
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Priority: Major
>
> As the 
> [doc|https://kubernetes.io/docs/reference/access-authn-authz/rbac/#clusterrole-example]
>  of k8s said, ClusterRole is one kind of non-namespaced resource. 
> In our helm chart, we now define the ClusterRole with name 'flink-operator' 
> and the namespace field in metadata will be omitted. As a result, if a user 
> wants to install multiple flink-kubernetes-operator in different namespace, 
> the ClusterRole 'flink-operator' will be created multiple times. 
> Errors like
> {quote}Error: INSTALLATION FAILED: rendered manifests contain a resource that 
> already exists. Unable to continue with install: ClusterRole "flink-operator" 
> in namespace "" exists and cannot be imported into the current release: 
> invalid ownership metadata; annotation validation error: key 
> "meta.helm.sh/release-namespace" must equal "c-8725bcef1dc84d6f": current 
> value is "default"
> {quote}
> will be thrown.
> Solution 1 could be adding the namespace as a postfix in the name of 
> ClusterRole. 
> Solution 2 is to add if else check like 
> [this|https://stackoverflow.com/questions/65110332/clusterrole-exists-and-cannot-be-imported-into-the-current-release]
>  to avoid creating existed resource.  One important drawback of solution 2 is 
> that when uninstalling one helm release, the created ClusterRole will be 
> removed as well.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-27109) Improve the creation of ClusterRole in Flink K8s operator

2022-04-07 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng closed FLINK-27109.
-
Resolution: Won't Fix

> Improve the creation of ClusterRole in Flink K8s operator
> -
>
> Key: FLINK-27109
> URL: https://issues.apache.org/jira/browse/FLINK-27109
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Priority: Not a Priority
>
> As the 
> [doc|https://kubernetes.io/docs/reference/access-authn-authz/rbac/#clusterrole-example]
>  of k8s said, ClusterRole is one kind of non-namespaced resource. 
> In our helm chart, we now define the ClusterRole with name 'flink-operator' 
> and the namespace field in metadata will be omitted. As a result, if a user 
> wants to install multiple flink-kubernetes-operator in different namespace, 
> the ClusterRole 'flink-operator' will be created multiple times. 
> Errors like
> {quote}Error: INSTALLATION FAILED: rendered manifests contain a resource that 
> already exists. Unable to continue with install: ClusterRole "flink-operator" 
> in namespace "" exists and cannot be imported into the current release: 
> invalid ownership metadata; annotation validation error: key 
> "meta.helm.sh/release-namespace" must equal "c-8725bcef1dc84d6f": current 
> value is "default"
> {quote}
> will be thrown.
> Solution 1 could be adding the namespace as a postfix in the name of 
> ClusterRole. 
> Solution 2 is to add if else check like 
> [this|https://stackoverflow.com/questions/65110332/clusterrole-exists-and-cannot-be-imported-into-the-current-release]
>  to avoid creating existed resource.  One important drawback of solution 2 is 
> that when uninstalling one helm release, the created ClusterRole will be 
> removed as well.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-27109) Improve the creation of ClusterRole in Flink K8s operator

2022-04-07 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-27109:
--
Summary: Improve the creation of ClusterRole in Flink K8s operator  (was: 
The naming pattern of ClusterRole in Flink K8s operator should consider 
namespace)

> Improve the creation of ClusterRole in Flink K8s operator
> -
>
> Key: FLINK-27109
> URL: https://issues.apache.org/jira/browse/FLINK-27109
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Priority: Major
>
> As the 
> [doc|https://kubernetes.io/docs/reference/access-authn-authz/rbac/#clusterrole-example]
>  of k8s said, ClusterRole is one kind of non-namespaced resource. 
> In our helm chart, we now define the ClusterRole with name 'flink-operator' 
> and the namespace field in metadata will be omitted. As a result, if a user 
> wants to install multiple flink-kubernetes-operator in different namespace, 
> the ClusterRole 'flink-operator' will be created multiple times. 
> Errors like
> {quote}Error: INSTALLATION FAILED: rendered manifests contain a resource that 
> already exists. Unable to continue with install: ClusterRole "flink-operator" 
> in namespace "" exists and cannot be imported into the current release: 
> invalid ownership metadata; annotation validation error: key 
> "meta.helm.sh/release-namespace" must equal "c-8725bcef1dc84d6f": current 
> value is "default"
> {quote}
> will be thrown.
> One solution could be adding the namespace as a postfix in the name of 
> ClusterRole.
> Another possible solution is to add if else check to avoid creating existed 
> resource.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27109) Improve the creation of ClusterRole in Flink K8s operator

2022-04-07 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518673#comment-17518673
 ] 

Biao Geng commented on FLINK-27109:
---

cc [~morhidi] [~mbalassi] Do you have any suggestion?

> Improve the creation of ClusterRole in Flink K8s operator
> -
>
> Key: FLINK-27109
> URL: https://issues.apache.org/jira/browse/FLINK-27109
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Priority: Major
>
> As the 
> [doc|https://kubernetes.io/docs/reference/access-authn-authz/rbac/#clusterrole-example]
>  of k8s said, ClusterRole is one kind of non-namespaced resource. 
> In our helm chart, we now define the ClusterRole with name 'flink-operator' 
> and the namespace field in metadata will be omitted. As a result, if a user 
> wants to install multiple flink-kubernetes-operator in different namespace, 
> the ClusterRole 'flink-operator' will be created multiple times. 
> Errors like
> {quote}Error: INSTALLATION FAILED: rendered manifests contain a resource that 
> already exists. Unable to continue with install: ClusterRole "flink-operator" 
> in namespace "" exists and cannot be imported into the current release: 
> invalid ownership metadata; annotation validation error: key 
> "meta.helm.sh/release-namespace" must equal "c-8725bcef1dc84d6f": current 
> value is "default"
> {quote}
> will be thrown.
> Solution 1 could be adding the namespace as a postfix in the name of 
> ClusterRole. 
> Solution 2 is to add if else check like 
> [this|https://stackoverflow.com/questions/65110332/clusterrole-exists-and-cannot-be-imported-into-the-current-release]
>  to avoid creating existed resource.  One important drawback of solution 2 is 
> that when uninstalling one helm release, the created ClusterRole will be 
> removed as well.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27329) Add default value of replica of JM pod and remove declaring it in example yamls

2022-04-20 Thread Biao Geng (Jira)
Biao Geng created FLINK-27329:
-

 Summary: Add default value of replica of JM pod and remove 
declaring it in example yamls
 Key: FLINK-27329
 URL: https://issues.apache.org/jira/browse/FLINK-27329
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Biao Geng


Currently, we do not explicitly set the default value of `replica` in 
`JobManagerSpec`. As a result, Java sets the default value to be zero. 
Besides, in our examples, we explicitly declare `replica` in `JobManagerSpec` 
to be 1. 
After a deeper look when debugging the exception thrown in FLINK-27310, we find 
it would be better to set the default value to 1 for `replica` fields and 
remove the declaration in examples due to following reasons:
1. A normal Session or Application cluster should have at least one JM. The 
current default value, zero, does not follow the common case.
2. One JM can work for k8s HA mode as well and if users really want to launch a 
standby JM for faster recorvery, they can declare the `replica` field in the 
yaml file. In examples, we just use the new default valu(i.e. 1) should be fine.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FLINK-27329) Add default value of replica of JM pod and not declare it in example yamls

2022-04-20 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-27329:
--
Description: 
Currently, we do not explicitly set the default value of `replica` in 
`JobManagerSpec`. As a result, Java sets the default value to be zero. 
Besides, in our examples, we explicitly declare `replica` in `JobManagerSpec` 
to be 1. 
After a deeper look when debugging the exception thrown in FLINK-27310, we find 
it would be better to set the default value to 1 for the `replica` field and 
remove the declaration in examples due to following reasons:
1. A normal Session or Application cluster should have at least one JM. The 
current default value, zero, does not follow the common case.
2. One JM can work for k8s HA mode as well and if users really want to launch a 
standby JM for faster recorvery, they can declare the value of `replica` field 
in the yaml file. In examples, we just use the new default value(i.e. 1), which 
should be fine.

  was:
Currently, we do not explicitly set the default value of `replica` in 
`JobManagerSpec`. As a result, Java sets the default value to be zero. 
Besides, in our examples, we explicitly declare `replica` in `JobManagerSpec` 
to be 1. 
After a deeper look when debugging the exception thrown in FLINK-27310, we find 
it would be better to set the default value to 1 for `replica` fields and 
remove the declaration in examples due to following reasons:
1. A normal Session or Application cluster should have at least one JM. The 
current default value, zero, does not follow the common case.
2. One JM can work for k8s HA mode as well and if users really want to launch a 
standby JM for faster recorvery, they can declare the `replica` field in the 
yaml file. In examples, we just use the new default valu(i.e. 1) should be fine.


> Add default value of replica of JM pod and not declare it in example yamls
> --
>
> Key: FLINK-27329
> URL: https://issues.apache.org/jira/browse/FLINK-27329
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Assignee: Biao Geng
>Priority: Major
>
> Currently, we do not explicitly set the default value of `replica` in 
> `JobManagerSpec`. As a result, Java sets the default value to be zero. 
> Besides, in our examples, we explicitly declare `replica` in `JobManagerSpec` 
> to be 1. 
> After a deeper look when debugging the exception thrown in FLINK-27310, we 
> find it would be better to set the default value to 1 for the `replica` field 
> and remove the declaration in examples due to following reasons:
> 1. A normal Session or Application cluster should have at least one JM. The 
> current default value, zero, does not follow the common case.
> 2. One JM can work for k8s HA mode as well and if users really want to launch 
> a standby JM for faster recorvery, they can declare the value of `replica` 
> field in the yaml file. In examples, we just use the new default value(i.e. 
> 1), which should be fine.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FLINK-27329) Add default value of replica of JM pod and not declare it in example yamls

2022-04-20 Thread Biao Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Geng updated FLINK-27329:
--
Summary: Add default value of replica of JM pod and not declare it in 
example yamls  (was: Add default value of replica of JM pod and remove 
declaring it in example yamls)

> Add default value of replica of JM pod and not declare it in example yamls
> --
>
> Key: FLINK-27329
> URL: https://issues.apache.org/jira/browse/FLINK-27329
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Assignee: Biao Geng
>Priority: Major
>
> Currently, we do not explicitly set the default value of `replica` in 
> `JobManagerSpec`. As a result, Java sets the default value to be zero. 
> Besides, in our examples, we explicitly declare `replica` in `JobManagerSpec` 
> to be 1. 
> After a deeper look when debugging the exception thrown in FLINK-27310, we 
> find it would be better to set the default value to 1 for `replica` fields 
> and remove the declaration in examples due to following reasons:
> 1. A normal Session or Application cluster should have at least one JM. The 
> current default value, zero, does not follow the common case.
> 2. One JM can work for k8s HA mode as well and if users really want to launch 
> a standby JM for faster recorvery, they can declare the `replica` field in 
> the yaml file. In examples, we just use the new default valu(i.e. 1) should 
> be fine.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-27309) Allow to load default flink configs in the k8s operator dynamically

2022-04-20 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525012#comment-17525012
 ] 

Biao Geng commented on FLINK-27309:
---

Thanks [~gyfora]! Besides [~aitozi]'s valuable point, I also think about if the 
changed default configs should take effect for a running job deployment. 
>From k8s side, it may make sense that such change should be considered as 
>`specChanged`, and all the running job will be suspended and restarted.
I am not sure it is proper to restart running jobs(i.e. changed default configs 
affect running jobs) from Flink users' side.



> Allow to load default flink configs in the k8s operator dynamically
> ---
>
> Key: FLINK-27309
> URL: https://issues.apache.org/jira/browse/FLINK-27309
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Assignee: Gyula Fora
>Priority: Major
>
> Current default configs used by the k8s operator will be saved in the 
> /opt/flink/conf dir in the k8s operator pod and will be loaded only once when 
> the operator is created.
> Since the flink k8s operator could be a long running service and users may 
> want to modify the default configs(e.g the metric reporter sampling interval) 
> for newly created deployments, it may better to load the default configs 
> dynamically(i.e. parsing the latest /opt/flink/conf/flink-conf.yaml) in the 
> {{ReconcilerFactory}} and {{ObserverFactory}}, instead of redeploying the 
> operator.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-27329) Add default value of replica of JM pod and remove declaring it in example yamls

2022-04-20 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525020#comment-17525020
 ] 

Biao Geng commented on FLINK-27329:
---

cc [~wangyang0918] 

I can take this ticket~

> Add default value of replica of JM pod and remove declaring it in example 
> yamls
> ---
>
> Key: FLINK-27329
> URL: https://issues.apache.org/jira/browse/FLINK-27329
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Priority: Major
>
> Currently, we do not explicitly set the default value of `replica` in 
> `JobManagerSpec`. As a result, Java sets the default value to be zero. 
> Besides, in our examples, we explicitly declare `replica` in `JobManagerSpec` 
> to be 1. 
> After a deeper look when debugging the exception thrown in FLINK-27310, we 
> find it would be better to set the default value to 1 for `replica` fields 
> and remove the declaration in examples due to following reasons:
> 1. A normal Session or Application cluster should have at least one JM. The 
> current default value, zero, does not follow the common case.
> 2. One JM can work for k8s HA mode as well and if users really want to launch 
> a standby JM for faster recorvery, they can declare the `replica` field in 
> the yaml file. In examples, we just use the new default valu(i.e. 1) should 
> be fine.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-24960) YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots ha

2022-04-20 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-24960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17524860#comment-17524860
 ] 

Biao Geng commented on FLINK-24960:
---

Hi [~mapohl], I tried to do some research but in latest 
[failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=34743=logs=298e20ef-7951-5965-0e79-ea664ddc435e=d4c90338-c843-57b0-3232-10ae74f00347=36026]
 posted by Yun, I did not find the {{Extracted hostname:port: }} was not shown. 
Though in the description of this ticket, it shows  {{Extracted hostname:port: 
5718b812c7ab:38622}} in the old CI test, which seems to be correct. 
I plan to verify {{yarnSessionClusterRunner.sendStop();}}  works fine(i.e. the 
session cluster will be stopped normally) first but I have not found a way to 
run the cron test's "test_cron_jdk11 misc" test only on the my own [azure 
pipeline|https://dev.azure.com/samuelgeng7/Flink/_build/results?buildId=109=results],
 which made the verification pretty slow and hard. Is there any guidelines 
about debugging the azure pipeline with some specific tests?

> YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots
>  hangs on azure
> ---
>
> Key: FLINK-24960
> URL: https://issues.apache.org/jira/browse/FLINK-24960
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.15.0, 1.14.3
>Reporter: Yun Gao
>Assignee: Niklas Semmler
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.16.0
>
>
> {code:java}
> Nov 18 22:37:08 
> 
> Nov 18 22:37:08 Test 
> testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots(org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase)
>  is running.
> Nov 18 22:37:08 
> 
> Nov 18 22:37:25 22:37:25,470 [main] INFO  
> org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase [] - Extracted 
> hostname:port: 5718b812c7ab:38622
> Nov 18 22:52:36 
> ==
> Nov 18 22:52:36 Process produced no output for 900 seconds.
> Nov 18 22:52:36 
> ==
>  {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=26722=logs=f450c1a5-64b1-5955-e215-49cb1ad5ec88=cc452273-9efa-565d-9db8-ef62a38a0c10=36395



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-27329) Add default value of replica of JM pod and not declare it in example yamls

2022-04-21 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525432#comment-17525432
 ] 

Biao Geng commented on FLINK-27329:
---

[~wangyang0918] Reviewing all default values makes sense to me. I will try to 
check specs under \{{org.apache.flink.kubernetes.operator.crd.spec}} package to 
find if there are some fields like JM replica that do not have a proper default 
value.

> Add default value of replica of JM pod and not declare it in example yamls
> --
>
> Key: FLINK-27329
> URL: https://issues.apache.org/jira/browse/FLINK-27329
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Biao Geng
>Assignee: Biao Geng
>Priority: Major
>
> Currently, we do not explicitly set the default value of `replica` in 
> `JobManagerSpec`. As a result, Java sets the default value to be zero. 
> Besides, in our examples, we explicitly declare `replica` in `JobManagerSpec` 
> to be 1. 
> After a deeper look when debugging the exception thrown in FLINK-27310, we 
> find it would be better to set the default value to 1 for the `replica` field 
> and remove the declaration in examples due to following reasons:
> 1. A normal Session or Application cluster should have at least one JM. The 
> current default value, zero, does not follow the common case.
> 2. One JM can work for k8s HA mode as well and if users really want to launch 
> a standby JM for faster recorvery, they can declare the value of `replica` 
> field in the yaml file. In examples, we just use the new default value(i.e. 
> 1), which should be fine.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (FLINK-27310) FlinkOperatorITCase failure due to JobManager replicas less than one

2022-04-19 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17524291#comment-17524291
 ] 

Biao Geng edited comment on FLINK-27310 at 4/19/22 1:00 PM:


It seems that this ITCase is not really executed due to its naming(not follow 
*Test*  pattern). 
For the error in the posted log, I believe it is because we have not added 
{{jm.setReplicas(1);}} to set the the replica of JM.


was (Author: bgeng777):
It seems that this ITCase is not really executed due to its naming(not follow 
*Test* pattern). 
For the error in the posted log, I believe it is because we have not added 
{{jm.setReplicas(1);}} to set the the replica of JM.


> FlinkOperatorITCase failure due to JobManager replicas less than one
> 
>
> Key: FLINK-27310
> URL: https://issues.apache.org/jira/browse/FLINK-27310
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Reporter: Usamah Jassat
>Priority: Minor
>
> The FlinkOperatorITCase test is currently failing, even in the CI pipeline 
>  
> {code:java}
> INFO] Running org.apache.flink.kubernetes.operator.FlinkOperatorITCase
>   
>   
> 
> Error:  Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 3.178 s <<< FAILURE! - in 
> org.apache.flink.kubernetes.operator.FlinkOperatorITCase
>   
>   
> 
> Error:  org.apache.flink.kubernetes.operator.FlinkOperatorITCase.test  
> Time elapsed: 2.664 s  <<< ERROR!
>   
>   
> 
> io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: POST at: 
> https://192.168.49.2:8443/apis/flink.apache.org/v1beta1/namespaces/flink-operator-test/flinkdeployments.
>  Message: Forbidden! User minikube doesn't have permission. admission webhook 
> "vflinkdeployments.flink.apache.org" denied the request: JobManager replicas 
> should not be configured less than one..
>   
>   
> 
>   at 
> flink.kubernetes.operator@1.0-SNAPSHOT/org.apache.flink.kubernetes.operator.FlinkOperatorITCase.test(FlinkOperatorITCase.java:86){code}
>  
> While the test is failing the CI test run is passing which also should be 
> fixed then to fail on the test failure.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (FLINK-27310) FlinkOperatorITCase failure due to JobManager replicas less than one

2022-04-19 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17524291#comment-17524291
 ] 

Biao Geng edited comment on FLINK-27310 at 4/19/22 1:00 PM:


It seems that this ITCase is not really executed due to its naming(not follow 
*Test{*}*{*}  pattern). 
For the error in the posted log, I believe it is because we have not added 
{{jm.setReplicas(1);}} to set the the replica of JM.


was (Author: bgeng777):
It seems that this ITCase is not really executed due to its naming(not follow 
*Test*  pattern). 
For the error in the posted log, I believe it is because we have not added 
{{jm.setReplicas(1);}} to set the the replica of JM.

> FlinkOperatorITCase failure due to JobManager replicas less than one
> 
>
> Key: FLINK-27310
> URL: https://issues.apache.org/jira/browse/FLINK-27310
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Reporter: Usamah Jassat
>Priority: Minor
>
> The FlinkOperatorITCase test is currently failing, even in the CI pipeline 
>  
> {code:java}
> INFO] Running org.apache.flink.kubernetes.operator.FlinkOperatorITCase
>   
>   
> 
> Error:  Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 3.178 s <<< FAILURE! - in 
> org.apache.flink.kubernetes.operator.FlinkOperatorITCase
>   
>   
> 
> Error:  org.apache.flink.kubernetes.operator.FlinkOperatorITCase.test  
> Time elapsed: 2.664 s  <<< ERROR!
>   
>   
> 
> io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: POST at: 
> https://192.168.49.2:8443/apis/flink.apache.org/v1beta1/namespaces/flink-operator-test/flinkdeployments.
>  Message: Forbidden! User minikube doesn't have permission. admission webhook 
> "vflinkdeployments.flink.apache.org" denied the request: JobManager replicas 
> should not be configured less than one..
>   
>   
> 
>   at 
> flink.kubernetes.operator@1.0-SNAPSHOT/org.apache.flink.kubernetes.operator.FlinkOperatorITCase.test(FlinkOperatorITCase.java:86){code}
>  
> While the test is failing the CI test run is passing which also should be 
> fixed then to fail on the test failure.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (FLINK-27310) FlinkOperatorITCase failure due to JobManager replicas less than one

2022-04-19 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17524291#comment-17524291
 ] 

Biao Geng edited comment on FLINK-27310 at 4/19/22 1:01 PM:


It seems that this ITCase is not really executed due to its naming(not follow 
*Test* pattern). 
For the error in the posted log, I believe it is because we have not added 
{{jm.setReplicas(1);}} to set the the replica of JM.


was (Author: bgeng777):
It seems that this ITCase is not really executed due to its naming(not follow 
*Test{*}*{*}  pattern). 
For the error in the posted log, I believe it is because we have not added 
{{jm.setReplicas(1);}} to set the the replica of JM.

> FlinkOperatorITCase failure due to JobManager replicas less than one
> 
>
> Key: FLINK-27310
> URL: https://issues.apache.org/jira/browse/FLINK-27310
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Reporter: Usamah Jassat
>Priority: Minor
>
> The FlinkOperatorITCase test is currently failing, even in the CI pipeline 
>  
> {code:java}
> INFO] Running org.apache.flink.kubernetes.operator.FlinkOperatorITCase
>   
>   
> 
> Error:  Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 3.178 s <<< FAILURE! - in 
> org.apache.flink.kubernetes.operator.FlinkOperatorITCase
>   
>   
> 
> Error:  org.apache.flink.kubernetes.operator.FlinkOperatorITCase.test  
> Time elapsed: 2.664 s  <<< ERROR!
>   
>   
> 
> io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: POST at: 
> https://192.168.49.2:8443/apis/flink.apache.org/v1beta1/namespaces/flink-operator-test/flinkdeployments.
>  Message: Forbidden! User minikube doesn't have permission. admission webhook 
> "vflinkdeployments.flink.apache.org" denied the request: JobManager replicas 
> should not be configured less than one..
>   
>   
> 
>   at 
> flink.kubernetes.operator@1.0-SNAPSHOT/org.apache.flink.kubernetes.operator.FlinkOperatorITCase.test(FlinkOperatorITCase.java:86){code}
>  
> While the test is failing the CI test run is passing which also should be 
> fixed then to fail on the test failure.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (FLINK-27310) FlinkOperatorITCase failure due to JobManager replicas less than one

2022-04-19 Thread Biao Geng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17524291#comment-17524291
 ] 

Biao Geng edited comment on FLINK-27310 at 4/19/22 1:01 PM:


It seems that this ITCase is not really executed due to its naming(not follow 
**Test * pattern). 
For the error in the posted log, I believe it is because we have not added 
{{jm.setReplicas(1);}} to set the the replica of JM.


was (Author: bgeng777):
It seems that this ITCase is not really executed due to its naming(not follow 
*Test* pattern). 
For the error in the posted log, I believe it is because we have not added 
{{jm.setReplicas(1);}} to set the the replica of JM.

> FlinkOperatorITCase failure due to JobManager replicas less than one
> 
>
> Key: FLINK-27310
> URL: https://issues.apache.org/jira/browse/FLINK-27310
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Reporter: Usamah Jassat
>Priority: Minor
>
> The FlinkOperatorITCase test is currently failing, even in the CI pipeline 
>  
> {code:java}
> INFO] Running org.apache.flink.kubernetes.operator.FlinkOperatorITCase
>   
>   
> 
> Error:  Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 3.178 s <<< FAILURE! - in 
> org.apache.flink.kubernetes.operator.FlinkOperatorITCase
>   
>   
> 
> Error:  org.apache.flink.kubernetes.operator.FlinkOperatorITCase.test  
> Time elapsed: 2.664 s  <<< ERROR!
>   
>   
> 
> io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: POST at: 
> https://192.168.49.2:8443/apis/flink.apache.org/v1beta1/namespaces/flink-operator-test/flinkdeployments.
>  Message: Forbidden! User minikube doesn't have permission. admission webhook 
> "vflinkdeployments.flink.apache.org" denied the request: JobManager replicas 
> should not be configured less than one..
>   
>   
> 
>   at 
> flink.kubernetes.operator@1.0-SNAPSHOT/org.apache.flink.kubernetes.operator.FlinkOperatorITCase.test(FlinkOperatorITCase.java:86){code}
>  
> While the test is failing the CI test run is passing which also should be 
> fixed then to fail on the test failure.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


  1   2   >