[jira] [Commented] (FLINK-14317) make hadoop history server logs accessible through flink ui
[ https://issues.apache.org/jira/browse/FLINK-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416277#comment-17416277 ] Ethan Li commented on FLINK-14317: -- Hello! May I ask what's the current plan for this? We have encountered similar issues. This feature is really important for making flink-on-yarn per-job mode easy to use. Thanks! > make hadoop history server logs accessible through flink ui > --- > > Key: FLINK-14317 > URL: https://issues.apache.org/jira/browse/FLINK-14317 > Project: Flink > Issue Type: Improvement > Components: Runtime / Task, Runtime / Web Frontend >Reporter: Yu Yang >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Currently if Flink jobs run on Yarn, the task manager logs are not accessible > through flink history server ui. And there is no straightforward way for > users to find yarn logs of specific tasks from completed logs. Making task > manager logs accessible through flink UI will help to improve developer > productivity. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-17641) How to secure flink applications on yarn on multi-tenant environment
[ https://issues.apache.org/jira/browse/FLINK-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236289#comment-17236289 ] Ethan Li edited comment on FLINK-17641 at 11/20/20, 5:06 PM: - Thanks very much for your reply [~rmetzger]. The suggestions are helpful. Sorry I haven't been able to come back to this Jira and reply. We currently have a solution for this issue (or at least part of it) internally and I'd like to share it once we have it working in production. Thanks! was (Author: ethanli): Thanks very much for your reply [~rmetzger]. The suggestions are helpful. Sorry I haven't been able to come back to this Jira and reply. We currently have a solution for this issue internally and I'd like to share it once we have it working in production. Thanks! > How to secure flink applications on yarn on multi-tenant environment > > > Key: FLINK-17641 > URL: https://issues.apache.org/jira/browse/FLINK-17641 > Project: Flink > Issue Type: Wish > Components: Deployment / YARN >Reporter: Ethan Li >Priority: Major > > This is a question I wish to get some insights on. > We are trying to support and secure flink on shared yarn cluster. Besides the > security provided by yarn side (queueACL, kerberos), what I noticed is that > flink CLI can still interact with the flink job as long as it knows the > jobmanager rpc port/hostname and rest.port, which can be obtained easily with > yarn command. > Also on the UI side, on yarn cluster, users can visit flink job UI via yarn > proxy using browser. As long as the user can authenticate and view yarn > resourcemanager webpage, he/she can visit the flink UI without any problem. > This basically means Flink UI is wide-open to corp internal users. > On the internal connection side, I am aware of the support added in 1.10 to > limit the mTLS connection by configuring > security.ssl.internal.cert.fingerprint > (https://ci.apache.org/projects/flink/flink-docs-stable/ops/security-ssl.html) > This works but it is not very flexible. Users need to update the config if > the cert changes before they submit a new job. > I asked the similar question on the mailing list before. I am really > interested in how other folks deal with this issue. Thanks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17641) How to secure flink applications on yarn on multi-tenant environment
[ https://issues.apache.org/jira/browse/FLINK-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236289#comment-17236289 ] Ethan Li commented on FLINK-17641: -- Thanks very much for your reply [~rmetzger]. The suggestions are helpful. Sorry I haven't been able to come back to this Jira and reply. We currently have a solution for this issue internally and I'd like to share it once we have it working in production. Thanks! > How to secure flink applications on yarn on multi-tenant environment > > > Key: FLINK-17641 > URL: https://issues.apache.org/jira/browse/FLINK-17641 > Project: Flink > Issue Type: Wish > Components: Deployment / YARN >Reporter: Ethan Li >Priority: Major > > This is a question I wish to get some insights on. > We are trying to support and secure flink on shared yarn cluster. Besides the > security provided by yarn side (queueACL, kerberos), what I noticed is that > flink CLI can still interact with the flink job as long as it knows the > jobmanager rpc port/hostname and rest.port, which can be obtained easily with > yarn command. > Also on the UI side, on yarn cluster, users can visit flink job UI via yarn > proxy using browser. As long as the user can authenticate and view yarn > resourcemanager webpage, he/she can visit the flink UI without any problem. > This basically means Flink UI is wide-open to corp internal users. > On the internal connection side, I am aware of the support added in 1.10 to > limit the mTLS connection by configuring > security.ssl.internal.cert.fingerprint > (https://ci.apache.org/projects/flink/flink-docs-stable/ops/security-ssl.html) > This works but it is not very flexible. Users need to update the config if > the cert changes before they submit a new job. > I asked the similar question on the mailing list before. I am really > interested in how other folks deal with this issue. Thanks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17641) How to secure flink applications on yarn on multi-tenant environment
Ethan Li created FLINK-17641: Summary: How to secure flink applications on yarn on multi-tenant environment Key: FLINK-17641 URL: https://issues.apache.org/jira/browse/FLINK-17641 Project: Flink Issue Type: Wish Reporter: Ethan Li This is a question I wish to get some insights on. We are trying to support and secure flink on shared yarn cluster. Besides the security provided by yarn side (queueACL, kerberos), what I noticed is that flink CLI can still interact with the flink job as long as it knows the jobmanager rpc port/hostname and rest.port, which can be obtained easily with yarn command. Also on the UI side, on yarn cluster, users can visit flink job UI via yarn proxy using browser. As long as the user can authenticate and view yarn resourcemanager webpage, he/she can visit the flink UI without any problem. This basically means Flink UI is wide-open to corp internal users. On the internal connection side, I am aware of the support added in 1.10 to limit the mTLS connection by configuring security.ssl.internal.cert.fingerprint (https://ci.apache.org/projects/flink/flink-docs-stable/ops/security-ssl.html) This works but it is not very flexible. Users need to update the config if the cert changes before they submit a new job. I asked the similar question on the mailing list before. I am really interested in how other folks deal with this issue. Thanks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-16517) Add a long running WordCount example
[ https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061327#comment-17061327 ] Ethan Li edited comment on FLINK-16517 at 3/18/20, 2:26 AM: Thanks [~aljoscha] I put up a pull request to add the new unbounded source. I did not deal with FileProcessingMode in this pr because it seems to require some changes in [readTextFile|#L1085] and that might impact other components/code. I am willing to file a separate issue/PR if you think it makes sense to do so. was (Author: ethanli): Thanks [~aljoscha] I put up a pull request to add the new unbounded source. I did not deal with FileProcessingMode in this pr because it seems to require some changes in "[readTextFile|#L1085]] and that might impact other components/code. I am willing to file a separate issue/PR if you think it makes sense to do so. > Add a long running WordCount example > > > Key: FLINK-16517 > URL: https://issues.apache.org/jira/browse/FLINK-16517 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As far as I know, flink doesn't have a long running WordCount example for > users to start with or doing some simple tests. > The closest one is SocketWindowWordCount. But it requires setting up a server > (nc -l ), which is not hard, but still tedious for simple use cases. And it > requires human input for the job to actually run. > I propose to add or modify current WordCount example to have a SourceFunction > that randomly generates input data based on a set of sentences, so the > WordCount job can run forever. The generation ratio will be configurable. > This will be the easiest way to start a long running flink job and can be > useful for new users to start using flink quickly, or for developers to test > flink easily. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-16517) Add a long running WordCount example
[ https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061327#comment-17061327 ] Ethan Li edited comment on FLINK-16517 at 3/18/20, 2:26 AM: Thanks [~aljoscha] I put up a pull request to add the new unbounded source. I did not deal with FileProcessingMode in this pr because it seems to require some changes in [readTextFile|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L1085] and that might impact other components/code. I am willing to file a separate issue/PR if you think it makes sense to do so. was (Author: ethanli): Thanks [~aljoscha] I put up a pull request to add the new unbounded source. I did not deal with FileProcessingMode in this pr because it seems to require some changes in [readTextFile|#L1085] and that might impact other components/code. I am willing to file a separate issue/PR if you think it makes sense to do so. > Add a long running WordCount example > > > Key: FLINK-16517 > URL: https://issues.apache.org/jira/browse/FLINK-16517 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As far as I know, flink doesn't have a long running WordCount example for > users to start with or doing some simple tests. > The closest one is SocketWindowWordCount. But it requires setting up a server > (nc -l ), which is not hard, but still tedious for simple use cases. And it > requires human input for the job to actually run. > I propose to add or modify current WordCount example to have a SourceFunction > that randomly generates input data based on a set of sentences, so the > WordCount job can run forever. The generation ratio will be configurable. > This will be the easiest way to start a long running flink job and can be > useful for new users to start using flink quickly, or for developers to test > flink easily. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-16517) Add a long running WordCount example
[ https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061327#comment-17061327 ] Ethan Li edited comment on FLINK-16517 at 3/18/20, 2:25 AM: Thanks [~aljoscha] I put up a pull request to add the new unbounded source. I did not deal with FileProcessingMode in this pr because it seems to require some changes in "[readTextFile|#L1085]] and that might impact other components/code. I am willing to file a separate issue/PR if you think it makes sense to do so. was (Author: ethanli): Thanks [~aljoscha] I put up a pull request to add the new unbounded source. I did not deal with FileProcessingMode in this pr because it seems to require some changes in "[readTextFile|[https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L1085]] and that might impact other components/code. I am willing to file a separate issue/PR if you think it makes sense to do so. > Add a long running WordCount example > > > Key: FLINK-16517 > URL: https://issues.apache.org/jira/browse/FLINK-16517 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As far as I know, flink doesn't have a long running WordCount example for > users to start with or doing some simple tests. > The closest one is SocketWindowWordCount. But it requires setting up a server > (nc -l ), which is not hard, but still tedious for simple use cases. And it > requires human input for the job to actually run. > I propose to add or modify current WordCount example to have a SourceFunction > that randomly generates input data based on a set of sentences, so the > WordCount job can run forever. The generation ratio will be configurable. > This will be the easiest way to start a long running flink job and can be > useful for new users to start using flink quickly, or for developers to test > flink easily. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16517) Add a long running WordCount example
[ https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061327#comment-17061327 ] Ethan Li commented on FLINK-16517: -- Thanks [~aljoscha] I put up a pull request to add the new unbounded source. I did not deal with FileProcessingMode in this pr because it seems to require some changes in "[readTextFile|[https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L1085]] and that might impact other components/code. I am willing to file a separate issue/PR if you think it makes sense to do so. > Add a long running WordCount example > > > Key: FLINK-16517 > URL: https://issues.apache.org/jira/browse/FLINK-16517 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As far as I know, flink doesn't have a long running WordCount example for > users to start with or doing some simple tests. > The closest one is SocketWindowWordCount. But it requires setting up a server > (nc -l ), which is not hard, but still tedious for simple use cases. And it > requires human input for the job to actually run. > I propose to add or modify current WordCount example to have a SourceFunction > that randomly generates input data based on a set of sentences, so the > WordCount job can run forever. The generation ratio will be configurable. > This will be the easiest way to start a long running flink job and can be > useful for new users to start using flink quickly, or for developers to test > flink easily. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-16517) Add a long running WordCount example
[ https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056571#comment-17056571 ] Ethan Li edited comment on FLINK-16517 at 3/11/20, 1:31 AM: [~aljoscha] Thanks for the link. I feel like it's still not simple enough for starters. I am looking for a very simple example so starters can focus on making their first flink job running. WordCount example is a like a hello-world program in streaming. We can modify current WordCount example to take a new option so a new Source can generate data randomly. It will not change the current code too much. was (Author: ethanli): [~aljoscha] Thanks for the link. I feel like it's still not simple enough for starters. I am looking for a very simple example so starters can focus on making their first flink job running. WordCount example is a like a hello-world program in streaming. We can modify current WordCount example to take a new option so a new Source can generate data randomly. It will change the current code too much. > Add a long running WordCount example > > > Key: FLINK-16517 > URL: https://issues.apache.org/jira/browse/FLINK-16517 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Ethan Li >Priority: Minor > > As far as I know, flink doesn't have a long running WordCount example for > users to start with or doing some simple tests. > The closest one is SocketWindowWordCount. But it requires setting up a server > (nc -l ), which is not hard, but still tedious for simple use cases. And it > requires human input for the job to actually run. > I propose to add or modify current WordCount example to have a SourceFunction > that randomly generates input data based on a set of sentences, so the > WordCount job can run forever. The generation ratio will be configurable. > This will be the easiest way to start a long running flink job and can be > useful for new users to start using flink quickly, or for developers to test > flink easily. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-16517) Add a long running WordCount example
[ https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056571#comment-17056571 ] Ethan Li edited comment on FLINK-16517 at 3/11/20, 1:30 AM: [~aljoscha] Thanks for the link. I feel like it's still not simple enough for starters. I am looking for a very simple example so starters can focus on making their first flink job running. WordCount example is a like a hello-world program in streaming. We can modify current WordCount example to take a new option so a new Source can generate data randomly. It will change the current code too much. was (Author: ethanli): [~aljoscha] Thanks for the link. I feel like it's still not simple enough for starters. I am looking for a very simple example so starters can focus on making their first flink job running. WordCount example is a like a hello-world program in streaming. > Add a long running WordCount example > > > Key: FLINK-16517 > URL: https://issues.apache.org/jira/browse/FLINK-16517 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Ethan Li >Priority: Minor > > As far as I know, flink doesn't have a long running WordCount example for > users to start with or doing some simple tests. > The closest one is SocketWindowWordCount. But it requires setting up a server > (nc -l ), which is not hard, but still tedious for simple use cases. And it > requires human input for the job to actually run. > I propose to add or modify current WordCount example to have a SourceFunction > that randomly generates input data based on a set of sentences, so the > WordCount job can run forever. The generation ratio will be configurable. > This will be the easiest way to start a long running flink job and can be > useful for new users to start using flink quickly, or for developers to test > flink easily. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16517) Add a long running WordCount example
[ https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056571#comment-17056571 ] Ethan Li commented on FLINK-16517: -- [~aljoscha] Thanks for the link. I feel like it's still not simple enough for starters. I am looking for a very simple example so starters can focus on making their first flink job running. WordCount example is a like a hello-world program in streaming. > Add a long running WordCount example > > > Key: FLINK-16517 > URL: https://issues.apache.org/jira/browse/FLINK-16517 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Ethan Li >Priority: Minor > > As far as I know, flink doesn't have a long running WordCount example for > users to start with or doing some simple tests. > The closest one is SocketWindowWordCount. But it requires setting up a server > (nc -l ), which is not hard, but still tedious for simple use cases. And it > requires human input for the job to actually run. > I propose to add or modify current WordCount example to have a SourceFunction > that randomly generates input data based on a set of sentences, so the > WordCount job can run forever. The generation ratio will be configurable. > This will be the easiest way to start a long running flink job and can be > useful for new users to start using flink quickly, or for developers to test > flink easily. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16517) Add a long running WordCount example
[ https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055515#comment-17055515 ] Ethan Li commented on FLINK-16517: -- I will put up a pull request if you think this makes sense. > Add a long running WordCount example > > > Key: FLINK-16517 > URL: https://issues.apache.org/jira/browse/FLINK-16517 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Ethan Li >Priority: Minor > > As far as I know, flink doesn't have a long running WordCount example for > users to start with or doing some simple tests. > The closest one is SocketWindowWordCount. But it requires setting up a server > (nc -l ), which is not hard, but still tedious for simple use cases. And it > requires human input for the job to actually run. > I propose to add or modify current WordCount example to have a SourceFunction > that randomly generates input data based on a set of sentences, so the > WordCount job can run forever. The generation ratio will be configurable. > This will be the easiest way to start a long running flink job and can be > useful for new users to start using flink quickly, or for developers to test > flink easily. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-16517) Add a long running WordCount example
[ https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated FLINK-16517: - Description: As far as I know, flink doesn't have a long running WordCount example for users to start with or doing some simple tests. The closest one is SocketWindowWordCount. But it requires setting up a server (nc -l ), which is not hard, but still tedious for simple use cases. And it requires human input for the job to actually run. I propose to add or modify current WordCount example to have a SourceFunction that randomly generates input data based on a set of sentences, so the WordCount job can run forever. The generation ratio will be configurable. This will be the easiest way to start a long running flink job and can be useful for new users to start using flink quickly, or for developers to test flink easily. was: As far as I know, flink doesn't have a long running WordCount example for users to start with or doing some simple tests. The closest one is SocketWindowWordCount. But it requires setting up a server (nc -l ), which is not hard, but still tedious for simple use cases. I propose to add or modify current WordCount example to have a SourceFunction that randomly generates input data based on a set of sentences, so the WordCount job can run forever. The generation ratio will be configurable. This will be the easiest way to start a long running flink job and can be useful for new users to start using flink quickly, or for developers to test flink easily. > Add a long running WordCount example > > > Key: FLINK-16517 > URL: https://issues.apache.org/jira/browse/FLINK-16517 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Ethan Li >Priority: Minor > > As far as I know, flink doesn't have a long running WordCount example for > users to start with or doing some simple tests. > The closest one is SocketWindowWordCount. But it requires setting up a server > (nc -l ), which is not hard, but still tedious for simple use cases. And it > requires human input for the job to actually run. > I propose to add or modify current WordCount example to have a SourceFunction > that randomly generates input data based on a set of sentences, so the > WordCount job can run forever. The generation ratio will be configurable. > This will be the easiest way to start a long running flink job and can be > useful for new users to start using flink quickly, or for developers to test > flink easily. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16517) Add a long running WordCount example
Ethan Li created FLINK-16517: Summary: Add a long running WordCount example Key: FLINK-16517 URL: https://issues.apache.org/jira/browse/FLINK-16517 Project: Flink Issue Type: Improvement Components: Examples Reporter: Ethan Li As far as I know, flink doesn't have a long running WordCount example for users to start with or doing some simple tests. The closest one is SocketWindowWordCount. But it requires setting up a server (nc -l ), which is not hard, but still tedious for simple use cases. I propose to add or modify current WordCount example to have a SourceFunction that randomly generates input data based on a set of sentences, so the WordCount job can run forever. The generation ratio will be configurable. This will be the easiest way to start a long running flink job and can be useful for new users to start using flink quickly, or for developers to test flink easily. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12122) Spread out tasks evenly across all available registered TaskManagers
[ https://issues.apache.org/jira/browse/FLINK-12122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026777#comment-17026777 ] Ethan Li commented on FLINK-12122: -- bq. there are no plan to backport this feature to 1.8. Thanks [~trohrmann] > Spread out tasks evenly across all available registered TaskManagers > > > Key: FLINK-12122 > URL: https://issues.apache.org/jira/browse/FLINK-12122 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.6.4, 1.7.2, 1.8.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Major > Labels: pull-request-available > Fix For: 1.9.2, 1.10.0 > > Attachments: image-2019-05-21-12-28-29-538.png, > image-2019-05-21-13-02-50-251.png > > Time Spent: 20m > Remaining Estimate: 0h > > With Flip-6, we changed the default behaviour how slots are assigned to > {{TaskManagers}}. Instead of evenly spreading it out over all registered > {{TaskManagers}}, we randomly pick slots from {{TaskManagers}} with a > tendency to first fill up a TM before using another one. This is a regression > wrt the pre Flip-6 code. > I suggest to change the behaviour so that we try to evenly distribute slots > across all available {{TaskManagers}} by considering how many of their slots > are already allocated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15386) SingleJobSubmittedJobGraphStore.putJobGraph has a logic error
[ https://issues.apache.org/jira/browse/FLINK-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17002910#comment-17002910 ] Ethan Li commented on FLINK-15386: -- I'd like to work on this if anyone can assign it to me. Thanks > SingleJobSubmittedJobGraphStore.putJobGraph has a logic error > - > > Key: FLINK-15386 > URL: https://issues.apache.org/jira/browse/FLINK-15386 > Project: Flink > Issue Type: Bug >Reporter: Ethan Li >Priority: Minor > > https://github.com/apache/flink/blob/release-1.9/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/SingleJobSubmittedJobGraphStore.java#L61-L66 > {code:java} > @Override > public void putJobGraph(SubmittedJobGraph jobGraph) throws Exception { > if (!jobGraph.getJobId().equals(jobGraph.getJobId())) { //this > always returns false. > throw new FlinkException("Cannot put additional jobs > into this submitted job graph store."); > } > } > {code} > The code is there since 1.5 but fixed in the master branch (1.10). It's also > better to add unit test for this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15386) SingleJobSubmittedJobGraphStore.putJobGraph has a logic error
Ethan Li created FLINK-15386: Summary: SingleJobSubmittedJobGraphStore.putJobGraph has a logic error Key: FLINK-15386 URL: https://issues.apache.org/jira/browse/FLINK-15386 Project: Flink Issue Type: Bug Reporter: Ethan Li https://github.com/apache/flink/blob/release-1.9/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/SingleJobSubmittedJobGraphStore.java#L61-L66 {code:java} @Override public void putJobGraph(SubmittedJobGraph jobGraph) throws Exception { if (!jobGraph.getJobId().equals(jobGraph.getJobId())) { //this always returns false. throw new FlinkException("Cannot put additional jobs into this submitted job graph store."); } } {code} The code is there since 1.5 but fixed in the master branch (1.10). It's also better to add unit test for this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12122) Spread out tasks evenly across all available registered TaskManagers
[ https://issues.apache.org/jira/browse/FLINK-12122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17002369#comment-17002369 ] Ethan Li commented on FLINK-12122: -- Is there any plan to backport this to 1.8? > Spread out tasks evenly across all available registered TaskManagers > > > Key: FLINK-12122 > URL: https://issues.apache.org/jira/browse/FLINK-12122 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.6.4, 1.7.2, 1.8.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Major > Labels: pull-request-available > Fix For: 1.9.2, 1.10.0 > > Attachments: image-2019-05-21-12-28-29-538.png, > image-2019-05-21-13-02-50-251.png > > Time Spent: 20m > Remaining Estimate: 0h > > With Flip-6, we changed the default behaviour how slots are assigned to > {{TaskManages}}. Instead of evenly spreading it out over all registered > {{TaskManagers}}, we randomly pick slots from {{TaskManagers}} with a > tendency to first fill up a TM before using another one. This is a regression > wrt the pre Flip-6 code. > I suggest to change the behaviour so that we try to evenly distribute slots > across all available {{TaskManagers}} by considering how many of their slots > are already allocated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-13807) Flink-avro unit tests fails if the character encoding in the environment is not default to UTF-8
[ https://issues.apache.org/jira/browse/FLINK-13807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912431#comment-16912431 ] Ethan Li commented on FLINK-13807: -- Thanks for fixing it. Sorry I am not able to test it in a short time. If you change your environment to ANSI_X3.4-1968 and if the unit tests pass, it should be good enough. > Flink-avro unit tests fails if the character encoding in the environment is > not default to UTF-8 > > > Key: FLINK-13807 > URL: https://issues.apache.org/jira/browse/FLINK-13807 > Project: Flink > Issue Type: Bug >Affects Versions: 1.8.0 >Reporter: Ethan Li >Priority: Minor > Attachments: patch.diff > > > On Flink release-1.8 branch: > {code:java} > [ERROR] Tests run: 12, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 4.81 > s <<< FAILURE! - in > org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest > [ERROR] testSimpleAvroRead[Execution mode = > CLUSTER](org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest) > Time elapsed: 0.438 s <<< FAILURE! > java.lang.AssertionError: > Different elements in arrays: expected 2 elements and received 2 > files: [/tmp/junit5386344396421857812/junit6023978980792200274.tmp/4, > /tmp/junit5386344396421857812/junit6023978980792200274.tmp/2, > /tmp/junit5386344396421857812/junit6023978980792200274.tmp/1, > /tmp/junit5386344396421857812/junit6023978980792200274.tmp/3] > expected: [{"name": "Alyssa", "favorite_number": 256, "favorite_color": > null, "type_long_test": null, "type_double_test": 123.45, "type_null_test": > null, "type_bool_test": true, "type_array_string": ["ELEMENT 1", "ELEMENT > 2"], "type_array_boolean": [true, false], "type_nullable_array": null, > "type_enum": "GREEN", "type_map": {"KEY 2": 17554, "KEY 1": 8546456}, > "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": > "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, > "type_bytes": {"bytes": > "\u\u\u\u\u\u\u\u\u\u"}, "type_date": > 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, > "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": > 123456, "type_decimal_bytes": {"bytes": "\u0007?"}, "type_decimal_fixed": [7, > -48]}, {"name": "Charlie", "favorite_number": null, "favorite_color": "blue", > "type_long_test": 1337, "type_double_test": 1.337, "type_null_test": null, > "type_bool_test": false, "type_array_string": [], "type_array_boolean": [], > "type_nullable_array": null, "type_enum": "RED", "type_map": {}, > "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": > "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, > "type_bytes": {"bytes": > "\u\u\u\u\u\u\u\u\u\u"}, "type_date": > 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, > "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": > 123456, "type_decimal_bytes": {"bytes": "\u0007?"}, "type_decimal_fixed": [7, > -48]}] > received: [{"name": "Alyssa", "favorite_number": 256, "favorite_color": > null, "type_long_test": null, "type_double_test": 123.45, "type_null_test": > null, "type_bool_test": true, "type_array_string": ["ELEMENT 1", "ELEMENT > 2"], "type_array_boolean": [true, false], "type_nullable_array": null, > "type_enum": "GREEN", "type_map": {"KEY 2": 17554, "KEY 1": 8546456}, > "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": > "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, > "type_bytes": {"bytes": > "\u\u\u\u\u\u\u\u\u\u"}, "type_date": > 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, > "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": > 123456, "type_decimal_bytes": {"bytes": "\u0007??"}, "type_decimal_fixed": > [7, -48]}, {"name": "Charlie", "favorite_number": null, "favorite_color": > "blue", "type_long_test": 1337, "type_double_test": 1.337, "type_null_test": > null, "type_bool_test": false, "type_array_string": [], "type_array_boolean": > [], "type_nullable_array": null, "type_enum": "RED", "type_map": {}, > "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": > "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, > "type_bytes": {"bytes": > "\u\u\u\u\u\u\u\u\u\u"}, "type_date": > 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, > "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": > 123456, "type_decimal_bytes": {"bytes":
[jira] [Updated] (FLINK-13807) Flink-avro unit tests fails if the character encoding in the environment is not default to UTF-8
[ https://issues.apache.org/jira/browse/FLINK-13807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated FLINK-13807: - Description: On Flink release-1.8 branch: {code:java} [ERROR] Tests run: 12, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 4.81 s <<< FAILURE! - in org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest [ERROR] testSimpleAvroRead[Execution mode = CLUSTER](org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest) Time elapsed: 0.438 s <<< FAILURE! java.lang.AssertionError: Different elements in arrays: expected 2 elements and received 2 files: [/tmp/junit5386344396421857812/junit6023978980792200274.tmp/4, /tmp/junit5386344396421857812/junit6023978980792200274.tmp/2, /tmp/junit5386344396421857812/junit6023978980792200274.tmp/1, /tmp/junit5386344396421857812/junit6023978980792200274.tmp/3] expected: [{"name": "Alyssa", "favorite_number": 256, "favorite_color": null, "type_long_test": null, "type_double_test": 123.45, "type_null_test": null, "type_bool_test": true, "type_array_string": ["ELEMENT 1", "ELEMENT 2"], "type_array_boolean": [true, false], "type_nullable_array": null, "type_enum": "GREEN", "type_map": {"KEY 2": 17554, "KEY 1": 8546456}, "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": "\u\u\u\u\u\u\u\u\u\u"}, "type_date": 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 123456, "type_decimal_bytes": {"bytes": "\u0007?"}, "type_decimal_fixed": [7, -48]}, {"name": "Charlie", "favorite_number": null, "favorite_color": "blue", "type_long_test": 1337, "type_double_test": 1.337, "type_null_test": null, "type_bool_test": false, "type_array_string": [], "type_array_boolean": [], "type_nullable_array": null, "type_enum": "RED", "type_map": {}, "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": "\u\u\u\u\u\u\u\u\u\u"}, "type_date": 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 123456, "type_decimal_bytes": {"bytes": "\u0007?"}, "type_decimal_fixed": [7, -48]}] received: [{"name": "Alyssa", "favorite_number": 256, "favorite_color": null, "type_long_test": null, "type_double_test": 123.45, "type_null_test": null, "type_bool_test": true, "type_array_string": ["ELEMENT 1", "ELEMENT 2"], "type_array_boolean": [true, false], "type_nullable_array": null, "type_enum": "GREEN", "type_map": {"KEY 2": 17554, "KEY 1": 8546456}, "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": "\u\u\u\u\u\u\u\u\u\u"}, "type_date": 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 123456, "type_decimal_bytes": {"bytes": "\u0007??"}, "type_decimal_fixed": [7, -48]}, {"name": "Charlie", "favorite_number": null, "favorite_color": "blue", "type_long_test": 1337, "type_double_test": 1.337, "type_null_test": null, "type_bool_test": false, "type_array_string": [], "type_array_boolean": [], "type_nullable_array": null, "type_enum": "RED", "type_map": {}, "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": "\u\u\u\u\u\u\u\u\u\u"}, "type_date": 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 123456, "type_decimal_bytes": {"bytes": "\u0007??"}, "type_decimal_fixed": [7, -48]}] at org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest.after(AvroTypeExtractionTest.java:76) {code} Comparing “expected” with “received”, there is really some question mark difference. For example, in “expected’, it’s {code:java} "type_decimal_bytes": {"bytes": "\u0007?”} {code} While in “received”, it’s {code:java} "type_decimal_bytes": {"bytes": "\u0007??"} {code} The environment I ran the unit tests on uses ANSI_X3.4-1968 I changed to "en_US.UTF-8" and the unit tests passed. was: {code:java} [ERROR] Tests run: 12, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 4.81 s <<< FAILURE! - in org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest [ERROR] testSimpleAvroRead[Execution mode =
[jira] [Created] (FLINK-13807) Flink-avro unit tests fails if the character encoding in the environment is not default to UTF-8
Ethan Li created FLINK-13807: Summary: Flink-avro unit tests fails if the character encoding in the environment is not default to UTF-8 Key: FLINK-13807 URL: https://issues.apache.org/jira/browse/FLINK-13807 Project: Flink Issue Type: Bug Affects Versions: 1.8.0 Reporter: Ethan Li {code:java} [ERROR] Tests run: 12, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 4.81 s <<< FAILURE! - in org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest [ERROR] testSimpleAvroRead[Execution mode = CLUSTER](org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest) Time elapsed: 0.438 s <<< FAILURE! java.lang.AssertionError: Different elements in arrays: expected 2 elements and received 2 files: [/tmp/junit5386344396421857812/junit6023978980792200274.tmp/4, /tmp/junit5386344396421857812/junit6023978980792200274.tmp/2, /tmp/junit5386344396421857812/junit6023978980792200274.tmp/1, /tmp/junit5386344396421857812/junit6023978980792200274.tmp/3] expected: [{"name": "Alyssa", "favorite_number": 256, "favorite_color": null, "type_long_test": null, "type_double_test": 123.45, "type_null_test": null, "type_bool_test": true, "type_array_string": ["ELEMENT 1", "ELEMENT 2"], "type_array_boolean": [true, false], "type_nullable_array": null, "type_enum": "GREEN", "type_map": {"KEY 2": 17554, "KEY 1": 8546456}, "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": "\u\u\u\u\u\u\u\u\u\u"}, "type_date": 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 123456, "type_decimal_bytes": {"bytes": "\u0007?"}, "type_decimal_fixed": [7, -48]}, {"name": "Charlie", "favorite_number": null, "favorite_color": "blue", "type_long_test": 1337, "type_double_test": 1.337, "type_null_test": null, "type_bool_test": false, "type_array_string": [], "type_array_boolean": [], "type_nullable_array": null, "type_enum": "RED", "type_map": {}, "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": "\u\u\u\u\u\u\u\u\u\u"}, "type_date": 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 123456, "type_decimal_bytes": {"bytes": "\u0007?"}, "type_decimal_fixed": [7, -48]}] received: [{"name": "Alyssa", "favorite_number": 256, "favorite_color": null, "type_long_test": null, "type_double_test": 123.45, "type_null_test": null, "type_bool_test": true, "type_array_string": ["ELEMENT 1", "ELEMENT 2"], "type_array_boolean": [true, false], "type_nullable_array": null, "type_enum": "GREEN", "type_map": {"KEY 2": 17554, "KEY 1": 8546456}, "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": "\u\u\u\u\u\u\u\u\u\u"}, "type_date": 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 123456, "type_decimal_bytes": {"bytes": "\u0007??"}, "type_decimal_fixed": [7, -48]}, {"name": "Charlie", "favorite_number": null, "favorite_color": "blue", "type_long_test": 1337, "type_double_test": 1.337, "type_null_test": null, "type_bool_test": false, "type_array_string": [], "type_array_boolean": [], "type_nullable_array": null, "type_enum": "RED", "type_map": {}, "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": "\u\u\u\u\u\u\u\u\u\u"}, "type_date": 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 123456, "type_decimal_bytes": {"bytes": "\u0007??"}, "type_decimal_fixed": [7, -48]}] at org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest.after(AvroTypeExtractionTest.java:76) {code} Comparing “expected” with “received”, there is really some question mark difference. For example, in “expected’, it’s {code:java} "type_decimal_bytes": {"bytes": "\u0007?”} {code} While in “received”, it’s {code:java} "type_decimal_bytes": {"bytes": "\u0007??"} {code} The environment I ran the unit tests on uses ANSI_X3.4-1968 I changed to "en_US.UTF-8" and the unit tests passed. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (FLINK-9684) HistoryServerArchiveFetcher not working properly with secure hdfs cluster
[ https://issues.apache.org/jira/browse/FLINK-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526449#comment-16526449 ] Ethan Li commented on FLINK-9684: - PR: https://github.com/apache/flink/pull/6225 > HistoryServerArchiveFetcher not working properly with secure hdfs cluster > - > > Key: FLINK-9684 > URL: https://issues.apache.org/jira/browse/FLINK-9684 > Project: Flink > Issue Type: Bug >Affects Versions: 1.4.2 >Reporter: Ethan Li >Priority: Major > > With my current setup, jobmanager and taskmanager are able to talk to hdfs > cluster (with kerberos setup). However, running history server gets: > > > {code:java} > 2018-06-27 19:03:32,080 WARN org.apache.hadoop.ipc.Client - Exception > encountered while connecting to the server : > java.lang.IllegalArgumentException: Failed to specify server's Kerberos > principal name > 2018-06-27 19:03:32,085 ERROR > org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher - > Failed to access job archive location for path > hdfs://openqe11blue-n2.blue.ygrid.yahoo.com/tmp/flink/openstorm10-blue/jmarchive. > java.io.IOException: Failed on local exception: java.io.IOException: > java.lang.IllegalArgumentException: Failed to specify server's Kerberos > principal name; Host Details : local host is: > "openstorm10blue-n2.blue.ygrid.yahoo.com/10.215.79.35"; destination host is: > "openqe11blue-n2.blue.ygri > d.yahoo.com":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getListing(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy9.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:515) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1743) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1726) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:650) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708) > at > org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.listStatus(HadoopFileSystem.java:146) > at > org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:139) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: java.lang.IllegalArgumentException: Failed to > specify server's Kerberos principal name > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at > org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724) > at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462) > at org.apache.hadoop.ipc.Client.call(Client.java:1381) > ...
[jira] [Created] (FLINK-9685) Flink should support hostname-substitution for security.kerberos.login.principal
Ethan Li created FLINK-9685: --- Summary: Flink should support hostname-substitution for security.kerberos.login.principal Key: FLINK-9685 URL: https://issues.apache.org/jira/browse/FLINK-9685 Project: Flink Issue Type: Improvement Reporter: Ethan Li [https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/security/SecurityConfiguration.java#L83] We can have something like this {code:java} String rawPrincipal = flinkConf.getString(SecurityOptions.KERBEROS_LOGIN_PRINCIPAL); if (rawPrincipal != null) { try { rawPrincipal = rawPrincipal.replace("HOSTNAME", InetAddress.getLocalHost().getCanonicalHostName()); } catch (UnknownHostException e) { LOG.error("Failed to replace HOSTNAME with localhost because {}", e); } } this.principal = rawPrincipal; {code} So it will be easier to deploy flink to cluster. Instead of setting different principal on every node, we can have the same principal headless_user/HOSTNAME@DOMAIN . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9684) HistoryServerArchiveFetcher not working properly with secure hdfs cluster
Ethan Li created FLINK-9684: --- Summary: HistoryServerArchiveFetcher not working properly with secure hdfs cluster Key: FLINK-9684 URL: https://issues.apache.org/jira/browse/FLINK-9684 Project: Flink Issue Type: Bug Affects Versions: 1.4.2 Reporter: Ethan Li With my current setup, jobmanager and taskmanager are able to talk to hdfs cluster (with kerberos setup). However, running history server gets: {code:java} 2018-06-27 19:03:32,080 WARN org.apache.hadoop.ipc.Client - Exception encountered while connecting to the server : java.lang.IllegalArgumentException: Failed to specify server's Kerberos principal name 2018-06-27 19:03:32,085 ERROR org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher - Failed to access job archive location for path hdfs://openqe11blue-n2.blue.ygrid.yahoo.com/tmp/flink/openstorm10-blue/jmarchive. java.io.IOException: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException: Failed to specify server's Kerberos principal name; Host Details : local host is: "openstorm10blue-n2.blue.ygrid.yahoo.com/10.215.79.35"; destination host is: "openqe11blue-n2.blue.ygri d.yahoo.com":8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1414) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy9.getListing(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy9.getListing(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:515) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1743) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1726) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:650) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712) at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708) at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.listStatus(HadoopFileSystem.java:146) at org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: java.lang.IllegalArgumentException: Failed to specify server's Kerberos principal name at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462) at org.apache.hadoop.ipc.Client.call(Client.java:1381) ... 28 more {code} Changed LOG Level to DEBUG and seeing {code:java} 2018-06-27 19:03:30,931 INFO org.apache.flink.runtime.webmonitor.history.HistoryServer - Enabling SSL for the history server. 2018-06-27 19:03:30,931 DEBUG org.apache.flink.runtime.net.SSLUtils - Creating server SSL context from configuration 2018-06-27 19:03:31,091 DEBUG org.apache.flink.core.fs.FileSystem - Loading extension file systems via services 2018-06-27 19:03:31,094 DEBUG
[jira] [Created] (FLINK-9683) inconsistent behaviors when setting historyserver.archive.fs.dir
Ethan Li created FLINK-9683: --- Summary: inconsistent behaviors when setting historyserver.archive.fs.dir Key: FLINK-9683 URL: https://issues.apache.org/jira/browse/FLINK-9683 Project: Flink Issue Type: Bug Affects Versions: 1.4.2 Reporter: Ethan Li I am using release-1.4.2, With fs.default-scheme and fs.hdfs.hadoopconf set correctly, when setting {code:java} historyserver.archive.fs.dir: /tmp/flink/cluster-name/jmarchive {code} I am seeing {code:java} 2018-06-27 18:51:12,692 WARN org.apache.flink.runtime.webmonitor.history.HistoryServer - Failed to create Path or FileSystem for directory '/tmp/flink/cluster-name/jmarchive'. Directory will not be monitored. java.lang.IllegalArgumentException: The scheme (hdfs://, file://, etc) is null. Please specify the file system scheme explicitly in the URI. at org.apache.flink.runtime.webmonitor.WebMonitorUtils.validateAndNormalizeUri(WebMonitorUtils.java:300) at org.apache.flink.runtime.webmonitor.history.HistoryServer.(HistoryServer.java:168) at org.apache.flink.runtime.webmonitor.history.HistoryServer.(HistoryServer.java:132) at org.apache.flink.runtime.webmonitor.history.HistoryServer$1.call(HistoryServer.java:113) at org.apache.flink.runtime.webmonitor.history.HistoryServer$1.call(HistoryServer.java:110) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.runtime.webmonitor.history.HistoryServer.main(HistoryServer.java:110) {code} And then if I set {code:java} historyserver.archive.fs.dir: hdfs:///tmp/flink/cluster-name/jmarchive{code} I am seeing: {code:java} java.io.IOException: The given file system URI (hdfs:///tmp/flink/cluster-name/jmarchive) did not describe the authority (like for example HDFS NameNode address/port or S3 host). The attempt to use a configured default authority failed: Hadoop configuration for default file system ('fs.default.name' or 'fs.defaultFS') contains no valid authority component (like hdfs namenode, S3 host, etc) at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:149) at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:401) at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320) at org.apache.flink.core.fs.Path.getFileSystem(Path.java:293) at org.apache.flink.runtime.webmonitor.history.HistoryServer.(HistoryServer.java:169) at org.apache.flink.runtime.webmonitor.history.HistoryServer.(HistoryServer.java:132) {code} The only way it works is to provide the full path of hdfs like: {code:java} #historyserver.archive.fs.dir: hdfs:///tmp/flink/cluster-name/jmarchive {code} Above situations are because there are two parts of code treating "scheme" differently. https://github.com/apache/flink/blob/release-1.4.2/flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/WebMonitorUtils.java#L299-L302 https://github.com/apache/flink/blob/release-1.4.2/flink-core/src/main/java/org/apache/flink/core/fs/FileSystem.java#L335-L338 I believe the first case should be supported if users have set fs.default-scheme -- This message was sent by Atlassian JIRA (v7.6.3#76005)