[jira] [Commented] (FLINK-14317) make hadoop history server logs accessible through flink ui

2021-09-16 Thread Ethan Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416277#comment-17416277
 ] 

Ethan Li commented on FLINK-14317:
--

Hello!
May I ask what's the current plan for this? We have encountered similar issues. 
This feature is really important for making flink-on-yarn per-job mode easy to 
use. 
Thanks!

> make hadoop history server logs accessible through flink ui
> ---
>
> Key: FLINK-14317
> URL: https://issues.apache.org/jira/browse/FLINK-14317
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Task, Runtime / Web Frontend
>Reporter: Yu Yang
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently if Flink jobs run on Yarn, the task manager logs are not accessible 
> through flink history server ui.  And there is no straightforward way for 
> users to find yarn logs of specific tasks from completed logs. Making task 
> manager logs accessible through flink UI will help to improve developer 
> productivity. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-17641) How to secure flink applications on yarn on multi-tenant environment

2020-11-20 Thread Ethan Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236289#comment-17236289
 ] 

Ethan Li edited comment on FLINK-17641 at 11/20/20, 5:06 PM:
-

Thanks very much for your reply [~rmetzger]. The suggestions are helpful.

Sorry I haven't been able to come back to this Jira and reply.

We currently have a solution for this issue (or at least part of it) internally 
and I'd like to share it once we have it working in production. Thanks!


was (Author: ethanli):
Thanks very much for your reply [~rmetzger]. The suggestions are helpful.

Sorry I haven't been able to come back to this Jira and reply.

We currently have a solution for this issue internally and I'd like to share it 
once we have it working in production. Thanks!

> How to secure flink applications on yarn on multi-tenant environment
> 
>
> Key: FLINK-17641
> URL: https://issues.apache.org/jira/browse/FLINK-17641
> Project: Flink
>  Issue Type: Wish
>  Components: Deployment / YARN
>Reporter: Ethan Li
>Priority: Major
>
> This is a question I wish to get some insights on. 
> We are trying to support and secure flink on shared yarn cluster. Besides the 
> security provided by yarn side (queueACL, kerberos), what I noticed is that 
> flink CLI can still interact with the flink job as long as it knows the 
> jobmanager rpc port/hostname and rest.port, which can be obtained easily with 
> yarn command. 
> Also on the UI side, on yarn cluster, users can visit flink job UI via yarn 
> proxy using browser. As long as the user can authenticate and view yarn 
> resourcemanager webpage, he/she can visit the flink UI without any problem. 
> This basically means Flink UI is wide-open to corp internal users.
> On the internal connection side, I am aware of the support added in 1.10 to 
> limit the mTLS connection by configuring 
> security.ssl.internal.cert.fingerprint 
> (https://ci.apache.org/projects/flink/flink-docs-stable/ops/security-ssl.html)
> This works but it is not very flexible. Users need to update the config if 
> the cert changes before they submit a new job.
> I asked the similar question on the mailing list before. I am really 
> interested in how other folks deal with this issue. Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17641) How to secure flink applications on yarn on multi-tenant environment

2020-11-20 Thread Ethan Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236289#comment-17236289
 ] 

Ethan Li commented on FLINK-17641:
--

Thanks very much for your reply [~rmetzger]. The suggestions are helpful.

Sorry I haven't been able to come back to this Jira and reply.

We currently have a solution for this issue internally and I'd like to share it 
once we have it working in production. Thanks!

> How to secure flink applications on yarn on multi-tenant environment
> 
>
> Key: FLINK-17641
> URL: https://issues.apache.org/jira/browse/FLINK-17641
> Project: Flink
>  Issue Type: Wish
>  Components: Deployment / YARN
>Reporter: Ethan Li
>Priority: Major
>
> This is a question I wish to get some insights on. 
> We are trying to support and secure flink on shared yarn cluster. Besides the 
> security provided by yarn side (queueACL, kerberos), what I noticed is that 
> flink CLI can still interact with the flink job as long as it knows the 
> jobmanager rpc port/hostname and rest.port, which can be obtained easily with 
> yarn command. 
> Also on the UI side, on yarn cluster, users can visit flink job UI via yarn 
> proxy using browser. As long as the user can authenticate and view yarn 
> resourcemanager webpage, he/she can visit the flink UI without any problem. 
> This basically means Flink UI is wide-open to corp internal users.
> On the internal connection side, I am aware of the support added in 1.10 to 
> limit the mTLS connection by configuring 
> security.ssl.internal.cert.fingerprint 
> (https://ci.apache.org/projects/flink/flink-docs-stable/ops/security-ssl.html)
> This works but it is not very flexible. Users need to update the config if 
> the cert changes before they submit a new job.
> I asked the similar question on the mailing list before. I am really 
> interested in how other folks deal with this issue. Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17641) How to secure flink applications on yarn on multi-tenant environment

2020-05-12 Thread Ethan Li (Jira)
Ethan Li created FLINK-17641:


 Summary: How to secure flink applications on yarn on multi-tenant 
environment
 Key: FLINK-17641
 URL: https://issues.apache.org/jira/browse/FLINK-17641
 Project: Flink
  Issue Type: Wish
Reporter: Ethan Li


This is a question I wish to get some insights on. 

We are trying to support and secure flink on shared yarn cluster. Besides the 
security provided by yarn side (queueACL, kerberos), what I noticed is that 
flink CLI can still interact with the flink job as long as it knows the 
jobmanager rpc port/hostname and rest.port, which can be obtained easily with 
yarn command. 

Also on the UI side, on yarn cluster, users can visit flink job UI via yarn 
proxy using browser. As long as the user can authenticate and view yarn 
resourcemanager webpage, he/she can visit the flink UI without any problem. 
This basically means Flink UI is wide-open to corp internal users.

On the internal connection side, I am aware of the support added in 1.10 to 
limit the mTLS connection by configuring security.ssl.internal.cert.fingerprint 
(https://ci.apache.org/projects/flink/flink-docs-stable/ops/security-ssl.html)

This works but it is not very flexible. Users need to update the config if the 
cert changes before they submit a new job.

I asked the similar question on the mailing list before. I am really interested 
in how other folks deal with this issue. Thanks.












--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-16517) Add a long running WordCount example

2020-03-17 Thread Ethan Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061327#comment-17061327
 ] 

Ethan Li edited comment on FLINK-16517 at 3/18/20, 2:26 AM:


Thanks [~aljoscha] I put up a pull request to add the new unbounded source. 

I did not deal with FileProcessingMode in this pr because it seems to require 
some changes in [readTextFile|#L1085] and that might impact other 
components/code. I am willing to file a separate issue/PR if you think it makes 
sense to do so. 


was (Author: ethanli):
Thanks [~aljoscha] I put up a pull request to add the new unbounded source. 

I did not deal with FileProcessingMode in this pr because it seems to require 
some changes in "[readTextFile|#L1085]] and that might impact other 
components/code. I am willing to file a separate issue/PR if you think it makes 
sense to do so. 

> Add a long running WordCount example
> 
>
> Key: FLINK-16517
> URL: https://issues.apache.org/jira/browse/FLINK-16517
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As far as I know, flink doesn't have a long running WordCount example for 
> users to start with or doing some simple tests.  
> The closest one is SocketWindowWordCount. But it requires setting up a server 
> (nc -l ), which is not hard, but still tedious for simple use cases. And it 
> requires human input for the job to actually run.
> I propose to add or modify current WordCount example to have a SourceFunction 
> that randomly generates input data based on a set of sentences, so the 
> WordCount job can run forever. The generation ratio will be configurable.
> This will be the easiest way to start a long running flink job and can be 
> useful for new users to start using flink quickly, or for developers to test 
> flink easily.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-16517) Add a long running WordCount example

2020-03-17 Thread Ethan Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061327#comment-17061327
 ] 

Ethan Li edited comment on FLINK-16517 at 3/18/20, 2:26 AM:


Thanks [~aljoscha] I put up a pull request to add the new unbounded source. 

I did not deal with FileProcessingMode in this pr because it seems to require 
some changes in 
[readTextFile|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L1085]
 and that might impact other components/code. I am willing to file a separate 
issue/PR if you think it makes sense to do so. 


was (Author: ethanli):
Thanks [~aljoscha] I put up a pull request to add the new unbounded source. 

I did not deal with FileProcessingMode in this pr because it seems to require 
some changes in [readTextFile|#L1085] and that might impact other 
components/code. I am willing to file a separate issue/PR if you think it makes 
sense to do so. 

> Add a long running WordCount example
> 
>
> Key: FLINK-16517
> URL: https://issues.apache.org/jira/browse/FLINK-16517
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As far as I know, flink doesn't have a long running WordCount example for 
> users to start with or doing some simple tests.  
> The closest one is SocketWindowWordCount. But it requires setting up a server 
> (nc -l ), which is not hard, but still tedious for simple use cases. And it 
> requires human input for the job to actually run.
> I propose to add or modify current WordCount example to have a SourceFunction 
> that randomly generates input data based on a set of sentences, so the 
> WordCount job can run forever. The generation ratio will be configurable.
> This will be the easiest way to start a long running flink job and can be 
> useful for new users to start using flink quickly, or for developers to test 
> flink easily.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-16517) Add a long running WordCount example

2020-03-17 Thread Ethan Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061327#comment-17061327
 ] 

Ethan Li edited comment on FLINK-16517 at 3/18/20, 2:25 AM:


Thanks [~aljoscha] I put up a pull request to add the new unbounded source. 

I did not deal with FileProcessingMode in this pr because it seems to require 
some changes in "[readTextFile|#L1085]] and that might impact other 
components/code. I am willing to file a separate issue/PR if you think it makes 
sense to do so. 


was (Author: ethanli):
Thanks [~aljoscha] I put up a pull request to add the new unbounded source. 

I did not deal with FileProcessingMode in this pr because it seems to require 
some changes in 
"[readTextFile|[https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L1085]]
 and that might impact other components/code. I am willing to file a separate 
issue/PR if you think it makes sense to do so.

 

 

> Add a long running WordCount example
> 
>
> Key: FLINK-16517
> URL: https://issues.apache.org/jira/browse/FLINK-16517
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As far as I know, flink doesn't have a long running WordCount example for 
> users to start with or doing some simple tests.  
> The closest one is SocketWindowWordCount. But it requires setting up a server 
> (nc -l ), which is not hard, but still tedious for simple use cases. And it 
> requires human input for the job to actually run.
> I propose to add or modify current WordCount example to have a SourceFunction 
> that randomly generates input data based on a set of sentences, so the 
> WordCount job can run forever. The generation ratio will be configurable.
> This will be the easiest way to start a long running flink job and can be 
> useful for new users to start using flink quickly, or for developers to test 
> flink easily.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16517) Add a long running WordCount example

2020-03-17 Thread Ethan Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061327#comment-17061327
 ] 

Ethan Li commented on FLINK-16517:
--

Thanks [~aljoscha] I put up a pull request to add the new unbounded source. 

I did not deal with FileProcessingMode in this pr because it seems to require 
some changes in 
"[readTextFile|[https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java#L1085]]
 and that might impact other components/code. I am willing to file a separate 
issue/PR if you think it makes sense to do so.

 

 

> Add a long running WordCount example
> 
>
> Key: FLINK-16517
> URL: https://issues.apache.org/jira/browse/FLINK-16517
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As far as I know, flink doesn't have a long running WordCount example for 
> users to start with or doing some simple tests.  
> The closest one is SocketWindowWordCount. But it requires setting up a server 
> (nc -l ), which is not hard, but still tedious for simple use cases. And it 
> requires human input for the job to actually run.
> I propose to add or modify current WordCount example to have a SourceFunction 
> that randomly generates input data based on a set of sentences, so the 
> WordCount job can run forever. The generation ratio will be configurable.
> This will be the easiest way to start a long running flink job and can be 
> useful for new users to start using flink quickly, or for developers to test 
> flink easily.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-16517) Add a long running WordCount example

2020-03-10 Thread Ethan Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056571#comment-17056571
 ] 

Ethan Li edited comment on FLINK-16517 at 3/11/20, 1:31 AM:


[~aljoscha] Thanks for the link. I feel like it's still not simple enough for 
starters. I am looking for a very simple example so starters can focus on 
making their first flink job running. WordCount example is a like a hello-world 
program in streaming. 

 

We can modify current WordCount example to take a new option so a new Source 
can generate data randomly. It will not change the current code too much.


was (Author: ethanli):
[~aljoscha] Thanks for the link. I feel like it's still not simple enough for 
starters. I am looking for a very simple example so starters can focus on 
making their first flink job running. WordCount example is a like a hello-world 
program in streaming. 

 

We can modify current WordCount example to take a new option so a new Source 
can generate data randomly. It will change the current code too much.

> Add a long running WordCount example
> 
>
> Key: FLINK-16517
> URL: https://issues.apache.org/jira/browse/FLINK-16517
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Ethan Li
>Priority: Minor
>
> As far as I know, flink doesn't have a long running WordCount example for 
> users to start with or doing some simple tests.  
> The closest one is SocketWindowWordCount. But it requires setting up a server 
> (nc -l ), which is not hard, but still tedious for simple use cases. And it 
> requires human input for the job to actually run.
> I propose to add or modify current WordCount example to have a SourceFunction 
> that randomly generates input data based on a set of sentences, so the 
> WordCount job can run forever. The generation ratio will be configurable.
> This will be the easiest way to start a long running flink job and can be 
> useful for new users to start using flink quickly, or for developers to test 
> flink easily.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-16517) Add a long running WordCount example

2020-03-10 Thread Ethan Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056571#comment-17056571
 ] 

Ethan Li edited comment on FLINK-16517 at 3/11/20, 1:30 AM:


[~aljoscha] Thanks for the link. I feel like it's still not simple enough for 
starters. I am looking for a very simple example so starters can focus on 
making their first flink job running. WordCount example is a like a hello-world 
program in streaming. 

 

We can modify current WordCount example to take a new option so a new Source 
can generate data randomly. It will change the current code too much.


was (Author: ethanli):
[~aljoscha] Thanks for the link. I feel like it's still not simple enough for 
starters. I am looking for a very simple example so starters can focus on 
making their first flink job running. WordCount example is a like a hello-world 
program in streaming. 

> Add a long running WordCount example
> 
>
> Key: FLINK-16517
> URL: https://issues.apache.org/jira/browse/FLINK-16517
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Ethan Li
>Priority: Minor
>
> As far as I know, flink doesn't have a long running WordCount example for 
> users to start with or doing some simple tests.  
> The closest one is SocketWindowWordCount. But it requires setting up a server 
> (nc -l ), which is not hard, but still tedious for simple use cases. And it 
> requires human input for the job to actually run.
> I propose to add or modify current WordCount example to have a SourceFunction 
> that randomly generates input data based on a set of sentences, so the 
> WordCount job can run forever. The generation ratio will be configurable.
> This will be the easiest way to start a long running flink job and can be 
> useful for new users to start using flink quickly, or for developers to test 
> flink easily.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16517) Add a long running WordCount example

2020-03-10 Thread Ethan Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056571#comment-17056571
 ] 

Ethan Li commented on FLINK-16517:
--

[~aljoscha] Thanks for the link. I feel like it's still not simple enough for 
starters. I am looking for a very simple example so starters can focus on 
making their first flink job running. WordCount example is a like a hello-world 
program in streaming. 

> Add a long running WordCount example
> 
>
> Key: FLINK-16517
> URL: https://issues.apache.org/jira/browse/FLINK-16517
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Ethan Li
>Priority: Minor
>
> As far as I know, flink doesn't have a long running WordCount example for 
> users to start with or doing some simple tests.  
> The closest one is SocketWindowWordCount. But it requires setting up a server 
> (nc -l ), which is not hard, but still tedious for simple use cases. And it 
> requires human input for the job to actually run.
> I propose to add or modify current WordCount example to have a SourceFunction 
> that randomly generates input data based on a set of sentences, so the 
> WordCount job can run forever. The generation ratio will be configurable.
> This will be the easiest way to start a long running flink job and can be 
> useful for new users to start using flink quickly, or for developers to test 
> flink easily.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16517) Add a long running WordCount example

2020-03-09 Thread Ethan Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055515#comment-17055515
 ] 

Ethan Li commented on FLINK-16517:
--

I will put up a pull request if you think this makes sense.

> Add a long running WordCount example
> 
>
> Key: FLINK-16517
> URL: https://issues.apache.org/jira/browse/FLINK-16517
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Ethan Li
>Priority: Minor
>
> As far as I know, flink doesn't have a long running WordCount example for 
> users to start with or doing some simple tests.  
> The closest one is SocketWindowWordCount. But it requires setting up a server 
> (nc -l ), which is not hard, but still tedious for simple use cases. And it 
> requires human input for the job to actually run.
> I propose to add or modify current WordCount example to have a SourceFunction 
> that randomly generates input data based on a set of sentences, so the 
> WordCount job can run forever. The generation ratio will be configurable.
> This will be the easiest way to start a long running flink job and can be 
> useful for new users to start using flink quickly, or for developers to test 
> flink easily.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-16517) Add a long running WordCount example

2020-03-09 Thread Ethan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Li updated FLINK-16517:
-
Description: 
As far as I know, flink doesn't have a long running WordCount example for users 
to start with or doing some simple tests.  

The closest one is SocketWindowWordCount. But it requires setting up a server 
(nc -l ), which is not hard, but still tedious for simple use cases. And it 
requires human input for the job to actually run.

I propose to add or modify current WordCount example to have a SourceFunction 
that randomly generates input data based on a set of sentences, so the 
WordCount job can run forever. The generation ratio will be configurable.

This will be the easiest way to start a long running flink job and can be 
useful for new users to start using flink quickly, or for developers to test 
flink easily.

  was:
As far as I know, flink doesn't have a long running WordCount example for users 
to start with or doing some simple tests.  

The closest one is SocketWindowWordCount. But it requires setting up a server 
(nc -l ), which is not hard, but still tedious for simple use cases.

I propose to add or modify current WordCount example to have a SourceFunction 
that randomly generates input data based on a set of sentences, so the 
WordCount job can run forever. The generation ratio will be configurable.

This will be the easiest way to start a long running flink job and can be 
useful for new users to start using flink quickly, or for developers to test 
flink easily.


> Add a long running WordCount example
> 
>
> Key: FLINK-16517
> URL: https://issues.apache.org/jira/browse/FLINK-16517
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Ethan Li
>Priority: Minor
>
> As far as I know, flink doesn't have a long running WordCount example for 
> users to start with or doing some simple tests.  
> The closest one is SocketWindowWordCount. But it requires setting up a server 
> (nc -l ), which is not hard, but still tedious for simple use cases. And it 
> requires human input for the job to actually run.
> I propose to add or modify current WordCount example to have a SourceFunction 
> that randomly generates input data based on a set of sentences, so the 
> WordCount job can run forever. The generation ratio will be configurable.
> This will be the easiest way to start a long running flink job and can be 
> useful for new users to start using flink quickly, or for developers to test 
> flink easily.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16517) Add a long running WordCount example

2020-03-09 Thread Ethan Li (Jira)
Ethan Li created FLINK-16517:


 Summary: Add a long running WordCount example
 Key: FLINK-16517
 URL: https://issues.apache.org/jira/browse/FLINK-16517
 Project: Flink
  Issue Type: Improvement
  Components: Examples
Reporter: Ethan Li


As far as I know, flink doesn't have a long running WordCount example for users 
to start with or doing some simple tests.  

The closest one is SocketWindowWordCount. But it requires setting up a server 
(nc -l ), which is not hard, but still tedious for simple use cases.

I propose to add or modify current WordCount example to have a SourceFunction 
that randomly generates input data based on a set of sentences, so the 
WordCount job can run forever. The generation ratio will be configurable.

This will be the easiest way to start a long running flink job and can be 
useful for new users to start using flink quickly, or for developers to test 
flink easily.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-12122) Spread out tasks evenly across all available registered TaskManagers

2020-01-30 Thread Ethan Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-12122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026777#comment-17026777
 ] 

Ethan Li commented on FLINK-12122:
--

bq. there are no plan to backport this feature to 1.8.
 

Thanks [~trohrmann]

> Spread out tasks evenly across all available registered TaskManagers
> 
>
> Key: FLINK-12122
> URL: https://issues.apache.org/jira/browse/FLINK-12122
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.2, 1.10.0
>
> Attachments: image-2019-05-21-12-28-29-538.png, 
> image-2019-05-21-13-02-50-251.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With Flip-6, we changed the default behaviour how slots are assigned to 
> {{TaskManagers}}. Instead of evenly spreading it out over all registered 
> {{TaskManagers}}, we randomly pick slots from {{TaskManagers}} with a 
> tendency to first fill up a TM before using another one. This is a regression 
> wrt the pre Flip-6 code.
> I suggest to change the behaviour so that we try to evenly distribute slots 
> across all available {{TaskManagers}} by considering how many of their slots 
> are already allocated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15386) SingleJobSubmittedJobGraphStore.putJobGraph has a logic error

2019-12-24 Thread Ethan Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17002910#comment-17002910
 ] 

Ethan Li commented on FLINK-15386:
--

I'd like to work on this if anyone can assign it to me. Thanks

> SingleJobSubmittedJobGraphStore.putJobGraph has a logic error
> -
>
> Key: FLINK-15386
> URL: https://issues.apache.org/jira/browse/FLINK-15386
> Project: Flink
>  Issue Type: Bug
>Reporter: Ethan Li
>Priority: Minor
>
> https://github.com/apache/flink/blob/release-1.9/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/SingleJobSubmittedJobGraphStore.java#L61-L66
> {code:java}
>   @Override
>   public void putJobGraph(SubmittedJobGraph jobGraph) throws Exception {
>   if (!jobGraph.getJobId().equals(jobGraph.getJobId())) { //this 
> always returns false.
>   throw new FlinkException("Cannot put additional jobs 
> into this submitted job graph store.");
>   }
>   }
> {code}
> The code is there since 1.5 but fixed in the master branch (1.10). It's also 
> better to add unit test for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15386) SingleJobSubmittedJobGraphStore.putJobGraph has a logic error

2019-12-24 Thread Ethan Li (Jira)
Ethan Li created FLINK-15386:


 Summary: SingleJobSubmittedJobGraphStore.putJobGraph has a logic 
error
 Key: FLINK-15386
 URL: https://issues.apache.org/jira/browse/FLINK-15386
 Project: Flink
  Issue Type: Bug
Reporter: Ethan Li


https://github.com/apache/flink/blob/release-1.9/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/SingleJobSubmittedJobGraphStore.java#L61-L66

{code:java}
@Override
public void putJobGraph(SubmittedJobGraph jobGraph) throws Exception {
if (!jobGraph.getJobId().equals(jobGraph.getJobId())) { //this 
always returns false.
throw new FlinkException("Cannot put additional jobs 
into this submitted job graph store.");
}
}
{code}



The code is there since 1.5 but fixed in the master branch (1.10). It's also 
better to add unit test for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-12122) Spread out tasks evenly across all available registered TaskManagers

2019-12-23 Thread Ethan Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-12122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17002369#comment-17002369
 ] 

Ethan Li commented on FLINK-12122:
--

Is there any plan to backport this to 1.8?

> Spread out tasks evenly across all available registered TaskManagers
> 
>
> Key: FLINK-12122
> URL: https://issues.apache.org/jira/browse/FLINK-12122
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.2, 1.10.0
>
> Attachments: image-2019-05-21-12-28-29-538.png, 
> image-2019-05-21-13-02-50-251.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With Flip-6, we changed the default behaviour how slots are assigned to 
> {{TaskManages}}. Instead of evenly spreading it out over all registered 
> {{TaskManagers}}, we randomly pick slots from {{TaskManagers}} with a 
> tendency to first fill up a TM before using another one. This is a regression 
> wrt the pre Flip-6 code.
> I suggest to change the behaviour so that we try to evenly distribute slots 
> across all available {{TaskManagers}} by considering how many of their slots 
> are already allocated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13807) Flink-avro unit tests fails if the character encoding in the environment is not default to UTF-8

2019-08-21 Thread Ethan Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912431#comment-16912431
 ] 

Ethan Li commented on FLINK-13807:
--

Thanks for fixing it. Sorry I am not able to test it in a short time. If you 
change your environment to ANSI_X3.4-1968 and if the unit tests pass, it should 
be good enough. 

> Flink-avro unit tests fails if the character encoding in the environment is 
> not default to UTF-8
> 
>
> Key: FLINK-13807
> URL: https://issues.apache.org/jira/browse/FLINK-13807
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Ethan Li
>Priority: Minor
> Attachments: patch.diff
>
>
> On Flink release-1.8 branch:
> {code:java}
> [ERROR] Tests run: 12, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 4.81 
> s <<< FAILURE! - in 
> org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest
> [ERROR] testSimpleAvroRead[Execution mode = 
> CLUSTER](org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest)  
> Time elapsed: 0.438 s  <<< FAILURE!
> java.lang.AssertionError: 
> Different elements in arrays: expected 2 elements and received 2
> files: [/tmp/junit5386344396421857812/junit6023978980792200274.tmp/4, 
> /tmp/junit5386344396421857812/junit6023978980792200274.tmp/2, 
> /tmp/junit5386344396421857812/junit6023978980792200274.tmp/1, 
> /tmp/junit5386344396421857812/junit6023978980792200274.tmp/3]
>  expected: [{"name": "Alyssa", "favorite_number": 256, "favorite_color": 
> null, "type_long_test": null, "type_double_test": 123.45, "type_null_test": 
> null, "type_bool_test": true, "type_array_string": ["ELEMENT 1", "ELEMENT 
> 2"], "type_array_boolean": [true, false], "type_nullable_array": null, 
> "type_enum": "GREEN", "type_map": {"KEY 2": 17554, "KEY 1": 8546456}, 
> "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": 
> "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, 
> "type_bytes": {"bytes": 
> "\u\u\u\u\u\u\u\u\u\u"}, "type_date": 
> 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, 
> "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 
> 123456, "type_decimal_bytes": {"bytes": "\u0007?"}, "type_decimal_fixed": [7, 
> -48]}, {"name": "Charlie", "favorite_number": null, "favorite_color": "blue", 
> "type_long_test": 1337, "type_double_test": 1.337, "type_null_test": null, 
> "type_bool_test": false, "type_array_string": [], "type_array_boolean": [], 
> "type_nullable_array": null, "type_enum": "RED", "type_map": {}, 
> "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": 
> "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, 
> "type_bytes": {"bytes": 
> "\u\u\u\u\u\u\u\u\u\u"}, "type_date": 
> 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, 
> "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 
> 123456, "type_decimal_bytes": {"bytes": "\u0007?"}, "type_decimal_fixed": [7, 
> -48]}]
>  received: [{"name": "Alyssa", "favorite_number": 256, "favorite_color": 
> null, "type_long_test": null, "type_double_test": 123.45, "type_null_test": 
> null, "type_bool_test": true, "type_array_string": ["ELEMENT 1", "ELEMENT 
> 2"], "type_array_boolean": [true, false], "type_nullable_array": null, 
> "type_enum": "GREEN", "type_map": {"KEY 2": 17554, "KEY 1": 8546456}, 
> "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": 
> "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, 
> "type_bytes": {"bytes": 
> "\u\u\u\u\u\u\u\u\u\u"}, "type_date": 
> 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, 
> "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 
> 123456, "type_decimal_bytes": {"bytes": "\u0007??"}, "type_decimal_fixed": 
> [7, -48]}, {"name": "Charlie", "favorite_number": null, "favorite_color": 
> "blue", "type_long_test": 1337, "type_double_test": 1.337, "type_null_test": 
> null, "type_bool_test": false, "type_array_string": [], "type_array_boolean": 
> [], "type_nullable_array": null, "type_enum": "RED", "type_map": {}, 
> "type_fixed": null, "type_union": null, "type_nested": {"num": 239, "street": 
> "Baker Street", "city": "London", "state": "London", "zip": "NW1 6XE"}, 
> "type_bytes": {"bytes": 
> "\u\u\u\u\u\u\u\u\u\u"}, "type_date": 
> 2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, 
> "type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 
> 123456, "type_decimal_bytes": {"bytes": 

[jira] [Updated] (FLINK-13807) Flink-avro unit tests fails if the character encoding in the environment is not default to UTF-8

2019-08-20 Thread Ethan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Li updated FLINK-13807:
-
Description: 
On Flink release-1.8 branch:
{code:java}
[ERROR] Tests run: 12, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 4.81 s 
<<< FAILURE! - in org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest
[ERROR] testSimpleAvroRead[Execution mode = 
CLUSTER](org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest)  Time 
elapsed: 0.438 s  <<< FAILURE!
java.lang.AssertionError: 
Different elements in arrays: expected 2 elements and received 2
files: [/tmp/junit5386344396421857812/junit6023978980792200274.tmp/4, 
/tmp/junit5386344396421857812/junit6023978980792200274.tmp/2, 
/tmp/junit5386344396421857812/junit6023978980792200274.tmp/1, 
/tmp/junit5386344396421857812/junit6023978980792200274.tmp/3]
 expected: [{"name": "Alyssa", "favorite_number": 256, "favorite_color": null, 
"type_long_test": null, "type_double_test": 123.45, "type_null_test": null, 
"type_bool_test": true, "type_array_string": ["ELEMENT 1", "ELEMENT 2"], 
"type_array_boolean": [true, false], "type_nullable_array": null, "type_enum": 
"GREEN", "type_map": {"KEY 2": 17554, "KEY 1": 8546456}, "type_fixed": null, 
"type_union": null, "type_nested": {"num": 239, "street": "Baker Street", 
"city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": 
"\u\u\u\u\u\u\u\u\u\u"}, "type_date": 
2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, 
"type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 
123456, "type_decimal_bytes": {"bytes": "\u0007?"}, "type_decimal_fixed": [7, 
-48]}, {"name": "Charlie", "favorite_number": null, "favorite_color": "blue", 
"type_long_test": 1337, "type_double_test": 1.337, "type_null_test": null, 
"type_bool_test": false, "type_array_string": [], "type_array_boolean": [], 
"type_nullable_array": null, "type_enum": "RED", "type_map": {}, "type_fixed": 
null, "type_union": null, "type_nested": {"num": 239, "street": "Baker Street", 
"city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": 
"\u\u\u\u\u\u\u\u\u\u"}, "type_date": 
2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, 
"type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 
123456, "type_decimal_bytes": {"bytes": "\u0007?"}, "type_decimal_fixed": [7, 
-48]}]
 received: [{"name": "Alyssa", "favorite_number": 256, "favorite_color": null, 
"type_long_test": null, "type_double_test": 123.45, "type_null_test": null, 
"type_bool_test": true, "type_array_string": ["ELEMENT 1", "ELEMENT 2"], 
"type_array_boolean": [true, false], "type_nullable_array": null, "type_enum": 
"GREEN", "type_map": {"KEY 2": 17554, "KEY 1": 8546456}, "type_fixed": null, 
"type_union": null, "type_nested": {"num": 239, "street": "Baker Street", 
"city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": 
"\u\u\u\u\u\u\u\u\u\u"}, "type_date": 
2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, 
"type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 
123456, "type_decimal_bytes": {"bytes": "\u0007??"}, "type_decimal_fixed": [7, 
-48]}, {"name": "Charlie", "favorite_number": null, "favorite_color": "blue", 
"type_long_test": 1337, "type_double_test": 1.337, "type_null_test": null, 
"type_bool_test": false, "type_array_string": [], "type_array_boolean": [], 
"type_nullable_array": null, "type_enum": "RED", "type_map": {}, "type_fixed": 
null, "type_union": null, "type_nested": {"num": 239, "street": "Baker Street", 
"city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": 
"\u\u\u\u\u\u\u\u\u\u"}, "type_date": 
2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, 
"type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 
123456, "type_decimal_bytes": {"bytes": "\u0007??"}, "type_decimal_fixed": [7, 
-48]}]
at 
org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest.after(AvroTypeExtractionTest.java:76)
{code}

Comparing “expected” with “received”, there is really some question mark 
difference.

For example, in “expected’, it’s
{code:java}
"type_decimal_bytes": {"bytes": "\u0007?”}
{code}

While in “received”, it’s 
{code:java}
"type_decimal_bytes": {"bytes": "\u0007??"}
{code}

The environment I ran the unit tests on uses ANSI_X3.4-1968 

I changed to "en_US.UTF-8" and the unit tests passed. 

  was:

{code:java}
[ERROR] Tests run: 12, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 4.81 s 
<<< FAILURE! - in org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest
[ERROR] testSimpleAvroRead[Execution mode = 

[jira] [Created] (FLINK-13807) Flink-avro unit tests fails if the character encoding in the environment is not default to UTF-8

2019-08-20 Thread Ethan Li (Jira)
Ethan Li created FLINK-13807:


 Summary: Flink-avro unit tests fails if the character encoding in 
the environment is not default to UTF-8
 Key: FLINK-13807
 URL: https://issues.apache.org/jira/browse/FLINK-13807
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.8.0
Reporter: Ethan Li



{code:java}
[ERROR] Tests run: 12, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 4.81 s 
<<< FAILURE! - in org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest
[ERROR] testSimpleAvroRead[Execution mode = 
CLUSTER](org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest)  Time 
elapsed: 0.438 s  <<< FAILURE!
java.lang.AssertionError: 
Different elements in arrays: expected 2 elements and received 2
files: [/tmp/junit5386344396421857812/junit6023978980792200274.tmp/4, 
/tmp/junit5386344396421857812/junit6023978980792200274.tmp/2, 
/tmp/junit5386344396421857812/junit6023978980792200274.tmp/1, 
/tmp/junit5386344396421857812/junit6023978980792200274.tmp/3]
 expected: [{"name": "Alyssa", "favorite_number": 256, "favorite_color": null, 
"type_long_test": null, "type_double_test": 123.45, "type_null_test": null, 
"type_bool_test": true, "type_array_string": ["ELEMENT 1", "ELEMENT 2"], 
"type_array_boolean": [true, false], "type_nullable_array": null, "type_enum": 
"GREEN", "type_map": {"KEY 2": 17554, "KEY 1": 8546456}, "type_fixed": null, 
"type_union": null, "type_nested": {"num": 239, "street": "Baker Street", 
"city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": 
"\u\u\u\u\u\u\u\u\u\u"}, "type_date": 
2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, 
"type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 
123456, "type_decimal_bytes": {"bytes": "\u0007?"}, "type_decimal_fixed": [7, 
-48]}, {"name": "Charlie", "favorite_number": null, "favorite_color": "blue", 
"type_long_test": 1337, "type_double_test": 1.337, "type_null_test": null, 
"type_bool_test": false, "type_array_string": [], "type_array_boolean": [], 
"type_nullable_array": null, "type_enum": "RED", "type_map": {}, "type_fixed": 
null, "type_union": null, "type_nested": {"num": 239, "street": "Baker Street", 
"city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": 
"\u\u\u\u\u\u\u\u\u\u"}, "type_date": 
2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, 
"type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 
123456, "type_decimal_bytes": {"bytes": "\u0007?"}, "type_decimal_fixed": [7, 
-48]}]
 received: [{"name": "Alyssa", "favorite_number": 256, "favorite_color": null, 
"type_long_test": null, "type_double_test": 123.45, "type_null_test": null, 
"type_bool_test": true, "type_array_string": ["ELEMENT 1", "ELEMENT 2"], 
"type_array_boolean": [true, false], "type_nullable_array": null, "type_enum": 
"GREEN", "type_map": {"KEY 2": 17554, "KEY 1": 8546456}, "type_fixed": null, 
"type_union": null, "type_nested": {"num": 239, "street": "Baker Street", 
"city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": 
"\u\u\u\u\u\u\u\u\u\u"}, "type_date": 
2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, 
"type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 
123456, "type_decimal_bytes": {"bytes": "\u0007??"}, "type_decimal_fixed": [7, 
-48]}, {"name": "Charlie", "favorite_number": null, "favorite_color": "blue", 
"type_long_test": 1337, "type_double_test": 1.337, "type_null_test": null, 
"type_bool_test": false, "type_array_string": [], "type_array_boolean": [], 
"type_nullable_array": null, "type_enum": "RED", "type_map": {}, "type_fixed": 
null, "type_union": null, "type_nested": {"num": 239, "street": "Baker Street", 
"city": "London", "state": "London", "zip": "NW1 6XE"}, "type_bytes": {"bytes": 
"\u\u\u\u\u\u\u\u\u\u"}, "type_date": 
2014-03-01, "type_time_millis": 12:12:12.000, "type_time_micros": 123456, 
"type_timestamp_millis": 2014-03-01T12:12:12.321Z, "type_timestamp_micros": 
123456, "type_decimal_bytes": {"bytes": "\u0007??"}, "type_decimal_fixed": [7, 
-48]}]
at 
org.apache.flink.formats.avro.typeutils.AvroTypeExtractionTest.after(AvroTypeExtractionTest.java:76)
{code}

Comparing “expected” with “received”, there is really some question mark 
difference.

For example, in “expected’, it’s
{code:java}
"type_decimal_bytes": {"bytes": "\u0007?”}
{code}

While in “received”, it’s 
{code:java}
"type_decimal_bytes": {"bytes": "\u0007??"}
{code}

The environment I ran the unit tests on uses ANSI_X3.4-1968 

I changed to "en_US.UTF-8" and the unit tests passed. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-9684) HistoryServerArchiveFetcher not working properly with secure hdfs cluster

2018-06-28 Thread Ethan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526449#comment-16526449
 ] 

Ethan Li commented on FLINK-9684:
-

PR: https://github.com/apache/flink/pull/6225

> HistoryServerArchiveFetcher not working properly with secure hdfs cluster
> -
>
> Key: FLINK-9684
> URL: https://issues.apache.org/jira/browse/FLINK-9684
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.4.2
>Reporter: Ethan Li
>Priority: Major
>
> With my current setup, jobmanager and taskmanager are able to talk to hdfs 
> cluster (with kerberos setup). However, running history server gets:
>  
>  
> {code:java}
> 2018-06-27 19:03:32,080 WARN org.apache.hadoop.ipc.Client - Exception 
> encountered while connecting to the server : 
> java.lang.IllegalArgumentException: Failed to specify server's Kerberos 
> principal name
> 2018-06-27 19:03:32,085 ERROR 
> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher - 
> Failed to access job archive location for path 
> hdfs://openqe11blue-n2.blue.ygrid.yahoo.com/tmp/flink/openstorm10-blue/jmarchive.
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.lang.IllegalArgumentException: Failed to specify server's Kerberos 
> principal name; Host Details : local host is: 
> "openstorm10blue-n2.blue.ygrid.yahoo.com/10.215.79.35"; destination host is: 
> "openqe11blue-n2.blue.ygri
> d.yahoo.com":8020;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
> at org.apache.hadoop.ipc.Client.call(Client.java:1414)
> at org.apache.hadoop.ipc.Client.call(Client.java:1363)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy9.getListing(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at com.sun.proxy.$Proxy9.getListing(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:515)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1743)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1726)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:650)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708)
> at 
> org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.listStatus(HadoopFileSystem.java:146)
> at 
> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:139)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: java.lang.IllegalArgumentException: Failed to 
> specify server's Kerberos principal name
> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
> at 
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724)
> at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
> at org.apache.hadoop.ipc.Client.call(Client.java:1381)
> ... 

[jira] [Created] (FLINK-9685) Flink should support hostname-substitution for security.kerberos.login.principal

2018-06-27 Thread Ethan Li (JIRA)
Ethan Li created FLINK-9685:
---

 Summary: Flink should support hostname-substitution for 
security.kerberos.login.principal
 Key: FLINK-9685
 URL: https://issues.apache.org/jira/browse/FLINK-9685
 Project: Flink
  Issue Type: Improvement
Reporter: Ethan Li


[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/security/SecurityConfiguration.java#L83]

 

We can have something like this
{code:java}
String rawPrincipal = 
flinkConf.getString(SecurityOptions.KERBEROS_LOGIN_PRINCIPAL);
if (rawPrincipal != null) {
   try {
  rawPrincipal = rawPrincipal.replace("HOSTNAME", 
InetAddress.getLocalHost().getCanonicalHostName());
   } catch (UnknownHostException e) {
  LOG.error("Failed to replace HOSTNAME with localhost because {}", e);
   }
}
this.principal = rawPrincipal;
{code}

So it will be easier to deploy flink to cluster. Instead of setting different 
principal on every node, we can have the same principal 
headless_user/HOSTNAME@DOMAIN .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9684) HistoryServerArchiveFetcher not working properly with secure hdfs cluster

2018-06-27 Thread Ethan Li (JIRA)
Ethan Li created FLINK-9684:
---

 Summary: HistoryServerArchiveFetcher not working properly with 
secure hdfs cluster
 Key: FLINK-9684
 URL: https://issues.apache.org/jira/browse/FLINK-9684
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.4.2
Reporter: Ethan Li


With my current setup, jobmanager and taskmanager are able to talk to hdfs 
cluster (with kerberos setup). However, running history server gets:

 

 
{code:java}
2018-06-27 19:03:32,080 WARN org.apache.hadoop.ipc.Client - Exception 
encountered while connecting to the server : 
java.lang.IllegalArgumentException: Failed to specify server's Kerberos 
principal name
2018-06-27 19:03:32,085 ERROR 
org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher - 
Failed to access job archive location for path 
hdfs://openqe11blue-n2.blue.ygrid.yahoo.com/tmp/flink/openstorm10-blue/jmarchive.
java.io.IOException: Failed on local exception: java.io.IOException: 
java.lang.IllegalArgumentException: Failed to specify server's Kerberos 
principal name; Host Details : local host is: 
"openstorm10blue-n2.blue.ygrid.yahoo.com/10.215.79.35"; destination host is: 
"openqe11blue-n2.blue.ygri
d.yahoo.com":8020;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1414)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy9.getListing(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy9.getListing(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:515)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1743)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1726)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:650)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708)
at 
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.listStatus(HadoopFileSystem.java:146)
at 
org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: java.lang.IllegalArgumentException: Failed to 
specify server's Kerberos principal name
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at 
org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
at org.apache.hadoop.ipc.Client.call(Client.java:1381)
... 28 more
{code}
 

 

Changed LOG Level to DEBUG and seeing 

 
{code:java}
2018-06-27 19:03:30,931 INFO 
org.apache.flink.runtime.webmonitor.history.HistoryServer - Enabling SSL for 
the history server.
2018-06-27 19:03:30,931 DEBUG org.apache.flink.runtime.net.SSLUtils - Creating 
server SSL context from configuration
2018-06-27 19:03:31,091 DEBUG org.apache.flink.core.fs.FileSystem - Loading 
extension file systems via services
2018-06-27 19:03:31,094 DEBUG 

[jira] [Created] (FLINK-9683) inconsistent behaviors when setting historyserver.archive.fs.dir

2018-06-27 Thread Ethan Li (JIRA)
Ethan Li created FLINK-9683:
---

 Summary: inconsistent behaviors when setting 
historyserver.archive.fs.dir
 Key: FLINK-9683
 URL: https://issues.apache.org/jira/browse/FLINK-9683
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.4.2
Reporter: Ethan Li


I am using release-1.4.2,

 

With fs.default-scheme and  fs.hdfs.hadoopconf set correctly, 

when setting 
{code:java}
historyserver.archive.fs.dir: /tmp/flink/cluster-name/jmarchive
{code}
I am seeing
{code:java}
2018-06-27 18:51:12,692 WARN 
org.apache.flink.runtime.webmonitor.history.HistoryServer - Failed to create 
Path or FileSystem for directory '/tmp/flink/cluster-name/jmarchive'. Directory 
will not be monitored.
java.lang.IllegalArgumentException: The scheme (hdfs://, file://, etc) is null. 
Please specify the file system scheme explicitly in the URI.
at 
org.apache.flink.runtime.webmonitor.WebMonitorUtils.validateAndNormalizeUri(WebMonitorUtils.java:300)
at 
org.apache.flink.runtime.webmonitor.history.HistoryServer.(HistoryServer.java:168)
at 
org.apache.flink.runtime.webmonitor.history.HistoryServer.(HistoryServer.java:132)
at 
org.apache.flink.runtime.webmonitor.history.HistoryServer$1.call(HistoryServer.java:113)
at 
org.apache.flink.runtime.webmonitor.history.HistoryServer$1.call(HistoryServer.java:110)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at 
org.apache.flink.runtime.webmonitor.history.HistoryServer.main(HistoryServer.java:110)
{code}
 

And then if I set 
{code:java}
historyserver.archive.fs.dir: hdfs:///tmp/flink/cluster-name/jmarchive{code}
I am seeing:

 
{code:java}
java.io.IOException: The given file system URI 
(hdfs:///tmp/flink/cluster-name/jmarchive) did not describe the authority (like 
for example HDFS NameNode address/port or S3 host). The attempt to use a 
configured default authority failed: Hadoop configuration for default file 
system ('fs.default.name' or 'fs.defaultFS') contains no valid authority 
component (like hdfs namenode, S3 host, etc)
at 
org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:149)
at 
org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:401)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320)
at org.apache.flink.core.fs.Path.getFileSystem(Path.java:293)
at 
org.apache.flink.runtime.webmonitor.history.HistoryServer.(HistoryServer.java:169)
at 
org.apache.flink.runtime.webmonitor.history.HistoryServer.(HistoryServer.java:132)
{code}
The only way it works is to provide the full path of hdfs like:
{code:java}
#historyserver.archive.fs.dir: 
hdfs:///tmp/flink/cluster-name/jmarchive
{code}
 

Above situations are because there are two parts of code treating "scheme" 
differently. 

https://github.com/apache/flink/blob/release-1.4.2/flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/WebMonitorUtils.java#L299-L302

https://github.com/apache/flink/blob/release-1.4.2/flink-core/src/main/java/org/apache/flink/core/fs/FileSystem.java#L335-L338

I believe the first case should be supported if users have set 
fs.default-scheme 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)