[jira] [Commented] (FLINK-33694) GCS filesystem does not respect gs.storage.root.url config option

2023-12-04 Thread Patrick Lucas (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792901#comment-17792901
 ] 

Patrick Lucas commented on FLINK-33694:
---

I've updated the PR to more explicitly look for the specific Hadoop connector 
config for this option instead of defining it as a Flink {{{}ConfigOption{}}}.

> GCS filesystem does not respect gs.storage.root.url config option
> -
>
> Key: FLINK-33694
> URL: https://issues.apache.org/jira/browse/FLINK-33694
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems
>Affects Versions: 1.18.0, 1.17.2
>Reporter: Patrick Lucas
>Priority: Major
>  Labels: gcs, pull-request-available
>
> The GCS FileSystem's RecoverableWriter implementation uses the GCS SDK 
> directly rather than going through Hadoop. While support has been added to 
> configure credentials correctly based on the standard Hadoop implementation 
> configuration, no other options are passed through to the underlying client.
> Because this only affects the RecoverableWriter-related codepaths, it can 
> result in very surprising differing behavior whether the FileSystem is being 
> used as a source or a sink—while a {{{}gs://{}}}-URI FileSource may work 
> fine, a {{{}gs://{}}}-URI FileSink may not work at all.
> We use [fake-gcs-server|https://github.com/fsouza/fake-gcs-server] in 
> testing, and so we override the Hadoop GCS FileSystem config option 
> {{{}gs.storage.root.url{}}}. However, because this option is not considered 
> when creating the GCS client for the RecoverableWriter codepath, in a 
> FileSink the GCS FileSystem attempts to write to the real GCS service rather 
> than fake-gcs-server. At the same time, a FileSource works as expected, 
> reading from fake-gcs-server.
> The fix should be fairly straightforward, reading the {{gs.storage.root.url}} 
> config option from the Hadoop FileSystem config in 
> [{{GSFileSystemOptions}}|https://github.com/apache/flink/blob/release-1.18.0/flink-filesystems/flink-gs-fs-hadoop/src/main/java/org/apache/flink/fs/gs/GSFileSystemOptions.java#L30]
>  and, if set, passing it to {{storageOptionsBuilder}} in 
> [{{GSFileSystemFactory}}|https://github.com/apache/flink/blob/release-1.18.0/flink-filesystems/flink-gs-fs-hadoop/src/main/java/org/apache/flink/fs/gs/GSFileSystemFactory.java].
> The only workaround for this is to build a custom flink-gs-fs-hadoop JAR with 
> a patch and use it as a plugin.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33694) GCS filesystem does not respect gs.storage.root.url config option

2023-12-01 Thread Patrick Lucas (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792050#comment-17792050
 ] 

Patrick Lucas commented on FLINK-33694:
---

[~martijnvisser] my comment is perhaps not precise, in that "support has been 
added to configure credentials correctly" should be qualified with the comment 
from the docs about which credentials mechanisms are supported. But it is still 
true that other potentially-interesting options are not proxied through, such 
as setting the root URL.

This change as written doesn't affect any credentials handling, only adding 
support for this one additional option. However, I could see an argument for 
implementing the behavior in {{org.apache.flink.fs.gs.utils.ConfigUtils}} as 
the credentials behavior is rather than in {{GSFileSystemOptions}} as I did to 
start with.

> GCS filesystem does not respect gs.storage.root.url config option
> -
>
> Key: FLINK-33694
> URL: https://issues.apache.org/jira/browse/FLINK-33694
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems
>Affects Versions: 1.18.0, 1.17.2
>Reporter: Patrick Lucas
>Priority: Major
>  Labels: gcs, pull-request-available
>
> The GCS FileSystem's RecoverableWriter implementation uses the GCS SDK 
> directly rather than going through Hadoop. While support has been added to 
> configure credentials correctly based on the standard Hadoop implementation 
> configuration, no other options are passed through to the underlying client.
> Because this only affects the RecoverableWriter-related codepaths, it can 
> result in very surprising differing behavior whether the FileSystem is being 
> used as a source or a sink—while a {{{}gs://{}}}-URI FileSource may work 
> fine, a {{{}gs://{}}}-URI FileSink may not work at all.
> We use [fake-gcs-server|https://github.com/fsouza/fake-gcs-server] in 
> testing, and so we override the Hadoop GCS FileSystem config option 
> {{{}gs.storage.root.url{}}}. However, because this option is not considered 
> when creating the GCS client for the RecoverableWriter codepath, in a 
> FileSink the GCS FileSystem attempts to write to the real GCS service rather 
> than fake-gcs-server. At the same time, a FileSource works as expected, 
> reading from fake-gcs-server.
> The fix should be fairly straightforward, reading the {{gs.storage.root.url}} 
> config option from the Hadoop FileSystem config in 
> [{{GSFileSystemOptions}}|https://github.com/apache/flink/blob/release-1.18.0/flink-filesystems/flink-gs-fs-hadoop/src/main/java/org/apache/flink/fs/gs/GSFileSystemOptions.java#L30]
>  and, if set, passing it to {{storageOptionsBuilder}} in 
> [{{GSFileSystemFactory}}|https://github.com/apache/flink/blob/release-1.18.0/flink-filesystems/flink-gs-fs-hadoop/src/main/java/org/apache/flink/fs/gs/GSFileSystemFactory.java].
> The only workaround for this is to build a custom flink-gs-fs-hadoop JAR with 
> a patch and use it as a plugin.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33694) GCS filesystem does not respect gs.storage.root.url config option

2023-11-30 Thread Patrick Lucas (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791670#comment-17791670
 ] 

Patrick Lucas commented on FLINK-33694:
---

[~martijnvisser] aye, have a PR open but Azure needs a re-run because I missed 
a license header. I'm not sure if flinkbot listens to me for commands.

> GCS filesystem does not respect gs.storage.root.url config option
> -
>
> Key: FLINK-33694
> URL: https://issues.apache.org/jira/browse/FLINK-33694
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems
>Affects Versions: 1.18.0, 1.17.2
>Reporter: Patrick Lucas
>Priority: Major
>  Labels: gcs, pull-request-available
>
> The GCS FileSystem's RecoverableWriter implementation uses the GCS SDK 
> directly rather than going through Hadoop. While support has been added to 
> configure credentials correctly based on the standard Hadoop implementation 
> configuration, no other options are passed through to the underlying client.
> Because this only affects the RecoverableWriter-related codepaths, it can 
> result in very surprising differing behavior whether the FileSystem is being 
> used as a source or a sink—while a {{{}gs://{}}}-URI FileSource may work 
> fine, a {{{}gs://{}}}-URI FileSink may not work at all.
> We use [fake-gcs-server|https://github.com/fsouza/fake-gcs-server] in 
> testing, and so we override the Hadoop GCS FileSystem config option 
> {{{}gs.storage.root.url{}}}. However, because this option is not considered 
> when creating the GCS client for the RecoverableWriter codepath, in a 
> FileSink the GCS FileSystem attempts to write to the real GCS service rather 
> than fake-gcs-server. At the same time, a FileSource works as expected, 
> reading from fake-gcs-server.
> The fix should be fairly straightforward, reading the {{gs.storage.root.url}} 
> config option from the Hadoop FileSystem config in 
> [{{GSFileSystemOptions}}|https://github.com/apache/flink/blob/release-1.18.0/flink-filesystems/flink-gs-fs-hadoop/src/main/java/org/apache/flink/fs/gs/GSFileSystemOptions.java#L30]
>  and, if set, passing it to {{storageOptionsBuilder}} in 
> [{{GSFileSystemFactory}}|https://github.com/apache/flink/blob/release-1.18.0/flink-filesystems/flink-gs-fs-hadoop/src/main/java/org/apache/flink/fs/gs/GSFileSystemFactory.java].
> The only workaround for this is to build a custom flink-gs-fs-hadoop JAR with 
> a patch and use it as a plugin.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33694) GCS filesystem does not respect gs.storage.root.url config option

2023-11-29 Thread Patrick Lucas (Jira)
Patrick Lucas created FLINK-33694:
-

 Summary: GCS filesystem does not respect gs.storage.root.url 
config option
 Key: FLINK-33694
 URL: https://issues.apache.org/jira/browse/FLINK-33694
 Project: Flink
  Issue Type: Bug
  Components: FileSystems
Affects Versions: 1.17.2, 1.18.0
Reporter: Patrick Lucas


The GCS FileSystem's RecoverableWriter implementation uses the GCS SDK directly 
rather than going through Hadoop. While support has been added to configure 
credentials correctly based on the standard Hadoop implementation 
configuration, no other options are passed through to the underlying client.

Because this only affects the RecoverableWriter-related codepaths, it can 
result in very surprising differing behavior whether the FileSystem is being 
used as a source or a sink—while a {{{}gs://{}}}-URI FileSource may work fine, 
a {{{}gs://{}}}-URI FileSink may not work at all.

We use [fake-gcs-server|https://github.com/fsouza/fake-gcs-server] in testing, 
and so we override the Hadoop GCS FileSystem config option 
{{{}gs.storage.root.url{}}}. However, because this option is not considered 
when creating the GCS client for the RecoverableWriter codepath, in a FileSink 
the GCS FileSystem attempts to write to the real GCS service rather than 
fake-gcs-server. At the same time, a FileSource works as expected, reading from 
fake-gcs-server.

The fix should be fairly straightforward, reading the {{gs.storage.root.url}} 
config option from the Hadoop FileSystem config in 
[{{GSFileSystemOptions}}|https://github.com/apache/flink/blob/release-1.18.0/flink-filesystems/flink-gs-fs-hadoop/src/main/java/org/apache/flink/fs/gs/GSFileSystemOptions.java#L30]
 and, if set, passing it to {{storageOptionsBuilder}} in 
[{{GSFileSystemFactory}}|https://github.com/apache/flink/blob/release-1.18.0/flink-filesystems/flink-gs-fs-hadoop/src/main/java/org/apache/flink/fs/gs/GSFileSystemFactory.java].

The only workaround for this is to build a custom flink-gs-fs-hadoop JAR with a 
patch and use it as a plugin.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32722) 239 exit code in flink-runtime

2023-08-01 Thread Patrick Lucas (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749863#comment-17749863
 ] 

Patrick Lucas commented on FLINK-32722:
---

Possibly related: FLINK-18290

> 239 exit code in flink-runtime
> --
>
> Key: FLINK-32722
> URL: https://issues.apache.org/jira/browse/FLINK-32722
> Project: Flink
>  Issue Type: Bug
>  Components: Test Infrastructure
>Affects Versions: 1.16.2
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51852=logs=d89de3df-4600-5585-dadc-9bbc9a5e661c=be5a4b15-4b23-56b1-7582-795f58a645a2=8418
> {code:java}
> Aug 01 01:03:49 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test (default-test) 
> on project flink-runtime: There are test failures.
> Aug 01 01:03:49 [ERROR] 
> Aug 01 01:03:49 [ERROR] Please refer to 
> /__w/2/s/flink-runtime/target/surefire-reports for the individual test 
> results.
> Aug 01 01:03:49 [ERROR] Please refer to dump files (if any exist) 
> [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
> Aug 01 01:03:49 [ERROR] ExecutionException The forked VM terminated without 
> properly saying goodbye. VM crash or System.exit called?
> Aug 01 01:03:49 [ERROR] Command was /bin/sh -c cd /__w/2/s/flink-runtime && 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -XX:+UseG1GC -Xms256m -Xmx768m 
> -jar 
> /__w/2/s/flink-runtime/target/surefire/surefirebooter1803120121827605294.jar 
> /__w/2/s/flink-runtime/target/surefire 2023-08-01T00-58-17_520-jvmRun1 
> surefire9107652818401825168tmp surefire_261701267003130520249tmp
> Aug 01 01:03:49 [ERROR] Error occurred in starting fork, check output in log
> Aug 01 01:03:49 [ERROR] Process Exit Code: 239
> Aug 01 01:03:49 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> Aug 01 01:03:49 [ERROR] Command was /bin/sh -c cd /__w/2/s/flink-runtime && 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -XX:+UseG1GC -Xms256m -Xmx768m 
> -jar 
> /__w/2/s/flink-runtime/target/surefire/surefirebooter1803120121827605294.jar 
> /__w/2/s/flink-runtime/target/surefire 2023-08-01T00-58-17_520-jvmRun1 
> surefire9107652818401825168tmp surefire_261701267003130520249tmp
> Aug 01 01:03:49 [ERROR] Error occurred in starting fork, check output in log
> Aug 01 01:03:49 [ERROR] Process Exit Code: 239
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:532)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:405)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:321)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:266)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1314)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1159)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:932)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
> Aug 01 01:03:49 [ERROR] at 
> org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
> Aug 01 

[jira] [Comment Edited] (FLINK-32583) RestClient can deadlock if request made after Netty event executor terminated

2023-08-01 Thread Patrick Lucas (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749829#comment-17749829
 ] 

Patrick Lucas edited comment on FLINK-32583 at 8/1/23 2:33 PM:
---

[~mapohl] regarding the release note, I just wanted to clarify something.
{quote}Fixes a race condition in the RestClient that could happen when 
submitting a request while closing the client. There was a small chance that 
the request submission wouldn't complete.
{quote}
That is one effect of this change, but the primary problem it solves is not a 
race condition, but that if a request is made in _any time after_ the client is 
closed, then the future will definitely never be resolved, and no exception 
thrown. If a caller were to call {{.join()}} on the returned 
{{{}CompletableFuture{}}}—which does not take a timeout—the caller will 
actually block forever.

I think this is the more serious bug being fixed here than the very rare edge 
case that required most of the complexity in the change.

Suggested release note:
{quote}Fixes a bug in the RestClient where the response future of a request 
made after the client was closed is never completed.
{quote}
or
{quote}Fixes a bug in the RestClient where making a request after the client 
was closed returns a future that never completes.
{quote}


was (Author: plucas):
[~mapohl] regarding the release note, I just wanted to clarify something.

{quote}Fixes a race condition in the RestClient that could happen when 
submitting a request while closing the client. There was a small chance that 
the request submission wouldn't complete.{quote}

That is one effect of this change, but the primary problem it solves is not a 
race condition, but that if a request is made in _any time after_ the client is 
closed, then the future will definitely never be resolved, and no exception 
thrown. If a caller were to call `.join()` on the returned 
`CompletableFuture`—which does not take a timeout—the caller will actually 
block forever.

I think this is the more serious bug being fixed here than the very rare edge 
case that required most of the complexity in the change.

Suggested release note:

{quote}Fixes a bug in the RestClient where the response future of a request 
made after the client was closed is never completed.{quote}

or

{quote}Fixes a bug in the RestClient where making a request after the client 
was closed returns a future that never completes.{quote}

> RestClient can deadlock if request made after Netty event executor terminated
> -
>
> Key: FLINK-32583
> URL: https://issues.apache.org/jira/browse/FLINK-32583
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.18.0, 1.16.3, 1.17.2
>Reporter: Patrick Lucas
>Assignee: Patrick Lucas
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.16.3, 1.17.2
>
>
> The RestClient can deadlock if a request is made after the Netty event 
> executor has terminated.
> This is due to the listener that would resolve the CompletableFuture that is 
> attached to the ChannelFuture returned by the call to Netty to connect not 
> being able to run because the executor to run it rejects the execution.
> [RestClient.java|https://github.com/apache/flink/blob/release-1.17.1/flink-runtime/src/main/java/org/apache/flink/runtime/rest/RestClient.java#L471-L482]:
> {code:java}
> final ChannelFuture connectFuture = bootstrap.connect(targetAddress, 
> targetPort);
> final CompletableFuture channelFuture = new CompletableFuture<>();
> connectFuture.addListener(
> (ChannelFuture future) -> {
> if (future.isSuccess()) {
> channelFuture.complete(future.channel());
> } else {
> channelFuture.completeExceptionally(future.cause());
> }
> });
> {code}
> In this code, the call to {{addListener()}} can fail silently (only logging 
> to the console), meaning any code waiting on the CompletableFuture returned 
> by this method will deadlock.
> There was some work in Netty around this back in 2015, but it's unclear to me 
> how this situation is expected to be handled given the discussion and changes 
> from these issues:
>  * [https://github.com/netty/netty/issues/3449] (open)
>  * [https://github.com/netty/netty/pull/3483] (closed)
>  * [https://github.com/netty/netty/pull/3566] (closed)
>  * [https://github.com/netty/netty/pull/5087] (merged)
> I think a reasonable fix for Flink would be to check the state of 
> {{connectFuture}} and {{channelFuture}} immediately after the call to 
> {{addListener()}}, resolving {{channelFuture}} with 
> {{completeExceptionally()}} if {{connectFuture}} is done and failed and 
> {{channelFuture}} has not 

[jira] [Commented] (FLINK-32583) RestClient can deadlock if request made after Netty event executor terminated

2023-08-01 Thread Patrick Lucas (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749829#comment-17749829
 ] 

Patrick Lucas commented on FLINK-32583:
---

[~mapohl] regarding the release note, I just wanted to clarify something.

{quote}Fixes a race condition in the RestClient that could happen when 
submitting a request while closing the client. There was a small chance that 
the request submission wouldn't complete.{quote}

That is one effect of this change, but the primary problem it solves is not a 
race condition, but that if a request is made in _any time after_ the client is 
closed, then the future will definitely never be resolved, and no exception 
thrown. If a caller were to call `.join()` on the returned 
`CompletableFuture`—which does not take a timeout—the caller will actually 
block forever.

I think this is the more serious bug being fixed here than the very rare edge 
case that required most of the complexity in the change.

Suggested release note:

{quote}Fixes a bug in the RestClient where the response future of a request 
made after the client was closed is never completed.{quote}

or

{quote}Fixes a bug in the RestClient where making a request after the client 
was closed returns a future that never completes.{quote}

> RestClient can deadlock if request made after Netty event executor terminated
> -
>
> Key: FLINK-32583
> URL: https://issues.apache.org/jira/browse/FLINK-32583
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.18.0, 1.16.3, 1.17.2
>Reporter: Patrick Lucas
>Assignee: Patrick Lucas
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.18.0, 1.16.3, 1.17.2
>
>
> The RestClient can deadlock if a request is made after the Netty event 
> executor has terminated.
> This is due to the listener that would resolve the CompletableFuture that is 
> attached to the ChannelFuture returned by the call to Netty to connect not 
> being able to run because the executor to run it rejects the execution.
> [RestClient.java|https://github.com/apache/flink/blob/release-1.17.1/flink-runtime/src/main/java/org/apache/flink/runtime/rest/RestClient.java#L471-L482]:
> {code:java}
> final ChannelFuture connectFuture = bootstrap.connect(targetAddress, 
> targetPort);
> final CompletableFuture channelFuture = new CompletableFuture<>();
> connectFuture.addListener(
> (ChannelFuture future) -> {
> if (future.isSuccess()) {
> channelFuture.complete(future.channel());
> } else {
> channelFuture.completeExceptionally(future.cause());
> }
> });
> {code}
> In this code, the call to {{addListener()}} can fail silently (only logging 
> to the console), meaning any code waiting on the CompletableFuture returned 
> by this method will deadlock.
> There was some work in Netty around this back in 2015, but it's unclear to me 
> how this situation is expected to be handled given the discussion and changes 
> from these issues:
>  * [https://github.com/netty/netty/issues/3449] (open)
>  * [https://github.com/netty/netty/pull/3483] (closed)
>  * [https://github.com/netty/netty/pull/3566] (closed)
>  * [https://github.com/netty/netty/pull/5087] (merged)
> I think a reasonable fix for Flink would be to check the state of 
> {{connectFuture}} and {{channelFuture}} immediately after the call to 
> {{addListener()}}, resolving {{channelFuture}} with 
> {{completeExceptionally()}} if {{connectFuture}} is done and failed and 
> {{channelFuture}} has not been completed. In the possible race condition 
> where the listener was attached successfully and the connection fails 
> instantly, the result is the same, as calls to 
> {{CompletableFuture#completeExceptionally()}} are idempotent.
> A workaround for users of RestClient is to call {{CompletableFuture#get(long 
> timeout, TimeUnit unit)}} rather than {{#get()}} or {{#join()}} on the 
> CompletableFutures it returns. However, if the call throws TimeoutException, 
> the cause of the failure cannot easily be determined.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32583) RestClient can deadlock if request made after Netty event executor terminated

2023-07-12 Thread Patrick Lucas (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17742435#comment-17742435
 ] 

Patrick Lucas commented on FLINK-32583:
---

Further investigation suggests that this can only happen when a request is made 
after the RestClient itself is closed, which asynchronously shuts down the 
event loop used by Netty.

Many uses of RestClient will use try-with-resources where this wouldn't be an 
issue, but it should still have the correct behavior when used in other 
contexts where a request may be attempted after shutdown has already started.

> RestClient can deadlock if request made after Netty event executor terminated
> -
>
> Key: FLINK-32583
> URL: https://issues.apache.org/jira/browse/FLINK-32583
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.18.0, 1.16.3, 1.17.2
>Reporter: Patrick Lucas
>Priority: Minor
>  Labels: pull-request-available
>
> The RestClient can deadlock if a request is made after the Netty event 
> executor has terminated.
> This is due to the listener that would resolve the CompletableFuture that is 
> attached to the ChannelFuture returned by the call to Netty to connect not 
> being able to run because the executor to run it rejects the execution.
> [RestClient.java|https://github.com/apache/flink/blob/release-1.17.1/flink-runtime/src/main/java/org/apache/flink/runtime/rest/RestClient.java#L471-L482]:
> {code:java}
> final ChannelFuture connectFuture = bootstrap.connect(targetAddress, 
> targetPort);
> final CompletableFuture channelFuture = new CompletableFuture<>();
> connectFuture.addListener(
> (ChannelFuture future) -> {
> if (future.isSuccess()) {
> channelFuture.complete(future.channel());
> } else {
> channelFuture.completeExceptionally(future.cause());
> }
> });
> {code}
> In this code, the call to {{addListener()}} can fail silently (only logging 
> to the console), meaning any code waiting on the CompletableFuture returned 
> by this method will deadlock.
> There was some work in Netty around this back in 2015, but it's unclear to me 
> how this situation is expected to be handled given the discussion and changes 
> from these issues:
>  * [https://github.com/netty/netty/issues/3449] (open)
>  * [https://github.com/netty/netty/pull/3483] (closed)
>  * [https://github.com/netty/netty/pull/3566] (closed)
>  * [https://github.com/netty/netty/pull/5087] (merged)
> I think a reasonable fix for Flink would be to check the state of 
> {{connectFuture}} and {{channelFuture}} immediately after the call to 
> {{addListener()}}, resolving {{channelFuture}} with 
> {{completeExceptionally()}} if {{connectFuture}} is done and failed and 
> {{channelFuture}} has not been completed. In the possible race condition 
> where the listener was attached successfully and the connection fails 
> instantly, the result is the same, as calls to 
> {{CompletableFuture#completeExceptionally()}} are idempotent.
> A workaround for users of RestClient is to call {{CompletableFuture#get(long 
> timeout, TimeUnit unit)}} rather than {{#get()}} or {{#join()}} on the 
> CompletableFutures it returns. However, if the call throws TimeoutException, 
> the cause of the failure cannot easily be determined.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32583) RestClient can deadlock if request made after Netty event executor terminated

2023-07-12 Thread Patrick Lucas (Jira)
Patrick Lucas created FLINK-32583:
-

 Summary: RestClient can deadlock if request made after Netty event 
executor terminated
 Key: FLINK-32583
 URL: https://issues.apache.org/jira/browse/FLINK-32583
 Project: Flink
  Issue Type: Bug
  Components: Runtime / REST
Affects Versions: 1.18.0, 1.16.3, 1.17.2
Reporter: Patrick Lucas


The RestClient can deadlock if a request is made after the Netty event executor 
has terminated.

This is due to the listener that would resolve the CompletableFuture that is 
attached to the ChannelFuture returned by the call to Netty to connect not 
being able to run because the executor to run it rejects the execution.

[RestClient.java|https://github.com/apache/flink/blob/release-1.17.1/flink-runtime/src/main/java/org/apache/flink/runtime/rest/RestClient.java#L471-L482]:
{code:java}
final ChannelFuture connectFuture = bootstrap.connect(targetAddress, 
targetPort);

final CompletableFuture channelFuture = new CompletableFuture<>();

connectFuture.addListener(
(ChannelFuture future) -> {
if (future.isSuccess()) {
channelFuture.complete(future.channel());
} else {
channelFuture.completeExceptionally(future.cause());
}
});
{code}
In this code, the call to {{addListener()}} can fail silently (only logging to 
the console), meaning any code waiting on the CompletableFuture returned by 
this method will deadlock.

There was some work in Netty around this back in 2015, but it's unclear to me 
how this situation is expected to be handled given the discussion and changes 
from these issues:
 * [https://github.com/netty/netty/issues/3449] (open)
 * [https://github.com/netty/netty/pull/3483] (closed)
 * [https://github.com/netty/netty/pull/3566] (closed)
 * [https://github.com/netty/netty/pull/5087] (merged)

I think a reasonable fix for Flink would be to check the state of 
{{connectFuture}} and {{channelFuture}} immediately after the call to 
{{addListener()}}, resolving {{channelFuture}} with {{completeExceptionally()}} 
if {{connectFuture}} is done and failed and {{channelFuture}} has not been 
completed. In the possible race condition where the listener was attached 
successfully and the connection fails instantly, the result is the same, as 
calls to {{CompletableFuture#completeExceptionally()}} are idempotent.

A workaround for users of RestClient is to call {{CompletableFuture#get(long 
timeout, TimeUnit unit)}} rather than {{#get()}} or {{#join()}} on the 
CompletableFutures it returns. However, if the call throws TimeoutException, 
the cause of the failure cannot easily be determined.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-13703) AvroTypeInfo requires objects to be strict POJOs (mutable, with setters)

2021-06-15 Thread Patrick Lucas (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363519#comment-17363519
 ] 

Patrick Lucas commented on FLINK-13703:
---

Hit a very similar issue (though didn't think it was the same at first), when 
using the option `gettersReturnOptional`.

With this option, all getters return Optional (mainly useful along with 
`optionalGettersForNullableFieldsOnly` to limit it to only nullable fields), 
which breaks Flink's POJO analyzer in a similar way and results in the same 
exception.

I never hit this before because I've tended to not have nullable fields.

> AvroTypeInfo requires objects to be strict POJOs (mutable, with setters)
> 
>
> Key: FLINK-13703
> URL: https://issues.apache.org/jira/browse/FLINK-13703
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Reporter: Alexander Fedulov
>Priority: Minor
>
> There exists an option to generate Avro sources which would represent 
> immutable objects (`createSetters` option set to false) 
> [\[1\]|https://github.com/commercehub-oss/gradle-avro-plugin] , 
> [\[2\]|https://avro.apache.org/docs/current/api/java/org/apache/avro/mojo/AbstractAvroMojo.html].
>  Those objects still have full arguments constructors and are being correctly 
> dealt with by Avro. 
>  `AvroTypeInfo` in Flink performs a check to verify if a Class complies to 
> the strict POJO requirements (including setters) and throws an 
> IllegalStateException("Expecting type to be a PojoTypeInfo") otherwise. Can 
> this check be relaxed to provide better immutability support?
> +Steps to reproduce:+
> 1) Generate Avro sources from schema using `createSetters` option.
> 2) Use generated class in 
> `ConfluentRegistryAvroDeserializationSchema.forSpecific(GeneratedClass.class, 
> schemaRegistryUrl)`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13703) AvroTypeInfo requires objects to be strict POJOs (mutable, with setters)

2021-02-25 Thread Patrick Lucas (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17290867#comment-17290867
 ] 

Patrick Lucas commented on FLINK-13703:
---

I ran into this as well and was thrown off by the confusing error message—glad 
I found this issue to help explain it, thanks [~afedulov].

I have more experience with protobuf which tends to be immutable-by-default, so 
I set Avro's {{createSetters}} flag when starting out. I'll undo that for now, 
but supporting immutable specific records would still be nice.

> AvroTypeInfo requires objects to be strict POJOs (mutable, with setters)
> 
>
> Key: FLINK-13703
> URL: https://issues.apache.org/jira/browse/FLINK-13703
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Reporter: Alexander Fedulov
>Priority: Minor
>
> There exists an option to generate Avro sources which would represent 
> immutable objects (`createSetters` option set to false) 
> [\[1\]|https://github.com/commercehub-oss/gradle-avro-plugin] , 
> [\[2\]|https://avro.apache.org/docs/current/api/java/org/apache/avro/mojo/AbstractAvroMojo.html].
>  Those objects still have full arguments constructors and are being correctly 
> dealt with by Avro. 
>  `AvroTypeInfo` in Flink performs a check to verify if a Class complies to 
> the strict POJO requirements (including setters) and throws an 
> IllegalStateException("Expecting type to be a PojoTypeInfo") otherwise. Can 
> this check be relaxed to provide better immutability support?
> +Steps to reproduce:+
> 1) Generate Avro sources from schema using `createSetters` option.
> 2) Use generated class in 
> `ConfluentRegistryAvroDeserializationSchema.forSpecific(GeneratedClass.class, 
> schemaRegistryUrl)`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19159) Using Scalafmt to format scala source code

2020-09-08 Thread Patrick Lucas (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192174#comment-17192174
 ] 

Patrick Lucas commented on FLINK-19159:
---

[~aljoscha] some plugins, like 
[Spotless|https://github.com/diffplug/spotless/tree/main/plugin-maven] (which 
supports Maven & scalafmt) have an option to only enforce style on a file when 
that file has changed from a given Git ref 
(["ratchet"|https://github.com/diffplug/spotless/tree/main/plugin-maven#ratchet]).
 There would still be some noise as files are formatted, chiefly impacting 
large files, but maybe that's a compromise that works for Flink?

> Using Scalafmt to format scala source code
> --
>
> Key: FLINK-19159
> URL: https://issues.apache.org/jira/browse/FLINK-19159
> Project: Flink
>  Issue Type: Improvement
>Reporter: darion yaphet
>Priority: Minor
>
> Scalafmt is a code formatter for Scala. It can help developer to avoid code 
> style conflict



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17324) Move the image we use to generate the docker official-images changes into flink-docker

2020-04-22 Thread Patrick Lucas (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089631#comment-17089631
 ] 

Patrick Lucas commented on FLINK-17324:
---

That image is just a convenience wrapper for 
[bashbrew|https://github.com/docker-library/official-images/tree/master/bashbrew].
 Here's the whole Dockerfile, assuming the binary {{bashbrew-amd64}} is in the 
directory:

{code}
FROM alpine
RUN apk --no-cache add bash git
COPY bashbrew-amd64 /usr/local/bin/bashbrew
{code}

There are definitely plenty of other options for how to make this available to 
the {{generate-stackbrew-library.sh}} script, I'm really not opinionated on how 
it gets done.  

The tool is used to enumerate the architectures supported by the base image 
referenced in the FROM line of the Flink images, and there might also be 
another way to accomplish the same thing.

> Move the image we use to generate the docker official-images changes into 
> flink-docker
> --
>
> Key: FLINK-17324
> URL: https://issues.apache.org/jira/browse/FLINK-17324
> Project: Flink
>  Issue Type: Improvement
>  Components: Release System / Docker
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> Before the docker official image was repatriated into Apache Flink we used a 
> docker image that contained the scripts to generate the release here: 
> https://github.com/apache/flink-docker/blob/e0107b0f85d9b3630db39568729e36408cdc7b78/generate-stackbrew-library-docker.sh#L5.
> {quote}docker run --rm
>     --volume ~/projects/docker-flink:/build
>     plucas/docker-flink-build
>     /{{build/generate-stackbrew-library.sh > ~/projects/official-images  
> /library/flink}}
> {quote}
> Notice that this docker image tool 'plucas/docker-flink-build' is not part of 
> upstream Flink so we need to move it there into some sort of tools section in 
> the flink-docker repo or document an alternative to it.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-15828) Integrate docker-flink/docker-flink into Flink release process

2020-02-10 Thread Patrick Lucas (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas closed FLINK-15828.
-

We can continue to improve this integration in new issues.

> Integrate docker-flink/docker-flink into Flink release process
> --
>
> Key: FLINK-15828
> URL: https://issues.apache.org/jira/browse/FLINK-15828
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Docker, Release System
>Reporter: Ufuk Celebi
>Priority: Major
>
> This ticket tracks the first phase of Flink Docker image build consolidation.
> The goal of this story is to integrate Docker image publication with the 
> Flink release process and provide convenience packages of released Flink 
> artifacts on DockerHub.
> For more details, check the 
> [DISCUSS|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Integrate-Flink-Docker-image-publication-into-Flink-release-process-td36139.html]
>  and 
> [VOTE|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Integrate-Flink-Docker-image-publication-into-Flink-release-process-td36982.html]
>  threads on the mailing list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (FLINK-15828) Integrate docker-flink/docker-flink into Flink release process

2020-02-10 Thread Patrick Lucas (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas resolved FLINK-15828.
---
Resolution: Fixed

> Integrate docker-flink/docker-flink into Flink release process
> --
>
> Key: FLINK-15828
> URL: https://issues.apache.org/jira/browse/FLINK-15828
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Docker, Release System
>Reporter: Ufuk Celebi
>Priority: Major
>
> This ticket tracks the first phase of Flink Docker image build consolidation.
> The goal of this story is to integrate Docker image publication with the 
> Flink release process and provide convenience packages of released Flink 
> artifacts on DockerHub.
> For more details, check the 
> [DISCUSS|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Integrate-Flink-Docker-image-publication-into-Flink-release-process-td36139.html]
>  and 
> [VOTE|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Integrate-Flink-Docker-image-publication-into-Flink-release-process-td36982.html]
>  threads on the mailing list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-15829) Request apache/flink-docker repository

2020-02-10 Thread Patrick Lucas (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas closed FLINK-15829.
-

> Request apache/flink-docker repository
> --
>
> Key: FLINK-15829
> URL: https://issues.apache.org/jira/browse/FLINK-15829
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-15831) Add Docker image publication to release documentation

2020-02-10 Thread Patrick Lucas (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas closed FLINK-15831.
-

> Add Docker image publication to release documentation
> -
>
> Key: FLINK-15831
> URL: https://issues.apache.org/jira/browse/FLINK-15831
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Ufuk Celebi
>Assignee: Patrick Lucas
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The [release documentation in the project 
> Wiki|https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release]
>  describes the release process.
> We need to add a note to follow up with the Docker image publication process 
> as part of the release checklist. The actual documentation should probably be 
> self-contained in the apache/flink-docker repository, but we should 
> definitely link to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (FLINK-15831) Add Docker image publication to release documentation

2020-02-10 Thread Patrick Lucas (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas resolved FLINK-15831.
---
Resolution: Fixed

> Add Docker image publication to release documentation
> -
>
> Key: FLINK-15831
> URL: https://issues.apache.org/jira/browse/FLINK-15831
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Ufuk Celebi
>Assignee: Patrick Lucas
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The [release documentation in the project 
> Wiki|https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release]
>  describes the release process.
> We need to add a note to follow up with the Docker image publication process 
> as part of the release checklist. The actual documentation should probably be 
> self-contained in the apache/flink-docker repository, but we should 
> definitely link to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15831) Add Docker image publication to release documentation

2020-02-10 Thread Patrick Lucas (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033556#comment-17033556
 ] 

Patrick Lucas commented on FLINK-15831:
---

After discussion on the mailing list, I updated the [Flink release 
guide|https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release],
 and once [#5|https://github.com/apache/flink-docker/pull/5] has been merged I 
think we can close this issue.

There's still room for improvement here, for example adding more automation to 
the workflow and somehow making release candidate images available during the 
release process, but we can tackle that in a follow-up issue.

> Add Docker image publication to release documentation
> -
>
> Key: FLINK-15831
> URL: https://issues.apache.org/jira/browse/FLINK-15831
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Ufuk Celebi
>Assignee: Patrick Lucas
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The [release documentation in the project 
> Wiki|https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release]
>  describes the release process.
> We need to add a note to follow up with the Docker image publication process 
> as part of the release checklist. The actual documentation should probably be 
> self-contained in the apache/flink-docker repository, but we should 
> definitely link to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14973) OSS filesystem does not relocate many dependencies

2019-11-27 Thread Patrick Lucas (Jira)
Patrick Lucas created FLINK-14973:
-

 Summary: OSS filesystem does not relocate many dependencies
 Key: FLINK-14973
 URL: https://issues.apache.org/jira/browse/FLINK-14973
 Project: Flink
  Issue Type: Bug
  Components: FileSystems
Affects Versions: 1.9.1
Reporter: Patrick Lucas


Whereas the Azure and S3 Hadoop filesystem jars relocate all of their 
depdendencies:

{noformat}
$ jar tf opt/flink-azure-fs-hadoop-1.9.1.jar | grep -v '^org/apache/fs/shaded/' 
| grep -v '^META-INF' | grep '/' | cut -f -2 -d / | sort | uniq
org/
org/apache

$ jar tf opt/flink-s3-fs-hadoop-1.9.1.jar | grep -v '^org/apache/fs/shaded/' | 
grep -v '^META-INF' | grep '/' | cut -f -2 -d / | sort | uniq
org/
org/apache
{noformat}

The OSS Hadoop filesystem leaves many things un-relocated:

{noformat}
$ jar tf opt/flink-oss-fs-hadoop-1.9.1.jar | grep -v '^org/apache/fs/shaded/' | 
grep -v '^META-INF' | grep '/' | cut -f -2 -d / | sort | uniq
assets/
assets/org
avro/
avro/shaded
com/
com/ctc
com/fasterxml
com/google
com/jcraft
com/nimbusds
com/sun
com/thoughtworks
javax/
javax/activation
javax/el
javax/servlet
javax/ws
javax/xml
jersey/
jersey/repackaged
licenses/
licenses/LICENSE.asm
licenses/LICENSE.cddlv1.0
licenses/LICENSE.cddlv1.1
licenses/LICENSE.jdom
licenses/LICENSE.jzlib
licenses/LICENSE.paranamer
licenses/LICENSE.protobuf
licenses/LICENSE.re2j
licenses/LICENSE.stax2api
net/
net/jcip
net/minidev
org/
org/apache
org/codehaus
org/eclipse
org/jdom
org/objectweb
org/tukaani
org/xerial
{noformat}

The first symptom of this I ran into was that Flink is unable to restore from a 
savepoint if both the OSS and Azure Hadoop filesystems are on the classpath, 
but I assume this has the potential to cause further problems, at least until 
more progress is made on the module/classloading front.

h3. Steps to reproduce

# Copy both the Azure and OSS Hadoop filesystem JARs from opt/ into lib/
# Run a job that restores from a savepoint (the savepoint might need to be 
stored on OSS)
# See a crash and traceback like:

{noformat}
2019-11-26 15:59:25,318 ERROR 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] - Fatal error 
occurred in the cluster entrypoint.
org.apache.flink.runtime.dispatcher.DispatcherException: Failed to take 
leadership with session id ----.
at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$30(Dispatcher.java:915)
 ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 ~[?:1.8.0_232]
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 ~[?:1.8.0_232]
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_232]
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_232]
at 
org.apache.flink.runtime.concurrent.FutureUtils$WaitingConjunctFuture.handleCompletedFuture(FutureUtils.java:691)
 ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 ~[?:1.8.0_232]
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 ~[?:1.8.0_232]
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_232]
at 
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575) 
~[?:1.8.0_232]
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:753)
 ~[?:1.8.0_232]
at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
 ~[?:1.8.0_232]
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
 ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
 ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
 ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
 ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) 
[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) 
[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]

[jira] [Commented] (FLINK-14812) Add metric libs to Flink classpath with an environment variable.

2019-11-20 Thread Patrick Lucas (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978444#comment-16978444
 ] 

Patrick Lucas commented on FLINK-14812:
---

Supporting some env var to augment the classpath used when running Flink sounds 
totally reasonable to me, though for users who want variations on the upstream 
image I still don't think it's a bad idea for them to create their own 
images—especially if they are already set up to build Docker images.

In this case, a Dockerfile could be:


{code}
FROM flink:1.9

RUN ln -s ${FLINK_HOME}/opt/flink-metrics-prometheus-*.jar ${FLINK_HOME}/lib/
{code}


> Add metric libs to Flink classpath with an environment variable.
> 
>
> Key: FLINK-14812
> URL: https://issues.apache.org/jira/browse/FLINK-14812
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / Kubernetes, Deployment / Scripts
>Reporter: Eui Heo
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> To use the Flink metric lib you need to add it to the flink classpath. The 
> documentation explains to put the jar file in the lib path.
> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter
> However, to deploy metric-enabled Flinks on a kubernetes cluster, we have the 
> burden of creating and managing another container image. It would be more 
> efficient to add the classpath using environment variables inside the 
> constructFlinkClassPath function in the config.sh file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-12191) Flink SVGs on "Material" page broken, render incorrectly on Firefox

2019-04-17 Thread Patrick Lucas (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas closed FLINK-12191.
-
Resolution: Fixed

> Flink SVGs on "Material" page broken, render incorrectly on Firefox
> ---
>
> Key: FLINK-12191
> URL: https://issues.apache.org/jira/browse/FLINK-12191
> Project: Flink
>  Issue Type: Bug
>  Components: Project Website
>Reporter: Patrick Lucas
>Assignee: Patrick Lucas
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2019-04-15 at 09.48.15.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Like FLINK-11043, the Flink SVGs on the [Material page of the Flink 
> website|https://flink.apache.org/material.html] are invalid and do not render 
> correctly on Firefox.
> I'm not sure if there is an additional source-of-truth for these images, or 
> if these hosted on the website are canonical, but I can fix them nonetheless.
> I also noticed that one of the squirrels in both {{color_black.svg}} and 
> {{color_white.svg}} is missing the eye gradient, which can also be easily 
> fixed.
>  !Screen Shot 2019-04-15 at 09.48.15.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12191) Flink SVGs on "Material" page broken, render incorrectly on Firefox

2019-04-15 Thread Patrick Lucas (JIRA)
Patrick Lucas created FLINK-12191:
-

 Summary: Flink SVGs on "Material" page broken, render incorrectly 
on Firefox
 Key: FLINK-12191
 URL: https://issues.apache.org/jira/browse/FLINK-12191
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Patrick Lucas
Assignee: Patrick Lucas
 Attachments: Screen Shot 2019-04-15 at 09.48.15.png

Like FLINK-11043, the Flink SVGs on the [Material page of the Flink 
website|https://flink.apache.org/material.html] are invalid and do not render 
correctly on Firefox.

I'm not sure if there is an additional source-of-truth for these images, or if 
these hosted on the website are canonical, but I can fix them nonetheless.

I also noticed that one of the squirrels in both {{color_black.svg}} and 
{{color_white.svg}} is missing the eye gradient, which can also be easily fixed.

 !Screen Shot 2019-04-15 at 09.48.15.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11462) "Powered by Flink" page does not display correctly on Firefox

2019-01-30 Thread Patrick Lucas (JIRA)
Patrick Lucas created FLINK-11462:
-

 Summary: "Powered by Flink" page does not display correctly on 
Firefox
 Key: FLINK-11462
 URL: https://issues.apache.org/jira/browse/FLINK-11462
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Patrick Lucas
Assignee: Patrick Lucas
 Attachments: Screen Shot 2019-01-30 at 12.13.03.png

The JavaScript that sets the height of each "tile" does not work as expected in 
Firefox, causing them to overlap (see attached screenshot). [This Stack 
Overflow 
post|https://stackoverflow.com/questions/12184133/jquery-wrong-values-when-trying-to-get-div-height]
 suggests using jQuery's {{outerHeight}} instead of {{height}} method, and 
making this change seems to make the page display correctly in both Firefox and 
Chrome.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11127) Make metrics query service establish connection to JobManager

2018-12-13 Thread Patrick Lucas (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720154#comment-16720154
 ] 

Patrick Lucas commented on FLINK-11127:
---

StatefulSets now support the podManagementPolicy option of "Parallel", which 
allows all the Pods to launch or terminate at the same time, which partially 
solves the problem, though experimentation is needed to really know whether 
that is preferable to using a Deployment. (See: 
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#pod-management-policies)

Would it be feasible for the JMs to connect to the TMs by IP address (as they 
do naturally for other types of communication)?

> Make metrics query service establish connection to JobManager
> -
>
> Key: FLINK-11127
> URL: https://issues.apache.org/jira/browse/FLINK-11127
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Coordination, Kubernetes, Metrics
>Affects Versions: 1.7.0
>Reporter: Ufuk Celebi
>Priority: Major
>
> As part of FLINK-10247, the internal metrics query service has been separated 
> into its own actor system. Before this change, the JobManager (JM) queried 
> TaskManager (TM) metrics via the TM actor. Now, the JM needs to establish a 
> separate connection to the TM metrics query service actor.
> In the context of Kubernetes, this is problematic as the JM will typically 
> *not* be able to resolve the TMs by name, resulting in warnings as follows:
> {code}
> 2018-12-11 08:32:33,962 WARN  akka.remote.ReliableDeliverySupervisor  
>   - Association with remote system 
> [akka.tcp://flink-metrics@flink-task-manager-64b868487c-x9l4b:39183] has 
> failed, address is now gated for [50] ms. Reason: [Association failed with 
> [akka.tcp://flink-metrics@flink-task-manager-64b868487c-x9l4b:39183]] Caused 
> by: [flink-task-manager-64b868487c-x9l4b: Name does not resolve]
> {code}
> In order to expose the TMs by name in Kubernetes, users require a service 
> *for each* TM instance which is not practical.
> This currently results in the web UI not being to display some basic metrics 
> about number of sent records. You can reproduce this by following the READMEs 
> in {{flink-container/kubernetes}}.
> This worked before, because the JM is typically exposed via a service with a 
> known name and the TMs establish the connection to it which the metrics query 
> service piggybacked on.
> A potential solution to this might be to let the query service connect to the 
> JM similar to how the TMs register.
> I tagged this ticket as an improvement, but in the context of Kubernetes I 
> would consider this to be a bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-11043) Website Flink logo broken on Firefox

2018-12-06 Thread Patrick Lucas (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas closed FLINK-11043.
-
Resolution: Fixed

> Website Flink logo broken on Firefox
> 
>
> Key: FLINK-11043
> URL: https://issues.apache.org/jira/browse/FLINK-11043
> Project: Flink
>  Issue Type: Bug
>  Components: Project Website
>Reporter: Patrick Lucas
>Assignee: Patrick Lucas
>Priority: Minor
>  Labels: pull-request-available
> Attachments: Screen Shot 2018-11-30 at 14.48.43.png
>
>
> The gradients the Flink SVG logo do not render on Firefox due to this 
> (12-year-old) bug: [https://bugzilla.mozilla.org/show_bug.cgi?id=353575]
> Example:
> !Screen Shot 2018-11-30 at 14.48.43.png!
> [This StackOverflow 
> post|https://stackoverflow.com/questions/12867704/svg-linear-gradient-does-not-work-in-firefox]
>  links to the above issue and explains that Firefox does not render gradient 
> elements that are inside a "symbol" element in SVGs, and the fix is to 
> extract all the gradient definitions into a top-level "defs" section.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11043) Website Flink logo broken on Firefox

2018-11-30 Thread Patrick Lucas (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas updated FLINK-11043:
--
Description: 
The gradients the Flink SVG logo do not render on Firefox due to this 
(12-year-old) bug: [https://bugzilla.mozilla.org/show_bug.cgi?id=353575]

Example:

!Screen Shot 2018-11-30 at 14.48.43.png!

[This StackOverflow 
post|https://stackoverflow.com/questions/12867704/svg-linear-gradient-does-not-work-in-firefox]
 links to the above issue and explains that Firefox does not render gradient 
elements that are inside a "symbol" element in SVGs, and the fix is to extract 
all the gradient definitions into a top-level "defs" section.

  was:
The gradients the Flink SVG logo do not render on Firefox due to this 
(12-year-old) bug: [https://bugzilla.mozilla.org/show_bug.cgi?id=353575]

Example:

!Screen Shot 2018-11-30 at 14.48.43.png!

This StackOverflow post links to the above issue and explains that Firefox does 
not render gradient elements that are inside a "symbol" element in SVGs, and 
the fix is to extract all the gradient definitions into a top-level "defs" 
section.


> Website Flink logo broken on Firefox
> 
>
> Key: FLINK-11043
> URL: https://issues.apache.org/jira/browse/FLINK-11043
> Project: Flink
>  Issue Type: Bug
>  Components: Project Website
>Reporter: Patrick Lucas
>Assignee: Patrick Lucas
>Priority: Minor
>  Labels: pull-request-available
> Attachments: Screen Shot 2018-11-30 at 14.48.43.png
>
>
> The gradients the Flink SVG logo do not render on Firefox due to this 
> (12-year-old) bug: [https://bugzilla.mozilla.org/show_bug.cgi?id=353575]
> Example:
> !Screen Shot 2018-11-30 at 14.48.43.png!
> [This StackOverflow 
> post|https://stackoverflow.com/questions/12867704/svg-linear-gradient-does-not-work-in-firefox]
>  links to the above issue and explains that Firefox does not render gradient 
> elements that are inside a "symbol" element in SVGs, and the fix is to 
> extract all the gradient definitions into a top-level "defs" section.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11043) Website Flink logo broken on Firefox

2018-11-30 Thread Patrick Lucas (JIRA)
Patrick Lucas created FLINK-11043:
-

 Summary: Website Flink logo broken on Firefox
 Key: FLINK-11043
 URL: https://issues.apache.org/jira/browse/FLINK-11043
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Patrick Lucas
Assignee: Patrick Lucas
 Attachments: Screen Shot 2018-11-30 at 14.48.43.png

The gradients the Flink SVG logo do not render on Firefox due to this 
(12-year-old) bug: [https://bugzilla.mozilla.org/show_bug.cgi?id=353575]

Example:

!Screen Shot 2018-11-30 at 14.48.43.png!

This StackOverflow post links to the above issue and explains that Firefox does 
not render gradient elements that are inside a "symbol" element in SVGs, and 
the fix is to extract all the gradient definitions into a top-level "defs" 
section.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11043) Website Flink logo broken on Firefox

2018-11-30 Thread Patrick Lucas (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas updated FLINK-11043:
--
Priority: Minor  (was: Major)

> Website Flink logo broken on Firefox
> 
>
> Key: FLINK-11043
> URL: https://issues.apache.org/jira/browse/FLINK-11043
> Project: Flink
>  Issue Type: Bug
>  Components: Project Website
>Reporter: Patrick Lucas
>Assignee: Patrick Lucas
>Priority: Minor
>  Labels: pull-request-available
> Attachments: Screen Shot 2018-11-30 at 14.48.43.png
>
>
> The gradients the Flink SVG logo do not render on Firefox due to this 
> (12-year-old) bug: [https://bugzilla.mozilla.org/show_bug.cgi?id=353575]
> Example:
> !Screen Shot 2018-11-30 at 14.48.43.png!
> This StackOverflow post links to the above issue and explains that Firefox 
> does not render gradient elements that are inside a "symbol" element in SVGs, 
> and the fix is to extract all the gradient definitions into a top-level 
> "defs" section.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10007) Security vulnerability in website build infrastructure

2018-08-02 Thread Patrick Lucas (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566645#comment-16566645
 ] 

Patrick Lucas commented on FLINK-10007:
---

FWIW, this vulnerability has to do with a specific JSON string being able to 
crash ruby, which could result in DoS to a service running it. It's basically 
inconsequential to us, but might as well check it off the list I guess.

https://www.cvedetails.com/cve/CVE-2017-16516/

> Security vulnerability in website build infrastructure
> --
>
> Key: FLINK-10007
> URL: https://issues.apache.org/jira/browse/FLINK-10007
> Project: Flink
>  Issue Type: Bug
>  Components: Project Website
>Reporter: Fabian Hueske
>Priority: Critical
>
> We've got a notification from Apache INFRA about a potential security 
> vulnerability:
> {quote}
> We found a potential security vulnerability in a repository for which you 
> have been granted security alert access.
> @apache   apache/flink-web
> Known high severity security vulnerability detected in yajl-ruby < 1.3.1 
> defined in Gemfile.
> Gemfile update suggested: yajl-ruby ~> 1.3.1. 
> {quote}
> This is a problem with the build environment of the website, i.e., this 
> dependency is not distributed or executed with Flink but only run when the 
> website is updated.
> Nonetheless, we should of course update the dependency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9806) Add a canonical link element to documentation HTML

2018-07-31 Thread Patrick Lucas (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563249#comment-16563249
 ] 

Patrick Lucas commented on FLINK-9806:
--

I'm pleased with the impact of this so far: three test Google queries, "flink 
metrics", "flink queryable state", and "flink flatmap", all have a link from 
the stable docs as the first result.

However, DuckDuckGo, Bing, and Baidu still show terribly out of date results 
for these queries, some even with the Flink 1.1.2 docs as the top result and no 
"stable" URLs in sight.

Improving the experience on Google is still a good result.

> Add a canonical link element to documentation HTML
> --
>
> Key: FLINK-9806
> URL: https://issues.apache.org/jira/browse/FLINK-9806
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Lucas
>Assignee: Patrick Lucas
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.3.4, 1.4.3, 1.5.3, 1.6.0
>
>
> Flink has suffered for a while with non-optimal SEO for its documentation, 
> meaning a web search for a topic covered in the documentation often produces 
> results for many versions of Flink, even preferring older versions since 
> those pages have been around for longer.
> Using a canonical link element (see references) may alleviate this by 
> informing search engines about where to find the latest documentation (i.e. 
> pages hosted under [https://ci.apache.org/projects/flink/flink-docs-master/).]
> I think this is at least worth experimenting with, and if it doesn't cause 
> problems, even backporting it to the older release branches to eventually 
> clean up the Flink docs' SEO and converge on advertising only the latest docs 
> (unless a specific version is specified).
> References:
>  * [https://moz.com/learn/seo/canonicalization]
>  * [https://yoast.com/rel-canonical/]
>  * [https://support.google.com/webmasters/answer/139066?hl=en]
>  * [https://en.wikipedia.org/wiki/Canonical_link_element]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9914) Flink docker information in official website is out of date and should be update

2018-07-23 Thread Patrick Lucas (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553007#comment-16553007
 ] 

Patrick Lucas commented on FLINK-9914:
--

The version numbers in those docs are just for example purposes, and don't need 
to be updated for each release.

However, there have been some slight changes to the images, so updating these 
docs anyway is in order.

> Flink docker information in official website is out of date and should be 
> update
> 
>
> Key: FLINK-9914
> URL: https://issues.apache.org/jira/browse/FLINK-9914
> Project: Flink
>  Issue Type: Bug
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>
> The documentation still use Flink 1.2.1, but the latest Flink version is 1.5.x



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-9806) Add a canonical link element to documentation HTML

2018-07-23 Thread Patrick Lucas (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas reassigned FLINK-9806:


Assignee: Patrick Lucas

> Add a canonical link element to documentation HTML
> --
>
> Key: FLINK-9806
> URL: https://issues.apache.org/jira/browse/FLINK-9806
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Lucas
>Assignee: Patrick Lucas
>Priority: Major
>  Labels: pull-request-available
>
> Flink has suffered for a while with non-optimal SEO for its documentation, 
> meaning a web search for a topic covered in the documentation often produces 
> results for many versions of Flink, even preferring older versions since 
> those pages have been around for longer.
> Using a canonical link element (see references) may alleviate this by 
> informing search engines about where to find the latest documentation (i.e. 
> pages hosted under [https://ci.apache.org/projects/flink/flink-docs-master/).]
> I think this is at least worth experimenting with, and if it doesn't cause 
> problems, even backporting it to the older release branches to eventually 
> clean up the Flink docs' SEO and converge on advertising only the latest docs 
> (unless a specific version is specified).
> References:
>  * [https://moz.com/learn/seo/canonicalization]
>  * [https://yoast.com/rel-canonical/]
>  * [https://support.google.com/webmasters/answer/139066?hl=en]
>  * [https://en.wikipedia.org/wiki/Canonical_link_element]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9806) Add a canonical link element to documentation HTML

2018-07-11 Thread Patrick Lucas (JIRA)
Patrick Lucas created FLINK-9806:


 Summary: Add a canonical link element to documentation HTML
 Key: FLINK-9806
 URL: https://issues.apache.org/jira/browse/FLINK-9806
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.5.0
Reporter: Patrick Lucas


Flink has suffered for a while with non-optimal SEO for its documentation, 
meaning a web search for a topic covered in the documentation often produces 
results for many versions of Flink, even preferring older versions since those 
pages have been around for longer.

Using a canonical link element (see references) may alleviate this by informing 
search engines about where to find the latest documentation (i.e. pages hosted 
under [https://ci.apache.org/projects/flink/flink-docs-master/).]

I think this is at least worth experimenting with, and if it doesn't cause 
problems, even backporting it to the older release branches to eventually clean 
up the Flink docs' SEO and converge on advertising only the latest docs (unless 
a specific version is specified).

References:
 * [https://moz.com/learn/seo/canonicalization]
 * [https://yoast.com/rel-canonical/]
 * [https://support.google.com/webmasters/answer/139066?hl=en]
 * [https://en.wikipedia.org/wiki/Canonical_link_element]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8154) JobSubmissionClientActor submited job,but there is no connection to a JobManager

2017-11-27 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266782#comment-16266782
 ] 

Patrick Lucas commented on FLINK-8154:
--

Is `act-monitor-flink-jobmanager` a Kubernetes Service? If so, it's looking 
like the Service's IP is `10.3.0.81`, but the Pod's IP is `10.2.43.51`. 
Depending on your network configuration, these may not be routable to each 
other.

Can you confirm connectivity by, for example, running `nc -v 10.2.43.51 6123` 
from within the Pod? (You may have to install `netcat`)

> JobSubmissionClientActor  submited job,but there is no connection to a 
> JobManager
> -
>
> Key: FLINK-8154
> URL: https://issues.apache.org/jira/browse/FLINK-8154
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 1.3.2
> Environment: Kubernetes 1.8.3, Platform "Linux/amd64"
>Reporter: Gregory Melekh
>Priority: Blocker
>
> There is JobManager log file bellow.
> 2017-11-26 08:17:13,435 INFO  org.apache.flink.client.CliFrontend 
>   - 
> 
> 2017-11-26 08:17:13,437 INFO  org.apache.flink.client.CliFrontend 
>   -  Starting Command Line Client (Version: 1.3.2, Rev:0399bee, 
> Date:03.08.2017 @ 10:23:11 UTC)
> 2017-11-26 08:17:13,437 INFO  org.apache.flink.client.CliFrontend 
>   -  Current user: root
> 2017-11-26 08:17:13,437 INFO  org.apache.flink.client.CliFrontend 
>   -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 
> 1.8/25.131-b11
> 2017-11-26 08:17:13,437 INFO  org.apache.flink.client.CliFrontend 
>   -  Maximum heap size: 6252 MiBytes
> 2017-11-26 08:17:13,437 INFO  org.apache.flink.client.CliFrontend 
>   -  JAVA_HOME: /usr/lib/jvm/java-1.8-openjdk/jre
> 2017-11-26 08:17:13,439 INFO  org.apache.flink.client.CliFrontend 
>   -  Hadoop version: 2.7.2
> 2017-11-26 08:17:13,440 INFO  org.apache.flink.client.CliFrontend 
>   -  JVM Options:
> 2017-11-26 08:17:13,440 INFO  org.apache.flink.client.CliFrontend 
>   - 
> -Dlog.file=/opt/flink/log/flink--client-act-monitor-flink-jobmanager-66cd4bdb5c-8kxbh.log
> 2017-11-26 08:17:13,440 INFO  org.apache.flink.client.CliFrontend 
>   - -Dlog4j.configuration=file:/etc/flink/log4j-cli.properties
> 2017-11-26 08:17:13,440 INFO  org.apache.flink.client.CliFrontend 
>   - -Dlogback.configurationFile=file:/etc/flink/logback.xml
> 2017-11-26 08:17:13,440 INFO  org.apache.flink.client.CliFrontend 
>   -  Program Arguments:
> 2017-11-26 08:17:13,440 INFO  org.apache.flink.client.CliFrontend 
>   - run
> 2017-11-26 08:17:13,440 INFO  org.apache.flink.client.CliFrontend 
>   - -c
> 2017-11-26 08:17:13,440 INFO  org.apache.flink.client.CliFrontend 
>   - monitoring.flow.AccumulateAll
> 2017-11-26 08:17:13,440 INFO  org.apache.flink.client.CliFrontend 
>   - /tmp/monitoring-0.0.1-SNAPSHOT.jar
> 2017-11-26 08:17:13,440 INFO  org.apache.flink.client.CliFrontend 
>   -  Classpath: 
> /opt/flink/lib/flink-python_2.11-1.3.2.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.3.2.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.3.2.jar:::
> 2017-11-26 08:17:13,440 INFO  org.apache.flink.client.CliFrontend 
>   - 
> 
> 2017-11-26 08:17:13,440 INFO  org.apache.flink.client.CliFrontend 
>   - Using configuration directory /etc/flink
> 2017-11-26 08:17:13,441 INFO  org.apache.flink.client.CliFrontend 
>   - Trying to load configuration file
> 2017-11-26 08:17:13,443 INFO  
> org.apache.flink.configuration.GlobalConfiguration- Loading 
> configuration property: blob.server.port, 6124
> 2017-11-26 08:17:13,443 INFO  
> org.apache.flink.configuration.GlobalConfiguration- Loading 
> configuration property: jobmanager.rpc.address, act-monitor-flink-jobmanager
> 2017-11-26 08:17:13,443 INFO  
> org.apache.flink.configuration.GlobalConfiguration- Loading 
> configuration property: jobmanager.rpc.port, 6123
> 2017-11-26 08:17:13,443 INFO  
> org.apache.flink.configuration.GlobalConfiguration- Loading 
> configuration property: jobmanager.heap.mb, 1024
> 2017-11-26 08:17:13,443 INFO  
> 

[jira] [Commented] (FLINK-6369) Better support for overlay networks

2017-09-06 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155461#comment-16155461
 ] 

Patrick Lucas commented on FLINK-6369:
--

[~till.rohrmann]: yeah that's what I was getting at. It doesn't really solve 
our problem for that reason---though if you just Googled around a bit you might 
come across that documentation and think it does. I definitely agree that an 
HTTP control mechanism is the way to go.

> Better support for overlay networks
> ---
>
> Key: FLINK-6369
> URL: https://issues.apache.org/jira/browse/FLINK-6369
> Project: Flink
>  Issue Type: Improvement
>  Components: Docker, Network
>Affects Versions: 1.2.0
>Reporter: Patrick Lucas
> Fix For: 1.4.0
>
>
> Running Flink in an environment that utilizes an overlay network 
> (containerized environments like Kubernetes or Docker Compose, or cloud 
> platforms like AWS VPC) poses various challenges related to networking.
> The core problem is that in these environments, applications are frequently 
> addressed by a name different from that with which the application sees 
> itself.
> For instance, it is plausible that the Flink UI (served by the Jobmanager) is 
> accessed via an ELB, which poses a problem in HA mode when the non-leader UI 
> returns an HTTP redirect to the leader—but the user may not be able to 
> connect directly to the leader.
> Or, if a user is using [Docker 
> Compose|https://github.com/apache/flink/blob/aa21f853ab0380ec1f68ae1d0b7c8d9268da4533/flink-contrib/docker-flink/docker-compose.yml],
>  they cannot submit a job via the CLI since there is a mismatch between the 
> name used to address the Jobmanager and what the Jobmanager perceives as its 
> hostname. (see \[1] below for more detail)
> 
> h3. Problems and proposed solutions
> There are four instances of this issue that I've run into so far:
> h4. Jobmanagers must be addressed by the same name they are configured with 
> due to limitations of Akka
> Akka enforces that messages it receives are addressed with the hostname it is 
> configured with. Newer versions of Akka (>= 2.4) than what Flink uses 
> (2.3-custom) have support for accepting messages with the "wrong" hostname, 
> but it limited to a single "external" hostname.
> In many environments, it is likely that not all parties that want to connect 
> to the Jobmanager have the same way of addressing it (e.g. the ELB example 
> above). Other similarly-used protocols like HTTP generally don't have this 
> restriction: if you connect on a socket and send a well-formed message, the 
> system assumes that it is the desired recipient.
> One solution is to not use Akka at all when communicating with the cluster 
> from the outside, perhaps using an HTTP API instead. This would be somewhat 
> involved, and probabyl best left as a longer-term goal.
> A more immediate solution would be to override this behavior within Flakka, 
> the custom fork of Akka currently in use by Flink. I'm not sure how much 
> effort this would take.
> h4. The Jobmanager needs to be able to address the Taskmanagers for e.g. 
> metrics collection
> Having the Taskmanagers register themselves by IP is probably the best 
> solution here. It's a reasonable assumption that IPs can always be used for 
> communication between the nodes of a single cluster. Asking that each 
> Taskmanager container have a resolvable hostname is unreasonable.
> h4. Jobmanagers in HA mode send HTTP redirects to URLs that aren't externally 
> resolvable/routable
> If multiple Jobmanagers are used in HA mode, HTTP requests to non-leaders 
> (such as if you put a Kubernetes Service in front of all Jobmanagers in a 
> cluster) get redirected to the (supposed) hostname of the leader, but this is 
> potentially unresolvable/unroutable externally.
> Enabling non-leader Jobmanagers to proxy API calls to the leader would solve 
> this. The non-leaders could even serve static asset requests (e.g. for css or 
> js files) directly.
> h4. Queryable state requests involve direct communication with Taskmanagers
> Currently, queryable state requests involve communication between the client 
> and the Jobmanager (for key partitioning lookups) and between the client and 
> all Taskmanagers.
> If the client is inside the network (as would be common in production 
> use-cases where high-volume lookups are required) this is a non-issue, but 
> problems crop up if the client is outside the network.
> For the communication with the Jobmanager, a similar solution as above can be 
> used: if all Jobmanagers can service all key partitioning lookup requests 
> (e.g. by proxying) then a simple Service can be used.
> The story is a bit different for the Taskmanagers. The partitioning lookup to 
> the Jobmanager would return the name of the particular Taskmanager that owned 
> 

[jira] [Created] (FLINK-7155) Add Influxdb metrics reporter

2017-07-11 Thread Patrick Lucas (JIRA)
Patrick Lucas created FLINK-7155:


 Summary: Add Influxdb metrics reporter
 Key: FLINK-7155
 URL: https://issues.apache.org/jira/browse/FLINK-7155
 Project: Flink
  Issue Type: Improvement
Reporter: Patrick Lucas
Assignee: Patrick Lucas
 Fix For: 1.4.0


[~jgrier] has a [simple Influxdb metrics reporter for 
Flink|https://github.com/jgrier/flink-stuff/tree/master/flink-influx-reporter] 
that is a thing wrapper around [a lightweight, public-domain Influxdb 
reporter|https://github.com/davidB/metrics-influxdb] for Codahale metrics.

We can implement this very easily in Java in the same as as 
flink-metrics-graphite.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-6369) Better support for overlay networks

2017-05-22 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas updated FLINK-6369:
-
Fix Version/s: (was: 1.3.1)
   1.4.0

> Better support for overlay networks
> ---
>
> Key: FLINK-6369
> URL: https://issues.apache.org/jira/browse/FLINK-6369
> Project: Flink
>  Issue Type: Improvement
>  Components: Docker, Network
>Affects Versions: 1.2.0
>Reporter: Patrick Lucas
> Fix For: 1.4.0
>
>
> Running Flink in an environment that utilizes an overlay network 
> (containerized environments like Kubernetes or Docker Compose, or cloud 
> platforms like AWS VPC) poses various challenges related to networking.
> The core problem is that in these environments, applications are frequently 
> addressed by a name different from that with which the application sees 
> itself.
> For instance, it is plausible that the Flink UI (served by the Jobmanager) is 
> accessed via an ELB, which poses a problem in HA mode when the non-leader UI 
> returns an HTTP redirect to the leader—but the user may not be able to 
> connect directly to the leader.
> Or, if a user is using [Docker 
> Compose|https://github.com/apache/flink/blob/aa21f853ab0380ec1f68ae1d0b7c8d9268da4533/flink-contrib/docker-flink/docker-compose.yml],
>  they cannot submit a job via the CLI since there is a mismatch between the 
> name used to address the Jobmanager and what the Jobmanager perceives as its 
> hostname. (see \[1] below for more detail)
> 
> h3. Problems and proposed solutions
> There are four instances of this issue that I've run into so far:
> h4. Jobmanagers must be addressed by the same name they are configured with 
> due to limitations of Akka
> Akka enforces that messages it receives are addressed with the hostname it is 
> configured with. Newer versions of Akka (>= 2.4) than what Flink uses 
> (2.3-custom) have support for accepting messages with the "wrong" hostname, 
> but it limited to a single "external" hostname.
> In many environments, it is likely that not all parties that want to connect 
> to the Jobmanager have the same way of addressing it (e.g. the ELB example 
> above). Other similarly-used protocols like HTTP generally don't have this 
> restriction: if you connect on a socket and send a well-formed message, the 
> system assumes that it is the desired recipient.
> One solution is to not use Akka at all when communicating with the cluster 
> from the outside, perhaps using an HTTP API instead. This would be somewhat 
> involved, and probabyl best left as a longer-term goal.
> A more immediate solution would be to override this behavior within Flakka, 
> the custom fork of Akka currently in use by Flink. I'm not sure how much 
> effort this would take.
> h4. The Jobmanager needs to be able to address the Taskmanagers for e.g. 
> metrics collection
> Having the Taskmanagers register themselves by IP is probably the best 
> solution here. It's a reasonable assumption that IPs can always be used for 
> communication between the nodes of a single cluster. Asking that each 
> Taskmanager container have a resolvable hostname is unreasonable.
> h4. Jobmanagers in HA mode send HTTP redirects to URLs that aren't externally 
> resolvable/routable
> If multiple Jobmanagers are used in HA mode, HTTP requests to non-leaders 
> (such as if you put a Kubernetes Service in front of all Jobmanagers in a 
> cluster) get redirected to the (supposed) hostname of the leader, but this is 
> potentially unresolvable/unroutable externally.
> Enabling non-leader Jobmanagers to proxy API calls to the leader would solve 
> this. The non-leaders could even serve static asset requests (e.g. for css or 
> js files) directly.
> h4. Queryable state requests involve direct communication with Taskmanagers
> Currently, queryable state requests involve communication between the client 
> and the Jobmanager (for key partitioning lookups) and between the client and 
> all Taskmanagers.
> If the client is inside the network (as would be common in production 
> use-cases where high-volume lookups are required) this is a non-issue, but 
> problems crop up if the client is outside the network.
> For the communication with the Jobmanager, a similar solution as above can be 
> used: if all Jobmanagers can service all key partitioning lookup requests 
> (e.g. by proxying) then a simple Service can be used.
> The story is a bit different for the Taskmanagers. The partitioning lookup to 
> the Jobmanager would return the name of the particular Taskmanager that owned 
> the desired data, but that name (likely an IP, as proposed in the second 
> section above) is not necessarily resolvable/routable from the client.
> In the context of Kubernetes, where individual containers are generally not 
> addressible, a very ugly 

[jira] [Closed] (FLINK-3026) Publish the flink docker container to the docker registry

2017-05-10 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas closed FLINK-3026.

Resolution: Fixed

> Publish the flink docker container to the docker registry
> -
>
> Key: FLINK-3026
> URL: https://issues.apache.org/jira/browse/FLINK-3026
> Project: Flink
>  Issue Type: Task
>  Components: Build System, Docker
>Reporter: Omer Katz
>Assignee: Patrick Lucas
>  Labels: Deployment, Docker
>
> There's a dockerfile that can be used to build a docker container already in 
> the repository. It'd be awesome to just be able to pull it instead of 
> building it ourselves.
> The dockerfile can be found at 
> https://github.com/apache/flink/tree/master/flink-contrib/docker-flink
> It also doesn't point to the latest version of Flink which I fixed in 
> https://github.com/apache/flink/pull/1366



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-3026) Publish the flink docker container to the docker registry

2017-05-10 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16004389#comment-16004389
 ] 

Patrick Lucas commented on FLINK-3026:
--

We now have [official Docker images|https://hub.docker.com/_/flink/]!

Any remaining discussion from this issue should be moved to a new one, as this 
one is now resolved.

> Publish the flink docker container to the docker registry
> -
>
> Key: FLINK-3026
> URL: https://issues.apache.org/jira/browse/FLINK-3026
> Project: Flink
>  Issue Type: Task
>  Components: Build System, Docker
>Reporter: Omer Katz
>Assignee: Patrick Lucas
>  Labels: Deployment, Docker
>
> There's a dockerfile that can be used to build a docker container already in 
> the repository. It'd be awesome to just be able to pull it instead of 
> building it ourselves.
> The dockerfile can be found at 
> https://github.com/apache/flink/tree/master/flink-contrib/docker-flink
> It also doesn't point to the latest version of Flink which I fixed in 
> https://github.com/apache/flink/pull/1366



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5997) Docker build tools should inline GPG key to verify HTTP Flink release download

2017-05-10 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16004386#comment-16004386
 ] 

Patrick Lucas commented on FLINK-5997:
--

This is done in the [official Docker 
images|https://github.com/docker-flink/docker-flink/blob/39a2b675f0ed0f8bf53054a66d67efebc6758009/Dockerfile-debian.template#L66].

> Docker build tools should inline GPG key to verify HTTP Flink release download
> --
>
> Key: FLINK-5997
> URL: https://issues.apache.org/jira/browse/FLINK-5997
> Project: Flink
>  Issue Type: Improvement
>  Components: Docker
>Reporter: Patrick Lucas
>Assignee: Patrick Lucas
>
> See examples 
> [here|https://github.com/docker-solr/docker-solr/blob/8afee8bf17a7a3d1fdc453643abfdb5cf2e822d0/6.4/Dockerfile]
>  and 
> [here|https://github.com/docker-library/official-images/blob/8b10b36d19a8cfba56bf4352c8ae98d3b6697eb3/README.md#init].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-5997) Docker build tools should inline GPG key to verify HTTP Flink release download

2017-05-10 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas closed FLINK-5997.

Resolution: Fixed

> Docker build tools should inline GPG key to verify HTTP Flink release download
> --
>
> Key: FLINK-5997
> URL: https://issues.apache.org/jira/browse/FLINK-5997
> Project: Flink
>  Issue Type: Improvement
>  Components: Docker
>Reporter: Patrick Lucas
>Assignee: Patrick Lucas
>
> See examples 
> [here|https://github.com/docker-solr/docker-solr/blob/8afee8bf17a7a3d1fdc453643abfdb5cf2e822d0/6.4/Dockerfile]
>  and 
> [here|https://github.com/docker-library/official-images/blob/8b10b36d19a8cfba56bf4352c8ae98d3b6697eb3/README.md#init].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6487) Remove JobManager local mode

2017-05-08 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001269#comment-16001269
 ] 

Patrick Lucas commented on FLINK-6487:
--

This appears to work:

{code}
#!/bin/bash

JM_PID=
TM_PID=

function die() {
kill $JM_PID $TM_PID
}

trap die SIGINT SIGTERM

jobmanager.sh start-foreground cluster &
JM_PID=$!

taskmanager.sh start-foreground &
TM_PID=$!

wait $JM_PID $TM_PID
wait $JM_PID $TM_PID
{code}

The double wait is intentional. When a trap fires it immediately returns all 
pending {{wait}}s, so you have to {{wait}} again on the same pids to actually 
block until they exit. There might be a nicer way to do that.

> Remove JobManager local mode
> 
>
> Key: FLINK-6487
> URL: https://issues.apache.org/jira/browse/FLINK-6487
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Stephan Ewen
>
> We should remove the "local" mode from the JobManager.
> Currently, the JobManager has the strange "local" mode where it also starts 
> an embedded Task Manager.
> I think that mode has caused confusion / problems:
>   - No TaskManagers can join the cluster
>   - TaskManagers do not support querable state
>   - It is redundant code to maintain
> At the same time, the mode does not help at all:
>   - The MiniCluster does not use that mode
>   - Starting from scripts, the {{start-cluster.sh}} works out of the box, 
> creating a proper local cluster, but in two processes, rather than one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6487) Remove JobManager local mode

2017-05-08 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001228#comment-16001228
 ] 

Patrick Lucas commented on FLINK-6487:
--

I think we have to be cleverer (more clever?) than that—to propagate those 
signals in bash properly you need to exec (which of course you could only do 
once) or use trap... which might actually work okay. I'll give it a shot.

Killing PID 1 with this approach as-is will orphan the cat and a bash that has 
two java child processes.

> Remove JobManager local mode
> 
>
> Key: FLINK-6487
> URL: https://issues.apache.org/jira/browse/FLINK-6487
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Stephan Ewen
>
> We should remove the "local" mode from the JobManager.
> Currently, the JobManager has the strange "local" mode where it also starts 
> an embedded Task Manager.
> I think that mode has caused confusion / problems:
>   - No TaskManagers can join the cluster
>   - TaskManagers do not support querable state
>   - It is redundant code to maintain
> At the same time, the mode does not help at all:
>   - The MiniCluster does not use that mode
>   - Starting from scripts, the {{start-cluster.sh}} works out of the box, 
> creating a proper local cluster, but in two processes, rather than one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6487) Remove JobManager local mode

2017-05-08 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000980#comment-16000980
 ] 

Patrick Lucas commented on FLINK-6487:
--

Something like that should be fine. Even just a supervisord.conf that runs both 
with the right options could work, though unfortunately installing supervisord 
adds ~22MB to the image.

We really don't need anything fancy, just something that spawns two processes, 
propagates signals to them properly, and multiplexes their stdouts into its own 
stdout. I wonder what the simplest approach for that would be.

> Remove JobManager local mode
> 
>
> Key: FLINK-6487
> URL: https://issues.apache.org/jira/browse/FLINK-6487
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Stephan Ewen
>
> We should remove the "local" mode from the JobManager.
> Currently, the JobManager has the strange "local" mode where it also starts 
> an embedded Task Manager.
> I think that mode has caused confusion / problems:
>   - No TaskManagers can join the cluster
>   - TaskManagers do not support querable state
>   - It is redundant code to maintain
> At the same time, the mode does not help at all:
>   - The MiniCluster does not use that mode
>   - Starting from scripts, the {{start-cluster.sh}} works out of the box, 
> creating a proper local cluster, but in two processes, rather than one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (FLINK-6487) Remove JobManager local mode

2017-05-08 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000909#comment-16000909
 ] 

Patrick Lucas edited comment on FLINK-6487 at 5/8/17 3:28 PM:
--

One nice reason to have local mode is for docker, so users can, for example, 
run {{docker run flink local}} to get a local single-taskmanager cluster. (In 
[{{docker-entrypoint.sh}}|https://github.com/docker-flink/docker-flink/blob/e92920ff33107e6f01abaddf7b0195570f4d06e1/docker-entrypoint.sh#L56]
 this is currently turned into a call to {{jobmanager.sh start-foreground 
local}}).

Right now, {{start-cluster.sh}} has the flaws that it doesn't have an option to 
run in the foreground and there is no real process management for the two 
processes it spawns.

I think to properly replicate the behavior of {{jobmanager.sh start-foreground 
local}} we either need to inline something like 
[supervisord|http://supervisord.org/], making sure to multiplex [the processes' 
stdouts|https://github.com/Supervisor/supervisor/issues/511] or have the 
jobmanager fork (to keep the two-process paradigm).


was (Author: plucas):
One nice reason to have local mode is for docker, so users can, for example, 
run {{docker run flink local}} to get a local single-taskmanager cluster. (In 
[{{docker-entrypoint.sh}}|https://github.com/docker-flink/docker-flink/blob/e92920ff33107e6f01abaddf7b0195570f4d06e1/docker-entrypoint.sh#L56]
 this is turned into a call to {{jobmanager.sh start-foreground local}}.

Right now, {{start-cluster.sh}} has the flaws that it doesn't have an option to 
run in the foreground and there is no real process management for the two 
processes it spawns.

I think to properly replicate the behavior of {{jobmanager.sh start-foreground 
local}} we either need to inline something like 
[supervisord|http://supervisord.org/], making sure to multiplex [the processes' 
stdouts|https://github.com/Supervisor/supervisor/issues/511] or have the 
jobmanager fork (to keep the two-process paradigm).

> Remove JobManager local mode
> 
>
> Key: FLINK-6487
> URL: https://issues.apache.org/jira/browse/FLINK-6487
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Stephan Ewen
>
> We should remove the "local" mode from the JobManager.
> Currently, the JobManager has the strange "local" mode where it also starts 
> an embedded Task Manager.
> I think that mode has caused confusion / problems:
>   - No TaskManagers can join the cluster
>   - TaskManagers do not support querable state
>   - It is redundant code to maintain
> At the same time, the mode does not help at all:
>   - The MiniCluster does not use that mode
>   - Starting from scripts, the {{start-cluster.sh}} works out of the box, 
> creating a proper local cluster, but in two processes, rather than one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6487) Remove JobManager local mode

2017-05-08 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000909#comment-16000909
 ] 

Patrick Lucas commented on FLINK-6487:
--

One nice reason to have local mode is for docker, so users can, for example, 
run {{docker run flink local}} to get a local single-taskmanager cluster. (In 
[{{docker-entrypoint.sh}}|https://github.com/docker-flink/docker-flink/blob/e92920ff33107e6f01abaddf7b0195570f4d06e1/docker-entrypoint.sh#L56]
 this is turned into a call to {{jobmanager.sh start-foreground local}}.

Right now, {{start-cluster.sh}} has the flaws that it doesn't have an option to 
run in the foreground and there is no real process management for the two 
processes it spawns.

I think to properly replicate the behavior of {{jobmanager.sh start-foreground 
local}} we either need to inline something like 
[supervisord|http://supervisord.org/], making sure to multiplex [the processes' 
stdouts|https://github.com/Supervisor/supervisor/issues/511] or have the 
jobmanager fork (to keep the two-process paradigm).

> Remove JobManager local mode
> 
>
> Key: FLINK-6487
> URL: https://issues.apache.org/jira/browse/FLINK-6487
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Stephan Ewen
>
> We should remove the "local" mode from the JobManager.
> Currently, the JobManager has the strange "local" mode where it also starts 
> an embedded Task Manager.
> I think that mode has caused confusion / problems:
>   - No TaskManagers can join the cluster
>   - TaskManagers do not support querable state
>   - It is redundant code to maintain
> At the same time, the mode does not help at all:
>   - The MiniCluster does not use that mode
>   - Starting from scripts, the {{start-cluster.sh}} works out of the box, 
> creating a proper local cluster, but in two processes, rather than one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-5460) Add documentation how to use Flink with Docker

2017-05-04 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas closed FLINK-5460.

   Resolution: Duplicate
Fix Version/s: (was: 1.3.0)
   (was: 1.2.0)

> Add documentation how to use Flink with Docker
> --
>
> Key: FLINK-5460
> URL: https://issues.apache.org/jira/browse/FLINK-5460
> Project: Flink
>  Issue Type: Sub-task
>  Components: Docker, Documentation
>Reporter: Stephan Ewen
>
> {{docs/setup/docker.md}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6399) Update default Hadoop download version

2017-05-04 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996641#comment-15996641
 ] 

Patrick Lucas commented on FLINK-6399:
--

This duplicates FLINK-6021. (feel free to close either in favor of the other)

> Update default Hadoop download version
> --
>
> Key: FLINK-6399
> URL: https://issues.apache.org/jira/browse/FLINK-6399
> Project: Flink
>  Issue Type: Bug
>  Components: Project Website
>Reporter: Greg Hogan
>Assignee: mingleizhang
>
> [Update|http://flink.apache.org/downloads.html] "If you don’t want to do 
> this, pick the Hadoop 1 version." since Hadoop 1 versions are no longer 
> provided.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (FLINK-5460) Add documentation how to use Flink with Docker

2017-05-04 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas updated FLINK-5460:
-
Component/s: Docker

> Add documentation how to use Flink with Docker
> --
>
> Key: FLINK-5460
> URL: https://issues.apache.org/jira/browse/FLINK-5460
> Project: Flink
>  Issue Type: Sub-task
>  Components: Docker, Documentation
>Reporter: Stephan Ewen
> Fix For: 1.2.0, 1.3.0
>
>
> {{docs/setup/docker.md}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6369) Better support for overlay networks

2017-04-24 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981409#comment-15981409
 ] 

Patrick Lucas commented on FLINK-6369:
--

I think there are a number of solutions that fix these problems without 
compromising any preexisting SSL support within Flink. For instance, having 
Taskmanagers register their IP with the Jobmanager by default (afaik like it 
was in a pretty recent version of Flink) but allowing that to be overridden.

Also, relying on SSL hostname verification in an environment like Kubernetes 
seems pretty dubious. Even if you have a wildcard cert to use, you'd have to 
mount or ship the cert and key to _every Flink container_. At that point, why 
not just also include a trust store that contained only a single cert the whole 
cluster uses, and ignore hostname verification?

For external communication with the cluster: absolutely. It totally makes sense 
to have a "real" cert in front of the web UI, for instance, but then it's up to 
you to have the CN/alt names set up appropriately. But for intra-cluster 
communication I think that it should be possible and supported to use 
hostname-valid certs, but the default behavior should keep things simple for 
the vastly more common case of SSL-lessness.

> Better support for overlay networks
> ---
>
> Key: FLINK-6369
> URL: https://issues.apache.org/jira/browse/FLINK-6369
> Project: Flink
>  Issue Type: Improvement
>  Components: Docker, Network
>Affects Versions: 1.2.0
>Reporter: Patrick Lucas
> Fix For: 1.3.1
>
>
> Running Flink in an environment that utilizes an overlay network 
> (containerized environments like Kubernetes or Docker Compose, or cloud 
> platforms like AWS VPC) poses various challenges related to networking.
> The core problem is that in these environments, applications are frequently 
> addressed by a name different from that with which the application sees 
> itself.
> For instance, it is plausible that the Flink UI (served by the Jobmanager) is 
> accessed via an ELB, which poses a problem in HA mode when the non-leader UI 
> returns an HTTP redirect to the leader—but the user may not be able to 
> connect directly to the leader.
> Or, if a user is using [Docker 
> Compose|https://github.com/apache/flink/blob/aa21f853ab0380ec1f68ae1d0b7c8d9268da4533/flink-contrib/docker-flink/docker-compose.yml],
>  they cannot submit a job via the CLI since there is a mismatch between the 
> name used to address the Jobmanager and what the Jobmanager perceives as its 
> hostname. (see \[1] below for more detail)
> 
> h3. Problems and proposed solutions
> There are four instances of this issue that I've run into so far:
> h4. Jobmanagers must be addressed by the same name they are configured with 
> due to limitations of Akka
> Akka enforces that messages it receives are addressed with the hostname it is 
> configured with. Newer versions of Akka (>= 2.4) than what Flink uses 
> (2.3-custom) have support for accepting messages with the "wrong" hostname, 
> but it limited to a single "external" hostname.
> In many environments, it is likely that not all parties that want to connect 
> to the Jobmanager have the same way of addressing it (e.g. the ELB example 
> above). Other similarly-used protocols like HTTP generally don't have this 
> restriction: if you connect on a socket and send a well-formed message, the 
> system assumes that it is the desired recipient.
> One solution is to not use Akka at all when communicating with the cluster 
> from the outside, perhaps using an HTTP API instead. This would be somewhat 
> involved, and probabyl best left as a longer-term goal.
> A more immediate solution would be to override this behavior within Flakka, 
> the custom fork of Akka currently in use by Flink. I'm not sure how much 
> effort this would take.
> h4. The Jobmanager needs to be able to address the Taskmanagers for e.g. 
> metrics collection
> Having the Taskmanagers register themselves by IP is probably the best 
> solution here. It's a reasonable assumption that IPs can always be used for 
> communication between the nodes of a single cluster. Asking that each 
> Taskmanager container have a resolvable hostname is unreasonable.
> h4. Jobmanagers in HA mode send HTTP redirects to URLs that aren't externally 
> resolvable/routable
> If multiple Jobmanagers are used in HA mode, HTTP requests to non-leaders 
> (such as if you put a Kubernetes Service in front of all Jobmanagers in a 
> cluster) get redirected to the (supposed) hostname of the leader, but this is 
> potentially unresolvable/unroutable externally.
> Enabling non-leader Jobmanagers to proxy API calls to the leader would solve 
> this. The non-leaders could even serve static asset requests (e.g. for css or 
> js files) directly.
> h4. Queryable state 

[jira] [Created] (FLINK-6369) Better support for overlay networks

2017-04-24 Thread Patrick Lucas (JIRA)
Patrick Lucas created FLINK-6369:


 Summary: Better support for overlay networks
 Key: FLINK-6369
 URL: https://issues.apache.org/jira/browse/FLINK-6369
 Project: Flink
  Issue Type: Improvement
  Components: Docker, Network
Affects Versions: 1.2.0
Reporter: Patrick Lucas


Running Flink in an environment that utilizes an overlay network (containerized 
environments like Kubernetes or Docker Compose, or cloud platforms like AWS 
VPC) poses various challenges related to networking.

The core problem is that in these environments, applications are frequently 
addressed by a name different from that with which the application sees itself.

For instance, it is plausible that the Flink UI (served by the Jobmanager) is 
accessed via an ELB, which poses a problem in HA mode when the non-leader UI 
returns an HTTP redirect to the leader—but the user may not be able to connect 
directly to the leader.

Or, if a user is using [Docker 
Compose|https://github.com/apache/flink/blob/aa21f853ab0380ec1f68ae1d0b7c8d9268da4533/flink-contrib/docker-flink/docker-compose.yml],
 they cannot submit a job via the CLI since there is a mismatch between the 
name used to address the Jobmanager and what the Jobmanager perceives as its 
hostname. (see \[1] below for more detail)



h3. Problems and proposed solutions

There are four instances of this issue that I've run into so far:

h4. Jobmanagers must be addressed by the same name they are configured with due 
to limitations of Akka

Akka enforces that messages it receives are addressed with the hostname it is 
configured with. Newer versions of Akka (>= 2.4) than what Flink uses 
(2.3-custom) have support for accepting messages with the "wrong" hostname, but 
it limited to a single "external" hostname.

In many environments, it is likely that not all parties that want to connect to 
the Jobmanager have the same way of addressing it (e.g. the ELB example above). 
Other similarly-used protocols like HTTP generally don't have this restriction: 
if you connect on a socket and send a well-formed message, the system assumes 
that it is the desired recipient.

One solution is to not use Akka at all when communicating with the cluster from 
the outside, perhaps using an HTTP API instead. This would be somewhat 
involved, and probabyl best left as a longer-term goal.

A more immediate solution would be to override this behavior within Flakka, the 
custom fork of Akka currently in use by Flink. I'm not sure how much effort 
this would take.

h4. The Jobmanager needs to be able to address the Taskmanagers for e.g. 
metrics collection

Having the Taskmanagers register themselves by IP is probably the best solution 
here. It's a reasonable assumption that IPs can always be used for 
communication between the nodes of a single cluster. Asking that each 
Taskmanager container have a resolvable hostname is unreasonable.

h4. Jobmanagers in HA mode send HTTP redirects to URLs that aren't externally 
resolvable/routable

If multiple Jobmanagers are used in HA mode, HTTP requests to non-leaders (such 
as if you put a Kubernetes Service in front of all Jobmanagers in a cluster) 
get redirected to the (supposed) hostname of the leader, but this is 
potentially unresolvable/unroutable externally.

Enabling non-leader Jobmanagers to proxy API calls to the leader would solve 
this. The non-leaders could even serve static asset requests (e.g. for css or 
js files) directly.

h4. Queryable state requests involve direct communication with Taskmanagers

Currently, queryable state requests involve communication between the client 
and the Jobmanager (for key partitioning lookups) and between the client and 
all Taskmanagers.

If the client is inside the network (as would be common in production use-cases 
where high-volume lookups are required) this is a non-issue, but problems crop 
up if the client is outside the network.

For the communication with the Jobmanager, a similar solution as above can be 
used: if all Jobmanagers can service all key partitioning lookup requests (e.g. 
by proxying) then a simple Service can be used.

The story is a bit different for the Taskmanagers. The partitioning lookup to 
the Jobmanager would return the name of the particular Taskmanager that owned 
the desired data, but that name (likely an IP, as proposed in the second 
section above) is not necessarily resolvable/routable from the client.

In the context of Kubernetes, where individual containers are generally not 
addressible, a very ugly solution would involve creating a Service for each 
Taskmanager, then cleverly configuring things such that the same name could be 
used to address a specific Taskmanager both inside and outside the network. \[2]

A much nicer solution would be, like in the previous section, to enable 
Taskmanagers to proxy any queryable state lookup to the appropriate 

[jira] [Updated] (FLINK-6369) Better support for overlay networks

2017-04-24 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas updated FLINK-6369:
-
Fix Version/s: 1.3.1

> Better support for overlay networks
> ---
>
> Key: FLINK-6369
> URL: https://issues.apache.org/jira/browse/FLINK-6369
> Project: Flink
>  Issue Type: Improvement
>  Components: Docker, Network
>Affects Versions: 1.2.0
>Reporter: Patrick Lucas
> Fix For: 1.3.1
>
>
> Running Flink in an environment that utilizes an overlay network 
> (containerized environments like Kubernetes or Docker Compose, or cloud 
> platforms like AWS VPC) poses various challenges related to networking.
> The core problem is that in these environments, applications are frequently 
> addressed by a name different from that with which the application sees 
> itself.
> For instance, it is plausible that the Flink UI (served by the Jobmanager) is 
> accessed via an ELB, which poses a problem in HA mode when the non-leader UI 
> returns an HTTP redirect to the leader—but the user may not be able to 
> connect directly to the leader.
> Or, if a user is using [Docker 
> Compose|https://github.com/apache/flink/blob/aa21f853ab0380ec1f68ae1d0b7c8d9268da4533/flink-contrib/docker-flink/docker-compose.yml],
>  they cannot submit a job via the CLI since there is a mismatch between the 
> name used to address the Jobmanager and what the Jobmanager perceives as its 
> hostname. (see \[1] below for more detail)
> 
> h3. Problems and proposed solutions
> There are four instances of this issue that I've run into so far:
> h4. Jobmanagers must be addressed by the same name they are configured with 
> due to limitations of Akka
> Akka enforces that messages it receives are addressed with the hostname it is 
> configured with. Newer versions of Akka (>= 2.4) than what Flink uses 
> (2.3-custom) have support for accepting messages with the "wrong" hostname, 
> but it limited to a single "external" hostname.
> In many environments, it is likely that not all parties that want to connect 
> to the Jobmanager have the same way of addressing it (e.g. the ELB example 
> above). Other similarly-used protocols like HTTP generally don't have this 
> restriction: if you connect on a socket and send a well-formed message, the 
> system assumes that it is the desired recipient.
> One solution is to not use Akka at all when communicating with the cluster 
> from the outside, perhaps using an HTTP API instead. This would be somewhat 
> involved, and probabyl best left as a longer-term goal.
> A more immediate solution would be to override this behavior within Flakka, 
> the custom fork of Akka currently in use by Flink. I'm not sure how much 
> effort this would take.
> h4. The Jobmanager needs to be able to address the Taskmanagers for e.g. 
> metrics collection
> Having the Taskmanagers register themselves by IP is probably the best 
> solution here. It's a reasonable assumption that IPs can always be used for 
> communication between the nodes of a single cluster. Asking that each 
> Taskmanager container have a resolvable hostname is unreasonable.
> h4. Jobmanagers in HA mode send HTTP redirects to URLs that aren't externally 
> resolvable/routable
> If multiple Jobmanagers are used in HA mode, HTTP requests to non-leaders 
> (such as if you put a Kubernetes Service in front of all Jobmanagers in a 
> cluster) get redirected to the (supposed) hostname of the leader, but this is 
> potentially unresolvable/unroutable externally.
> Enabling non-leader Jobmanagers to proxy API calls to the leader would solve 
> this. The non-leaders could even serve static asset requests (e.g. for css or 
> js files) directly.
> h4. Queryable state requests involve direct communication with Taskmanagers
> Currently, queryable state requests involve communication between the client 
> and the Jobmanager (for key partitioning lookups) and between the client and 
> all Taskmanagers.
> If the client is inside the network (as would be common in production 
> use-cases where high-volume lookups are required) this is a non-issue, but 
> problems crop up if the client is outside the network.
> For the communication with the Jobmanager, a similar solution as above can be 
> used: if all Jobmanagers can service all key partitioning lookup requests 
> (e.g. by proxying) then a simple Service can be used.
> The story is a bit different for the Taskmanagers. The partitioning lookup to 
> the Jobmanager would return the name of the particular Taskmanager that owned 
> the desired data, but that name (likely an IP, as proposed in the second 
> section above) is not necessarily resolvable/routable from the client.
> In the context of Kubernetes, where individual containers are generally not 
> addressible, a very ugly solution would involve creating a Service 

[jira] [Created] (FLINK-6330) Improve Docker documentation

2017-04-19 Thread Patrick Lucas (JIRA)
Patrick Lucas created FLINK-6330:


 Summary: Improve Docker documentation
 Key: FLINK-6330
 URL: https://issues.apache.org/jira/browse/FLINK-6330
 Project: Flink
  Issue Type: Bug
  Components: Docker
Affects Versions: 1.2.0
Reporter: Patrick Lucas
Assignee: Patrick Lucas
 Fix For: 1.2.2


The "Docker" page in the docs exists but is blank.

Add something useful here, including references to the official images that 
should exist once 1.2.1 is released, and add a brief "Kubernetes" page as well, 
referencing the [helm chart|https://github.com/docker-flink/examples]. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6322) Mesos task labels

2017-04-18 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973018#comment-15973018
 ] 

Patrick Lucas commented on FLINK-6322:
--

This could likely be modeled similarly to `yarn.tags`.

> Mesos task labels
> -
>
> Key: FLINK-6322
> URL: https://issues.apache.org/jira/browse/FLINK-6322
> Project: Flink
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Eron Wright 
>Priority: Minor
>
> Task labels serve many purposes in Mesos, such a tagging tasks for 
> log-aggregation purposes.   
> I propose a new configuration setting for a list of 'key=value' labels to be 
> applied to TM instances.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (FLINK-6300) PID1 of docker images does not behave correctly

2017-04-18 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas updated FLINK-6300:
-
Affects Version/s: (was: 2.0.0)
   1.2.0

> PID1 of docker images does not behave correctly
> ---
>
> Key: FLINK-6300
> URL: https://issues.apache.org/jira/browse/FLINK-6300
> Project: Flink
>  Issue Type: Bug
>  Components: Docker
>Affects Versions: 1.2.0, 1.1.4
> Environment: all
>Reporter: kathleen sharp
>Assignee: Patrick Lucas
>Priority: Minor
>
> When running the task manager and job manager docker images the process with 
> PID1 is a bash script.
> There is a problem in using bash as the PID1 process in a docker
> container as docker sends SIGTERM, but bash doesn't send this to its
> child processes.
> This means for example that if a container was ever killed and a child
> process had a file open then the file may get corrupted.
> It's covered in more detail in a blog post here:
> https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
> From the mailing list (Nico):
> "Some background:
> Although docker-entrypoint.sh uses "exec" to run succeeding bash scripts for
> jobmanager.sh and taskmanager.sh, respectively, and thus replaces itself with
> these scripts, they do not seem to use exec themselves for foreground
> processes and thus may run into the problem you described.
> I may be wrong, but I did not find any other fallback to handle this in the
> current code base."
> Potentially useful information:
> dockerd version 1.1.3 added an init flag:
> "You can use the --init flag to indicate that an init process should be used 
> as the PID 1 in the container. Specifying an init process ensures the usual 
> responsibilities of an init system, such as reaping zombie processes, are 
> performed inside the created container."
> from:
> https://docs.docker.com/engine/reference/run/#restart-policies---restart
> perhaps the fix could be just to update readme for these images to specify to 
> use this flag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (FLINK-6300) PID1 of docker images does not behave correctly

2017-04-18 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas reassigned FLINK-6300:


Assignee: Patrick Lucas

> PID1 of docker images does not behave correctly
> ---
>
> Key: FLINK-6300
> URL: https://issues.apache.org/jira/browse/FLINK-6300
> Project: Flink
>  Issue Type: Bug
>  Components: Docker
>Affects Versions: 2.0.0, 1.1.4
> Environment: all
>Reporter: kathleen sharp
>Assignee: Patrick Lucas
>Priority: Minor
>
> When running the task manager and job manager docker images the process with 
> PID1 is a bash script.
> There is a problem in using bash as the PID1 process in a docker
> container as docker sends SIGTERM, but bash doesn't send this to its
> child processes.
> This means for example that if a container was ever killed and a child
> process had a file open then the file may get corrupted.
> It's covered in more detail in a blog post here:
> https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
> From the mailing list (Nico):
> "Some background:
> Although docker-entrypoint.sh uses "exec" to run succeeding bash scripts for
> jobmanager.sh and taskmanager.sh, respectively, and thus replaces itself with
> these scripts, they do not seem to use exec themselves for foreground
> processes and thus may run into the problem you described.
> I may be wrong, but I did not find any other fallback to handle this in the
> current code base."
> Potentially useful information:
> dockerd version 1.1.3 added an init flag:
> "You can use the --init flag to indicate that an init process should be used 
> as the PID 1 in the container. Specifying an init process ensures the usual 
> responsibilities of an init system, such as reaping zombie processes, are 
> performed inside the created container."
> from:
> https://docs.docker.com/engine/reference/run/#restart-policies---restart
> perhaps the fix could be just to update readme for these images to specify to 
> use this flag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-6308) Task managers are not attaching to job manager on macos

2017-04-18 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas closed FLINK-6308.

   Resolution: Fixed
Fix Version/s: 1.2.1

`docker-compose.yml` is already fixed in master, and [~iemejia] has a branch to 
remove bluemix docker compose file and related references.

> Task managers are not attaching to job manager on macos
> ---
>
> Key: FLINK-6308
> URL: https://issues.apache.org/jira/browse/FLINK-6308
> Project: Flink
>  Issue Type: Bug
>  Components: Docker
>Reporter: Mateusz Zakarczemny
> Fix For: 1.2.1
>
>
> I'm using flink 1.2.0. On macOS task managers were not joining cluster. I was 
> able to fix that by removing quotes from docker-compose:
> {code}
> diff --git a/flink-contrib/docker-flink/docker-compose-bluemix.yml 
> b/flink-contrib/docker-flink/docker-compose-bluemix.yml
> index b667a0d89a..55c161766a 100644
> --- a/flink-contrib/docker-flink/docker-compose-bluemix.yml
> +++ b/flink-contrib/docker-flink/docker-compose-bluemix.yml
> @@ -23,11 +23,11 @@ services:
>jobmanager:
>  #image example:  registry.eu-gb.bluemix.net/markussaidhiho/flink
>  image: ${IMAGENAME}
> -container_name: "jobmanager"
> +container_name: jobmanager
>  expose:
> -  - "6123"
> +  - 6123"
>  ports:
> -  - "48081:8081"
> +  - 48081:8081
>  command: jobmanager
>taskmanager:
> @@ -41,4 +41,4 @@ services:
>  links:
>- "jobmanager:jobmanager"
>  environment:
> -  - JOB_MANAGER_RPC_ADDRESS="jobmanager"
> +  - JOB_MANAGER_RPC_ADDRESS=jobmanager
> diff --git a/flink-contrib/docker-flink/docker-compose.yml 
> b/flink-contrib/docker-flink/docker-compose.yml
> index 6a1335360d..0a644681f3 100644
> --- a/flink-contrib/docker-flink/docker-compose.yml
> +++ b/flink-contrib/docker-flink/docker-compose.yml
> @@ -27,7 +27,7 @@ services:
>- "48081:8081"
>  command: jobmanager
>  environment:
> -  - JOB_MANAGER_RPC_ADDRESS="jobmanager"
> +  - JOB_MANAGER_RPC_ADDRESS=jobmanager
>taskmanager:
>  image: flink
> @@ -40,4 +40,4 @@ services:
>  links:
>- "jobmanager:jobmanager"
>  environment:
> -  - JOB_MANAGER_RPC_ADDRESS="jobmanager"
> +  - JOB_MANAGER_RPC_ADDRESS=jobmanager
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6194) More broken links in docs

2017-03-27 Thread Patrick Lucas (JIRA)
Patrick Lucas created FLINK-6194:


 Summary: More broken links in docs
 Key: FLINK-6194
 URL: https://issues.apache.org/jira/browse/FLINK-6194
 Project: Flink
  Issue Type: Bug
Reporter: Patrick Lucas
Assignee: Patrick Lucas


My script noticed a few broken links that made it into the docs. I'll fix them 
up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-5801) Queryable State from Scala job/client fails with key of type Long

2017-03-15 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas closed FLINK-5801.

Resolution: Not A Bug

> Queryable State from Scala job/client fails with key of type Long
> -
>
> Key: FLINK-5801
> URL: https://issues.apache.org/jira/browse/FLINK-5801
> Project: Flink
>  Issue Type: Bug
>  Components: Queryable State
>Affects Versions: 1.2.0
> Environment: Flink 1.2.0
> Scala 2.10
>Reporter: Patrick Lucas
>Assignee: Ufuk Celebi
> Attachments: OrderFulfillment.scala, OrderFulfillmentStateQuery.scala
>
>
> While working on a demo Flink job, to try out Queryable State, I exposed some 
> state of type Long -> custom class via the Query server. However, the query 
> server returned an exception when I tried to send a query:
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Failed to query state 
> backend for query 0. Caused by: java.io.IOException: Unable to deserialize 
> key and namespace. This indicates a mismatch in the key/namespace serializers 
> used by the KvState instance and this access.
>   at 
> org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer.deserializeKeyAndNamespace(KvStateRequestSerializer.java:392)
>   at 
> org.apache.flink.runtime.state.heap.AbstractHeapState.getSerializedValue(AbstractHeapState.java:130)
>   at 
> org.apache.flink.runtime.query.netty.KvStateServerHandler$AsyncKvStateQueryTask.run(KvStateServerHandler.java:220)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.EOFException
>   at 
> org.apache.flink.runtime.util.DataInputDeserializer.readLong(DataInputDeserializer.java:217)
>   at 
> org.apache.flink.api.common.typeutils.base.LongSerializer.deserialize(LongSerializer.java:69)
>   at 
> org.apache.flink.api.common.typeutils.base.LongSerializer.deserialize(LongSerializer.java:27)
>   at 
> org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer.deserializeKeyAndNamespace(KvStateRequestSerializer.java:379)
>   ... 7 more
>   at 
> org.apache.flink.runtime.query.netty.KvStateServerHandler$AsyncKvStateQueryTask.run(KvStateServerHandler.java:257)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I banged my head against this for a while, then per [~jgrier]'s suggestion I 
> tried simply changing the key from Long to String (modifying the two 
> {{keyBy}} calls and the {{keySerializer}} {{TypeHint}} in the attached code) 
> and it started working perfectly.
> cc [~uce]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (FLINK-5634) Flink should not always redirect stdout to a file.

2017-03-15 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas resolved FLINK-5634.
--
Resolution: Duplicate

Closing as essentially a dupe of FLINK-4326.

> Flink should not always redirect stdout to a file.
> --
>
> Key: FLINK-5634
> URL: https://issues.apache.org/jira/browse/FLINK-5634
> Project: Flink
>  Issue Type: Bug
>  Components: Docker
>Affects Versions: 1.2.0
>Reporter: Jamie Grier
>Assignee: Jamie Grier
>
> Flink always redirects stdout to a file.  While often convenient this isn't 
> always what people want.  The most obvious case of this is a Docker 
> deployment.
> It should be possible to have Flink log to stdout.
> Here is a PR for this:  https://github.com/apache/flink/pull/3204



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-3026) Publish the flink docker container to the docker registry

2017-03-14 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923956#comment-15923956
 ] 

Patrick Lucas commented on FLINK-3026:
--

Personally I like the idea of having as little code as possible in the 
docker-entrypoint.sh script. I think it should be designed to basically work in 
all Flink versions going forward. I suppose that would mean that we should have 
some scripts on the PATH named 'jobmanager' and 'taskmanager' so that 
docker-entrypoint.sh can effectively just be:

{code}
#!/bin/bash -e

# Drop root privs code here

exec "$@"
{code}

And absolutely these builds will be automated—see my note in README.md about 
generating the .travis.yml config. I just built some images as a proof of 
concept and to think about how tagging will work.

We should include a script like the generate-stackbrew-library.sh one that many 
official Dockerfile repos include to generate the file that is referenced by 
the 
[docker-library/official-images|https://github.com/docker-library/official-images/]
 repo to actually build and tag all the right permutations.

And regarding this repo living my personal account, I consider that just a 
temporary arrangement until we can get these included in docker-library. You're 
welcome to clone the repo if you want to experiment with kicking off builds.

Finally, could you expound a little more about your opinion about the 
relationship between a Dockerfile included in apache/flink and these official 
images? I'm not sure if I see the benefit of trying to keep it around if we 
have official, publicly available, well-documented reference Dockerfiles and 
images available. The one argument I see is building a Docker image from your 
work tree if you're a developer, but in that case I think it's a lot less 
important to maintain it very actively. It could even just be a Dockerfile that 
bases itself on the official image and then replaces the Flink release within 
with your work tree!

> Publish the flink docker container to the docker registry
> -
>
> Key: FLINK-3026
> URL: https://issues.apache.org/jira/browse/FLINK-3026
> Project: Flink
>  Issue Type: Task
>  Components: Build System, Docker
>Reporter: Omer Katz
>Assignee: Patrick Lucas
>  Labels: Deployment, Docker
>
> There's a dockerfile that can be used to build a docker container already in 
> the repository. It'd be awesome to just be able to pull it instead of 
> building it ourselves.
> The dockerfile can be found at 
> https://github.com/apache/flink/tree/master/flink-contrib/docker-flink
> It also doesn't point to the latest version of Flink which I fixed in 
> https://github.com/apache/flink/pull/1366



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (FLINK-3026) Publish the flink docker container to the docker registry

2017-03-12 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906494#comment-15906494
 ] 

Patrick Lucas edited comment on FLINK-3026 at 3/12/17 11:31 AM:


I've pushed some of the images built from the Dockerfiles in that repo 
@[aafb1e4b|https://github.com/patricklucas/docker-flink/tree/aafb1e4bc4a1d6d1fe706cbe457376385e3f8d02]
 (after which were some changes by [~iemejia] that I want to discuss) to the 
Docker Hub repo [plucas/flink|https://hub.docker.com/r/plucas/flink/tags/], 
along with the relevant tags.

Specifically, I pushed:

* 1.1.4-hadoop27-scala_2.11-alpine
* 1.2.0-hadoop26-scala_2.10-alpine
* 1.2.0-hadoop26-scala_2.11-alpine
* 1.2.0-hadoop27-scala_2.10-alpine
* 1.2.0-hadoop27-scala_2.11-alpine

to demonstrate various tag permutations, e.g.:

* latest-alpine -> 1.2.0-hadoop27-scala_2.11-alpine
* 1.1-alpine -> 1.1.4-hadoop27-scala_2.11-alpine
* 1.2-hadoop26-alpine -> 1.2.0-hadoop26-scala_2.11-alpine
* 1.2-scala_2.10-alpine -> 1.2.0-hadoop27-scala_2.10-alpine


was (Author: plucas):
I've pushed some of the images built from the Dockerfiles in that repo 
@[aafb1e4b|https://github.com/patricklucas/docker-flink/commit/aafb1e4bc4a1d6d1fe706cbe457376385e3f8d02]
 (after which were some changes by [~iemejia] that I want to discuss) to the 
Docker Hub repo [plucas/flink|https://hub.docker.com/r/plucas/flink/tags/], 
along with the relevant tags.

Specifically, I pushed:

* 1.1.4-hadoop27-scala_2.11-alpine
* 1.2.0-hadoop26-scala_2.10-alpine
* 1.2.0-hadoop26-scala_2.11-alpine
* 1.2.0-hadoop27-scala_2.10-alpine
* 1.2.0-hadoop27-scala_2.11-alpine

to demonstrate various tag permutations, e.g.:

* latest-alpine -> 1.2.0-hadoop27-scala_2.11-alpine
* 1.1-alpine -> 1.1.4-hadoop27-scala_2.11-alpine
* 1.2-hadoop26-alpine -> 1.2.0-hadoop26-scala_2.11-alpine
* 1.2-scala_2.10-alpine -> 1.2.0-hadoop27-scala_2.10-alpine

> Publish the flink docker container to the docker registry
> -
>
> Key: FLINK-3026
> URL: https://issues.apache.org/jira/browse/FLINK-3026
> Project: Flink
>  Issue Type: Task
>  Components: Build System, Docker
>Reporter: Omer Katz
>Assignee: Patrick Lucas
>  Labels: Deployment, Docker
>
> There's a dockerfile that can be used to build a docker container already in 
> the repository. It'd be awesome to just be able to pull it instead of 
> building it ourselves.
> The dockerfile can be found at 
> https://github.com/apache/flink/tree/master/flink-contrib/docker-flink
> It also doesn't point to the latest version of Flink which I fixed in 
> https://github.com/apache/flink/pull/1366



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-3026) Publish the flink docker container to the docker registry

2017-03-12 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906494#comment-15906494
 ] 

Patrick Lucas commented on FLINK-3026:
--

I've pushed some of the images built from the Dockerfiles in that repo 
@[aafb1e4b|https://github.com/patricklucas/docker-flink/commit/aafb1e4bc4a1d6d1fe706cbe457376385e3f8d02]
 (after which were some changes by [~iemejia] that I want to discuss) to the 
Docker Hub repo [plucas/flink|https://hub.docker.com/r/plucas/flink/tags/], 
along with the relevant tags.

Specifically, I pushed:

* 1.1.4-hadoop27-scala_2.11-alpine
* 1.2.0-hadoop26-scala_2.10-alpine
* 1.2.0-hadoop26-scala_2.11-alpine
* 1.2.0-hadoop27-scala_2.10-alpine
* 1.2.0-hadoop27-scala_2.11-alpine

to demonstrate various tag permutations, e.g.:

* latest-alpine -> 1.2.0-hadoop27-scala_2.11-alpine
* 1.1-alpine -> 1.1.4-hadoop27-scala_2.11-alpine
* 1.2-hadoop26-alpine -> 1.2.0-hadoop26-scala_2.11-alpine
* 1.2-scala_2.10-alpine -> 1.2.0-hadoop27-scala_2.10-alpine

> Publish the flink docker container to the docker registry
> -
>
> Key: FLINK-3026
> URL: https://issues.apache.org/jira/browse/FLINK-3026
> Project: Flink
>  Issue Type: Task
>  Components: Build System, Docker
>Reporter: Omer Katz
>Assignee: Patrick Lucas
>  Labels: Deployment, Docker
>
> There's a dockerfile that can be used to build a docker container already in 
> the repository. It'd be awesome to just be able to pull it instead of 
> building it ourselves.
> The dockerfile can be found at 
> https://github.com/apache/flink/tree/master/flink-contrib/docker-flink
> It also doesn't point to the latest version of Flink which I fixed in 
> https://github.com/apache/flink/pull/1366



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6021) Downloads page references "Hadoop 1 version" which isn't an option

2017-03-10 Thread Patrick Lucas (JIRA)
Patrick Lucas created FLINK-6021:


 Summary: Downloads page references "Hadoop 1 version" which isn't 
an option
 Key: FLINK-6021
 URL: https://issues.apache.org/jira/browse/FLINK-6021
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Patrick Lucas


The downloads pages says

{quote}
Apache Flink® 1.2.0 is our latest stable release.

You don’t have to install Hadoop to use Flink, but if you plan to use Flink 
with data stored in Hadoop, pick the version matching your installed Hadoop 
version. If you don’t want to do this, pick the Hadoop 1 version.
{quote}

But Hadoop 1 appears to no longer be an available alternative.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-3026) Publish the flink docker container to the docker registry

2017-03-09 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903317#comment-15903317
 ] 

Patrick Lucas commented on FLINK-3026:
--

Yes, I gave you contributor permissions. This repo is really just a stop-gap 
until we find a permanent home for these scripts, such as in docker-library. 
I'm fairly convinced that these should be versioned independently of Flink (in 
a separate repo), as they are updated as a consequence of Flink releases and 
are not themselves part of a given release. Moreover, the scripts herein may 
relate to many different "stable" branches of Flink.

And I don't really see how it would be feasible to make the contents of this 
repo somehow "downstream" from the Docker support included in apache/flink as I 
think they serve different purposes. In apache/flink, having a Dockerfile to 
package up the current tree would be useful for development, and as a baseline 
if a user wanted to create their own images from scratch. The docker-flink repo 
meanwhile encodes the generation of all Dockerfile variants for all "stable" 
versions of Flink.

Finally, I think a tenet of this work (Docker support) going forward should be 
that the Dockerfile templates really should not change much over time. Instead, 
we should try to make any changes in functionality in Flink itself, and keep 
the Dockerfiles (and docker-entrypoint.sh scripts) as simple as possible. If 
we're doing things right, this code should very rarely change, except when we 
need to generate Dockerfiles for a new Flink release.

> Publish the flink docker container to the docker registry
> -
>
> Key: FLINK-3026
> URL: https://issues.apache.org/jira/browse/FLINK-3026
> Project: Flink
>  Issue Type: Task
>  Components: Build System, Docker
>Reporter: Omer Katz
>Assignee: Patrick Lucas
>  Labels: Deployment, Docker
>
> There's a dockerfile that can be used to build a docker container already in 
> the repository. It'd be awesome to just be able to pull it instead of 
> building it ourselves.
> The dockerfile can be found at 
> https://github.com/apache/flink/tree/master/flink-contrib/docker-flink
> It also doesn't point to the latest version of Flink which I fixed in 
> https://github.com/apache/flink/pull/1366



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-3026) Publish the flink docker container to the docker registry

2017-03-09 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903162#comment-15903162
 ] 

Patrick Lucas commented on FLINK-3026:
--

The new [docker-flink|https://github.com/patricklucas/docker-flink/] repo is up.

> Publish the flink docker container to the docker registry
> -
>
> Key: FLINK-3026
> URL: https://issues.apache.org/jira/browse/FLINK-3026
> Project: Flink
>  Issue Type: Task
>  Components: Build System, Docker
>Reporter: Omer Katz
>Assignee: Patrick Lucas
>  Labels: Deployment, Docker
>
> There's a dockerfile that can be used to build a docker container already in 
> the repository. It'd be awesome to just be able to pull it instead of 
> building it ourselves.
> The dockerfile can be found at 
> https://github.com/apache/flink/tree/master/flink-contrib/docker-flink
> It also doesn't point to the latest version of Flink which I fixed in 
> https://github.com/apache/flink/pull/1366



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-3026) Publish the flink docker container to the docker registry

2017-03-09 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902792#comment-15902792
 ] 

Patrick Lucas commented on FLINK-3026:
--

Following from discussion on [#3494|https://github.com/apache/flink/pull/3494], 
it's apparent that more work is required to provide official Flink Docker 
images than simply improving the Dockerfile in the Flink repo.

I explored the official Docker Hub repos and looked at how the first five 
Apache projects I saw, [httpd|https://hub.docker.com/_/httpd/], 
[tomcat|https://hub.docker.com/_/tomcat/], 
[cassandra|https://hub.docker.com/_/cassandra/], 
[maven|https://hub.docker.com/_/maven/], and 
[solr|https://hub.docker.com/_/solr/], have their Dockerfiles hosted.

Common between all of them is that they all have a dedicated git repo to host 
the Dockerfiles, and none are hosted by the Apache project. Three 
([httpd|https://github.com/docker-library/httpd], 
[tomcat|https://github.com/docker-library/tomcat], and 
[cassandra|https://github.com/docker-library/cassandra]) are hosted within the 
[docker-library|https://github.com/docker-library] GitHub org, one 
([solr|https://github.com/docker-solr/docker-solr]) has a dedicated GitHub org 
([docker-solr|https://github.com/docker-solr]), and one 
([maven|https://github.com/carlossg/docker-maven]) is hosted within a Maven 
contributor and ASF member's GitHub account 
([carlossg|https://github.com/carlossg]).

Adding Flink to the docker-library repo is thus the route most inline with 
these projects, though there may be some implications I'm not aware of.

In the mean time, I will create a repo under [my own GitHub 
account|https://github.com/patricklucas] using the Apache license called 
docker-flink and grant [~iemejia] commit access so we can work on the 
Dockerfiles and supporting scripts.

Does this sound like a good starting point [~iemejia] & [~jgrier]?

> Publish the flink docker container to the docker registry
> -
>
> Key: FLINK-3026
> URL: https://issues.apache.org/jira/browse/FLINK-3026
> Project: Flink
>  Issue Type: Task
>  Components: Build System, Docker
>Reporter: Omer Katz
>Assignee: Patrick Lucas
>  Labels: Deployment, Docker
>
> There's a dockerfile that can be used to build a docker container already in 
> the repository. It'd be awesome to just be able to pull it instead of 
> building it ourselves.
> The dockerfile can be found at 
> https://github.com/apache/flink/tree/master/flink-contrib/docker-flink
> It also doesn't point to the latest version of Flink which I fixed in 
> https://github.com/apache/flink/pull/1366



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (FLINK-5997) Docker build tools should inline GPG key to verify HTTP Flink release download

2017-03-08 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas updated FLINK-5997:
-
Description: See examples 
[here|https://github.com/docker-solr/docker-solr/blob/8afee8bf17a7a3d1fdc453643abfdb5cf2e822d0/6.4/Dockerfile]
 and 
[here|https://github.com/docker-library/official-images/blob/8b10b36d19a8cfba56bf4352c8ae98d3b6697eb3/README.md#init].
  (was: See title)

> Docker build tools should inline GPG key to verify HTTP Flink release download
> --
>
> Key: FLINK-5997
> URL: https://issues.apache.org/jira/browse/FLINK-5997
> Project: Flink
>  Issue Type: Improvement
>  Components: Docker
>Reporter: Patrick Lucas
>Assignee: Patrick Lucas
>
> See examples 
> [here|https://github.com/docker-solr/docker-solr/blob/8afee8bf17a7a3d1fdc453643abfdb5cf2e822d0/6.4/Dockerfile]
>  and 
> [here|https://github.com/docker-library/official-images/blob/8b10b36d19a8cfba56bf4352c8ae98d3b6697eb3/README.md#init].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5997) Docker build tools should inline GPG key to verify HTTP Flink release download

2017-03-08 Thread Patrick Lucas (JIRA)
Patrick Lucas created FLINK-5997:


 Summary: Docker build tools should inline GPG key to verify HTTP 
Flink release download
 Key: FLINK-5997
 URL: https://issues.apache.org/jira/browse/FLINK-5997
 Project: Flink
  Issue Type: Improvement
  Components: Docker
Reporter: Patrick Lucas
Assignee: Patrick Lucas


See title



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (FLINK-5635) Improve Docker tooling to make it easier to build images and launch Flink via Docker tools

2017-03-08 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas reassigned FLINK-5635:


Assignee: Patrick Lucas  (was: Jamie Grier)

> Improve Docker tooling to make it easier to build images and launch Flink via 
> Docker tools
> --
>
> Key: FLINK-5635
> URL: https://issues.apache.org/jira/browse/FLINK-5635
> Project: Flink
>  Issue Type: Improvement
>  Components: Docker
>Affects Versions: 1.2.0
>Reporter: Jamie Grier
>Assignee: Patrick Lucas
>
> This is a bit of a catch-all ticket for general improvements to the Flink on 
> Docker experience.
> Things to improve:
>   - Make it possible to build a Docker image from your own flink-dist 
> directory as well as official releases.
>   - Make it possible to override the image name so a user can more easily 
> publish these images to their Docker repository
>   - Provide scripts that show how to properly run on Docker Swarm or similar 
> environments with overlay networking (Kubernetes) without using host 
> networking.
>   - Log to stdout rather than to files.
>   - Work properly with docker-compose for local deployment as well as 
> production deployments (Swarm/k8s)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (FLINK-4326) Flink start-up scripts should optionally start services on the foreground

2017-03-08 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas reassigned FLINK-4326:


Assignee: Greg Hogan

> Flink start-up scripts should optionally start services on the foreground
> -
>
> Key: FLINK-4326
> URL: https://issues.apache.org/jira/browse/FLINK-4326
> Project: Flink
>  Issue Type: Improvement
>  Components: Startup Shell Scripts
>Affects Versions: 1.0.3
>Reporter: Elias Levy
>Assignee: Greg Hogan
>
> This has previously been mentioned in the mailing list, but has not been 
> addressed.  Flink start-up scripts start the job and task managers in the 
> background.  This makes it difficult to integrate Flink with most processes 
> supervisory tools and init systems, including Docker.  One can get around 
> this via hacking the scripts or manually starting the right classes via Java, 
> but it is a brittle solution.
> In addition to starting the daemons in the foreground, the start up scripts 
> should use exec instead of running the commends, so as to avoid forks.  Many 
> supervisory tools assume the PID of the process to be monitored is that of 
> the process it first executes, and fork chains make it difficult for the 
> supervisor to figure out what process to monitor.  Specifically, 
> jobmanager.sh and taskmanager.sh should exec flink-daemon.sh, and 
> flink-daemon.sh should exec java.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (FLINK-4326) Flink start-up scripts should optionally start services on the foreground

2017-03-08 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas resolved FLINK-4326.
--
Resolution: Fixed

> Flink start-up scripts should optionally start services on the foreground
> -
>
> Key: FLINK-4326
> URL: https://issues.apache.org/jira/browse/FLINK-4326
> Project: Flink
>  Issue Type: Improvement
>  Components: Startup Shell Scripts
>Affects Versions: 1.0.3
>Reporter: Elias Levy
>Assignee: Greg Hogan
>
> This has previously been mentioned in the mailing list, but has not been 
> addressed.  Flink start-up scripts start the job and task managers in the 
> background.  This makes it difficult to integrate Flink with most processes 
> supervisory tools and init systems, including Docker.  One can get around 
> this via hacking the scripts or manually starting the right classes via Java, 
> but it is a brittle solution.
> In addition to starting the daemons in the foreground, the start up scripts 
> should use exec instead of running the commends, so as to avoid forks.  Many 
> supervisory tools assume the PID of the process to be monitored is that of 
> the process it first executes, and fork chains make it difficult for the 
> supervisor to figure out what process to monitor.  Specifically, 
> jobmanager.sh and taskmanager.sh should exec flink-daemon.sh, and 
> flink-daemon.sh should exec java.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (FLINK-5843) Website/docs missing Cache-Control HTTP header, can serve stale data

2017-02-20 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas reassigned FLINK-5843:


Assignee: Patrick Lucas
Priority: Minor  (was: Major)

> Website/docs missing Cache-Control HTTP header, can serve stale data
> 
>
> Key: FLINK-5843
> URL: https://issues.apache.org/jira/browse/FLINK-5843
> Project: Flink
>  Issue Type: Bug
>  Components: Project Website
>Reporter: Patrick Lucas
>Assignee: Patrick Lucas
>Priority: Minor
>
> When Flink 1.2.0 was released, I found that the [Flink downloads 
> page|https://flink.apache.org/downloads.html] was out-of-date until I forced 
> my browser to refresh the page. Upon investigation, I found that the 
> principle pages of the website are served with only the following headers 
> that relate to caching: Date, Last-Modified, and ETag.
> Since there is no Cache-Control header (or the older Expires or Pragma 
> headers), browsers are left to their own heuristics as to how long to cache 
> this content, which varies browser to browser. In some browsers, this 
> heuristic is 10% of the difference between Date and Last-Modified headers. I 
> take this to mean that, if the content were last modified 90 days ago, and I 
> last accessed it 5 days ago, my browser will serve a cached response for a 
> further 3.5 days (10% * (90 days - 5 days) = 8.5 days, 5 days have elapsed 
> leaving 3.5 days).
> I'm not sure who at the ASF we should talk to about this, but I recommend we 
> add the following header to any responses served from the Flink project 
> website or official documentation website\[1]:
> {code}Cache-Control: max-age=0, must-revalidate{code}
> (Note this will only make browser revalidate their caches; if the ETag of the 
> cached content matches what the server still has, the server will return 304 
> Not Modified and omit the actual content)
> \[1] Both the website hosted at flink.apache.org and the documentation hosted 
> at ci.apache.org are affected.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5843) Website/docs missing Cache-Control HTTP header, can serve stale data

2017-02-19 Thread Patrick Lucas (JIRA)
Patrick Lucas created FLINK-5843:


 Summary: Website/docs missing Cache-Control HTTP header, can serve 
stale data
 Key: FLINK-5843
 URL: https://issues.apache.org/jira/browse/FLINK-5843
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Patrick Lucas


When Flink 1.2.0 was released, I found that the [Flink downloads 
page|https://flink.apache.org/downloads.html] was out-of-date until I forced my 
browser to refresh the page. Upon investigation, I found that the principle 
pages of the website are served with only the following headers that relate to 
caching: Date, Last-Modified, and ETag.

Since there is no Cache-Control header (or the older Expires or Pragma 
headers), browsers are left to their own heuristics as to how long to cache 
this content, which varies browser to browser. In some browsers, this heuristic 
is 10% of the difference between Date and Last-Modified headers. I take this to 
mean that, if the content were last modified 90 days ago, and I last accessed 
it 5 days ago, my browser will serve a cached response for a further 3.5 days 
(10% * (90 days - 5 days) = 8.5 days, 5 days have elapsed leaving 3.5 days).

I'm not sure who at the ASF we should talk to about this, but I recommend we 
add the following header to any responses served from the Flink project website 
or official documentation website\[1]:

{code}Cache-Control: max-age=0, must-revalidate{code}

(Note this will only make browser revalidate their caches; if the ETag of the 
cached content matches what the server still has, the server will return 304 
Not Modified and omit the actual content)

\[1] Both the website hosted at flink.apache.org and the documentation hosted 
at ci.apache.org are affected.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (FLINK-5842) Wrong 'since' version for ElasticSearch 5.x connector

2017-02-19 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas reassigned FLINK-5842:


Assignee: Patrick Lucas

> Wrong 'since' version for ElasticSearch 5.x connector
> -
>
> Key: FLINK-5842
> URL: https://issues.apache.org/jira/browse/FLINK-5842
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Streaming Connectors
>Affects Versions: 1.3.0
>Reporter: Dawid Wysakowicz
>Assignee: Patrick Lucas
>
> The documentation claims that ElasticSearch 5.x is supported since Flink 
> 1.2.0 which is not true, as the support was merged after 1.2.0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5751) 404 in documentation

2017-02-15 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868981#comment-15868981
 ] 

Patrick Lucas commented on FLINK-5751:
--

Opened a PR fixing as many broken links as I could find using this method.

The script I used:

{code}
#!/bin/bash -x
# Don't abort on any non-zero exit code
#set +e

# Crawl the docs, ignoring robots.txt, storing nothing locally
wget --spider -r -nd -nv -e robots=off -p -o spider.log 
http://localhost/flink-docs

# Abort for anything other than 0 and 4 ("Network failure")
status=$?
if [ $status -ne 0 ] && [ $status -ne 4 ]; then
exit $status
fi

# Fail the build if any broken links are found
broken_links_str=$(grep -e 'Found [[:digit:]]\+ broken link' spider.log)
echo -e "\e[1;31m$broken_links_str\e[0m"
if [ -n "$broken_links_str" ]; then
exit 1
fi
{code}

> 404 in documentation
> 
>
> Key: FLINK-5751
> URL: https://issues.apache.org/jira/browse/FLINK-5751
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Colin Breame
>Priority: Trivial
>
> This page:
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/quickstart/setup_quickstart.html
> Contains a link with title "Flink on Windows" with URL:
> - 
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/flink_on_windows
> This gives a 404.  It should be:
> - 
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/flink_on_windows.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5751) 404 in documentation

2017-02-15 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868901#comment-15868901
 ] 

Patrick Lucas commented on FLINK-5751:
--

Down to 15 now:

{noformat}
http://localhost/flink-docs/dev/stream/checkpointing
http://localhost/flink-docs/dev/api_concepts
http://localhost/flink-docs/dev/execution_configuration
http://localhost/flink-docs/dev/parallel
http://localhost/flink-docs/dev/restart_strategies
http://localhost/flink-docs/dev/stream/state
http://localhost/flink-docs/quickstart/run_example_quickstart
http://localhost/dev/api_concepts
http://localhost/dev/batch/index.html
http://localhost/flink-docs/quickstart/scala_api_quickstart
http://localhost/flink-docs/dev/datastream_api
http://localhost/flink-docs/quickstart/java_api_quickstart
http://localhost/flink-docs/dev/custom_serializers
http://localhost/flink-docs/setup/config
http://localhost/flink-docs/api/java/org/apache/flink/table/api/Table.html
{noformat}

> 404 in documentation
> 
>
> Key: FLINK-5751
> URL: https://issues.apache.org/jira/browse/FLINK-5751
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Colin Breame
>Priority: Trivial
>
> This page:
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/quickstart/setup_quickstart.html
> Contains a link with title "Flink on Windows" with URL:
> - 
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/flink_on_windows
> This gives a 404.  It should be:
> - 
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/flink_on_windows.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5801) Queryable State from Scala job/client fails with key of type Long

2017-02-14 Thread Patrick Lucas (JIRA)
Patrick Lucas created FLINK-5801:


 Summary: Queryable State from Scala job/client fails with key of 
type Long
 Key: FLINK-5801
 URL: https://issues.apache.org/jira/browse/FLINK-5801
 Project: Flink
  Issue Type: Bug
  Components: Queryable State
Affects Versions: 1.2.0
 Environment: Flink 1.2.0
Scala 2.10
Reporter: Patrick Lucas
 Attachments: OrderFulfillment.scala, OrderFulfillmentStateQuery.scala

While working on a demo Flink job, to try out Queryable State, I exposed some 
state of type Long -> custom class via the Query server. However, the query 
server returned an exception when I tried to send a query:

{noformat}
Exception in thread "main" java.lang.RuntimeException: Failed to query state 
backend for query 0. Caused by: java.io.IOException: Unable to deserialize key 
and namespace. This indicates a mismatch in the key/namespace serializers used 
by the KvState instance and this access.
at 
org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer.deserializeKeyAndNamespace(KvStateRequestSerializer.java:392)
at 
org.apache.flink.runtime.state.heap.AbstractHeapState.getSerializedValue(AbstractHeapState.java:130)
at 
org.apache.flink.runtime.query.netty.KvStateServerHandler$AsyncKvStateQueryTask.run(KvStateServerHandler.java:220)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at 
org.apache.flink.runtime.util.DataInputDeserializer.readLong(DataInputDeserializer.java:217)
at 
org.apache.flink.api.common.typeutils.base.LongSerializer.deserialize(LongSerializer.java:69)
at 
org.apache.flink.api.common.typeutils.base.LongSerializer.deserialize(LongSerializer.java:27)
at 
org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer.deserializeKeyAndNamespace(KvStateRequestSerializer.java:379)
... 7 more

at 
org.apache.flink.runtime.query.netty.KvStateServerHandler$AsyncKvStateQueryTask.run(KvStateServerHandler.java:257)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

I banged my head against this for a while, then per [~jgrier]'s suggestion I 
tried simply changing the key from Long to String (modifying the two {{keyBy}} 
calls and the {{keySerializer}} {{TypeHint}} in the attached code) and it 
started working perfectly.

cc [~uce]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (FLINK-5801) Queryable State from Scala job/client fails with key of type Long

2017-02-14 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas updated FLINK-5801:
-
Attachment: OrderFulfillmentStateQuery.scala
OrderFulfillment.scala

> Queryable State from Scala job/client fails with key of type Long
> -
>
> Key: FLINK-5801
> URL: https://issues.apache.org/jira/browse/FLINK-5801
> Project: Flink
>  Issue Type: Bug
>  Components: Queryable State
>Affects Versions: 1.2.0
> Environment: Flink 1.2.0
> Scala 2.10
>Reporter: Patrick Lucas
> Attachments: OrderFulfillment.scala, OrderFulfillmentStateQuery.scala
>
>
> While working on a demo Flink job, to try out Queryable State, I exposed some 
> state of type Long -> custom class via the Query server. However, the query 
> server returned an exception when I tried to send a query:
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Failed to query state 
> backend for query 0. Caused by: java.io.IOException: Unable to deserialize 
> key and namespace. This indicates a mismatch in the key/namespace serializers 
> used by the KvState instance and this access.
>   at 
> org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer.deserializeKeyAndNamespace(KvStateRequestSerializer.java:392)
>   at 
> org.apache.flink.runtime.state.heap.AbstractHeapState.getSerializedValue(AbstractHeapState.java:130)
>   at 
> org.apache.flink.runtime.query.netty.KvStateServerHandler$AsyncKvStateQueryTask.run(KvStateServerHandler.java:220)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.EOFException
>   at 
> org.apache.flink.runtime.util.DataInputDeserializer.readLong(DataInputDeserializer.java:217)
>   at 
> org.apache.flink.api.common.typeutils.base.LongSerializer.deserialize(LongSerializer.java:69)
>   at 
> org.apache.flink.api.common.typeutils.base.LongSerializer.deserialize(LongSerializer.java:27)
>   at 
> org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer.deserializeKeyAndNamespace(KvStateRequestSerializer.java:379)
>   ... 7 more
>   at 
> org.apache.flink.runtime.query.netty.KvStateServerHandler$AsyncKvStateQueryTask.run(KvStateServerHandler.java:257)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I banged my head against this for a while, then per [~jgrier]'s suggestion I 
> tried simply changing the key from Long to String (modifying the two 
> {{keyBy}} calls and the {{keySerializer}} {{TypeHint}} in the attached code) 
> and it started working perfectly.
> cc [~uce]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5751) 404 in documentation

2017-02-09 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860369#comment-15860369
 ] 

Patrick Lucas commented on FLINK-5751:
--

I've identified the links as these using wget. It doesn't give the page they 
were found on (web crawling, pool of URLs and all that) but it's pretty easy to 
grep for them:

{noformat}
http://localhost/flink-docs/dev/api_concepts
http://localhost/flink-docs/dev/execution_configuration
http://localhost/flink-docs/internals/state_backends.html
http://localhost/flink-docs/dev/parallel
http://localhost/flink-docs/dev/restart_strategies
http://localhost/flink-docs/dev/state.html
http://localhost/flink-docs/dev/stream/state
http://localhost/flink-docs/quickstart/run_example_quickstart
http://localhost/dev/api_concepts
http://localhost/dev/batch/index.html
http://localhost/flink-docs/quickstart/scala_api_quickstart
http://localhost/flink-docs/dev/datastream_api
http://localhost/flink-docs/quickstart/java_api_quickstart
http://localhost/flink-docs/setup/flink_on_windows
http://localhost/flink-docs/dev/custom_serializers
http://localhost/flink-docs/setup/config
http://localhost/flink-docs/api/java/org/apache/flink/table/api/Table.html
http://localhost/flink-docs/dev/state
{noformat}

> 404 in documentation
> 
>
> Key: FLINK-5751
> URL: https://issues.apache.org/jira/browse/FLINK-5751
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Colin Breame
>Priority: Trivial
>
> This page:
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/quickstart/setup_quickstart.html
> Contains a link with title "Flink on Windows" with URL:
> - 
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/flink_on_windows
> This gives a 404.  It should be:
> - 
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/flink_on_windows.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5751) 404 in documentation

2017-02-09 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860222#comment-15860222
 ] 

Patrick Lucas commented on FLINK-5751:
--

I've actually run into quite a few broken links in the docs in the past few 
days. For example, there are a number of links to {{/dev/api_concepts}} which 
are missing {{.html}}. (In the repo, run \[1])

In many of the pages are tables, encoded in the markdown files as raw HTML, 
which have many broken links as well (eg. 
[here|https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/batch/index.html#dataset-transformations],
 though here usually because they are hard-coded to start with `/` instead of 
prefixing `{{ site.baseurl }}` like the non-HTML links.

I think I might put together a script to try to find broken links in the docs, 
as there seem to be many of them.

\[1]
{code}git grep -e api_concepts[^\\.]{code}

P.S. I'll give a dollar to anyone who can tell me how to put that {{git grep}} 
snippet on the same line as other text in JIRA...

> 404 in documentation
> 
>
> Key: FLINK-5751
> URL: https://issues.apache.org/jira/browse/FLINK-5751
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Colin Breame
>Priority: Trivial
>
> This page:
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/quickstart/setup_quickstart.html
> Contains a link with title "Flink on Windows" with URL:
> - 
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/flink_on_windows
> This gives a 404.  It should be:
> - 
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/flink_on_windows.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-1588) Load flink configuration also from classloader

2017-02-07 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856540#comment-15856540
 ] 

Patrick Lucas commented on FLINK-1588:
--

My gut feeling is to avoid adding a new way to configure Flink if no one is 
actively asking for it. I've already gotten what I want out of this ticket (a 
better understanding of this part of the codebase), so unless you think this 
actually is a feature we do want now, I would just assume close it.

> Load flink configuration also from classloader
> --
>
> Key: FLINK-1588
> URL: https://issues.apache.org/jira/browse/FLINK-1588
> Project: Flink
>  Issue Type: New Feature
>  Components: Local Runtime
>Reporter: Robert Metzger
>Assignee: Patrick Lucas
>
> The GlobalConfiguration object should also check if it finds the 
> flink-config.yaml in the classpath and load if from there.
> This allows users to inject configuration files in local "standalone" or 
> embedded environments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-1588) Load flink configuration also from classloader

2017-02-06 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854230#comment-15854230
 ] 

Patrick Lucas commented on FLINK-1588:
--

I have a [first pass 
branch|https://github.com/patricklucas/flink/commits/FLINK-1588_load_config_from_classpath]
 for this, but I have just a few questions about what we actually want here:

# Is the goal to search the directories on the classpath for a file named 
"flink-config.yaml" (as my branch does) or load a resource using 
getClassLoader().getResource("/flink-config.yaml") (requiring us to know the 
exact resource path)
# Do we want to load config from the classpath as a fallback if the config dir 
or config file don't exist, or do we want to somehow merge the configs?
# There's a [version of loadConfiguration with no 
params|https://github.com/patricklucas/flink/commit/deb10d6a065817569fb5659a7a4091e5984d5e11#diff-0a9e260480028baad50a56024fc3e84aR75]
 that currently returns an empty config object. Do we want to also search the 
classpath here? I also notice that it does not inject "dynamic properties" like 
the other overloaded method does, but since this one appears to be chiefly for 
testing, perhaps that isn't a problem.

> Load flink configuration also from classloader
> --
>
> Key: FLINK-1588
> URL: https://issues.apache.org/jira/browse/FLINK-1588
> Project: Flink
>  Issue Type: New Feature
>  Components: Local Runtime
>Reporter: Robert Metzger
>Assignee: Patrick Lucas
>
> The GlobalConfiguration object should also check if it finds the 
> flink-config.yaml in the classpath and load if from there.
> This allows users to inject configuration files in local "standalone" or 
> embedded environments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (FLINK-5153) Allow setting custom application tags for Flink on YARN

2017-02-05 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas reassigned FLINK-5153:


Assignee: Patrick Lucas

> Allow setting custom application tags for Flink on YARN
> ---
>
> Key: FLINK-5153
> URL: https://issues.apache.org/jira/browse/FLINK-5153
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Robert Metzger
>Assignee: Patrick Lucas
>
> https://issues.apache.org/jira/browse/YARN-1399 added support in YARN to tag 
> applications.
> We should introduce a configuration variable in Flink allowing users to 
> specify a comma-separated list of tags they want to assign to their Flink on 
> YARN applications.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (FLINK-1588) Load flink configuration also from classloader

2017-02-05 Thread Patrick Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Lucas reassigned FLINK-1588:


Assignee: Patrick Lucas

> Load flink configuration also from classloader
> --
>
> Key: FLINK-1588
> URL: https://issues.apache.org/jira/browse/FLINK-1588
> Project: Flink
>  Issue Type: New Feature
>  Components: Local Runtime
>Reporter: Robert Metzger
>Assignee: Patrick Lucas
>
> The GlobalConfiguration object should also check if it finds the 
> flink-config.yaml in the classpath and load if from there.
> This allows users to inject configuration files in local "standalone" or 
> embedded environments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-1588) Load flink configuration also from classloader

2017-02-03 Thread Patrick Lucas (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852224#comment-15852224
 ] 

Patrick Lucas commented on FLINK-1588:
--

Should it search and apply the configuration found on the classpath only when 
configDir is unset or flink-config.yaml can't be found in configDir? Or should 
it load configuration from both, preferring the values found in 
configDir/flink-config.yaml?

> Load flink configuration also from classloader
> --
>
> Key: FLINK-1588
> URL: https://issues.apache.org/jira/browse/FLINK-1588
> Project: Flink
>  Issue Type: New Feature
>  Components: Local Runtime
>Reporter: Robert Metzger
>
> The GlobalConfiguration object should also check if it finds the 
> flink-config.yaml in the classpath and load if from there.
> This allows users to inject configuration files in local "standalone" or 
> embedded environments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)