Re: Does flink configuration support configed by environment variables?

2019-04-01 Thread Stephen Connolly
I don't think it does. I ended up writing a small CLI tool to enabling templating the file from environment variables. There are loads of such tools, but mine is https://github.com/stephenc/envsub I have the dockerfile like so: ARG FLINK_VERSION=1.7.2-alpine FROM flink:${FLINK_VERSION} ARG

Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Stephen Connolly
ase-1.7/dev/event_time.html#watermarks-in-parallel-streams > On 19/02/2019 14:31, Stephen Connolly wrote: > > Hmmm my suspicions are now quite high. I created a file source that just > replays the events straight then I get more results > > On Tue, 19 Feb 2019 at 11:50, Stephen Co

Re: Reduce one event under multiple keys

2019-02-21 Thread Stephen Connolly
for open and close) for counting and aggregating should be > a good design. > > Best, Fabian > > Am Mo., 11. Feb. 2019 um 11:29 Uhr schrieb Stephen Connolly < > stephen.alan.conno...@gmail.com>: > >> >> >> On Mon, 11 Feb 2019 at 09:42, Fabian Hueske wrote

Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Stephen Connolly
event count for timebox" output or the "update event count for timebox from late events" output as long as it is always one and only one of those paths. > > > Best, > > Dawid > On 21/02/2019 14:18, Stephen Connolly wrote: > > Yes, it was the "watermarks fo

Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Stephen Connolly
se of reprocessing" I started to think that maybe the Watermarks are the Barrier but after your clarification I'm back to thinking they are separate similar mechanisms operating in the stream > Best, > > Dawid > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.7/in

Can I make an Elasticsearch Sink effectively exactly once?

2019-02-21 Thread Stephen Connolly
>From how I understand it: https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/elasticsearch.html#elasticsearch-sinks-and-fault-tolerance the Flink Elasticsearch Sink guarantees at-least-once delivery of action > requests to Elasticsearch clusters. It does so by waiting

How to debug difference between Kinesis and Kafka

2019-02-19 Thread Stephen Connolly
Hi, I’m having a strange situation and I would like to know where I should start trying to debug. I have set up a configurable swap in source, with three implementations: 1. A mock implementation 2. A Kafka consumer implementation 3. A Kinesis consumer implementation >From injecting a log and

Re: How to debug difference between Kinesis and Kafka

2019-02-19 Thread Stephen Connolly
:14, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > Hi, I’m having a strange situation and I would like to know where I should > start trying to debug. > > I have set up a configurable swap in source, with three implementations: > > 1. A mock implementati

Re: Why don't Tuple types implement Comparable?

2019-02-22 Thread Stephen Connolly
On Fri, 22 Feb 2019 at 10:16, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > On Thu, 21 Feb 2019 at 18:29, Frank Grimes > wrote: > >> Hi, >> >> I've recently started to evaluate Flink and have found it odd that its >> Tuple typ

Re: Why don't Tuple types implement Comparable?

2019-02-22 Thread Stephen Connolly
On Thu, 21 Feb 2019 at 18:29, Frank Grimes wrote: > Hi, > > I've recently started to evaluate Flink and have found it odd that its > Tuple types, while Serializable, don't implement java.lang.Comparable. > This means that I either need to provide an KeySelector for many > operations or subtype

Re: Why don't Tuple types implement Comparable?

2019-02-22 Thread Stephen Connolly
On Fri, 22 Feb 2019 at 10:38, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > > > On Fri, 22 Feb 2019 at 10:16, Stephen Connolly < > stephen.alan.conno...@gmail.com> wrote: > >> On Thu, 21 Feb 2019 at 18:29, Frank Grimes >> wrote: >> >

Re: Checkpoints and catch-up burst (heavy back pressure)

2019-03-05 Thread Stephen Connolly
On Fri, 1 Mar 2019 at 13:05, LINZ, Arnaud wrote: > Hi, > > > > I think I should go into more details to explain my use case. > > I have one non parallel source (parallelism = 1) that list binary files in > a HDFS directory. DataSet emitted by the source is a data set of file > names, not file

Re: Checkpoints and catch-up burst (heavy back pressure)

2019-03-05 Thread Stephen Connolly
On Tue, 5 Mar 2019 at 12:48, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > > > On Fri, 1 Mar 2019 at 13:05, LINZ, Arnaud > wrote: > >> Hi, >> >> >> >> I think I should go into more details to explain my use case. >> >

Re: REST API question GET /jars/:jarid/plan

2019-03-07 Thread Stephen Connolly
logging, > try again and look for logging messages from > "org.apache.flink.runtime.rest.handler.router.RouterHandler" > > On 07.03.2019 11:25, Stephen Connolly wrote: > > In the documentation for the /jars/:jarid/plan endpoint > > > https://ci.apache.org/proje

Re: REST API question GET /jars/:jarid/plan

2019-03-07 Thread Stephen Connolly
Yep that was it. I have created https://issues.apache.org/jira/browse/FLINK-11853 so that it is easier for others to work around if they have restrictions on the HTTP client library choice On Thu, 7 Mar 2019 at 11:47, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > > >

REST API question GET /jars/:jarid/plan

2019-03-07 Thread Stephen Connolly
In the documentation for the /jars/:jarid/plan endpoint https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#jars-jarid-plan It says: > Program arguments can be passed both via the JSON request (recommended) or query parameters. Has anyone got sample code that sends

Re: DataStream EventTime last data cannot be output?

2019-03-07 Thread Stephen Connolly
I had this issue myself. Your timestamp assigner will only advance the window as it receives data, thus when you reach the end of the data there will be data which is newer than the last window. One solution is to have the source flag that there will be no more data. If you can do this then that

Reduce one event under multiple keys

2019-02-08 Thread Stephen Connolly
Ok, I'll try and map my problem into something that should be familiar to most people. Consider collection of PCs, each of which has a unique ID, e.g. ca:fe:ba:be, de:ad:be:ef, etc. Each PC has a tree of local files. Some of the file paths are coincidentally the same names, but there is no file

Re: Can an Aggregate the key from a WindowedStream.aggregate()

2019-02-10 Thread Stephen Connolly
On Sun, 10 Feb 2019 at 09:48, Chesnay Schepler wrote: > There are also versions of WindowedStream#aggregate that accept an > additional WindowFunction/ProcessWindowFunction, which do have access to > the key via apply()/process() respectively. These functions are called > post aggregation. >

Re: Reduce one event under multiple keys

2019-02-10 Thread Stephen Connolly
","path":"/foo/bar/Admin guide.txt"} So there will be aggregates stored for ("ca:fe:ba:be","/"), ("ca:fe:ba:be","/foo"), ("ca:fe:ba:be","/foo/bar"), ("ca:fe:ba:be","/foo/bar/README.txt"), etc In wi

Is there a windowing strategy that allows a different offset per key?

2019-02-10 Thread Stephen Connolly
I would like to process a stream of data firom different customers, producing output say once every 15 minutes. The results will then be loaded into another system for stoage and querying. I have been using TumblingEventTimeWindows in my prototype, but I am concerned that all the windows will

Re: Is there a windowing strategy that allows a different offset per key?

2019-02-10 Thread Stephen Connolly
you forget to call " + "'DataStream.assignTimestampsAndWatermarks(...)'?"); } } So I think I can just write my own where the offset is derived from hashing the element using my hash function. Good plan or bad plan? On Sun, 10 Feb 2019 at 19:55, Stephen Connolly < stephen.alan.conno.

Re: [ANNOUNCE] New Flink PMC member Thomas Weise

2019-02-12 Thread Stephen Connolly
Congratulations to Thomas. I see that this is not his first time in the PMC rodeo... also somebody needs to update LDAP as he's not on https://people.apache.org/phonebook.html?pmc=flink yet! -stephenc On Tue, 12 Feb 2019 at 09:59, Fabian Hueske wrote: > Hi everyone, > > On behalf of the Flink

Re: EXT :Re: How to debug difference between Kinesis and Kafka

2019-02-19 Thread Stephen Connolly
> function isn’t getting data you have to watch out for this. > > > > *From:* Stephen Connolly [mailto:stephen.alan.conno...@gmail.com] > *Sent:* Tuesday, February 19, 2019 6:32 AM > *To:* user > *Subject:* EXT :Re: How to debug difference between Kinesis and Kafka > >

Re: Anyone tried to do blue-green topology deployments?

2019-02-11 Thread Stephen Connolly
On Mon, 11 Feb 2019 at 14:10, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > Another possibility would be injecting pseudo events into the source and > having a stateful filter. > > The event would be something like “key X is now owned by green”. > > I can

Re: Is there a windowing strategy that allows a different offset per key?

2019-02-11 Thread Stephen Connolly
countered since > > Cheers, > Fabian > > [1] > https://lists.apache.org/thread.html/0d1e41302b89378f88693bf4fdb52c23d4b240160b5a10c163d9c46c@%3Cdev.flink.apache.org%3E > > > Am So., 10. Feb. 2019 um 21:03 Uhr schrieb Stephen Connolly < > stephen.alan.conno...@gmail.com>: &g

Re: Anyone tried to do blue-green topology deployments?

2019-02-11 Thread Stephen Connolly
would be adding such state to the filter On Mon 11 Feb 2019 at 13:33, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > > > On Mon, 11 Feb 2019 at 13:26, Stephen Connolly < > stephen.alan.conno...@gmail.com> wrote: > >> I have my main application upda

Re: Anyone tried to do blue-green topology deployments?

2019-02-11 Thread Stephen Connolly
On Mon, 11 Feb 2019 at 13:26, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > I have my main application updating with a blue-green deployment strategy > whereby a new version (always called green) starts receiving an initial > fraction of the web traffic a

Anyone tried to do blue-green topology deployments?

2019-02-11 Thread Stephen Connolly
I have my main application updating with a blue-green deployment strategy whereby a new version (always called green) starts receiving an initial fraction of the web traffic and then - based on the error rates - we progress the % of traffic until 100% of traffic is being handled by the green

Re: Reduce one event under multiple keys

2019-02-11 Thread Stephen Connolly
the running total... at least that's what the PoC I am experimenting with Flink should show > > Hope this helps, > Fabian > > Am So., 10. Feb. 2019 um 20:36 Uhr schrieb Stephen Connolly < > stephen.alan.conno...@gmail.com>: > >> >> >> On Sun

Re: How to debug difference between Kinesis and Kafka

2019-02-19 Thread Stephen Connolly
Hmmm my suspicions are now quite high. I created a file source that just replays the events straight then I get more results On Tue, 19 Feb 2019 at 11:50, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > Hmmm after expanding the dataset such that there was additi

Logback on AWS EMR

2019-09-04 Thread Stephen Connolly
Has anyone configured AWS EMR’s flavour of Flink to use Logback (more specifically with the logstash encoder, which would require additional jars on the classpath) Or is there an alternative way people are using to send the logs to a service like Datadog Thanks in advance Stephen -- Sent from

Re: Logback on AWS EMR

2019-09-05 Thread Stephen Connolly
ace of logback). Then when you launch your flink jobs they will be clones with the correct files and happy as larry! Haven't figured out how to handle for ephemeral EMR clusters... but we aren't using them so :shrug: On Wed, 4 Sep 2019 at 22:17, Stephen Connolly < stephen.alan.conno...@gma

How to handle JDBC connections in a topology

2019-07-24 Thread Stephen Connolly
Hi, So we have a number of nodes in our topology that need to do things like checking a database, e.g. * We need a filter step to drop events on the floor from systems we are no longer interested in * We need a step that outputs on a side-channel if the event is for an object where the parent is

Re: How to handle JDBC connections in a topology

2019-07-24 Thread Stephen Connolly
Oh and I'd also need some way to clean up the per-node transient state if the topology stops running on a specific node. On Wed, 24 Jul 2019 at 08:18, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > Hi, > > So we have a number of nodes in our topology that need to

Is there a lifecycle listener that gets notified when a topology starts/stops on a task manager

2019-09-23 Thread Stephen Connolly
We are using a 3rd party library that allocates some resources in one of our topologies. Is there a listener or something that gets notified when the topology starts / stops running in the Task Manager's JVM? The 3rd party library uses a singleton, so I need to initialize the singleton when the

Re: Is there a lifecycle listener that gets notified when a topology starts/stops on a task manager

2019-09-24 Thread Stephen Connolly
cause your code(operators/UDFs) is part of the task, namely that > it can only be executed when the task is not disposed. > > Thanks, > Zhu Zhu > > Stephen Connolly 于2019年9月24日周二 上午2:13写道: > >> Currently the best I can see is to make *everything* a Rich... and hook >> i

Re: Is there a lifecycle listener that gets notified when a topology starts/stops on a task manager

2019-09-23 Thread Stephen Connolly
Currently the best I can see is to make *everything* a Rich... and hook into the open and close methods... but feels very ugly. On Mon 23 Sep 2019 at 15:45, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > We are using a 3rd party library that allocates some resourc

POJO serialization vs immutability

2019-10-02 Thread Stephen Connolly
I notice https://ci.apache.org/projects/flink/flink-docs-stable/dev/types_serialization.html#rules-for-pojo-types says that all non-transient fields need a setter. That means that the fields cannot be final. That means that the hashCode() should probably just return a constant value (otherwise

Re: Rescaling a running topology

2020-02-07 Thread Stephen Connolly
) at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:966) ... 20 more On Fri, 7 Feb 2020 at 11:40, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > So I am looking at the Flink Management REST API... and, as I see it, > there are two paths to rescale a running topology: > > 1. Stop the topolog

Re: Rescaling a running topology

2020-02-07 Thread Stephen Connolly
time (I gave up waiting) On Fri, 7 Feb 2020 at 11:54, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > And now the job is stuck in a suspended state and I seem to have no way to > get it out of that state again! > > On Fri, 7 Feb 2020 at 11:50, Stephen Connolly <

Re: Rescaling a running topology

2020-02-07 Thread Stephen Connolly
And now the job is stuck in a suspended state and I seem to have no way to get it out of that state again! On Fri, 7 Feb 2020 at 11:50, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > The plot thickens... I was able to rescale down... just not back up > again!!! &g

Rescaling a running topology

2020-02-07 Thread Stephen Connolly
So I am looking at the Flink Management REST API... and, as I see it, there are two paths to rescale a running topology: 1. Stop the topology with a savepoint and then start it up with the new savepoint; or 2. Use the /jobs/:jobid/rescaling

Upgrading Flink

2020-04-06 Thread Stephen Connolly
Quick questions on upgrading Flink. All our jobs are compiled against Flink 1.8.x We are planning to upgrade to 1.10.x 1. Is the recommended path to upgrade one minor at a time, i.e. 1.8.x -> 1.9.x and then 1.9.x -> 1.10.x as a second step or is the big jump supported, i.e. 1.8.x -> 1.10.x in

Two questions about Async

2020-04-21 Thread Stephen Connolly
1. On Flink 1.10 when I look at the topology overview, the AsyncFunctions show non-zero values for Bytes Received; Records Received; Bytes Sent but Records Sent is always 0... yet the next step in the topology shows approx the same Bytes Received as the async sent (modulo minor delays) and a

How to debug checkpoints failing to complete

2020-03-23 Thread Stephen Connolly
We have a topology and the checkpoints fail to complete a *lot* of the time. Typically it is just one subtask that fails. We have a parallelism of 2 on this topology at present and the other subtask will complete in 3ms though the end to end duration on the rare times when the checkpointing

Re: Running Apache Flink on the GraalVM as a Native Image

2020-06-27 Thread Stephen Connolly
On Thu 25 Jun 2020 at 12:48, ivo.kn...@t-online.de wrote: > Whats up guys, > > > > I'm trying to run an Apache Flink Application with the GraalVM Native > Image but I get the following error: (check attached file) > > > > I suppose this happens, because Flink uses a lot of low-level-code and is

Re: Running Apache Flink on the GraalVM as a Native Image

2020-06-28 Thread Stephen Connolly
On Sun 28 Jun 2020 at 01:34, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > > > On Thu 25 Jun 2020 at 12:48, ivo.kn...@t-online.de > wrote: > >> Whats up guys, >> >> >> >> I'm trying to run an Apache Flink Application with the Gr

Re: Is outputting from components other than sinks or side outputs a no-no ?

2020-07-27 Thread Stephen Connolly
I am not 100% certain that David is talking about the same pattern of usage that you are Tom. David, the pattern Tom is talking about is something like this... try { do something with record } catch (SomeException e) { push record to DLQ } My concern is that if we have a different failure,