Re: [External] checkpoint metadata not in configured directory state.checkpoints.dir

2019-06-19 Thread Congxian Qiu
Hi, Vishal If you want to restart from the last competed external checkpoint of the previous stoped job, you need to track the checkpoint path and restart from it. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/checkpoints.html#resuming-from-a-retained-checkpoint Best,

Re: can we do Flink CEP on event stream or batch or both?

2019-06-19 Thread kant kodali
Hi Fabian, Actually, now that I had gone through my use case I can say that the equality matches are more like expressions. for example the *sum(col1, col2) of datasetA = col3 datasetB.* And these expressions can include, sum, if & else, trim, substring, absolute_value etc.. and they are

Updated Flink Documentation and tutorials

2019-06-19 Thread Pankaj Chand
Hello, Please let me know how to get the updated documentation and tutorials of Apache Flink. The stable v1.8 and v1.9-snapshot release of the documentation seems to be outdated. Thanks! Pankaj

Re: [EXTERNAL] Re: How to restart/recover on reboot?

2019-06-19 Thread John Smith
Ok I tried it works! I can setup my cluster with terraform and enable systemd services! i think I got confused when I looked and it was doing leader election because all service came up quick! On Tue, 18 Jun 2019 at 22:35, John Smith wrote: > Ah ok we need to pass --host. The command line

Re: [External] checkpoint metadata not in configured directory state.checkpoints.dir

2019-06-19 Thread Vishal Sharma
Hi Chesnay, Can you suggest, How should I go about automating job restart from last completed externalised checkpoint in case of failure ? I am not sure about the path for the latest completed checkpoint. Thanks, Vishal Sharma On Wed, Jun 19, 2019 at 11:11 PM Chesnay Schepler wrote: > The

Re: [External] checkpoint metadata not in configured directory state.checkpoints.dir

2019-06-19 Thread Chesnay Schepler
The _metadata is always stored in the same directory as the checkpoint data. As outlined here "state.checkpoints.dir" serves as a cluster-wide configuration that _can_ be overwritten with a

[External] checkpoint metadata not in configured directory state.checkpoints.dir

2019-06-19 Thread Vishal Sharma
Hi Folks, I am using flink 1.8 with externalised checkpointing enabled and saving the checkpoints to aws S3. My configuration is as follows : flink-conf.yaml : state.checkpoints.dir: s3a://test-bucket/checkpoint-metadata In application code : env.setStateBackend(new

Re: Simple stateful polling source

2019-06-19 Thread Flavio Pompermaier
Ok great! Thanks everybody for the support On Wed, Jun 19, 2019 at 3:05 PM Chesnay Schepler wrote: > A (Rich)SourceFunction that does not implement RichParallelSourceFunction > is always run with a parallelism of 1. > > On 19/06/2019 14:36, Flavio Pompermaier wrote: > > My sourcefunction is

Re: Simple stateful polling source

2019-06-19 Thread Chesnay Schepler
A (Rich)SourceFunction that does not implement RichParallelSourceFunction is always run with a parallelism of 1. On 19/06/2019 14:36, Flavio Pompermaier wrote: My sourcefunction is intrinsically single-thread. Is there a way to force this aspect? I can't find a real difference between a

Re: Simple stateful polling source

2019-06-19 Thread Flavio Pompermaier
My sourcefunction is intrinsically single-thread. Is there a way to force this aspect? I can't find a real difference between a RichParallelSourceFunction and a RichSourceFunction. Is this last (RichSourceFunction) implicitly using parallelism = 1? On Wed, Jun 19, 2019 at 2:25 PM Chesnay Schepler

partition columns with StreamingFileSink

2019-06-19 Thread Yitzchak Lieberman
Hi. I'm using the StreamingFileSink for writing partitioned data to s3. The code is below: StreamingFileSink sink = StreamingFileSink.forBulkFormat(new Path("s3a://test-bucket/test"), ParquetAvroFactory.getParquetWriter(schema, "GZIP")) .withBucketAssigner(new

Re: Simple stateful polling source

2019-06-19 Thread Chesnay Schepler
It returns a list of states so that state can be re-distributed if the parallelism changes. If you hard-code the interface to return a single value then you're implicitly locking the parallelism. When you reduce the parallelism you'd no longer be able to restore all state, since you have less

Re: Simple stateful polling source

2019-06-19 Thread Flavio Pompermaier
It's not clear to me why the source checkpoint returns a list of object...when it could be useful to use a list instead of a single value? The documentation says The returned list should contain one entry for redistributable unit of state" but this is not very clear to me.. Best, Flavio On Wed,

Re: Simple stateful polling source

2019-06-19 Thread Chesnay Schepler
This looks fine to me. What exactly were you worried about? On 19/06/2019 12:33, Flavio Pompermaier wrote: Hi to all, in my use case I have to ingest data from a rest service, where I periodically poll the data (of course a queue would be a better choice but this doesn't depend on me). So

Simple stateful polling source

2019-06-19 Thread Flavio Pompermaier
Hi to all, in my use case I have to ingest data from a rest service, where I periodically poll the data (of course a queue would be a better choice but this doesn't depend on me). So I wrote a RichSourceFunction that starts a thread that poll for new data. However, I'd like to restart from the

Re: Side output in ProcessFunction.onTimer

2019-06-19 Thread Chesnay Schepler
ProcessFunction#onTimer provides an OnTimerContext parameter which allows you to use side-outputs. On 18/06/2019 17:41, Frank Wilson wrote: Hi, Is there a way to make side outputs in an onTimer callback in ProcessFunction? I want to side output events that belong to a session that was

Re: Maybe a flink bug. Job keeps in FAILING state

2019-06-19 Thread zhijiang
As long as one task is in canceling state, then the job status might be still in canceling state. @Joshua Do you confirm all of the tasks in topology were already in terminal state such as failed or canceled? Best, Zhijiang --

Re: Maybe a flink bug. Job keeps in FAILING state

2019-06-19 Thread Chesnay Schepler
@Till have you see something like this before? Despite all source tasks reaching a terminal state on a TM (FAILED) it does not send updates to the JM for all of them, but only a single one. On 18/06/2019 12:14, Joshua Fan wrote: Hi All, There is a topology of 3 operator, such as, source,

Re: Role of Job Manager

2019-06-19 Thread Pankaj Chand
Hi Biao, Thank you for your reply! Please let me know the url of the updated Flink documentation. The url of the outdated document is: https://ci.apache.org/projects/flink/flink-docs-release-1.8/concepts/runtime.html Another page which (tacitly) supports the outdated concept is: