Re: multiple users per flink deployment

2017-08-02 Thread Eron Wright
One of the key challenges is isolation, eg. ensuring that one job cannot access the credentials of another. The easiest solution today is to use the YARN deployment mode, with a separate app per job. Meanwhile, improvements being made under the FLIP-6 banner for 1.4+ are lying groundwork for a

Re: Eventime window

2017-08-02 Thread Eron Wright
Note that the built-in `BoundedOutOfOrdernessTimestampExtractor` generates watermarks based only on the timestamp of incoming events. Without new events, `BoundedOutOfOrdernessTimestampExtractor` will not advance the event-time clock. That explains why the window doesn't trigger immediately

Re: json mapper

2017-08-03 Thread Eron Wright
I think your snippet looks good. The Jackson ObjectMapper is designed to be reused by numerous threads, and routinely stored as a static field. It is somewhat expensive to create. Hope this helps, -Eron On Thu, Aug 3, 2017 at 7:46 AM, Nico Kruber wrote: > Hi Peter, >

Re: Automate Job submission with container

2017-08-15 Thread Eron Wright
Flink 1.3 relies on a two-step approach of deploying the cluster on Mesos and then deploying the job. The cron task seems like a good approach; maybe retry until the job is successfully deployed (using the 'detached' option assumedly). One complication is that the slots take some time to come

Re: Using Hadoop 2.8.0 in Flink Project for S3A Path Style Access

2017-08-09 Thread Eron Wright
For reference: [FLINK-6466] Build Hadoop 2.8.0 convenience binaries On Wed, Aug 9, 2017 at 6:41 AM, Aljoscha Krettek wrote: > So you're saying that this works if you manually compile Flink for Hadoop > 2.8.0? If yes, I think the solution is that we have to provide binaries

Re: kerberos yarn - failure in long running streaming application

2017-08-14 Thread Eron Wright
It sounds to me that the TGT is expiring (usually after 12 hours). This shouldn't happen in the keytab scenario because of a background thread provided by Hadoop that periodically performs a re-login using the keytab. More details on the Hadoop internals here:

Re: blob store defaults to /tmp and files get deleted

2017-08-04 Thread Eron Wright
The directory referred to by `blob.storage.directory` is best described as a local cache. For recovery purposes the JARs are also stored in ` high-availability.storageDir`.At least that's my reading of the code in 1.2. Maybe there's some YARN specific behavior too, sorry if this information

Re: Flink REST API async?

2017-08-07 Thread Eron Wright
When you submit a program via the REST API, the main method executes inside the JobManager process.Unfortunately a static variable is used to establish the execution environment that the program obtains from `ExecutionEnvironment.getExecutionEnvironment()`. From the stack trace it appears

Re: Are timers in ProcessFunction fault tolerant?

2017-05-25 Thread Eron Wright
Yes, registered timers are stored in managed keyed state and should be fault-tolerant. -Eron On Thu, May 25, 2017 at 9:28 AM, Moiz S Jinia wrote: > With a checkpointed RocksDB based state backend, can I expect the > registered processing timers to be fault tolerant?

Re: How can I handle backpressure with event time.

2017-05-25 Thread Eron Wright
Try setting the assigner on the Kafka consumer, rather than on the DataStream: https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/ kafka.html#kafka-consumers-and-timestamp-extractionwatermark-emission I believe this will produce a per-partition assigner and forward only

Re: Kafka 0.10 jaas multiple clients

2017-05-25 Thread Eron Wright
Gordon's suggestion seems like a good way to provide per-job credentials based on application-specific properties. In contrast, Flink's built-in JAAS features are aimed at making the Flink cluster's Kerberos credentials available to jobs. I want to reiterate that all jobs (for a given Flink

Re: Can't start cluster

2017-09-13 Thread Eron Wright
It appears that you're building Flink from source, and attempting to start the cluster from the 'flink-dist/src/...' directory. Please use 'flink-dist/target/...' instead since that's where the built distribution is located. For example, check:

Re: User-based authentication in Flink

2017-09-15 Thread Eron Wright
I think the short-term approach is to place an nginx proxy in front, in combination with some form of isolation of the underlying endpoint. That addresses the authentication piece but not fine-grained authorization. Be aware that the Flink JM is not multi-user due to lack of isolation among

Re: Securing Flink Monitoring REST API

2017-09-18 Thread Eron Wright
Unfortunately Flink does not yet support SSL mutual authentication nor any form of client authentication. There is an ongoing discussion about it: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Service-Authorization-redux-td18890.html A workaround that I've seen is to

Re: Rest API cancel-with-savepoint: 404s when passing path as target-directory

2017-09-19 Thread Eron Wright
Good news, it can be done if you carefully encode the target directory with percent-encoding, as per: https://tools.ietf.org/html/rfc3986#section-2.1 For example, given the directory `s3:///savepoint-bucket/my-awesome-job`, which encodes to `s3%3A%2F%2F%2Fsavepoint-bucket%2Fmy-awesome-job`, I was

Re: Problem in Flink 1.3.2 with Mesos task managers offers

2017-09-19 Thread Eron Wright
Hello, the current behavior is that Flink holds onto received offers for up to two minutes while it attempts to provision the TMs. Flink can combine small offers to form a single TM, to combat fragmentation that develops over time in a Mesos cluster. Are you saying that unused offers aren't

Re: Rest API cancel-with-savepoint: 404s when passing path as target-directory

2017-09-20 Thread Eron Wright
; > It's a bit confusing because the cancel api works with either and the > proxy > > url sometimes works as this was successful http:// > > {ip}:20888/proxy/application_1504649135200_0001/jobs/ > cca2dd609c716a7b0a19570 > > 0777e5b1f/cancel-with-savepoint/target-dir

Re: Flink flick cancel vs stop

2017-09-15 Thread Eron Wright
act Instant getWatermark(); > > public abstract CheckpointMark getCheckpointMark(); > } > > Here the driver would sit outside and call the source whenever data should > be provided. Stop/cancel would not be a feature of the source function but > of the code that calls it

Re: Flink flick cancel vs stop

2017-09-14 Thread Eron Wright
I too am curious about stop vs cancel. I'm trying to understand the motivations a bit more. The current behavior of stop is basically that the sources become bounded, leading to the job winding down. The interesting question is how best to support 'planned' maintenance procedures such as app

Re: Jobmanager and Taskmanager

2017-09-11 Thread Eron Wright
Yes, a given host may be used as both a job manager and as a task manager. On Mon, Sep 11, 2017 at 8:27 AM, AndreaKinn wrote: > Hi, > I'm configuring a cluster composed by just three nodes. > Looking at cluster setup guide >

Re: Installing Apache Flink on Mesos Cluster without DC/OS

2017-09-11 Thread Eron Wright
Hi, You do not need to install Flink onto the nodes. The approach used by Flink is that the task managers download a copy of Flink from the appmaster. The entire installation tree of Flink is downloaded (i.e. the bin/lib/conf directories). The only assumed dependency is Java, which may be

Re: Handle event time

2017-09-11 Thread Eron Wright
As mentioned earlier, the watermark is the basis for reasoning about the overall progression of time. Many operators use the watermark to correctly organize records, e.g. into the correct time-based window. Within that window the records may still be unordered. That said, some operators do

Re: Flink Application Jar file on Docker container

2017-09-27 Thread Eron Wright
There was also a talk about containerization at Flink Forward that touches on some of your questions. https://www.youtube.com/watch?v=w721NI-mtAA=2s=PLDX4T_cnKjD0JeULl1X6iTn7VIkDeYX_X=33 Eron On Wed, Sep 27, 2017 at 9:12 AM, Stefan Richter wrote: > Hi, > > from

Re: How to run a flink wordcount program

2017-08-17 Thread Eron Wright
Hello, You can run the WordCount program directly within your IDE. An embedded Flink environment is automatically created. -Eron On Thu, Aug 17, 2017 at 8:03 AM, P. Ramanjaneya Reddy wrote: > Thanks. > But I want to run in debug mode..how to run without jar and using the

Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

2017-08-17 Thread Eron Wright
Raja, According to those configuration values, the delegation token would be automatically renewed every 24 hours, then expire entirely after 7 days. You say that the job ran without issue for 'a few days'. Can we conclude that the job hit the 7-day DT expiration? Flink supports the use of

Re: Flink Mesos Outstanding Offers - trouble launching task managers

2017-08-31 Thread Eron Wright
Please mail me more information, in particular the JM log and the information on the 'offers' tab on the Mesos UI. Also, are you using any Mesos roles? Thanks On Thu, Aug 31, 2017 at 9:02 AM, prashantnayak < prash...@intellifylearning.com> wrote: > Hi Eron > > No, unfortunately we did not

Re: Flink Mesos Outstanding Offers - trouble launching task managers

2017-08-29 Thread Eron Wright
Hello, did you resolve this issue? Thanks, Eron Wright On Wed, Jul 12, 2017 at 11:09 AM, Prashant Nayak < prash...@intellifylearning.com> wrote: > > Hi > > We’re running Flink 1.3.1 on Mesos. > > From time-to-time, the Flink app master seems to have trouble with Meso

Re: Classloader error after SSL setup

2017-10-04 Thread Eron Wright
By following Chesney's recommendation we will hopefully uncover an SSL error that is being masked. Another thing to try is to disable hostname verification (it is enabled by default) to see whether the certificate is being rejected. On Wed, Oct 4, 2017 at 5:15 AM, Chesnay Schepler

Re: flink eventTime, lateness, maxoutoforderness

2017-12-16 Thread Eron Wright
Take a look at the section of Flink documentation titled "Event Time and Watermarks": https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/event_time.html#event-time-and-watermarks Also read the excellent series "Streaming 101" and "102", has useful animations depicting the flow of

Re: Fenzo NoClassDefFoundError: ObjectMapper

2017-12-14 Thread Eron Wright
Jared, I think you're correct that the shaded `ObjectMapper` is missing. Based on the above details and a quick look at the Fenzo code, it appears that this bug is expressed when you've configured some hard constraints that Mesos couldn't satisfy. Do you agree? Thanks also for suggesting a fix,

Re: Flink long-running streaming job, Keytab authentication

2017-12-14 Thread Eron Wright
To my knowledge the various RPC clients take care of renewal (whether reactively or using a renewal thread). Some examples: https://github.com/apache/hadoop/blob/release-2.7.3-RC2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L638

Re: Job-level close()?

2017-12-18 Thread Eron Wright
Since the connection pool is per-TM, and a given job may span numerous TMs, I think you're imagining a per-TM cleanup. Your idea of using reference counting seems like the best option. Maintain a static count of open instances. Use a synchronization block to manage the count and to

Re: what's the best practice to determine event-time watermark for unordered stream?

2017-12-12 Thread Eron Wright
A couple of other details worth mentioning: 1. Your choice of connector matters a lot. Some connectors provide limited ordering guarantees, especially across keys, which leads to a highly disordered stream of events (with respect to event time) from the perspective of the watermark generator.

Re: hadoop error with flink mesos on startup

2017-12-12 Thread Eron Wright
Thanks for investigating this, Jared. I would summarize it as Flink-on-Mesos cannot be used in Hadoop-free mode in Flink 1.4.0. I filed an improvement bug to support this scenario: FLINK-8247 On Tue, Dec 12, 2017 at 11:46 AM, Jared Stehler < jared.steh...@intellifylearning.com> wrote: > I

Re: coordinate watermarks between jobs?

2018-05-04 Thread Eron Wright
It might be possible to apply backpressure to the channels that are significantly ahead in event time. Tao, it would not be trivial, but if you'd like to investigate more deeply, take a look at the Flink runtime's `StatusWatermarkValve` and the associated stream input processors to see how an

Re: Problems with taskmanagers in Mesos Cluster

2017-10-23 Thread Eron Wright
If I understand you correctly, the high-availability path isn't being changed but other TM-related settings are, and the recovered TMs aren't picking up the new configuration. I don't think that Flink supports on-the-fly reconfiguration of a Task Manager at this time. As a workaround, to

Re: History Server

2018-01-16 Thread Eron Wright
As a follow-up question, how well does the history server work for observing a running job? I'm trying to understand whether, in the cluster-per-job model, a user would be expected to hop from the Web UI to the History Server once the job completed. Thanks On Wed, Oct 4, 2017 at 3:49 AM,

Re: Timestamps and watermarks in CoProcessFunction function

2018-01-16 Thread Eron Wright
Consider the watermarks that are generated by your chosen watermark generator as an +assertion+ about the progression of time, based on domain knowledge, observation of elements, and connector specifics. The generator is asserting that any elements observed after a given watermark will come later

Re: how to run flink project built with maven

2018-01-19 Thread Eron Wright
You must specify the full class name, in this case `org.example.WordCount`, for the `--class` argument. On Fri, Jan 19, 2018 at 9:35 AM, Jesse Lacika wrote: > I feel like this is probably the simplest thing, but I can't seem to > figure it out, and I've searched and searched and

Re: Two issues when deploying Flink on DC/OS

2018-01-13 Thread Eron Wright
host. Eron On Thu, Jan 11, 2018 at 7:18 AM, Gary Yao <g...@data-artisans.com> wrote: > Hi Dongwon, > > I am not familiar with the deployment on DC/OS. However, Eron Wright and > Jörg > Schad (cc'd), who have worked on the Mesos integration, might be able to > help

Re: How to stop FlinkKafkaConsumer and make job finished?

2017-12-28 Thread Eron Wright
I believe you can extend the `KeyedDeserializationSchema` that you pass to the consumer to check for end-of-stream markers. https://ci.apache.org/projects/flink/flink-docs-release-1.4/api/java/org/apache/flink/streaming/util/serialization/KeyedDeserializationSchema.html#isEndOfStream-T- Eron On

Re: Problem with querying state on Flink 1.6.

2018-08-30 Thread Eron Wright
I took a brief look as to why the queryable state server would bind to the loopback address. Both the qs server and the org.apache.flink.runtime.io.network.netty.NettyServer do bind the local address based on the TM address. That address is based on the "taskmanager.hostname" configuration

Re: Far too few watermarks getting generated with Kafka source

2018-01-22 Thread Eron Wright
I think there's a misconception about `setAutowatermarkInterval`. It establishes the rate at which your periodic watermark generator is polled for the current watermark. Like most generators, `BoundedOutOfOrdernessTimestampExtractor` produces a watermark based solely on observed elements.

Re: Issue in Flink/Zookeeper authentication via Kerberos

2018-04-15 Thread Eron Wright
t; unauthorised mode if zookeeper allows. > > And then it connected successfully! > > > > Do I missing any configuration in flink/zookeeper side. > > Expecting you suggestion here. > > > > Regards > > Sarthak Sahu > > > > *From:* Eron Wright [mailto:

Re: Queryable state and TTL

2019-07-06 Thread Eron Wright
Here's a PR for queryable state TLS that I closed because I didn't have time, and because I get the impression that the queryable state feature is used very often.Feel free to take it up, if you like. https://github.com/apache/flink/pull/6626 -Eron On Wed, Jul 3, 2019 at 11:21 PM Avi Levi