Re: Breakage in Flink CLI in 1.5.0

2018-06-19 Thread Sampath Bhat
Hi Chesnay If REST API (i.e. the web server) is mandatory for submitting jobs then why is there an option to set rest.port to -1? I think it should be mandatory to set some valid port for rest.port and make sure flink job manager does not come up if valid port is not set for rest.port? Or else

Re: Questions regarding to Flink 1.5.0 REST API change

2018-06-19 Thread Siew Wai Yow
Hi all, Seems pass in target-directory is a must now for checkpoints REST API, and the status will not response with save point directory anymore. I can pass in but the information is redundant with the same already defined in flink-config.yaml. May I know is there a way to retrieve the save

Re:How to get past "bad" Kafka message, restart, keep state

2018-06-19 Thread sihua zhou
Hi, Flink will reset the kafka offset to the latest successful checkpoint when recovery, but the "bad" message will always raise exception and cause recovery, so it will never be covered by any successful checkpoint, and your job will never skip the record that "bad" message. I think you

Re: Questions regarding to Flink 1.5.0 REST API change

2018-06-19 Thread Siew Wai Yow
Thank you @Chesnay Schepler and @Esteban Serrano! From: Chesnay Schepler Sent: Tuesday, June 19, 2018 11:55 PM To: user@flink.apache.org Subject: Re: Questions regarding to Flink 1.5.0 REST API change 1.

How to get past "bad" Kafka message, restart, keep state

2018-06-19 Thread chrisr123
First time I'm trying to get this to work so bear with me. I'm trying to learn checkpointing with Kafka and handling "bad" messages, restarting without losing state. Use Case: Use checkpointing. Read a stream of integers from Kafka, keep a running sum. If a "bad" Kafka message read, restart app,

Savepoint with S3

2018-06-19 Thread Anil
I'm using RocksDB and S3 to for Savepoint. Flink version is 1.4. Currently I'm creating a savepoint for the jobs every 10 mins. When the job starts the data is saved as expected but after a while I see these message in my log and the savepoint is not saved anymore. 2018-06-19 12:23:18,992

Re: Ordering of stream from different kafka partitions

2018-06-19 Thread Andrey Zagrebin
Hi Amol, I think you could try (based on your stack overflow code) org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor like this: DataStream streamSource = env .addSource(kafkaConsumer) .setParallelism(4) .assignTimestampsAndWatermarks( new

Re: DataStreamCalcRule$1802" grows beyond 64 KB when execute long sql.

2018-06-19 Thread Timo Walther
Hi, this is a known issue that is mentioned in https://issues.apache.org/jira/browse/FLINK-8921 and should be fixed soon. Currently, we only split by field but for your case we should also split expressions. As a workaround you could implement your own scalar UDF that contains the case/when

Re: Questions regarding to Flink 1.5.0 REST API change

2018-06-19 Thread Chesnay Schepler
1. PATCH to /jobs/:jobid, you can specify CANCEL/STOP with the "mode" query parameter 2. POST to /jobs/:jobid/savepoints, with a json payload. Returns a trigger id, used for 3). {| "target-directory" : { | |"type" : "string" }, "cancel-job" : { "type" : "boolean" } } | 3. GET to

Re: Questions regarding to Flink 1.5.0 REST API change

2018-06-19 Thread Esteban Serrano
For #1, you need to use a PATCH request to "*/jobs/:jobid*" On Tue, Jun 19, 2018 at 11:35 AM Siew Wai Yow wrote: > Hi, > > > Regarding to Flink 1.5.0 REST API breaking change, > >- *The REST API to cancel a job was changed.* >- *The REST API to cancel a job with savepoint was changed.*

Questions regarding to Flink 1.5.0 REST API change

2018-06-19 Thread Siew Wai Yow
Hi, Regarding to Flink 1.5.0 REST API breaking change, * The REST API to cancel a job was changed. * The REST API to cancel a job with savepoint was changed. I have few dump questions, 1. Any replacement for cancellation ONLY without save-point? Only found

Re: Running with Docker image on ECS/Fargate?

2018-06-19 Thread Sandybayev, Turar (CAI - Atlanta)
Sorry, sent to a wrong group originally. My question is below: Date: Tuesday, June 19, 2018 at 11:01 AM To: "d...@flink.apache.org" Subject: Running with Docker image on ECS/Fargate? Hi, Has anyone tried to run a Flink cluster on AWS ECS? I couldn’t figure how to replicate “docker-compose

Re: DataStreamCalcRule$1802" grows beyond 64 KB when execute long sql.

2018-06-19 Thread Fabian Hueske
I see, then this case wasn't covered by the fix that we added for Flink 1.5.0. I guess the problem is that the code is needed to evaluate a single field. Implementing a scalar user-function is not very difficult [1]. However, you need to register it in the TableEnvironment before you can use it

Re: DataStreamCalcRule$1802" grows beyond 64 KB when execute long sql.

2018-06-19 Thread zhangminglei
Hi, Fabian, Absolutely, Flink 1.5.0 I am using for this. A big CASE WHEN statement. Is it hard to implement ? I am a new to flink table api & sql. Best Minglei. > 在 2018年6月19日,下午10:36,Fabian Hueske 写道: > > Hi, > > Which version are you using? We fixed a similar issue for Flink 1.5.0. > If

Re: # of active session windows of a streaming job

2018-06-19 Thread Dongwon Kim
Hi Fabian, Thanks a lot for your reply. Do you need to number of active session windows as a DataStream or would > you like to have it as a metric that you can expose. > I possible, I would recommend to expose it as a metric because they are > usually easier to collect. I want to have it as a

Re: DataStreamCalcRule$1802" grows beyond 64 KB when execute long sql.

2018-06-19 Thread Fabian Hueske
Hi, Which version are you using? We fixed a similar issue for Flink 1.5.0. If you can't upgrade yet, you can also implement a user-defined function that evaluates the big CASE WHEN statement. Best, Fabian 2018-06-19 16:27 GMT+02:00 zhangminglei <18717838...@163.com>: > Hi, friends. > > When I

DataStreamCalcRule$1802" grows beyond 64 KB when execute long sql.

2018-06-19 Thread zhangminglei
Hi, friends. When I execute a long sql and get the follow error, how can I have a quick fix ? org.apache.flink.api.common.InvalidProgramException: Table program cannot be compiled. This is a bug. Please file an issue. at

Re: Flink application does not scale as expected, please help!

2018-06-19 Thread Siew Wai Yow
Thanks @Fabian and @Till for the good explanation. The picture is pretty clear right now. As for the single slot TM test, seems it trying to allocate the same machine's slot first as well. But with that the result is less spiky. i guess @Fabian is right that network is our bottleneck in case

Re: Flink kafka consumer stopped committing offsets

2018-06-19 Thread Juho Autio
Hi, Thanks for your analysis. We found LeaderElectionRateAndTimeMs go to non-zero value on Kafka around the same time when this error was seen in the Flink job. Kafka itself recovers from this and so do any other consumers that we have. It seems like a bug in kafka consumer library if this

Re: Breakage in Flink CLI in 1.5.0

2018-06-19 Thread Chesnay Schepler
In 1.5 we reworked the job-submission to go through the REST API instead of akka. I believe the jobmanager rpc port shouldn't be necessary anymore, the rpc address is still /required /due to some technical implementations; it may be that you can set this to some arbitrary value however. As

Re:Ordering of stream from different kafka partitions

2018-06-19 Thread sihua zhou
Hi Amol, I'm not sure whether this is impossible, especially when you need to operate the record in multi parallelism. IMO, in theroy, we can only get a ordered stream when there is a single partition of kafka and operate it with a single parallelism in flink. Even in this case, if you

Breakage in Flink CLI in 1.5.0

2018-06-19 Thread Sampath Bhat
Hello I'm using Flink 1.5.0 version and Flink CLI to submit jar to flink cluster. In flink 1.4.2 only job manager rpc address and job manager rpc port were sufficient for flink client to connect to job manager and submit the job. But in flink 1.5.0 the flink client additionally requires the

Re: Exception while submitting jobs through Yarn

2018-06-19 Thread Ted Yu
Since you're using a vendor's distro, I would suggest asking on their user forum. Cheers Original message From: Garvit Sharma Date: 6/19/18 3:34 AM (GMT-08:00) To: trohrm...@apache.org Cc: Amit Jain , Chesnay Schepler , Ted Yu , user@flink.apache.org Subject: Re: Exception

Re: Exception while submitting jobs through Yarn

2018-06-19 Thread Garvit Sharma
Any help on this? On Mon, Jun 18, 2018 at 11:31 PM Garvit Sharma wrote: > Yes, it is. > > On Mon, Jun 18, 2018 at 7:54 PM Till Rohrmann > wrote: > >> Is `/usr/hdp/2.6.3.0-235/hadoop/client/xercesImpl.jar` a link to ` >> /usr/hdp/2.6.3.0-235/hadoop/client/xercesImpl-2.9.1.jar`? >> >> On Mon,

Re: Stream Join With Early firings

2018-06-19 Thread Fabian Hueske
Hi Johannes, You are right. You should approach the problem with the semantics that you need before thinking about optimizations such as state size. The Table API / SQL offers (in v1.5.0) two types of joins: 1) Windowed joins where each record joins with records in a time-range of the other

Re: # of active session windows of a streaming job

2018-06-19 Thread Fabian Hueske
Hi Dongwon, Do you need to number of active session windows as a DataStream or would you like to have it as a metric that you can expose. I possible, I would recommend to expose it as a metric because they are usually easier to collect. SessionWindows work internally as follows: - every new

Re: Heap Problem with Checkpoints

2018-06-19 Thread Piotr Nowojski
Hi, Can you search the logs/std err/std output for log entries like: log.warn("Failed to locally delete blob “ …) ? I see in the code, that if file deletion fails for whatever the reason, TransientBlobCleanupTask can loop indefinitely trying to remove it over and over again. That might be ok,

Re: Flink application does not scale as expected, please help!

2018-06-19 Thread Fabian Hueske
Hi Siew, The hint about the lower source parallelism compared to the operator parallelism might be the right one. Can you check if all source tasks are scheduled to the same machine? In that case your application might be bottlenecked by the out-going network connection of that single machine.

Re: Flink application does not scale as expected, please help!

2018-06-19 Thread Till Rohrmann
Hi, as Fabian explained, if you exceed the number of slots on a single TM, then Flink needs to deploy tasks on other TMs which causes a network transfer between the sources and the mappers. This will have a negative impact if you compare it to a setup where everything runs locally. The

Re: Flink 1.5.0 no more allow multiple program argument?

2018-06-19 Thread Siew Wai Yow
Thanks Chesnay! I guess it is related to StringJoiner(",") used in Flink 1.5 :( From: Chesnay Schepler Sent: Tuesday, June 19, 2018 1:44 PM To: user@flink.apache.org Subject: Re: Flink 1.5.0 no more allow multiple program argument? For now you'll have to replace