Re: JobManager container is running beyond physical memory limits

2018-09-25 Thread Till Rohrmann
Hi, what changed between version 1.4.2 and 1.5.2 was the addition of the application level flow control mechanism which changed a bit how the network buffers are configured. This could be a potential culprit. Since you said that the container ran for some time, I'm wondering whether there is

Re: Strange behavior of FsStateBackend checkpoint when local executing

2018-09-25 Thread Till Rohrmann
Hi Henry, which version of Flink are you using. If you could us provide with a working example to reproduce the problem, then I'm sure that we can figure out why it is not working as expected. Cheers, Till On Tue, Sep 25, 2018 at 8:44 AM 徐涛 wrote: > Hi All, > I use using a

Re: "405 HTTP method POST is not supported by this URL" is returned when POST a REST url on Flink on yarn

2018-09-25 Thread Till Rohrmann
Hi Henry, I think when running Flink on Yarn, then you must not go through the Yarn proxy. Instead you should directly send the post request to the node on which the application master runs. When starting a Flink Yarn session via yarn-session.sh, then the web interface URL is printed to stdout,

Re: Flink 1.5.4 -- issues w/ TaskManager connecting to ResourceManager

2018-09-25 Thread Till Rohrmann
Hi Jamie, thanks for the update on how to fix the problem. This is very helpful for the rest of the community. The change of removing the execution mode parameter (FLINK-8696) from the start up scripts was actually released with Flink 1.5.0. That way, the host name became the 2nd parameter. By

Re: Flink 1.5.4 -- issues w/ TaskManager connecting to ResourceManager

2018-09-25 Thread Jamie Grier
Update on this: The issue was the command being used to start the jobmanager: `jobmanager.sh start-foreground cluster`. This was a command leftover in our automation that used to be the correct way to start the JM -- however now, in Flink 1.5.4, that second parameter, `cluster`, is being

Re: Question about Window Tigger

2018-09-25 Thread Till Rohrmann
Hi Chang Liu, maybe you could use the AssignerWithPeriodicWatermarks to build a custom watermark assigner which creates the watermarks based on the incoming events and if it detects that no events are coming, that it progresses the watermark with respect to the wall clock time. Cheers, Till On

Re: Could not find a suitable table factory for 'org.apache.flink.table.factories.DeserializationSchemaFactory' in the classpath.

2018-09-25 Thread Till Rohrmann
Hi, I've pulled in Timo, who can help you with your problem. Cheers, Till On Tue, Sep 25, 2018 at 12:02 PM clay wrote: > hi: > I am using flink's table api, I receive data from kafka, then register it > as > a table, then I use sql statement to process, and finally convert the > result >

Re: Question about Window Tigger

2018-09-25 Thread Chang Liu
Hi Till, You mean use AssignerWithPeriodicWatermarks but combine the logic of Event time and Processing Time (or to say depend on the time difference/interval of Processing time, but the Watermark value is still depending on the Event time right)? One additional question: where do I config

Re: When should the RETAIN_ON_CANCELLATION option be used?

2018-09-25 Thread vino yang
Hi Henry, Your understanding is correct. Checkpoint itself is for recovery purposes. If you cancel a job, Flink thinks it doesn't make sense to save the checkpoint again. If you want to recover after cancel, then you should use cancel with savepoint. So, by default, you don't need to manually

Strange behavior of FsStateBackend checkpoint when local executing

2018-09-25 Thread 徐涛
Hi All, I use using a FsStateBackend in local executing, I set the DELETE_ON_CANCELLATION of checkpoint. When I click the “stop” button in Intellij IDEA, the log shows that it has been switched CANCELED state, but I check the local file system, the checkpoint directory and file still

Re: How to join stream and batch data in Flink?

2018-09-25 Thread vino yang
Hi Henry, If you have converted the mysql table to a flink stream table. In flink table/sql, streams and stream joins can also do this, such as setting the state retention time of one of the tables to be permanent. But when the job is just running, you may not be able to match the results,

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-25 Thread Averell
Hi Kostas, I use PROCESS_CONTINUOUSLY mode, and checkpoint interval of 20 minutes. When I said "Within that 15 minutes, checkpointing process is not triggered though" in my previous email, I was not complaining that checkpoint is not running, but to say that the slowness is not due to ongoing

Flink - Process datastream in a bounded context (like Dataset) - Unifying stream & batch

2018-09-25 Thread bastien dine
Hello everyone, I need to join some files to perform some processing.. The dataset API is a perfect way to achieve this, I am able to do it when I read file in batch (csv) However in the prod environment, I will receive thoses files in kafka messages (one message = one line of a file) So I am

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-09-25 Thread Stefan Richter
Hi, I cannot spot anything bad or „wrong“ about your job configuration. Maybe you can try to save and send the logs if it happens again? Did you observe this only once, often, or is it something that is even reproduceable? Best, Stefan > Am 24.09.2018 um 10:15 schrieb PedroMrChaves : > >

Re: How to join stream and batch data in Flink?

2018-09-25 Thread 徐涛
Hi Vino & Hequn, I am now using the table/sql API, if I import the mysql table as a stream then convert it into a table, it seems that it can also be a workaround for batch/streaming joining. May I ask what is the difference between the UDTF method? Does this implementation has some

Re: When should the RETAIN_ON_CANCELLATION option be used?

2018-09-25 Thread vino yang
Hi Henry, I gave a blue comment in your original email. Thanks, vino. 徐涛 于2018年9月25日周二 下午12:56写道: > Hi Vino, > *What is the definition and difference between job cancel and job fails?* > Can I say that if the program is shutdown artificially, then it is a job > cancel, >

"405 HTTP method POST is not supported by this URL" is returned when POST a REST url on Flink on yarn

2018-09-25 Thread 徐涛
Hi All, I am trying to POST a RESTful url and want to generate a savepoint, the Flink version is 1.6.0. When I executed the POST in local, everything is OK, but when I POST the url on a Flink on YARN application. The following error is returned: “405 HTTP method POST is

Re: JobManager container is running beyond physical memory limits

2018-09-25 Thread Yun Tang
Hi If your JM's container is killed by YARN due to beyond physical memory limit and your job's code is not changed but just bumped the Flink verion , I think you could use jmap command to dump the memory of your JobManager to see the difference between 1.4.2 and 1.5.2, and you could also open

Re: How to join stream and batch data in Flink?

2018-09-25 Thread 徐涛
Hi Vino, I do not quite understand in some sentences below, would you please help explain it a bit more detailedly? 1. “such as setting the state retention time of one of the tables to be permanent” , as I know, the state retention time is a global config, I can not set this

Scheduling sources

2018-09-25 Thread Averell
Hi everyone, I have 2 file sources, which I want to start reading them in a specified order (e.g: source2 should only start 5 minutes after source1 has started). I could not find any Flink document mentioning this capability, and I also tried to search the mailing list, without any success.

Re: Question about Window Tigger

2018-09-25 Thread Chang Liu
Hi Rong, Thanks for your reply. :) Best regards/祝好, Chang Liu 刘畅 > On 19 Sep 2018, at 18:20, Rong Rong wrote: > > Hi Chang, > > There were some previous discussion regarding how to debug watermark and > window triggers[1]. > Basically if there's no data for some partitions there's no way

Could not find a suitable table factory for 'org.apache.flink.table.factories.DeserializationSchemaFactory' in the classpath.

2018-09-25 Thread clay4444
hi: I am using flink's table api, I receive data from kafka, then register it as a table, then I use sql statement to process, and finally convert the result back to a stream, write to a directory, the code looks like this: def main(args: Array[String]): Unit = { val sEnv =

Re: When should the RETAIN_ON_CANCELLATION option be used?

2018-09-25 Thread 徐涛
Hi Vino, So I will use the default setting of DELETE_ON_CANCELLATION. When the program cancels the checkpoint will be deleted, when the program fails,because the checkpoint will not be deleted, I still can have a checkpoint that can be used to resume. Please help to correct me

timewindowall and aggregate(count): count 0 when no event in the window

2018-09-25 Thread Luigi Sgaglione
Hi, I'm trying to count the number of events in a window (every 5 seconds). The code below works fine if there are events in the window, if there are no events in the window no output is emitted. What I want to achieve is a count of 0 when there are no events in the time window of 5 seconds.

Re: Get last element of a DataSe

2018-09-25 Thread Fabian Hueske
Hi, Can you post the full stacktrace? Thanks, Fabian Am Di., 25. Sep. 2018 um 12:55 Uhr schrieb Alejandro Alcalde < algu...@gmail.com>: > Hi, > > I am trying to improve the efficiency of this code: > > discretized.map(_._2) > .name("Map V") > .reduce((_, b) ⇒ b) >

Re: How to join stream and batch data in Flink?

2018-09-25 Thread vino yang
Hi Henry, 1) I don't recommend this method very much, but you said that you expect to convert mysql table to stream and then to flink table. Under this premise, I said that you can do this by joining two stream tables. But as you know, this join depends on the time period in which the state is

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-25 Thread Averell
Thank you Kostas for spending time on my case. Relating to the issue I mentioned, I have another issue caused by having a lot of files to list. From the error msg, I understand that the listing was taking more than 30s, and the JM thought that it hung and killed it. Is that possible to increase

Should Queryable State Server be listening on 127.0.1.1?

2018-09-25 Thread Andrew Kowpak
I'm running into an issue where I am starting a standalone flink cluster in an lxc container. When my TaskManager starts up, the queryable state proxy starts listening on 127.0.1.1:9069. Attempting to connect to that port from outside the container fails. I'm totally willing to believe this is

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-25 Thread Averell
Hi Kostas, Yes, applying the filter on the 100K files takes time, and the delay of 15 minutes I observed definitely caused by that big number of files and the cost of each individual file status check. However, the delay is much smaller when checkpointing is off. Within that 15 minutes,

Re: JobManager container is running beyond physical memory limits

2018-09-25 Thread eSKa
anyone? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-25 Thread Kostas Kloudas
I see, Thanks for the clarification. Cheers, Kostas > On Sep 25, 2018, at 8:51 AM, Averell wrote: > > Hi Kostas, > > I use PROCESS_CONTINUOUSLY mode, and checkpoint interval of 20 minutes. When > I said "Within that 15 minutes, checkpointing process is not triggered > though" in my previous

Re: JobManager container is running beyond physical memory limits

2018-09-25 Thread eSKa
we dont set it up anywhere so i guess its default 16. Do you think its too much? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Scheduling sources

2018-09-25 Thread Averell
Thank you Till. My use case is like this: I have two streams, one is raw data (1), the other is enrichment data (2), which in turn consists of two component: initial enrichment data (2a) which comes from an RDBMS table, and incremental data (2b) which comes from a Kafka stream. To ensure that

Re: "405 HTTP method POST is not supported by this URL" is returned when POST a REST url on Flink on yarn

2018-09-25 Thread 徐涛
Hi Till, Actually I do send to request to the application master: "http://storage4.test.lan:8089/proxy/application_1532081306755_0124/jobs/269608367e0f548f30d98aa4efa2211e/savepoints

Re: Flink - Process datastream in a bounded context (like Dataset) - Unifying stream & batch

2018-09-25 Thread Hequn Cheng
Hi bastien, Could you give more details about your scenario? Do you want to load another file from the same kafka after current file has been processed? I am curious about why you want to join data in a bounded way when the input data is a stream. The stream-stream join outputs same results as

Re: How to join stream and batch data in Flink?

2018-09-25 Thread vino yang
Hi Hequn, The specific content of the book does not give a right or wrong conclusion, but it illustrates this phenomenon: two streams of the same input, playing and joining at the same time, due to the order of events, the connection results are uncertain. This is because the two streams are

Re: ArrayIndexOutOfBoundsException

2018-09-25 Thread Stefan Richter
You only need to update the flink jars, the job requires no update. I think you also cannot start from this checkpoint/savepoint after the upgrade because it seems to be corrupted from the bug. You need to us an older point to restart. Best, Stefan > Am 25.09.2018 um 16:53 schrieb Alexander

Re: 1.5 Checkpoint metadata location

2018-09-25 Thread Gyula Fóra
Yes, the only workaround I found at the end was to restore the previous behavior where metadata files are written separately. But for this you need a custom Flink build with the changes to the check pointing logic. Gyula On Tue, 25 Sep 2018 at 16:45, Till Rohrmann wrote: > Hi Bryant, > > I

Get last element of a DataSe

2018-09-25 Thread Alejandro Alcalde
Hi, I am trying to improve the efficiency of this code: discretized.map(_._2) .name("Map V") .reduce((_, b) ⇒ b) .name("Get Last V") I am just interested in the last element of discretized. I've seen this SO question:

Re: Could not retrieve the redirect address - No REST endpoint has been started

2018-09-25 Thread PedroMrChaves
Hello, Thank you for the reply. The problem sometimes happens when there is a jobmanager failover. I've attached the jobmanager logs for further debugging. flink-flink-jobmanager-1-demchcep00-01.log

Re: How to join stream and batch data in Flink?

2018-09-25 Thread Fabian Hueske
Hi, I don't think that using the current join implementation in the Table API / SQL will work. The non-windowed join fully materializes *both* input tables in state. This is necessary, because the join needs to be able to process updates on either side. While this is not a problem for the fixed

Re: How to join stream and batch data in Flink?

2018-09-25 Thread vino yang
Hi Fabian, I may not have stated it here, and there is no semantic problem at the Flink implementation level. Rather, there may be “Time-dependence” here. [1] Yes, my initial answer was not to use this form of join in this scenario, but Henry said he converted the table into a stream table and

Re: How to get the location of keytab when using flink on yarn

2018-09-25 Thread 杨光
Hi Aljoscha, Sorry for my late response . According to my experience , if the flink-conf.yaml has set the "security.kerberos.login.keytab" and "security.kerberos.login.contexts" with a kerberos file then yarn will ship the keytab file to the TaskManager . Also i can find the log like: " INFO

Intermittent issue with GCS storage

2018-09-25 Thread Heath Albritton
Howdy folks, I'm attempting to get Flink running in a Kubernetes cluster with the ultimate goal of using GCS for checkpoints and savepoints. I've used the helm chart to deploy and followed this guide, modified for 1.6.0:

Re: Could not find a suitable table factory for 'org.apache.flink.table.factories.DeserializationSchemaFactory' in the classpath.

2018-09-25 Thread clay4444
hi Till: I have solve the problem, this reason is the flink-json which is add to pom didn't work must copy the flink-json-xxx.jar to flink path ./lib/ ... -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

RocksDB Read IOPs

2018-09-25 Thread Ning Shi
Hi, I'm benchmarking a job with large state in various window sizes (hourly, daily). I noticed that it would consistently slow down after 30 minutes into the benchmark due to high disk read IOPs. The first 30 minutes were fine, with close to 0 disk IOPs. Then after 30 minutes, read IOPs would

Re: Could not find a suitable table factory for 'org.apache.flink.table.factories.DeserializationSchemaFactory' in the classpath.

2018-09-25 Thread clay4444
hi Till: thank you for your reply there are some comments: I summit my task to yarn with following command: ./bin/flink run -c org.clay.test.TestX flinkTools-1.0.jar my pon look like this: 1.6.0 provided scala-tools.org

Re: Flink - Process datastream in a bounded context (like Dataset) - Unifying stream & batch

2018-09-25 Thread Hequn Cheng
Hi bastien, Flink features two relational APIs, the Table API and SQL. Both APIs are unified APIs for batch and stream processing, i.e., queries are executed with the same semantics on unbounded, real-time streams or bounded[1]. There are also documents about Join[2]. Best, Hequn [1]

Re: Null Flink State

2018-09-25 Thread Dawid Wysakowicz
Hi Taher, As long as you don't put something into the state ValueState#value() will return null. The point for having ctx.globalState(1) and ctx.windowState(2) is to allow users to store some their own state, scoped to key(1) and  key & window(2) accordingly. If you want to access all elements

Re: How to join stream and batch data in Flink?

2018-09-25 Thread Hequn Cheng
Hi vino, There are no order problems of stream-stream join in Flink. No matter what order the elements come, stream-stream join in Flink will output results which consistent with standard SQL semantics. I haven't read the book you mentioned. For join, it doesn't guarantee output orders. You have

Null Flink State

2018-09-25 Thread Taher Koitawala
Hi All, I am trying to access elements stored in the state of the window. As window, itself is a stateful operator I think I should be able to get records in the process function after the is triggered. Can someone tell me why in the following code is the state of the window null?

Re: Null Flink State

2018-09-25 Thread Taher Koitawala
Hi Dawid, Thanks for the answer, how do I get the state of the Window then? I do understand that elements are going to the state as window in itself is a stateful operator. How do I get access to those elements? Regards, Taher Koitawala GS Lab Pune +91 8407979163 On Tue, Sep 25,

Re: Get last element of a DataSe

2018-09-25 Thread Alejandro Alcalde
Yes, of course A IDA Discretization [info] When computing its discretization [info] - Should be computed correctly *** FAILED *** [info] org.apache.flink.runtime.client.JobExecutionException: java.lang.Exception: The user defined 'open()' method caused an exce ption: Index: 0, Size: 0

Re: Flink - Process datastream in a bounded context (like Dataset) - Unifying stream & batch

2018-09-25 Thread bastien dine
Hi Hequn, Thanks for your response Yea I know about the table API, but I am searching a way to have a bounded context with a stream, somehow create a dataset from a buffer store in a window of datastream Regards, Bastien Le mar. 25 sept. 2018 à 14:50, Hequn Cheng a écrit : > Hi bastien, > >

Re: Flink TaskManagers do not start until job is submitted in YARN

2018-09-25 Thread Till Rohrmann
With Flink 1.5.x and 1.6.x you can put `mode: legacy` into your flink-conf.yaml and it will start the old mode. Then you have the old behaviour. What do you mean with total slots? The current number of total slots? With resource elasticity this number can of course change because if you don't

Re: 1.5 Checkpoint metadata location

2018-09-25 Thread Till Rohrmann
Hi Bryant, I think if you explicitly define the StateBackend in your code (calling StreamExecutionEnvironment#setStateBackend), then you also define the checkpointing directory when calling the StateBackend's constructor. This is also the directory in which the metadata files are stored. You

Re: Scheduling sources

2018-09-25 Thread Till Rohrmann
Hi Averell, such a feature is currently not supported by Flink. The scheduling works by starting all sources at the same time. Depending whether it is a batch or streaming job, you either start deploying consumers once producers have produced some results or right away. Cheers, Till On Tue, Sep

Re: ArrayIndexOutOfBoundsException

2018-09-25 Thread Stefan Richter
Hi, this problem looks like https://issues.apache.org/jira/browse/FLINK-8836 which would also match to your Flink version. I suggest to update to 1.4.3 or higher to avoid the issue in the future. Best, Stefan > Am 25.09.2018 um 16:37 schrieb

Re: ArrayIndexOutOfBoundsException

2018-09-25 Thread Alexander Smirnov
Thanks Stefan. is it only Flink runtime should be updated, or the job should be recompiled too? Is there a workaround to start the job without upgrading Flink? Alex On Tue, Sep 25, 2018 at 5:48 PM Stefan Richter wrote: > Hi, > > this problem looks like

Re: Rocksdb Metrics

2018-09-25 Thread Stefan Richter
Hi, this feature is tracked here https://issues.apache.org/jira/browse/FLINK-10423 Best, Stefan > Am 25.09.2018 um 17:51 schrieb Sayat Satybaldiyev : > > Flink provides a rich number of metrics. However, I didn't find any metrics > for

Re: Flink 1.5.4 -- issues w/ TaskManager connecting to ResourceManager

2018-09-25 Thread Jamie Grier
Anybody else seen this and know the solution? We're dead in the water with Flink 1.5.4. On Sun, Sep 23, 2018 at 11:46 PM alex wrote: > We started to see same errors after upgrading to flink 1.6.0 from 1.4.2. We > have one JM and 5 TM on kubernetes. JM is running on HA mode. Taskmanagers >

Re: Information required regarding SSL algorithms for Flink 1.5.x

2018-09-25 Thread Till Rohrmann
Flink 1.5.x supports the same set of algorithms as does Flink 1.6. However, it mainly depends on the used Java version which algorithms you can use. There are certain Java versions which don't support all cipher suites. See https://issues.apache.org/jira/browse/FLINK-9424 for example. Cheers,

ArrayIndexOutOfBoundsException

2018-09-25 Thread Alexander Smirnov
I'm getting an exception on job starting from a savepoint. Why that could happen? Flink 1.4.2 java.lang.IllegalStateException: Could not initialize operator state backend. at

Re: ArrayIndexOutOfBoundsException

2018-09-25 Thread Alexander Smirnov
Appreciate your help, Stefan!  On Tue, 25 Sep 2018 at 18:19, Stefan Richter wrote: > You only need to update the flink jars, the job requires no update. I > think you also cannot start from this checkpoint/savepoint after the > upgrade because it seems to be corrupted from the bug. You need to

Rocksdb Metrics

2018-09-25 Thread Sayat Satybaldiyev
Flink provides a rich number of metrics. However, I didn't find any metrics for rocksdb state backend not in metrics doc nor in JMX Mbean. Is there are any metrics for the rocksdb backend that Flink exposes?

Re: How to join stream and batch data in Flink?

2018-09-25 Thread Hequn Cheng
Hi vino, Thanks for sharing the link. It's a great book and I will take a look. There are kinds of join. Different joins have different semantics. From the link, I think it means the time versioned join. FLINK-9712 enrichments joins with Time