Re: When should the RETAIN_ON_CANCELLATION option be used?

2018-09-24 Thread 徐涛
Hi Vino, What is the definition and difference between job cancel and job fails? Can I say that if the program is shutdown artificially, then it is a job cancel, if the program is shutdown due to some error, it is a job fail? This is

Re: When should the RETAIN_ON_CANCELLATION option be used?

2018-09-24 Thread vino yang
Hi Henry, Answer your question: What is the definition and difference between job cancel and job fails? > The cancellation and failure of the job will cause the job to enter the termination state. But cancellation is artificially triggered and normally terminated, while failure is usually a

Re: How to get the location of keytab when using flink on yarn

2018-09-24 Thread Rong Rong
Hi Just a quick thought on this: You might be able to use delegation token to access HBase[1]. It might be a more secure way instead of distributing your keytab over to all the YARN nodes. Hope this helps. -- Rong [1] https://wiki.apache.org/hadoop/Hbase/HBaseTokenAuthentication On Mon, Sep

Re: When should the RETAIN_ON_CANCELLATION option be used?

2018-09-24 Thread 徐涛
Hi All, I mean if I can guarantee that a savepoint can always be made before manually cancelation. If I use DELETE_ON_CANCELLATION option on checkpoints, is there any probability that I do not have a checkpoint to recover from? Thank a a lot. Best Henry > 在

Re: 1.5 Checkpoint metadata location

2018-09-24 Thread vino yang
Hi Bryant, Maybe Stefan can answer your question, ping him for you. Thanks, vino. Bryant Baltes 于2018年9月25日周二 上午12:29写道: > Hi All, > > After upgrading from 1.3.2 to 1.5.2, one of our apps that uses > checkpointing no longer writes metadata files to the state.checkpoints.dir > location

Re: How to get the location of keytab when using flink on yarn

2018-09-24 Thread sanmutongzi
Hi Aljoscha, Sorry for my late response . According to my experience , if the flink-conf.yaml has set the "security.kerberos.login.keytab" and "security.kerberos.login.contexts" with a kerberos file then yarn will ship the keytab file to the TaskManager . Also i can find the log like: " INFO

When should the RETAIN_ON_CANCELLATION option be used?

2018-09-24 Thread 徐涛
Hi All, In flink document, it says DELETE_ON_CANCELLATION: “Delete the checkpoint when the job is cancelled. The checkpoint state will only be available if the job fails.” What is the definition and difference between job cancel and job fails? If I run the program on

Re: Trying to figure out why a slot takes a long time to checkpoint

2018-09-24 Thread Julio Biason
Hey guys, Stefan, Yeah, sorry about the stacks. Completely forgot about them. But I think we figured out why it's taking so long (and yeah, Stefan was right from the start): This specific slot is receiving 5x more records than any other slot (on a recent run, it had 10x more records than the

Re: Flink TaskManagers do not start until job is submitted in YARN

2018-09-24 Thread suraj7
Thanks for the clarification, Dawid and Till. @Till We have a few streaming jobs that need to be running all the time and we plan on using the modify tool to update parallelism of jobs as we scale the cluster in and out and knowing total slots value is crucial to this workflow. As Dawid pointed

1.5 Checkpoint metadata location

2018-09-24 Thread Bryant Baltes
Hi All, After upgrading from 1.3.2 to 1.5.2, one of our apps that uses checkpointing no longer writes metadata files to the state.checkpoints.dir location provided to the flink conf. I see this email chain addressed this here:

Re: Flink TaskManagers do not start until job is submitted in YARN

2018-09-24 Thread Till Rohrmann
Hi Suraj, at the moment Flink's new mode does not support such a behaviour. There are plans to set a min number of running TaskManagers which won't be released. But no work has been done in this direction yet, afaik. If you want, then you can help the community with this effort. Cheers, Till On

Re: Between Checkpoints in Kafka 11

2018-09-24 Thread Piotr Nowojski
Hi, I have nothing more to add. You (Dawid) and Vino explained it correctly :) Piotrek > On 24 Sep 2018, at 15:16, Dawid Wysakowicz wrote: > > Hi Harshvardhan, > > Flink won't buffer all the events between checkpoints. Flink uses Kafka's > transaction, which are committed only on

Re: Between Checkpoints in Kafka 11

2018-09-24 Thread Dawid Wysakowicz
Hi Harshvardhan, Flink won't buffer all the events between checkpoints. Flink uses Kafka's transaction, which are committed only on checkpoints, so the data will be persisted on the Kafka's side, but only available to read once committed. I've cced Piotr, who implemented the Kafka 0.11 connector

Re: Flink TaskManagers do not start until job is submitted in YARN

2018-09-24 Thread Dawid Wysakowicz
Hi Suraj, As far as I know this was changed with FLIP-6 to allow dynamic resource allocation. Till, cced might know if there is a switch to restore old behavior or are there plans to support it. Best, Dawid On 24/09/18 12:24, suraj7 wrote: > Hi, > > I am using Amazon EMR to run Flink Cluster

Re: error with session window

2018-09-24 Thread Dawid Wysakowicz
Hi Yuvraj, It looks as some race condition for me. Would it be ok for you to switch to either Event or Ingestion time[1]? I also cced @Aljosha who might give you a bit more insights Best, Dawid [1]

Re: JMX Configuration: Missing Job Related Beans

2018-09-24 Thread Sayat Satybaldiyev
yep, they're there. thank you! On Mon, Sep 24, 2018 at 12:54 PM 杨力 wrote: > They are provided in taskmanagers. > > Sayat Satybaldiyev 于 2018年9月24日周一 下午6:38写道: > >> Dear all, >> >> While configuring JMX with Flink, I don't see some bean metrics that >> belongs to the job, in particular, the

Re: error with session window

2018-09-24 Thread yuvraj singh
this is my code DataStream cityWithGeoHashesDataStream = filteredGeohashDataStream.keyBy(FilteredGeoHashes::getCity).window( ProcessingTimeSessionWindows.withGap(Time.seconds(4))) .process(new ProcessWindowFunction() { @Override

error with session window

2018-09-24 Thread yuvraj singh
Hi all , I am stuck with this error please help me . I am using sessionwindow 2018-09-23 07:15:08,097 INFO org.apache.flink.runtime.taskmanager.Task - city-geohashes-processor (24/48) (26aed9a769743191c7cb0257087e490a) switched from RUNNING to FAILED.

Re: JMX Configuration: Missing Job Related Beans

2018-09-24 Thread 杨力
They are provided in taskmanagers. Sayat Satybaldiyev 于 2018年9月24日周一 下午6:38写道: > Dear all, > > While configuring JMX with Flink, I don't see some bean metrics that > belongs to the job, in particular, the number in/out records per operator. > I've checked REST API and those numbers provided

JMX Configuration: Missing Job Related Beans

2018-09-24 Thread Sayat Satybaldiyev
Dear all, While configuring JMX with Flink, I don't see some bean metrics that belongs to the job, in particular, the number in/out records per operator. I've checked REST API and those numbers provided there. Does flink provide such bean or there's an additional configuration for it? Here's a

Information required regarding SSL algorithms for Flink 1.5.x

2018-09-24 Thread V N, Suchithra (Nokia - IN/Bangalore)
Hello, We have a query regarding SSL algorithms available for Flink versions. From the documents of Flink 1.6.0 we could see following SSL algorithms options are supported. security.ssl.algorithms:

Flink TaskManagers do not start until job is submitted in YARN

2018-09-24 Thread suraj7
Hi, I am using Amazon EMR to run Flink Cluster on YARN. My setup consists of m4.large instances for 1 master and 2 core nodes. I have started the Flink Cluster on YARN with the command: flink-yarn-session -n 2 -d -tm 4096 -s 4. Flink Job Manager and Application Manager starts but there are no

[ANNOUNCE] Weekly community update #39

2018-09-24 Thread Till Rohrmann
Dear community, this is the weekly community update thread #39. Please post any news and updates you want to share with the community to this thread. # Flink 1.6.1 and Flink 1.5.4 released The community has released new bug fix releases: Flink 1.6.1 and Flink 1.5.4 [1, 2]. # Open source review

Re: Running Flink in Google Cloud Platform (GCP) - can Flink be truly elastic?

2018-09-24 Thread Till Rohrmann
Hi Alexander, the issue for the reactive mode, the mode which reacts to newly available resources and scales the up accordingly, is here: https://issues.apache.org/jira/browse/FLINK-10407. It does not contain a lot of details but we are actively working on publishing the corresponding design

Re: [External] Re: Setting a custom Kryo serializer in Flink-Python

2018-09-24 Thread Chesnay Schepler
I can't really help you here. Digging into the backing java internals isn't supported, and neither is registering a kryo serializer (which is why it isn't exposed in the python environment). The jython-related serialization logic doesn't care about Flink's usual type serialization mechanism,

Re: Flink not running properly.

2018-09-24 Thread Sarabjyotsingh Multani
Thanks, I'll check it out. On Mon, Sep 24, 2018 at 9:49 AM vino yang wrote: > Hi, > > According to the instructions in the script: > > # Long.MAX_VALUE in TB: This is an upper bound, much less direct memory will > be used > TM_MAX_OFFHEAP_SIZE="8388607T" > > > I think you may need to confirm

Re: LocalEnvironment and Python streaming

2018-09-24 Thread Chesnay Schepler
No, this isn't really possible. You need a java process to kick off the processing. The only thing i can come up with is to open the flink-streaming-python module in the IDE and manually call the PythonStreamBinder class with the same arguments that you pass in the CLI as a test. On

Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-09-24 Thread PedroMrChaves
Hello Stefan, Thank you for the help. I've actually lost those logs to due several cluster restarts that we did, which cause log rotation up (limit = 5 versions). Those log lines that i've posted were the only ones that showed signs of some problem. *The configuration of the job is as

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-24 Thread Kostas Kloudas
Hi Averell, Happy to hear that the problem is no longer there and if you have more news from your debugging, let us know. The thing that I wanted to mention is that from what you are describing, the problem does not seem to be related to checkpointing, but to the fact that applying your

Re: error closing kafka

2018-09-24 Thread yuvraj singh
i have one more question , is it possible , if i do keyby on the stream it will get portioned automatically , because i am getting all the data in the same partition in kafka. Thanks Yubraj Singh On Mon, Sep 24, 2018 at 12:34 PM yuvraj singh <19yuvrajsing...@gmail.com> wrote: > I am processing

Re: error closing kafka

2018-09-24 Thread yuvraj singh
I am processing data and then sending it to kafka by kafka sink . this is method where I am producing the data nudgeDetailsDataStream.keyBy(NudgeDetails::getCarSugarID).addSink(NudgeCarLevelProducer.getProducer(config)) .name("nudge-details-producer")

Re: error closing kafka

2018-09-24 Thread miki haiat
What are you trying to do , can you share some code ? This is the reason for the exeption Proceeding to force close the producer since pending requests could not be completed within timeout 9223372036854775807 ms. On Mon, 24 Sep 2018, 9:23 yuvraj singh, <19yuvrajsing...@gmail.com> wrote: > Hi

Re: Flink 1.5.4 -- issues w/ TaskManager connecting to ResourceManager

2018-09-24 Thread alex
We started to see same errors after upgrading to flink 1.6.0 from 1.4.2. We have one JM and 5 TM on kubernetes. JM is running on HA mode. Taskmanagers sometimes are loosing connection to JM and having following error like you have. *2018-09-19 12:36:40,687 INFO

error closing kafka

2018-09-24 Thread yuvraj singh
Hi all , I am getting this error with flink 1.6.0 , please help me . 2018-09-23 07:15:08,846 ERROR org.apache.kafka.clients.producer.KafkaProducer - Interrupted while joining ioThread java.lang.InterruptedException at java.lang.Object.wait(Native Method) at