Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

2019-02-14 Thread Yun Tang
Hi When 'RollingSink' try to initialize state, it would first check current file system supported truncate method. If file system not supported, it would use another work-around solution, which means you should not meet the problem. Otherwise 'RollingSink' thought and found the reflection

Re: Window elements for certain period for delayed processing

2019-02-14 Thread Fabian Hueske
Hi, I would not use a window for that. Implementing the logic with a ProcessFunction seems more straight-forward. The function simply collects all events between 00:00 and 01:00 in a ListState and emits them when the time passes 01:00. All other records are simply forwarded. Best, Fabian Am

订阅Apache Flink 中文邮件列表

2019-02-14 Thread fysoft2006
您好,订阅Apache Flink 中文邮件列表,谢谢!

Re: Incorrect Javadoc in CheckpointedFunction.java?

2019-02-14 Thread Congxian Qiu
Hi Chirag I think the doc is outdated, the comments in CheckpointFuncion.java on master now[1] is `get the state data structure for the per-partition state` [1] 

Re: Window elements for certain period for delayed processing

2019-02-14 Thread Congxian Qiu
Hi, simpleusr Maybe custom trigger[1] can be helpful. [1]  https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/operators/windows.html#triggers Best, Congxian On Feb 15, 2019, 13:15 +0800, simpleusr , wrote: > Hi, > > My ultimate requirement is to stop processing of certain

Incorrect Javadoc in CheckpointedFunction.java?

2019-02-14 Thread Chirag Dewan
Hi, I was going through the Javadoc for CheckpointedFunction.java, it says that: * // get the state data structure for the per-key state * countPerKey = context.getKeyedStateStore().getReducingState( * new ReducingStateDescriptor<>("perKeyCount", new

Window elements for certain period for delayed processing

2019-02-14 Thread simpleusr
Hi, My ultimate requirement is to stop processing of certain events between 00:00:00 and 01:00:00 for each day (Time is in HH:mm:SS format). I am flink newbie and I thought only option to delay elements is to collect them in a window between 00:00:00 and 01:00:00 for each day.

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread jincheng sun
Hi Stephan, Thanks for the clarification! You are right, we have never initiated a discussion about supporting OVER Window on DataStream, we can discuss it in a separate thread. I agree with you add the item after move the discussion forward. +1 for putting the roadmap on the website. +1 for

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread zhijiang
Thanks Stephan for this proposal and I totally agree with it. It is very necessary to summarize the overall features/directions the community is going or planning to go. Although I almost checked the mailing list everyday, it still seems difficult to trace everything. In addtion I think this

Re: Linkage error when using DropwizardMeterWrapper

2019-02-14 Thread shkob1
Hey Jayant. Getting the same using gradle. my metrics reporter and my application both using the flink-metrics-dropwizard dependency for reporting Meters. how should i be solving it? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: ElasticSearchSink - retrying doesn't work in ActionRequestFailureHandler

2019-02-14 Thread Averell
Thank you Gordon and Ken. My Flink job is now running well with 1.7.2 RC1, with failed ES request retried successfully. One more question I have on this is how to limit the number of retries for different types of errors with ES bulk request. Is there any guideline on that? My temporary

[ANNOUNCEMENT] March 2019 Bay Area Apache Flink Meetup

2019-02-14 Thread Xuefu Zhang
Hi all, I'm very excited to announce that the community is planning the next meetup in Bay Area on March 25, 2019. The event is just announced on Meetup.com [1]. To make the event successful, your participation and help will be needed. Currently, we are looking for an organization that can host

Re: Impact of occasional big pauses in stream processing

2019-02-14 Thread Rong Rong
Hi Ajay, Yes, Andrey is right. I was actually missing the first basic but important point: If your process function is stuck, it will immediately block that thread. >From your description, what it sounds like is that not all the messages you consume from kafka actually triggers the processing

Re: StandAlone job on k8s fails with "Unknown method truncate" on restore

2019-02-14 Thread Vishal Santoshi
And yes cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1 b'coz of this. java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer at

StandAlone job on k8s fails with "Unknown method truncate" on restore

2019-02-14 Thread Vishal Santoshi
The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone cluster on k8s, * get the job running * kill the pod. The k8s deployment relaunches the pod but fails with java.io.IOException: Missing data in tmp file:

Re: Flink 1.6 Yarn Session behavior

2019-02-14 Thread Jins George
Thanks Gary. Understood the behavior. I am leaning towards running 7 TM on each machine(8 core), I have 4 nodes, that will end up 28 taskmanagers and 1 job manager. I was wondering if this can bring additional burden on jobmanager? Is it recommended? Thanks, Jins George On 2/14/19 8:49 AM,

Re: No resource available error while testing HA

2019-02-14 Thread Gary Yao
Hi Averell, The TM containers fetch the Flink binaries and config files form HDFS (or another DFS if configured) [1]. I think you should be able to change the log level by patching the logback configuration in HDFS, and kill all Flink containers on all hosts. If you are running an HA setup, your

Re: Flink 1.6 Yarn Session behavior

2019-02-14 Thread Gary Yao
Hi Jins George, This has been asked before [1]. The bottom line is that you currently cannot pre-allocate TMs and distribute your tasks evenly. You might be able to achieve a better distribution across hosts by configuring fewer slots in your TMs. Best, Gary [1]

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread Rong Rong
Hi Stephan, Thanks for the clarification, yes I think these issues has already been discussed in previous mailing list threads [1,2,3]. I also agree that updating the "official" roadmap every release is a very good idea to avoid frequent update. One question I might've been a bit confusion is:

Re: long lived standalone job session cluster in kubernetes

2019-02-14 Thread Heath Albritton
My team and I are keen to help out with testing and review as soon as there is a pill request. -H > On Feb 11, 2019, at 00:26, Till Rohrmann wrote: > > Hi Heath, > > I just learned that people from Alibaba already made some good progress with > FLINK-9953. I'm currently talking to them in

Synchronize reading from two Kafka Topics of different size

2019-02-14 Thread Olle Noren
Hi, We have a Flink job were we are trying to window join two datastreams originating from two different Kafka topics, where one topic contains a lot more data per time instance than the other one. We use event time processing, and this all works fine when running our pipeline live, i.e. data

Re: Impact of occasional big pauses in stream processing

2019-02-14 Thread Aggarwal, Ajay
Thank you Rong and Andrey. The blog and your explanation was very useful. In my use case, source stream (kafka based) contains messages that capture some “work” that needs to be done for a tenant. It’s a multi-tenant source stream. I need to queue up (and execute) this work per tenant in the

Re: TaskManager gets confused after the JobManager restarts

2019-02-14 Thread Ethan Li
The related job manager log is https://gist.github.com/Ethanlm/86a10e786ad9025ddaa27c113c536da8 > On Feb 14, 2019, at 9:40 AM, Ethan Li wrote: > > Hello, > > I have a standalone flink-1.4.2 cluster with one JobManager, one TaskManager, > and zookeeper. I first started JM and TM and waited

TaskManager gets confused after the JobManager restarts

2019-02-14 Thread Ethan Li
Hello, I have a standalone flink-1.4.2 cluster with one JobManager, one TaskManager, and zookeeper. I first started JM and TM and waited for them to be stable. Then I restarted JM. It’s when the TM got confused. TM got notified that Leader node has changed and it tried to register to the new

Re: How to register TypeInfoFactory for 'external' class

2019-02-14 Thread Alexey Trenikhun
Hi Gordon, This class is used for states, in/out parameters and as key. As you wrote, there is no problem with usage in states - I just specify TypeInformation in descriptor. With return value of process function, I tried .process(new MyProcessFunction()) .returns(MyTypeInformation), it works,

Re: fllink 1.7.1 and RollingFileSink

2019-02-14 Thread Vishal Santoshi
Awesome, thanks! Will open a new thread. But yes the inprogress file was helpful. On Thu, Feb 14, 2019, 7:50 AM Kostas Kloudas Hi Vishal, > > For the StreamingFileSink vs Rolling/BucketingSink: > - you can use the StreamingFileSink instead of the Rolling/BucketingSink. > You can see the

Re: fllink 1.7.1 and RollingFileSink

2019-02-14 Thread Kostas Kloudas
Hi Vishal, For the StreamingFileSink vs Rolling/BucketingSink: - you can use the StreamingFileSink instead of the Rolling/BucketingSink. You can see the StreamingFileSink as an evolution of the previous two. In the StreamingFileSink the files in Pending state are not renamed, but they keep

Re: Data loss when restoring from savepoint

2019-02-14 Thread Konstantin Knauf
Hi Juho, * does the output of the streaming job contain any data, which is not >> contained in the batch > > > No. > > * do you know if all lost records are contained in the last savepoint you >> took before the window fired? This would mean that no records are lost >> after the last restore. >

Flink DataStream: A few dates are getting through very slowly

2019-02-14 Thread Marke Builder
Hi, I'm using a simply streaming app with processing time and without states. The app read from kafka, transform the data and write the data to the storage (redis). But I see an interesting behavior, a few dates are getting through very slowly. Do you have any idea why this could be? Best,

Re: Re: Re: How to clear state immediately after a keyed window is processed?

2019-02-14 Thread Kumar Bolar, Harshith
Thanks a lot :D From: Konstantin Knauf Date: Thursday, 14 February 2019 at 5:38 PM To: Harshith Kumar Bolar , user Subject: [External] Re: Re: How to clear state immediately after a keyed window is processed? Yes, for processing-time windows the clean up time is exactly the end time of the

Re: Re: How to clear state immediately after a keyed window is processed?

2019-02-14 Thread Konstantin Knauf
Yes, for processing-time windows the clean up time is exactly the end time of the window, because by definition there is no late data and state does not need to be kept around. On Thu, Feb 14, 2019 at 1:03 PM Kumar Bolar, Harshith wrote: > Thanks Konstanin, > > > > But I’m using processing

Re: Data loss when restoring from savepoint

2019-02-14 Thread Juho Autio
Thanks Konstantin! I'll try to see if I can prepare code & conf to be shared as fully as possible. In the meantime: * does the output of the streaming job contain any data, which is not > contained in the batch No. * do you know if all lost records are contained in the last savepoint you >

Re: How to clear state immediately after a keyed window is processed?

2019-02-14 Thread Konstantin Knauf
Hi Harshith, when you use Flink's Windowing API, the state of an event time window is cleared once the watermark passes the end time of the window (that's when the window fires) + the allowed lateness. So, as long as you don't configure additional allowed lateness (default=0), Flink will already

Flink

2019-02-14 Thread 龚文洲
Flink

订阅

2019-02-14 Thread Wu

Re: fllink 1.7.1 and RollingFileSink

2019-02-14 Thread Vishal Santoshi
Thanks Fabian, more questions 1. I had on k8s standlone job env.getCheckpointConfig().setFailOnCheckpointingErrors(true)// the default. The job failed on chkpoint and I would have imagined that under HA the job would restore from the last checkpoint but it did not ( The UI showed the job had

How to clear state immediately after a keyed window is processed?

2019-02-14 Thread Kumar Bolar, Harshith
Hi all, My application uses a keyed window that is keyed by a function of timestamp. This means once that particular window has been fired and processed, there is no use in keeping that key active because there is no way that particular key will appear again. Because this use case involves

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread Stephan Ewen
I think the website is better as well. I agree with Fabian that the wiki is not so visible, and visibility is the main motivation. This type of roadmap overview would not be updated by everyone - letting committers update the roadmap means the listed threads are actually happening at the moment.

Re: Dataset statistics

2019-02-14 Thread Flavio Pompermaier
No effort in this direction, then? I had a try using SQL on Table API but I fear that the generated plan is not the optimal one..I'm looking for an efficient way to implement describe() method on a table or dataset/datasource On Fri, Feb 8, 2019 at 10:35 AM Flavio Pompermaier wrote: > Hi to

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread Fabian Hueske
Hi, I like the idea of putting the roadmap on the website because it is much more visible (and IMO more credible, obligatory) there. However, I share the concerns about frequent updates. It think it would be great to update the "official" roadmap on the website once per release (-bugfix

Re: Impact of occasional big pauses in stream processing

2019-02-14 Thread Andrey Zagrebin
Hi Ajay, Technically, it will immediately block the thread of MyKeyedProcessFunction subtask scheduled to some slot and basically block processing of the key range assigned to this subtask. Practically, I agree with Rong's answer. Depending on the topology of your inputStream, it can eventually

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread Jeff Zhang
Hi Stephan, Thanks for this proposal. It is a good idea to track the roadmap. One suggestion is that it might be better to put it into wiki page first. Because it is easier to update the roadmap on wiki compared to on flink web site. And I guess we may need to update the roadmap very often at the

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread Stephan Ewen
Thanks Jincheng and Rong Rong! I am not deciding a roadmap and making a call on what features should be developed or not. I was only collecting broader issues that are already happening or have an active FLIP/design discussion plus committer support. Do we have that for the suggested issues as

Re: Data loss when restoring from savepoint

2019-02-14 Thread Konstantin Knauf
Hi Juho, you are right the problem has actually been narrowed down quite a bit over time. Nevertheless, sharing the code (incl. flink-conf.yaml) might be a good idea. Maybe something strikes the eye, that we have not thought about so far. If you don't feel comfortable sharing the code on the ML,

Re: Broadcast state before events stream consumption

2019-02-14 Thread Konstantin Knauf
Hi Chirag, Broadcast state is checkpointed, hence the savepoint would contain it. Best, Konstantin On Wed, Feb 13, 2019 at 4:04 PM Chirag Dewan wrote: > Hi Konstantin, > > For the second solution, would savepoint persist the Broadcast state in > State backend? Because I am aware that