Re: Streaming from a file

2019-08-01 Thread Zhu Zhu
Hi Vishwas, Not sure whether I understand your needs correctly. I think currently readTextFile(path) does return a DataStream. From the code it is emitting one line once it is read from the file, thus in a line-by-line streaming pattern. Thanks, Zhu Zhu Vishwas Siravara 于2019年8月1日周四 下午11:50写道:

Re: Calculate a 4 hour time window for sum,count with sum and count output every 5 mins

2019-08-01 Thread Vijay Balakrishnan
Hi Rafi, I tried your approach with: > windowStream.trigger(ContinuousEventTimeTrigger.of(Time.minutes(5))); > > I can use .trigger with ProcessWindowFunction but it doesn't accumulate data across windows i.e I want to collect data for a 5h window with data sent to output every 5 mins with the

Re: How to implement Multi-tenancy in Flink

2019-08-01 Thread Ahmad Hassan
Hi Fabian, > On 4 Jul 2018, at 11:39, Fabian Hueske wrote: > > - Pre-aggregate records in a 5 minute Tumbling window. However, > pre-aggregation does not work for FoldFunctions. > - Implement the window as a custom ProcessFunction that maintains a state of > 288 events and aggregates and

Re: Best pattern to signal a watermark > t across all tasks?

2019-08-01 Thread Oytun Tez
Perhaps: 1. collect() an item inside onTimer() inside operator#1 2. merge the resulting stream from all keys 3. process the combined stream in operator#2 to see if all keys were processed. you will probably want to keep state in the operator#2 to see if you received items from all

Best pattern to signal a watermark > t across all tasks?

2019-08-01 Thread Eduardo Winpenny Tejedor
Hi all, I have a keyed operator with an hourly event time trigger. On a timer trigger, the operator simply persists some state to a table. I'd like to know when the triggers for all keys have finished so I can send a further signal to the data warehouse, to indicate it has all the necessary data

Re: Converting Metrics from a Reporter to a Custom Events mapping

2019-08-01 Thread Vijay Balakrishnan
Thanks for all your replies.Ended up using a StatsdReporter with Flink and building a statsd plugin to transform the data to my required output format and dump it into a folder that the Kinesis agent can then pick up. On Tue, Jul 16, 2019 at 2:16 AM Chesnay Schepler wrote: > You can configure

Re: StreamingFileSink part file count reset

2019-08-01 Thread sidhartha saurav
Thank you for the clarification Habibo and Andrey. Is there any limitation after which the global counter will reset ? I mean do we have to worry the counter may get too long and part file crosses the max filename length limit set by OS or is it handled by flink. Thanks Sidhartha On Tue, Jul

Streaming from a file

2019-08-01 Thread Vishwas Siravara
Hi guys, Is it possible for flink to stream from a unix file system line by line, when I use readTextFile(path) - Reads text files, i.e. files that respect the TextInputFormat specification, line-by-line and returns them as Strings. The entire contents of the file comes as a datastream, over which

Re: Error while running flink job on local environment

2019-08-01 Thread Vinayak Magadum
Thank you Biao and Nico for the inputs and clarification. Good to know that setDefaultLocalParallelism() will not have any impact on cluster deployment and can be used to solve the problem on local. I will try it out. Thanks, Vinayak On Thu, Aug 1, 2019, 2:22 PM Nico Kruber wrote: > Hi

Re: Timers and Checkpoints

2019-08-01 Thread Andrea Spina
Hi everybody, I think I'm in the same issue above described in https://issues.apache.org/jira/browse/FLINK-6291 . Flink1-6.4 I have had this savepoint with a timer service belonging to a process function. When I restore a new job w/o the former process function ti fails in the following way. What

BucketingSink ????????????

2019-08-01 Thread ????
??flink on yarn??1??TaskManager??4??slot??TaskManager4G??JobManager1G??BucketingSinkhdfs??3??checkpoint??1003300427427*300=128100=125KB80??80*300=24000=23KB??Flink ??

Re: Job manager failing because Flink does not find checkpoints on HDFS

2019-08-01 Thread Farouk
Hi all I am sorry. We found out that's it's a problem in our deployment. The directories in Zookeeper and HDFS are not the same. Thanks for the help Farouk Le jeu. 1 août 2019 à 11:38, Zhu Zhu a écrit : > Hi Farouk, > > This issue does not relate to checkpoints. The JM launching fails due

Re: Job manager failing because Flink does not find checkpoints on HDFS

2019-08-01 Thread Zhu Zhu
Hi Farouk, This issue does not relate to checkpoints. The JM launching fails due to the job's user jar blob is missing on HDFS. Does this issue always happen? If it rarely occurs, the file might be unexpectedly deleted by someone else. Thanks, Zhu Zhu Farouk 于2019年8月1日周四 下午5:22写道: > Hi > > We

Job manager failing because Flink does not find checkpoints on HDFS

2019-08-01 Thread Farouk
Hi We have Flink running on Kubernetes with HDFS. The JM crashed for some reasons. Has anybody already encounter an error like in the logfile attached ? Caused by: java.lang.Exception: Cannot set up the user code libraries: File does not exist:

Re: Custom log appender for YARN

2019-08-01 Thread Biao Liu
Hi Gyula, How about putting the appender class in lib folder, but choosing the appender only for user code through log4j.properties? That way only the user logger would be sent to Kafka. It seems what you want... Thanks, Biao /'bɪ.aʊ/ On Thu, Aug 1, 2019 at 2:25 PM Gyula Fóra wrote: > Hi

Re: Custom log appender for YARN

2019-08-01 Thread Gyula Fóra
Hi Biao, Thanks for the input! This is basically where we got ourselves as well. We are trying to avoid putting things in the lib folder so separating the loggers would be great but we don't have a solution for it yet. Thanks Gyula On Thu, 1 Aug 2019, 07:09 Biao Liu Hi Gyula, > > I guess it

Re: env.readFile只读取新文件

2019-08-01 Thread Biao Liu
恐怕没有现成的,自己写一个,继承 SourceFunction Thanks, Biao /'bɪ.aʊ/ On Wed, Jul 31, 2019 at 4:49 PM 王佩 wrote: > 如下代码: > > DataStreamSource source = env.readFile( > textInputFormat, > "/data/appData/streamingWatchFile/source", >