Re: Setting source vs sink vs window parallelism with data increase

2019-03-01 Thread Padarn Wilson
Hi all again - following up on this I think I've identified my problem as being something else, but would appreciate if anyone can offer advice. After running my stream from sometime, I see that my garbage collector for old generation starts to take a very long time: [image: Screen Shot

Re: 想知道local,flink 在window完成时,发送给sink的数据顺序,这个顺序是怎么确定的?

2019-03-01 Thread thinktothi...@gmail.com
).明白了整理笔记如下:  https://github.com/opensourceteams/fink-maven-scala-2/blob/master/md/miniCluster/flink-sink-order.md Sink 接收数据的顺序(Window发送数据顺序) 概述 - InternalTimerServiceImpl.processingTimeTimersQueue存储着同一个Window中所有Key,取第一个key,调用WindowOperator.onProcessingTime进行处理,并发送给Sink -

想知道local,flink 在window完成时,发送给sink的数据顺序,这个顺序是怎么确定的?

2019-03-01 Thread thinktothi...@gmail.com
【问题】).想知道local,flink 在window完成时,发送给sink的数据顺序,这个顺序是怎么确定的?---).输入数据:1 2 1 3 2).程序:Flink 1.7.2  local wordCount,  dataStream.timeWindow(Time.seconds(10)) ).WindowOperator.onProcessingTime    windowState.stateTable.primaryTable 数据结构          167 =

Flink 1.7.1 Inaccessible

2019-03-01 Thread Seye Jin
I am getting "service temporarily unavailable due to an ongoing leader election" when I try to access Flink UI. The jobmanager has HA configured, I have tried to restart jobmanager multiple times but no luck. I also tried submitting my job from console but I also get the same message. When I view

[1.7.1] job stuck in suspended state

2019-03-01 Thread Steven Wu
We have observe that sometimes job stuck in suspended state, and no job restart/recover were attempted once job is suspended. * it is a high-parallelism job (like close to 2,000) * there were a few job restarts before this * there were high GC pause during the period * zookeeper timeout. probably

Re: Checkpoints and catch-up burst (heavy back pressure)

2019-03-01 Thread Ken Krugler
Hi Arnaud, 1. What’s your checkpoint configuration? Wondering if you’re writing to HDFS, and thus the load you’re putting on it while catching up & checkpointing is too high. If so, then you could monitor the TotalLoad metric (FSNamesystem) in your source, and throttle back the emitting of

订阅

2019-03-01 Thread 刘文
订阅 | 姓名刘文 thinktothi...@163.com 公司名称: 地址: 电话 手机:15910540132 QQ:372065525 | 扫描该二维码,可以将电子名片迅速保存到手机 使用帮助 |

RE: Checkpoints and catch-up burst (heavy back pressure)

2019-03-01 Thread LINZ, Arnaud
Hi, I think I should go into more details to explain my use case. I have one non parallel source (parallelism = 1) that list binary files in a HDFS directory. DataSet emitted by the source is a data set of file names, not file content. These filenames are rebalanced, and sent to workers

Kafka consumer do not commit offset at checkpoint

2019-03-01 Thread Andy Hoang
Hi all, I posted a bug here but its seem is my configuration problem: https://issues.apache.org/jira/browse/FLINK-11335 so I resend this to mailing list My env: AWS EMR 5.20: hadoop, flink plugin flink: 1.62/1.70 run under yarn-cluster

Re: How do I compute the average and keep track of a state over a window in DataStream?

2019-03-01 Thread Felipe Gutierrez
thanks Congxian. I will check Process Function over windows. *--* *-- Felipe Gutierrez* *-- skype: felipe.o.gutierrez* *--* *https://felipeogutierrez.blogspot.com * On Fri, Mar 1, 2019 at 8:16 AM Congxian Qiu wrote: > Hi Felipe > > Maybe you could use

Flink Custom SourceFunction and SinkFunction

2019-03-01 Thread Siew Wai Yow
Hi guys, I have question regarding to the title that need your expertise, 1. I need to build a SFTP SourceFunction, may I know if hadoop SFTPFileSystem suitable? 2. I need to build a SFTP SinkFunction as well, may I know if per-defined HDFS rolling file sink accept SFTP connection since