Re: yarn-session vs cluster per job for streaming jobs

2019-07-18 Thread Maxim Parkachov
Hi Haibo, thanks for tip, I almost forgot about max-attempts. I understood implication of running with one AM. Maybe my question was incorrect, but what would be faster (with regards to downtime of each job): 1. In case of yarn-session: Parallel cancel all jobs with savepoints, restart

Re: Job Manager becomes irresponsive if the size of the session cluster grows

2019-07-18 Thread Biao Liu
Hi Prakhar, Have you ever checked the garbage collection of master? Which version of Flink are you using? How many TaskManagers in your cluster? Prakhar Mathur 于2019年7月18日周四 下午1:54写道: > Hello, > > We have deployed multiple Flink clusters on Kubernetess with 1 replica of > Jobmanager and

Re: Checkpoints timing out for no apparent reason

2019-07-18 Thread Congxian Qiu
Hi The image did not show. incremental checkpoint includes: 1) flush memtable to sst files; 2) checkpoint of RocksDB; 3) snapshot metadata; 4) upload needed sst files to remote, all the first three steps are in sync part, and the fourth step in async part, could you please check whether the sync

Re: Job Manager becomes irresponsive if the size of the session cluster grows

2019-07-18 Thread Biao Liu
Hi, It seems to be good based on your GC metrics. You could double check the GC log if you enable it. The GC log is more direct. I'm not sure what's happening in your JobManager. But I'm pretty sure that Flink could support far more larger scale cluster than yours. Have you ever checked the log

Fwd: FLink checkpoint,How to calculcate the number of files below the chk folder

2019-07-18 Thread 陈Darling
Darling Andrew D.Lin > 下面是被转发的邮件: > > 发件人: 陈Darling > 主题: FLink checkpoint,How to calculcate the number of files below the chk > folder > 日期: 2019年7月18日 GMT+8 下午4:02:11 > 收件人: user@flink.apache.org > > Hello > >

Re:Re: yarn-session vs cluster per job for streaming jobs

2019-07-18 Thread Haibo Sun
HI, Maxim As far as I understand, it's hard to draw a simple conclusion that who's faster. If the job is smaller (for example, the vertex number and the parallelism are very small), the session is usually faster than the per-job mode. I think the session has the advantage of sharing AM and

AsyncDataStream on key of KeyedStream

2019-07-18 Thread Flavio Pompermaier
Hi to all, I'm trying to exploit async IO in my Flink job. In my use case I use keyed tumbling windows and I'd like to execute the async action only once per key and window (while the AsyncDataStream.unorderedWait execute the async call for every element of my stream) ..is there an easy way to do

Re: Flink and CDC

2019-07-18 Thread miki haiat
I actually thinking about this option as well . Im assuming that the correct way to implement it , is to integrate debezium embedded to source function ? [1] https://github.com/debezium/debezium/tree/master/debezium-embedded On Wed, Jul 17, 2019 at 7:08 PM Flavio Pompermaier wrote: >

Re: Flink and CDC

2019-07-18 Thread Flavio Pompermaier
I think that using Kafka to get CDC events is fine. The problem, in my case, is really about how to proceed: 1) do I need to create Flink tables before reading CDC events or is there a way to automatically creating Flink tables when they gets created via a DDL event (assuming a filter on the name

failed checkpoint with metadata timeout exception

2019-07-18 Thread Yitzchak Lieberman
Hi. I have flink a application that produces to kafka with 3 brokers. When I add 2 brokers that are not up yet it fails the checkpoint (a key in s3) due to timeout error. Do you know what can cause that? Thanks, Yitzchak.

Re: failed checkpoint with metadata timeout exception

2019-07-18 Thread miki haiat
Can you share your logs On Thu, Jul 18, 2019 at 3:22 PM Yitzchak Lieberman < yitzch...@sentinelone.com> wrote: > Hi. > > I have flink a application that produces to kafka with 3 brokers. > When I add 2 brokers that are not up yet it fails the checkpoint (a key in > s3) due to timeout error. > >

Re: failed checkpoint with metadata timeout exception

2019-07-18 Thread Yitzchak Lieberman
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 6 ms. On Thu, Jul 18, 2019 at 3:49 PM miki haiat wrote: > Can you share your logs > > > On Thu, Jul 18, 2019 at 3:22 PM Yitzchak Lieberman < > yitzch...@sentinelone.com> wrote: > >> Hi. >> >> I have flink a

Re: Providing external files to flink classpath

2019-07-18 Thread Maxim Parkachov
Hi Vishwas, took me some time to find out as well. If you have your properties file under lib following will work: val kafkaPropertiesInputStream = getClass.getClassLoader.getResourceAsStream("lib/config/kafka.properties") Hope this helps, Maxim. On Wed, Jul 17, 2019 at 7:23 PM Vishwas

Re: failed checkpoint with metadata timeout exception

2019-07-18 Thread miki haiat
Can you check in the kafka logs what happens when you adding new brokers ? On Thu, Jul 18, 2019 at 4:36 PM Yitzchak Lieberman < yitzch...@sentinelone.com> wrote: > org.apache.kafka.common.errors.TimeoutException: Failed to update metadata > after 6 ms. > > > > > On Thu, Jul 18, 2019 at

Re: Checkpoints timing out for no apparent reason

2019-07-18 Thread spoganshev
The image should be visible now at http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoints-timing-out-for-no-apparent-reason-td28793.html#none It doesn't look like it is a disk performance or network issue. Feels more like some buffer overflowing or timeout due to slightly

1.9 Release Timeline

2019-07-18 Thread Oytun Tez
Hi team, 1.9 is bringing very exciting updates, State Processor API and MapState migrations being two of them. Thank you for all the hard work! I checked the burndown board [1], do you have an estimated timeline for the GA release of 1.9? [1]

Re: [Table API] ClassCastException when converting a table to DataStream

2019-07-18 Thread Rong Rong
Hi Dongwon, Can you provide a bit more information: which Flink version are you using? what is the "sourceTable.getSchema().toRowType()" return? what is the line *".map(a -> a)" *do and can you remove it? if I am understanding correctly, you are also using "time1" as the rowtime, is that want

Re:

2019-07-18 Thread Rong Rong
Hi Tangkailin, If I understand correctly from the snippet, you are trying to invoke this in some sort of window correct? If that's the case, your "apply" method will be invoked every time at the window fire[1]. This means there will be one new instance of the HashMap created each time "apply" is

Re: Apache Flink - Side output time semantics for DataStream

2019-07-18 Thread M Singh
Hey Folks - Just wanted to see if there are any thoughts on this question. ThanksOn Saturday, July 13, 2019, 09:33:15 PM EDT, M Singh wrote: Hi: I wanted to find out what is the timestamp associated with the elements of a stream side output with different stream time characteristics.

Re: Apache Flink - Event time and process time timers with same timestamp

2019-07-18 Thread M Singh
Hey Folks - Just checking if you have any pointers for me.  Thanks for your advice. On Sunday, July 14, 2019, 03:12:25 PM EDT, M Singh wrote: Also, are the event time timers and processing time timers handled separately - ie,  if I register event time timer and then use the same

Writing Flink logs into specific file

2019-07-18 Thread Soheil Pourbafrani
Hi, When we run the Flink application some logs will be generated about the running, in both local and distributed environment. I was wondering if is it possible to save logs into a specified file? I put the following file in the resource directory of the project but it has no effect:

Flink SinkFunction for WebSockets

2019-07-18 Thread Timothy Victor
Hi I'm looking to write a sink function for writing to websockets, in particular ones that speak the WAMP protocol ( https://wamp-proto.org/index.html). Before going down that path, I wanted to ask if a) anyone has done something like that already so I dont reinvent stuff b) any caveats or

Flink s3 wire log

2019-07-18 Thread Vishwas Siravara
Here is my wire log while trying to checkpoint to ecs S3. I see the request got a 404 , does this mean that it can't find the folder *checkpoints . *Since s3 does not have folders, what should I put there ? Thanks so much for all the help that you guys have provided so far. Really appreciate it.

S3 checkpointing exception

2019-07-18 Thread Vishwas Siravara
I am using ecs S3 instance to checkpoint, I use the following configuration. s3.access-key vdna_np_user s3.endpoint https://SU73ECSG**COM:9021 s3.secret-key **I set the checkpoint in the code like env.setStateBackend(*new *FsStateBackend("s3://vishwas.test1/checkpoints")) I have a

Re: Writing Flink logs into specific file

2019-07-18 Thread Caizhi Weng
Hi Soehil, There is a logback.xml in the conf directory. You can modify that and see if it works. For more information about logging please check https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/logging.html Soheil Pourbafrani 于2019年7月19日周五 上午2:03写道: > Hi, > > When we run the

Re: Writing Flink logs into specific file

2019-07-18 Thread Biao Liu
Hi Soheil, > I was wondering if is it possible to save logs into a specified file? Yes, of course. > I put the following file in the resource directory of the project but it has no effect I guess because the log4j has a higher priority. In the document [1], it says "Users willing to use

Re: [Table API] ClassCastException when converting a table to DataStream

2019-07-18 Thread Dongwon Kim
Hi Rong, Thank you for reply :-) which Flink version are you using? I'm using Flink-1.8.0. what is the "sourceTable.getSchema().toRowType()" return? Row(time1: TimeIndicatorTypeInfo(rowtime)) what is the line *".map(a -> a)" *do and can you remove it? *".map(a->a)"* is just to illustrate a

Re: Apache Flink - Manipulating the state of an object in an evictor or trigger

2019-07-18 Thread taher koitawala
As far as I know. It is completely safe On Fri, Jul 19, 2019, 1:35 AM M Singh wrote: > Just wanted to see if there is any advice on this question. Thanks > > On Sunday, July 14, 2019, 09:07:45 AM EDT, M Singh > wrote: > > > Hi: > > Is it safe to manipulate the state of an object in the

Re:Re: Flink 的 log 文件夹下产生了 44G 日志

2019-07-18 Thread Henry
明白啦,谢谢哈。 我以为会显示图片呢。 我贴链接下次。 在 2019-07-18 12:03:57,"zhisheng" 写道: >尴尬了,之前回复的邮件难道都是空白,Henry >你可以把报错信息放到哪个博客里面,然后再这里提供个链接,邮件看不到你的截图错误信息,你可以先根据错误信息定位问题所在,把错误的问题先解决掉,然后再来合理的配置重启策略。 > >Biao Liu 于2019年7月18日周四 上午1:15写道: > >> Hi Henry, >> >> 邮件列表貌似不能支持直接贴图,所以无法理解“里面不停的在产生 error >>

checkpoint 文件夹Chk-no 下面文件个数是能计算出来的吗?

2019-07-18 Thread 陈冬林
state_checkpoints_dir/2d93ffacbddcf363b960317816566552/chk-2903/1e95606a-8f70-4876-ad6f-95e5cc38af86 state_checkpoints_dir/2d93ffacbddcf363b960317816566552/chk-2903/2a012214-734a-4c2b-804b-d96f4f3dddf8

Re: checkpoint 文件夹Chk-no 下面文件个数是能计算出来的吗?

2019-07-18 Thread Yun Tang
Hi A1: chk-x文件下面的文件个数是跟operator个数并行度是有关系的,主要是operator state的文件。对于checkpoint场景,_metadata只是元数据,真实的operator数据都是在其他文件内。 A2: 不可以将这些文件合并在一起。因为_metadata内主要记录了文件路径,如果合并的话,找不到原始路径会有问题,无法从checkpoint进行restore 祝好 唐云 From: 陈冬林 <874269...@qq.com> Sent: Thursday, July 18,

Fwd: checkpoint 文件夹Chk-no 下面文件个数是能计算出来的吗?

2019-07-18 Thread 陈冬林
谢谢您的解答, 那些文件的数量是只和operator的并行度相关吗?是不是还有key 的个数等相关?有没有具体的公式呢?我没有在源码里找到这块的逻辑 还有一个最重要的问题,这些文件即然不能合并,state小文件合并指的是那些文件呢? 祝安 Andrew > 下面是被转发的邮件: > > 发件人: Yun Tang > 主题: 回复: checkpoint 文件夹Chk-no 下面文件个数是能计算出来的吗? > 日期: 2019年7月18日 GMT+8 下午3:24:57 > 收件人: "user-zh@flink.apache.org" >

Re: checkpoint 文件夹Chk-no 下面文件个数是能计算出来的吗?

2019-07-18 Thread Yun Tang
Hi 源码部分可以参考[1] DefaultOperatorStateBackendSnapshotStrategy 执行完成的时候,每个operator state backend 都只会产生至多一个文件。 state小文件合并,你指的应该是FLINK-11937 吧,这里的所谓合并是每个rocksDB state backend创建checkpoint的时候,在一定阈值内,若干sst文件的序列化结果都写到一个文件内。由于keyed

Fwd: checkpoint 文件夹Chk-no 下面文件个数是能计算出来的吗?

2019-07-18 Thread 陈冬林
好的,非常感谢您的解答。 > 下面是被转发的邮件: > > 发件人: Yun Tang > 主题: 回复: checkpoint 文件夹Chk-no 下面文件个数是能计算出来的吗? > 日期: 2019年7月18日 GMT+8 下午4:06:59 > 收件人: "user-zh@flink.apache.org" > 回复-收件人: user-zh@flink.apache.org > > Hi > > 源码部分可以参考[1] DefaultOperatorStateBackendSnapshotStrategy 执行完成的时候,每个operator >

Re: checkpoint 文件夹Chk-no 下面文件个数是能计算出来的吗?

2019-07-18 Thread Yun Tang
hi 首先先要确定是否是大量创造文件导致你的namenode RPC相应堆积多,RPC请求有很多种,例如每个task创建checkpoint目录也是会向namenode发送大量RPC请求的(参见 [https://issues.apache.org/jira/browse/FLINK-11696]);也有可能是你的checkpoint interval太小,导致文件不断被创建和删除(subsume old checkpoint),先找到NN压力大的root cause吧。

请问这些名词,在翻译 Glossary 时,有必要翻译成中文吗?

2019-07-18 Thread highfei2011
Hi 各位, 晚上好! 以下名词在翻译 Glossary 章节时,有必要翻译成中文吗?名词列表如下: Flink Application Cluster Flink Cluster Event ExecutionGraph Function Instance Flink Job JobGraph Flink JobManager Logical Graph Managed State Flink Master Operator Operator Chain Partition Physical Graph

Re: 请问这些名词,在翻译 Glossary 时,有必要翻译成中文吗?

2019-07-18 Thread Zili Chen
没有可援引的通译出处建议专有名词不要翻译。Glossary 的解释部分可以解释得详尽一点,上面像 record task 这些有比较普遍共识的还有商讨空间,像 transformation "operator chain" 强行翻译很可能是懂的人本来就看得懂,不懂的人看了还是不懂。现在不翻译在有通译之后可以改,先根据个人喜好翻译了以后就不好改了。 一点拙见。 Best, tison. highfei2011 于2019年7月18日周四 下午11:35写道: > Hi 各位, > 晚上好! > 以下名词在翻译 Glossary 章节时,有必要翻译成中文吗?名词列表如下:

Re:请问这些名词,在翻译 Glossary 时,有必要翻译成中文吗?

2019-07-18 Thread highfei2011
Hi,Zili Chen: 早上好,你讲的没错,谢谢。另外我发现,Glossary 英文文档中没有 Slot 和 Parallelism 的说明,建议添加。这样可以方便初学者和用户的学习和使用! 祝好 Original Message Subject: Re: 请问这些名词,在翻译 Glossary 时,有必要翻译成中文吗? From: Zili Chen To: user-zh@flink.apache.org CC: 没有可援引的通译出处建议专有名词不要翻译。Glossary 的解释部分可以解释得详尽一点,上面像

Re: 请问这些名词,在翻译 Glossary 时,有必要翻译成中文吗?

2019-07-18 Thread Zili Chen
Hi, 欢迎有 PR 后同步到这个 thread 上 :-) Best, tison. highfei2011 于2019年7月19日周五 上午8:34写道: > Hi,Zili Chen: > 早上好,你讲的没错,谢谢。另外我发现,Glossary 英文文档中没有 Slot 和 Parallelism > 的说明,建议添加。这样可以方便初学者和用户的学习和使用! > > 祝好 > > > > Original Message > Subject: Re: 请问这些名词,在翻译 Glossary 时,有必要翻译成中文吗? >

Flink 的 log 文件夹下产生了 44G 日志

2019-07-18 Thread Henry
大家好,之前那个报错图片大家没看到,重新弄一下。 报错图片链接: https://img-blog.csdnimg.cn/20190719092540880.png https://img-blog.csdnimg.cn/20190719092848500.png 我看报错的原因是,我这里Source用的是ActiveMQ,从昨天早上9点开始运行Flink任务接收消息,到今天早上8点都很正常。然后在今天早上8点4分的时候开始猛报错flink往log文件夹下写日志。第二个图是报错开始,显示ActiveMQ好像超时,然后就是消费者关闭一直猛写log。

Re: checkpoint 文件夹Chk-no 下面文件个数是能计算出来的吗?

2019-07-18 Thread Yun Tang
Hi [https://issues.apache.org/jira/browse/FLINK-11696] 里面目前的PR是我们的生产代码,你可以用。但是你现在的问题的root cause不是这个,而是创建文件和删除文件的请求太多了。可以统计一下目前你们几百个作业的checkpoint interval,一般而言3~5min的间隔就完全足够了,没必要将interval调整得太小,这是一个影响你们整个集群使用的配置,必要时需要告知用户正确的配置。

Re: Flink 的 log 文件夹下产生了 44G 日志

2019-07-18 Thread Caizhi Weng
Hi Henry, 这个 source 看起来不像是 Flink 提供的 source,应该是 source 本身实现的问题。你可能需要修改 source 的源码让它出错后关闭或者进行其它处理... Henry 于2019年7月19日周五 上午9:31写道: > 大家好,之前那个报错图片大家没看到,重新弄一下。 > 报错图片链接: > https://img-blog.csdnimg.cn/20190719092540880.png > https://img-blog.csdnimg.cn/20190719092848500.png > > >

Re: 请问这些名词,在翻译 Glossary 时,有必要翻译成中文吗?

2019-07-18 Thread Jark Wu
Hi highfei, Thanks for bringing up this discussion. I would suggest to move the discussion to the Glossary translation JIRA FLINK-13037 . Thanks, Jark On Fri, 19 Jul 2019 at 09:00, Zili Chen wrote: > Hi, > > 欢迎有 PR 后同步到这个 thread 上 :-) > >

Re: 请问这些名词,在翻译 Glossary 时,有必要翻译成中文吗?

2019-07-18 Thread Jark Wu
Hi, Just find the Glossary translation PR is created [1]. Let's move the discussion there. [1]. https://github.com/apache/flink/pull/9173 On Fri, 19 Jul 2019 at 11:22, Jark Wu wrote: > Hi highfei, > > Thanks for bringing up this discussion. I would suggest to move the > discussion to the

could rest api : /jobs/:jobid/yarn-cancel trigger the savepoint?

2019-07-18 Thread LakeShen
Hi community, I have a question is that could rest api : /jobs/:jobid/yarn-cancel trigger the savepoint? I saw the fink src code, and I find it didn't trigger the savepoint, is it right? Thank you to reply .

Re:Re: 请问这些名词,在翻译 Glossary 时,有必要翻译成中文吗?

2019-07-18 Thread 杨继飞
Hi, Jark Wu ,Thanks I am discussing in there . 在 2019-07-19 11:22:53,"Jark Wu" 写道: >Hi, > >Just find the Glossary translation PR is created [1]. Let's move the >discussion there. > >[1]. https://github.com/apache/flink/pull/9173 > >On Fri, 19 Jul 2019 at 11:22, Jark Wu wrote: > >> Hi