Re: IntervalJoin is stucked in rocksdb'seek for too long time in flink-1.6.2

2018-11-22 Thread liujiangang
Yes, you are right. I add log to record the time of seek and find that sometimes it is very slow. Then I use the rocksdb's files to test locally and the same problem appears. It is very weird to find that rocksdb's seek iterate data one by one. Until now, I add cache for rocksdb. The time is

Re: IntervalJoin is stucked in rocksdb'seek for too long time in flink-1.6.2

2018-11-22 Thread Stefan Richter
Btw how did you make sure that it is stuck in the seek call and that the trace does not show different invocations of seek? This can indicate that seek is slow, but is not yet proof that you are stuck. > On 22. Nov 2018, at 13:01, liujiangang wrote: > > This is not my case. Thank you. > > >

Re: IntervalJoin is stucked in rocksdb'seek for too long time in flink-1.6.2

2018-11-22 Thread liujiangang
This is not my case. Thank you. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: IntervalJoin is stucked in rocksdb'seek for too long time in flink-1.6.2

2018-11-22 Thread Stefan Richter
Hi, are your RocksDB instances running on local SSDs or on something like EBS? If have previously seen cases where this happened because some EBS quota was exhausted and the performance got throttled. Best, Stefan > On 22. Nov 2018, at 09:51, liujiangang wrote: > > Thank you very much. I

Re: IntervalJoin is stucked in rocksdb'seek for too long time in flink-1.6.2

2018-11-22 Thread liujiangang
Thank you very much. I have something to say. Each data is 20KB. The parallelism is 500 and each taskmanager memory is 10G. The memory is enough, and I think the parallelism is big enough. Only the intervalJoin thread is beyond 100% because of rockdb's seek. I am confused that why rockdb's seek

Re: IntervalJoin is stucked in rocksdb'seek for too long time in flink-1.6.2

2018-11-21 Thread Xingcan Cui
Hi Jiangang, The IntervalJoin is actually the DataStream-level implementation of the SQL time-windowed join[1]. To ensure the completeness of the join results, we have to cache all the records (from both sides) in the most recent time interval. That may lead to state backend problems when

IntervalJoin is stucked in rocksdb'seek for too long time in flink-1.6.2

2018-11-20 Thread liujiangang
I am using IntervalJoin function to join two streams within 10 minutes. As below: labelStream.intervalJoin(adLogStream) .between(Time.milliseconds(0), Time.milliseconds(60)) .process(new processFunction()) .sink(kafkaProducer) labelStream and adLogStream are