Yes, you are right. I add log to record the time of seek and find that
sometimes it is very slow. Then I use the rocksdb's files to test locally
and the same problem appears. It is very weird to find that rocksdb's seek
iterate data one by one. Until now, I add cache for rocksdb. The time is
Btw how did you make sure that it is stuck in the seek call and that the trace
does not show different invocations of seek? This can indicate that seek is
slow, but is not yet proof that you are stuck.
> On 22. Nov 2018, at 13:01, liujiangang wrote:
>
> This is not my case. Thank you.
>
>
>
This is not my case. Thank you.
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Hi,
are your RocksDB instances running on local SSDs or on something like EBS? If
have previously seen cases where this happened because some EBS quota was
exhausted and the performance got throttled.
Best,
Stefan
> On 22. Nov 2018, at 09:51, liujiangang wrote:
>
> Thank you very much. I
Thank you very much. I have something to say. Each data is 20KB. The
parallelism is 500 and each taskmanager memory is 10G. The memory is enough,
and I think the parallelism is big enough. Only the intervalJoin thread is
beyond 100% because of rockdb's seek. I am confused that why rockdb's seek
Hi Jiangang,
The IntervalJoin is actually the DataStream-level implementation of the SQL
time-windowed join[1].
To ensure the completeness of the join results, we have to cache all the
records (from both sides) in the most recent time interval. That may lead to
state backend problems when
I am using IntervalJoin function to join two streams within 10 minutes. As
below:
labelStream.intervalJoin(adLogStream)
.between(Time.milliseconds(0), Time.milliseconds(60))
.process(new processFunction())
.sink(kafkaProducer)
labelStream and adLogStream are