Github user sihuazhou commented on the issue:
https://github.com/apache/flink/pull/5979
Thank you all @StefanRRichter @StephanEwen @bowenli86
---
Github user StefanRRichter commented on the issue:
https://github.com/apache/flink/pull/5979
LGTM, will merge.
---
Github user sihuazhou commented on the issue:
https://github.com/apache/flink/pull/5979
Hi @StefanRRichter I rebased the PR, could you please have a look?
---
Github user sihuazhou commented on the issue:
https://github.com/apache/flink/pull/5979
Hi @StefanRRichter If I'm not sure whether we can do that without `seek()`,
because the `key bytes` is length is not fixed which may lead to delete
wrongly, What do you think?
Sure,
Github user StefanRRichter commented on the issue:
https://github.com/apache/flink/pull/5979
With the approach I outlined, we would not require any `seek()` to the last
key, we can simply create the exclusive end key. Nevertheless, you are right
about the comment that is only in the
Github user sihuazhou commented on the issue:
https://github.com/apache/flink/pull/5979
Hmm...there is another reason, indeed the mainly performance overhead is
the `seek()`. Even though we use the `deleteRange()` to implement this, we also
need to get the last key of the entries
Github user sihuazhou commented on the issue:
https://github.com/apache/flink/pull/5979
@StefanRRichter , the reason I prefer this approach is that:
- From the comment in RocksDB's source we can find that deleteRange()
should be used for deleting big range, what if the
Github user StefanRRichter commented on the issue:
https://github.com/apache/flink/pull/5979
@sihuazhou I wonder why you would chose iterator + batched write over
simply calling `db.deleteRange(...)` where start key is
`serializeCurrentKeyAndNamespace()` and end key is increasing the
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/5979
Okay, looks really good from my side.
Would be good if @StefanRRichter or @azagrebin to double check the change,
otherwise good to go.
---
Github user sihuazhou commented on the issue:
https://github.com/apache/flink/pull/5979
@StephanEwen , I had a micro-benchmark, here is the result
```
-> Batch VS Put <
BATCH: end insert - duration:255
PUT: end insert - duration:545
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/5979
Could you share some micro-benchmark numbers?
When we change something that we know works well to something new, would be
good to understand what benefits we are talking about.
---
Github user bowenli86 commented on the issue:
https://github.com/apache/flink/pull/5979
LGTM +1
---
Github user sihuazhou commented on the issue:
https://github.com/apache/flink/pull/5979
cc @StefanRRichter (This is for 1.6, I just complete it when I have time
currently)
---
13 matches
Mail list logo