[jira] [Updated] (FLINK-11050) When IntervalJoin, get left or right buffer's entries more quickly by assigning lowerBound
[ https://issues.apache.org/jira/browse/FLINK-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-11050: --- Labels: auto-deprioritized-major auto-deprioritized-minor performance pull-request-available (was: auto-deprioritized-major performance pull-request-available stale-minor) Priority: Not a Priority (was: Minor) This issue was labeled "stale-minor" 7 days ago and has not received any updates so it is being deprioritized. If this ticket is actually Minor, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > When IntervalJoin, get left or right buffer's entries more quickly by > assigning lowerBound > -- > > Key: FLINK-11050 > URL: https://issues.apache.org/jira/browse/FLINK-11050 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.6.2, 1.7.0 >Reporter: Liu >Priority: Not a Priority > Labels: auto-deprioritized-major, auto-deprioritized-minor, > performance, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When IntervalJoin, it is very slow to get left or right buffer's entries. > Because we have to scan all buffer's values, including the deleted values > which are out of time range. These deleted values's processing consumes too > much time in RocksDB's level 0. Since lowerBound is known, it can be > optimized by seek from the timestamp of lowerBound. > Our usage is like below: > {code:java} > labelStream.keyBy(uuid).intervalJoin(adLogStream.keyBy(uuid)) >.between(Time.milliseconds(0), Time.milliseconds(60)) >.process(new processFunction()) >.sink(kafkaProducer) > {code} > Our data is huge. The job always runs for an hour and is stuck by > RocksDB's seek when get buffer's entries. We use rocksDB's data to simulate > the problem RocksDB and find that it takes too much time in deleted values. > So we decide to optimize it by assigning the lowerBound instead of global > search. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-11050) When IntervalJoin, get left or right buffer's entries more quickly by assigning lowerBound
[ https://issues.apache.org/jira/browse/FLINK-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-11050: --- Labels: auto-deprioritized-major performance pull-request-available stale-minor (was: auto-deprioritized-major performance pull-request-available) I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help the community manage its development. I see this issues has been marked as Minor but is unassigned and neither itself nor its Sub-Tasks have been updated for 180 days. I have gone ahead and marked it "stale-minor". If this ticket is still Minor, please either assign yourself or give an update. Afterwards, please remove the label or in 7 days the issue will be deprioritized. > When IntervalJoin, get left or right buffer's entries more quickly by > assigning lowerBound > -- > > Key: FLINK-11050 > URL: https://issues.apache.org/jira/browse/FLINK-11050 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.6.2, 1.7.0 >Reporter: Liu >Priority: Minor > Labels: auto-deprioritized-major, performance, > pull-request-available, stale-minor > Time Spent: 10m > Remaining Estimate: 0h > > When IntervalJoin, it is very slow to get left or right buffer's entries. > Because we have to scan all buffer's values, including the deleted values > which are out of time range. These deleted values's processing consumes too > much time in RocksDB's level 0. Since lowerBound is known, it can be > optimized by seek from the timestamp of lowerBound. > Our usage is like below: > {code:java} > labelStream.keyBy(uuid).intervalJoin(adLogStream.keyBy(uuid)) >.between(Time.milliseconds(0), Time.milliseconds(60)) >.process(new processFunction()) >.sink(kafkaProducer) > {code} > Our data is huge. The job always runs for an hour and is stuck by > RocksDB's seek when get buffer's entries. We use rocksDB's data to simulate > the problem RocksDB and find that it takes too much time in deleted values. > So we decide to optimize it by assigning the lowerBound instead of global > search. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (FLINK-11050) When IntervalJoin, get left or right buffer's entries more quickly by assigning lowerBound
[ https://issues.apache.org/jira/browse/FLINK-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-11050: --- Labels: auto-deprioritized-major performance pull-request-available (was: performance pull-request-available stale-major) > When IntervalJoin, get left or right buffer's entries more quickly by > assigning lowerBound > -- > > Key: FLINK-11050 > URL: https://issues.apache.org/jira/browse/FLINK-11050 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.6.2, 1.7.0 >Reporter: Liu >Priority: Major > Labels: auto-deprioritized-major, performance, > pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When IntervalJoin, it is very slow to get left or right buffer's entries. > Because we have to scan all buffer's values, including the deleted values > which are out of time range. These deleted values's processing consumes too > much time in RocksDB's level 0. Since lowerBound is known, it can be > optimized by seek from the timestamp of lowerBound. > Our usage is like below: > {code:java} > labelStream.keyBy(uuid).intervalJoin(adLogStream.keyBy(uuid)) >.between(Time.milliseconds(0), Time.milliseconds(60)) >.process(new processFunction()) >.sink(kafkaProducer) > {code} > Our data is huge. The job always runs for an hour and is stuck by > RocksDB's seek when get buffer's entries. We use rocksDB's data to simulate > the problem RocksDB and find that it takes too much time in deleted values. > So we decide to optimize it by assigning the lowerBound instead of global > search. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-11050) When IntervalJoin, get left or right buffer's entries more quickly by assigning lowerBound
[ https://issues.apache.org/jira/browse/FLINK-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-11050: --- Priority: Minor (was: Major) > When IntervalJoin, get left or right buffer's entries more quickly by > assigning lowerBound > -- > > Key: FLINK-11050 > URL: https://issues.apache.org/jira/browse/FLINK-11050 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.6.2, 1.7.0 >Reporter: Liu >Priority: Minor > Labels: auto-deprioritized-major, performance, > pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When IntervalJoin, it is very slow to get left or right buffer's entries. > Because we have to scan all buffer's values, including the deleted values > which are out of time range. These deleted values's processing consumes too > much time in RocksDB's level 0. Since lowerBound is known, it can be > optimized by seek from the timestamp of lowerBound. > Our usage is like below: > {code:java} > labelStream.keyBy(uuid).intervalJoin(adLogStream.keyBy(uuid)) >.between(Time.milliseconds(0), Time.milliseconds(60)) >.process(new processFunction()) >.sink(kafkaProducer) > {code} > Our data is huge. The job always runs for an hour and is stuck by > RocksDB's seek when get buffer's entries. We use rocksDB's data to simulate > the problem RocksDB and find that it takes too much time in deleted values. > So we decide to optimize it by assigning the lowerBound instead of global > search. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-11050) When IntervalJoin, get left or right buffer's entries more quickly by assigning lowerBound
[ https://issues.apache.org/jira/browse/FLINK-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flink Jira Bot updated FLINK-11050: --- Labels: performance pull-request-available stale-major (was: performance pull-request-available) > When IntervalJoin, get left or right buffer's entries more quickly by > assigning lowerBound > -- > > Key: FLINK-11050 > URL: https://issues.apache.org/jira/browse/FLINK-11050 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.6.2, 1.7.0 >Reporter: Liu >Priority: Major > Labels: performance, pull-request-available, stale-major > Time Spent: 10m > Remaining Estimate: 0h > > When IntervalJoin, it is very slow to get left or right buffer's entries. > Because we have to scan all buffer's values, including the deleted values > which are out of time range. These deleted values's processing consumes too > much time in RocksDB's level 0. Since lowerBound is known, it can be > optimized by seek from the timestamp of lowerBound. > Our usage is like below: > {code:java} > labelStream.keyBy(uuid).intervalJoin(adLogStream.keyBy(uuid)) >.between(Time.milliseconds(0), Time.milliseconds(60)) >.process(new processFunction()) >.sink(kafkaProducer) > {code} > Our data is huge. The job always runs for an hour and is stuck by > RocksDB's seek when get buffer's entries. We use rocksDB's data to simulate > the problem RocksDB and find that it takes too much time in deleted values. > So we decide to optimize it by assigning the lowerBound instead of global > search. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-11050) When IntervalJoin, get left or right buffer's entries more quickly by assigning lowerBound
[ https://issues.apache.org/jira/browse/FLINK-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu updated FLINK-11050: Fix Version/s: (was: 1.7.1) > When IntervalJoin, get left or right buffer's entries more quickly by > assigning lowerBound > -- > > Key: FLINK-11050 > URL: https://issues.apache.org/jira/browse/FLINK-11050 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.6.2, 1.7.0 >Reporter: Liu >Priority: Major > Labels: performance, pull-request-available > > When IntervalJoin, it is very slow to get left or right buffer's entries. > Because we have to scan all buffer's values, including the deleted values > which are out of time range. These deleted values's processing consumes too > much time in RocksDB's level 0. Since lowerBound is known, it can be > optimized by seek from the timestamp of lowerBound. > Our usage is like below: > {code:java} > labelStream.keyBy(uuid).intervalJoin(adLogStream.keyBy(uuid)) >.between(Time.milliseconds(0), Time.milliseconds(60)) >.process(new processFunction()) >.sink(kafkaProducer) > {code} > Our data is huge. The job always runs for an hour and is stuck by > RocksDB's seek when get buffer's entries. We use rocksDB's data to simulate > the problem RocksDB and find that it takes too much time in deleted values. > So we decide to optimize it by assigning the lowerBound instead of global > search. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11050) When IntervalJoin, get left or right buffer's entries more quickly by assigning lowerBound
[ https://issues.apache.org/jira/browse/FLINK-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-11050: --- Labels: performance pull-request-available (was: performance) > When IntervalJoin, get left or right buffer's entries more quickly by > assigning lowerBound > -- > > Key: FLINK-11050 > URL: https://issues.apache.org/jira/browse/FLINK-11050 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.6.2, 1.7.0 >Reporter: Liu >Priority: Major > Labels: performance, pull-request-available > Fix For: 1.7.1 > > > When IntervalJoin, it is very slow to get left or right buffer's entries. > Because we have to scan all buffer's values, including the deleted values > which are out of time range. These deleted values's processing consumes too > much time in RocksDB's level 0. Since lowerBound is known, it can be > optimized by seek from the timestamp of lowerBound. > Our usage is like below: > {code:java} > labelStream.keyBy(uuid).intervalJoin(adLogStream.keyBy(uuid)) >.between(Time.milliseconds(0), Time.milliseconds(60)) >.process(new processFunction()) >.sink(kafkaProducer) > {code} > Our data is huge. The job always runs for an hour and is stuck by > RocksDB's seek when get buffer's entries. We use rocksDB's data to simulate > the problem RocksDB and find that it takes too much time in deleted values. > So we decide to optimize it by assigning the lowerBound instead of global > search. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11050) When IntervalJoin, get left or right buffer's entries more quickly by assigning lowerBound
[ https://issues.apache.org/jira/browse/FLINK-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu updated FLINK-11050: Description: When IntervalJoin, it is very slow to get left or right buffer's entries. Because we have to scan all buffer's values, including the deleted values which are out of time range. These deleted values's processing consumes too much time in RocksDB's level 0. Since lowerBound is known, it can be optimized by seek from the timestamp of lowerBound. Our usage is like below: {code:java} labelStream.keyBy(uuid).intervalJoin(adLogStream.keyBy(uuid)) .between(Time.milliseconds(0), Time.milliseconds(60)) .process(new processFunction()) .sink(kafkaProducer) {code} Our data is huge. The job always runs for an hour and is stuck by RocksDB's seek when get buffer's entries. We use rocksDB's data to simulate the problem RocksDB and find that it takes too much time in deleted values. So we decide to optimize it by assigning the lowerBound instead of global search. was: When IntervalJoin, it is very slow to get left or right buffer's entries. Because we have to scan all buffer's values, including the deleted values which are out of time range. These deleted values's processing consumes too much time in RocksDB's level 0. Since lowerBound is known, it can be optimized by seek from the timestamp of lowerBound. Our usage is like below: {code:java} labelStream.keyBy(uuid).intervalJoin(adLogStream.keyBy(uuid)) .between(Time.milliseconds(0), Time.milliseconds(60)) .process(new processFunction()) .sink(kafkaProducer) {code} Our data is huge. The job always runs for an hour and is stuck by RocksDB's seek when get buffer's entries. We use rocksDB's data to simulate the problem RocksDB. > When IntervalJoin, get left or right buffer's entries more quickly by > assigning lowerBound > -- > > Key: FLINK-11050 > URL: https://issues.apache.org/jira/browse/FLINK-11050 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.6.2, 1.7.0 >Reporter: Liu >Priority: Major > Labels: performance > Fix For: 1.7.1 > > > When IntervalJoin, it is very slow to get left or right buffer's entries. > Because we have to scan all buffer's values, including the deleted values > which are out of time range. These deleted values's processing consumes too > much time in RocksDB's level 0. Since lowerBound is known, it can be > optimized by seek from the timestamp of lowerBound. > Our usage is like below: > {code:java} > labelStream.keyBy(uuid).intervalJoin(adLogStream.keyBy(uuid)) >.between(Time.milliseconds(0), Time.milliseconds(60)) >.process(new processFunction()) >.sink(kafkaProducer) > {code} > Our data is huge. The job always runs for an hour and is stuck by > RocksDB's seek when get buffer's entries. We use rocksDB's data to simulate > the problem RocksDB and find that it takes too much time in deleted values. > So we decide to optimize it by assigning the lowerBound instead of global > search. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11050) When IntervalJoin, get left or right buffer's entries more quickly by assigning lowerBound
[ https://issues.apache.org/jira/browse/FLINK-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu updated FLINK-11050: Description: When IntervalJoin, it is very slow to get left or right buffer's entries. Because we have to scan all buffer's values, including the deleted values which are out of time range. These deleted values's processing consumes too much time in RocksDB's level 0. Since lowerBound is known, it can be optimized by seek from the timestamp of lowerBound. Our usage is like below: {code:java} labelStream.keyBy(uuid).intervalJoin(adLogStream.keyBy(uuid)) .between(Time.milliseconds(0), Time.milliseconds(60)) .process(new processFunction()) .sink(kafkaProducer) {code} Our data is huge. The job always runs for an hour and is stuck by RocksDB's seek when get buffer's entries. We use rocksDB's data to simulate the problem RocksDB. was:When IntervalJoin, it is very slow to get left or right buffer's entries. Because we have to scan all buffer's values, including the deleted values which are out of time range. These deleted values's processing consumes too much time in RocksDB's level 0. Since lowerBound is known, it can be optimized by seek from the timestamp of lowerBound. > When IntervalJoin, get left or right buffer's entries more quickly by > assigning lowerBound > -- > > Key: FLINK-11050 > URL: https://issues.apache.org/jira/browse/FLINK-11050 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.6.2, 1.7.0 >Reporter: Liu >Priority: Major > Labels: performance > Fix For: 1.7.1 > > > When IntervalJoin, it is very slow to get left or right buffer's entries. > Because we have to scan all buffer's values, including the deleted values > which are out of time range. These deleted values's processing consumes too > much time in RocksDB's level 0. Since lowerBound is known, it can be > optimized by seek from the timestamp of lowerBound. > Our usage is like below: > {code:java} > labelStream.keyBy(uuid).intervalJoin(adLogStream.keyBy(uuid)) >.between(Time.milliseconds(0), Time.milliseconds(60)) >.process(new processFunction()) >.sink(kafkaProducer) > {code} > Our data is huge. The job always runs for an hour and is stuck by > RocksDB's seek when get buffer's entries. We use rocksDB's data to simulate > the problem RocksDB. -- This message was sent by Atlassian JIRA (v7.6.3#76005)