[
https://issues.apache.org/jira/browse/OAK-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194484#comment-16194484
]
Julian Reschke commented on OAK-6789:
-------------------------------------
Further tests show that it's indeed the condition on MODIFIED that triggers the
behavior.
Removing it (breaking the API contract...) fixes the performance problem. So
does adding an index on MODIFIED:
{noformat}
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 122 |
20 (0)| 00:00:01 |
| 1 | DELETE | NODES | | |
| |
| 2 | INLIST ITERATOR | | | |
| |
|* 3 | TABLE ACCESS BY INDEX ROWID BATCHED| NODES | 1 | 122 |
20 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | MODIFIED_IDX | 8 | |
18 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("ID"='x1' AND "MODIFIED"=1 OR "ID"='x10' AND "MODIFIED"=10 OR "ID"
='x11' AND
"MODIFIED"=11 OR "ID"='x12' AND "MODIFIED"=12 OR "ID"='x13' AND "M
ODIFIED"=13 OR "ID"='x14'
AND "MODIFIED"=14 OR "ID"='x15' AND "MODIFIED"=15 OR "ID"='x16' AN
D "MODIFIED"=16 OR
"ID"='x2' AND "MODIFIED"=2 OR "ID"='x3' AND "MODIFIED"=3 OR "ID"='
x4' AND "MODIFIED"=4 OR
"ID"='x5' AND "MODIFIED"=5 OR "ID"='x6' AND "MODIFIED"=6 OR "ID"='
x7' AND "MODIFIED"=7 OR
"ID"='x8' AND "MODIFIED"=8 OR "ID"='x9' AND "MODIFIED"=9)
4 - access("MODIFIED"=1 OR "MODIFIED"=10 OR "MODIFIED"=11 OR "MODIFIED"=12 OR
"MODIFIED"=13 OR "MODIFIED"=14 OR "MODIFIED"=15 OR "MODIFIED"=16 O
R "MODIFIED"=2 OR
"MODIFIED"=3 OR "MODIFIED"=4 OR "MODIFIED"=5 OR "MODIFIED"=6 OR "M
ODIFIED"=7 OR "MODIFIED"=8
OR "MODIFIED"=9)
{noformat}
Note that this switches step 4 to an index scan on MODIFIED_IDX rather than the
primary key index. This fixes the performance in the test case, but might be
undesirable in large repos where many rows have the same modified date.
> RDB: RevisionGC performance on Oracle
> -------------------------------------
>
> Key: OAK-6789
> URL: https://issues.apache.org/jira/browse/OAK-6789
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: rdbmk
> Reporter: Julian Reschke
> Assignee: Julian Reschke
>
> In RevisionGC on Oracle, performance of deletes is bad. Using RevisionGCTest:
> {noformat}
> VersionGCStats{ignoredGCDueToCheckPoint=false, canceled=false,
> deletedDocGCCount=92092 (of which leaf: 92000), updateResurrectedGCCount=0,
> splitDocGCCount=1, intermediateSplitDocGCCount=0, iterationCount=2,
> timeActive=7.429 min, timeToCollectDeletedDocs=1394 ms,
> timeToCheckDeletedDocs=250.4 ms, timeToSortDocIds=374.0 us,
> timeTakenToUpdateResurrectedDocs=17.00 us, timeTakenToDeleteDeletedDocs=7.399
> min, timeTakenToCollectAndDeleteSplitDocs=97.95 ms}
> Performed RevisionGC in 7,429 min
> {noformat}
> Compared with DB2:
> {noformat}
> VersionGCStats{ignoredGCDueToCheckPoint=false, canceled=false,
> deletedDocGCCount=96096 (of which leaf: 96000), updateResurrectedGCCount=0,
> splitDocGCCount=1, intermediateSplitDocGCCount=0, iterationCount=2,
> timeActive=8.240 s, timeToCollectDeletedDocs=1780 ms,
> timeToCheckDeletedDocs=259.7 ms, timeToSortDocIds=237.0 us,
> timeTakenToUpdateResurrectedDocs=19.00 us, timeTakenToDeleteDeletedDocs=4.552
> s, timeTakenToCollectAndDeleteSplitDocs=685.4 ms}
> Performed RevisionGC in 8,243 s
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)