Stefan Egli created OAK-9880:
--------------------------------
Summary: Simplify rgc query
Key: OAK-9880
URL: https://issues.apache.org/jira/browse/OAK-9880
Project: Jackrabbit Oak
Issue Type: Task
Components: mongomk
Reporter: Stefan Egli
Assignee: Stefan Egli
We have seen a repeat of long running rgc *remove* operations - similarly to
what was described in OAK-8351.
This time happening with the query generated by
[queryForDefaultNoBranch|https://github.com/apache/jackrabbit-oak/blob/99b250a05ffe490f66de67374125fabee17f6fda/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/mongo/MongoVersionGCSupport.java#L213-L242]
with the query shape for example similar to:
{noformat}
{
"_sdType" : 70,
"_sdMaxRevTime" : {
"$lt" : NumberLong(1603030303)
},
"$or" : [
{
"$or" : [
{
"_id" : /.*-1\/0/
},
{
"_id" : /[^-]*/,
"_path" : /.*-1\/0/
}
],
"_sdMaxRevTime" : {
"$lt" : NumberLong(1602020202)
}
},
{
"$or" : [
{
"_id" : /.*-2\/0/
},
{
"_id" : /[^-]*/,
"_path" : /.*-2/0/
}
],
"_sdMaxRevTime" : {
"$lt" : NumberLong(1601010101)
}
}
}
{noformat}
While setting an index filter with the query plan in mongodb is one option, we
could additionally also look into simplifying the above query further into
multiple queries : eg. by having 1 query per clusterNodeId, and then
simplifying the {{_sdMaxRevTime}} accordingly, so that the above would
translate into the following 2 queries (with the hope that mongodb finds the
optimal query plan) :
{noformat}
{
"_sdType" : 70,
"_sdMaxRevTime" : {
"$lt" : NumberLong(1602020202)
},
"$or" : [
{
"_id" : /.*-1\/0/
},
{
"_id" : /[^-]*/,
"_path" : /.*-1\/0/
}
}
}
{noformat}
and
{noformat}
{
"_sdType" : 70,
"_sdMaxRevTime" : {
"$lt" : NumberLong(1601010101)
},
"$or" : [
{
"_id" : /.*-2\/0/
},
{
"_id" : /[^-]*/,
"_path" : /.*-2\/0/
}
}
}
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)