[ https://issues.apache.org/jira/browse/OAK-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197531#comment-15197531 ]
Thomas Mueller edited comment on OAK-4119 at 3/18/16 1:32 PM: -------------------------------------------------------------- As for which index is used, I made some test with a relatively recent version of Oak. The result is, it depends a lot on the data, and sometimes the query engine doesn't take the index one might expect. In general it's OK, but there is room for improvement. {noformat} Uses the Reference index; depending on the number of entries for 'c500b58d-122a-3651-9c69-8b889a8a4771' there this may be good or not so good. Traverse would probably be better, as it's just one node. Maybe it's better to just load that node directly instead, or use *just* ISSAMENODE: SELECT * FROM [rep:MemberReferences] AS s WHERE ISSAMENODE ('/rep:security/rep:authorizables/rep:groups/t/te/testGroup0') AND PROPERTY([rep:members], 'WeakReference') = 'c500b58d-122a-3651-9c69-8b889a8a4771' Uses traversal, probably because there is not much to traverse: SELECT * FROM [rep:MemberReferences] AS s WHERE ISDESCENDANTNODE ('/rep:security/rep:authorizables/rep:groups/t/te/testGroup0') AND PROPERTY([rep:members], 'WeakReference') = 'c500b58d-122a-3651-9c69-8b889a8a4771' Would use the reference index: SELECT * FROM [rep:MemberReferences] AS s WHERE PROPERTY([rep:members], 'WeakReference') = 'c500b58d-122a-3651-9c69-8b889a8a4771' Uses the repMembers index: SELECT * FROM [rep:MemberReferences] AS s WHERE ISDESCENDANTNODE ('/rep:security/rep:authorizables/rep:groups/t/te/testGroup0') AND [rep:members] = 'c500b58d-122a-3651-9c69-8b889a8a4771' {noformat} was (Author: tmueller): As for which index is used, I made some test with a relatively recent version of Oak. The result is, it depends a lot on the data, and sometimes the query engine doesn't take the index one might expect. In general it's OK, but there is room for improvement. {noformat} Uses the Reference index; depending on the number of entries for 'c500b58d-122a-3651-9c69-8b889a8a4771' there this may be good or not so good. Traverse would probably be better, as it's just one node. Maybe it's better to just load that node directly instead, or use *just* ISSAMENODE: SELECT * FROM [rep:MemberReferences] AS s WHERE ISSAMENODE ('/rep:security/rep:authorizables/rep:groups/t/te/testGroup0') AND PROPERTY([rep:members], 'WeakReference') = 'c500b58d-122a-3651-9c69-8b889a8a4771' Uses traversal, probably because there is not much to traverse: SELECT * FROM [rep:MemberReferences] AS s WHERE ISDESCENDANTNODE ('/rep:security/rep:authorizables/rep:groups/t/te/testGroup0') AND PROPERTY([rep:members], 'WeakReference') = 'c500b58d-122a-3651-9c69-8b889a8a4771' Would use the reference index: SELECT * FROM [rep:MemberReferences] AS s WHERE PROPERTY([rep:members], 'WeakReference') = 'c500b58d-122a-3651-9c69-8b889a8a4771' Uses the repMembers index: SELECT * FROM [rep:MemberReferences] AS s WHERE ISDESCENDANTNODE ('/rep:security/rep:authorizables/rep:groups/t/te/testGroup0') AND [rep:members] = 'c500b58d-122a-3651-9c69-8b889a8a4771' {noformat} > Improvements Take 1 > ------------------- > > Key: OAK-4119 > URL: https://issues.apache.org/jira/browse/OAK-4119 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core > Reporter: angela > Assignee: angela > Attachments: AddMemberTest_20160314_152515.csv, > AddMemberTest_with_batch_20160315_150518.csv, > AddMembersTest_besteffort_20160314_163025.csv, > IsDeclaredMemberTest_notNested_profiling_20160314_172728.csv, > IsMemberTest_nested_profiling_20160315_134252.csv, > IsMemberTest_notNested_profiling_20160315_115809.csv, > MemberDeclaredMemberOf_nested_profiling_20160314_111037.csv, > MemberDeclaredMemberOf_notNested_profiling_20160314_103003.csv, > MemberIsDeclaredMember_notNested_profiling_20160314_115026.csv, > MemberIsMember_nested_profiling_20160314_142612.csv, > MemberIsMember_notNested_profiling_20160314_123150.csv, > MemberMemberOf_nested_profiling_20160314_085505.csv, > MemberMemberOf_notNested_profiling_20160314_094452.csv, OAK-4119.patch, > OAK-4119_2.patch, OAK-4119_3.patch, OAK-4119_tests.patch > > > First bunch of potential improvements to be tested/verified using the > benchmarks: > h4. General > - dedicated index for {{rep:members}} property defined by nt > {{rep:MemberReferences}} (=> mod in {{UserInitializer}}) > - -adjust nt-handling in {{IdentifierManager.getReferences}} : if a single nt > name is specified in the {{nodeTypeNames}} param it is inserted in the query > statement instead of looking for {{nt:base}}- > - simplified, new version of {{IdentifierManager.getReferences}} to match > needs of membership lookup where propName and a single nt-name is given. see > below (in OAK-4119_3.patch) > - introduce {{hasAnyReferenceInPath}}, which allows to search for any > references being located in a give subtree of the repository (escaping fixed > in OAK-4119_3.patch) > - {{MembershipProvider.MemberReferenceIterator}}: keeping track of processed > references is optional i.e. an implementation detail > - {{MembershipProvider.AbstractMemberIterator}}: no more protected fields > (patch OAK-4119_2) > - {{MembershipProvider.AbstractMemberIterator}}: breadth-first iteration > before performing queries for the inherited members/membership (patch > OAK-4119_2) > h4. Group.isMember > - {{MembershipProvider}}: introduce shortcut if a Group doesn't have any > members > - {{MembershipProvider}}: keep old logic if the target Group has > (potentially) pending changes > - {{MembershipProvider}}: for unmodified Groups use reverse membership > lookup of the target authorizable instead (patch OAK-4119_2: further > optimizes 'hasMembership' as the {{AbstractMemberIterator}} doesn't need to > search for the complete membership; limiting the number of queries to be > executed. see above) > - {{GroupImpl}}: add shortcut if member to be tested is a Group representing > the {{EveryonePrincipal}} > h4. Group.isDeclaredMember > - introduce shortcut if a Group doesn't have any members > - keep old logic if target Group (potentially) has pending changes or if the > number of members is small (=> threshold with {{MembershipProvider}}) > - minor change with the old behavior, which does no longer keep track of > processed references, which is not needed here > - for unmodified Groups with plenty of members use > {{IdentifierManager.hasAnyReferenceInPath}} > - All modifications in {{MembershipProvider}} > h4. Group.addMember(Authorizable) > - {{GroupImpl}}: drop extra test for circular membership, which is anyway > enforced by the {{UserValidator}} => needs adjustment of test-cases that > expect this to be detected immediately => intoducing save-call to verify if > cycles are properly detected latest during commit. > - {{UserValidator}}: if {{isMember}} is improved by the changes above, the > check for circular membership would be faster as well > h4. Group.addMembers(String...) > - introduce new methods on {{MembershipProvider}} and {{MembershipWriter}} > that don't take a single contentID but rather a Map of contentID:memberID and > returns the set of memberIDs that could not be processed successfully. > - consequently finding the best {{rep:members property}} and avoiding > duplicate membership is performed only once for the batch of ids to be added > - Note: once best-property reaches threshold new member-ref nodes will be > created > h4. Group.removeMembers(String...) > - introduce new methods on {{MembershipProvider}} and {{MembershipWriter}} > that don't take a single contentID but rather a Map of contentID:memberID and > returns the set of memberIDs that could not be processed successfully. > - consequently for a given batch of ids the declared member are traversed > only once -- This message was sent by Atlassian JIRA (v6.3.4#6332)