RE: Is it faster/better to include one objectclass or all in query?

Carlo.Accorsi Wed, 14 Mar 2012 09:16:00 -0700

Alex - Thank you for your detailed description of the search algorithm. This is 
most helpful.

Regards,
Carlo Accorsi

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Alex 
Karasulu
Sent: Wednesday, March 14, 2012 12:08 PM
To: [email protected]
Subject: Re: Is it faster/better to include one objectclass or all in query?

On Wed, Mar 14, 2012 at 4:51 PM, <[email protected]> wrote:

> Hi, when searching for a user having this objectclass hierarchy
>
> top
>  |_person
>         |_organizationalPerson
>                    |_inetOrgPerson
>
> and uid = 'jsmith'
>
> Which query would be less expensive or better/faster?  Thanks!
>
> (&
>   (objectclass=inetOrgPerson)
>   (uid=jsmith)
> )
>

This would be faster and more efficient since the evaluation is on a more 
specific objectClass which reduces the search space from the get go.

To understand this you need to know about how the optimizer works with scan 
counts that are returned. LDAP search filters are expanded out into an AST 
(abstract syntax tree) with the leaves of the tree being assertions the branch 
nodes being operators. Then the optimizer annotates this AST with scan counts, 
which basically is asking each index, "Hey how many results would you return 
for this assertion?" So the more specific inetOrgPerson is more likely to 
return a smaller scan count.

Now if you have an index on uid then the scan count on this will be 1 since UID 
should be unique (our DSA does not enforce this tho). Once the optimizer is 
done annotating, then a leaf node is selected in the entire AST to act as the 
candidate generator and is used for iterations. The leaf node with the smallest 
scan count is selected for this. The driving reason for this is that it is 
cheaper to iterate and lookup on less than it is more candidates. The rest of 
the leaf assertion nodes are used by lookup based assertion evaluators. So in 
this case with a uid index you will use this uid=jsmith to return one candidate 
and then do a lookup to see if the returned candidates are also matched by 
objectClass=inetOrgPerson. In this case I would just use (uid=jsmith) since you 
have the uid index. It will prevent the need for another lookup to check if 
it's an inetOrgPerson. If UID's are unique and your peeps are inetOrgPersons 
then this is the best filter for you.

If you do not have an index on uid I suggest you index it. But if you don't 
then the candidates will be generated off the objectClass index which always 
exists since it is a system index. The server will then iterate through the 
entire set of inetOrgPersons in your DIB and de-serialize the entry from the 
master table then check (after normalizing the uid
attribute) if it is in fact equal to jsmith. This could be huge.

So index your uids and don't bother with the objectClass stuff if you don't 
vary the OC of the people in your DIB.

Cheers,
Alex

>
> OR
>
> (&
>                (&(objectclass=top)
>                (objectclass=person)
> (objectclass= organizationalPerson)
> (objectclass=inetOrgPerson))
> (uid=jsmith)
> )
>
>
>

--
Best Regards,
-- Alex

RE: Is it faster/better to include one objectclass or all in query?

Reply via email to