Hi Apache Syncope team!

Currently, we're using the Apache Syncope 2.0.12 version.

AS provides the wide rest API and our team would like to have the 
org.apache.syncope.common.rest.api.service.AnyService#search working faster as 
we do the search of groups and users using FIQL queries.

Here is an example of the URI with a request path that has the FIQL query:

https://hostname/syncope/rest/users?fiql=%28realm%3D%3Dea696a4f-e77a-4ef1-be67-8f8093bc8686%3BlastChangeDate%3Dle%3D2019-10-10T13%253A57%253A00%252B0300%29&size=50&orderby=username+ASC&details=true&page=1


During the quick investigation of how this endpoint works under the hood, I 
found the places that potentially can be optimized.

The result of generated sql query that searches for AnyTO keys looks like this 
(in a particular case I'm taking into consideration the User anyTO):

{code:sql}
SELECT u.any_id,sv.username FROM (SELECT DISTINCT any_id FROM user_search WHERE 
(realm_id=? AND any_id IN ( SELECT DISTINCT any_id FROM user_search WHERE 
lastChangeDate<=?))) u,user_search sv WHERE u.any_id=sv.any_id AND u.any_id IN 
(SELECT any_id FROM user_search WHERE realm_id IN (SELECT id AS realm_id FROM 
Realm WHERE id=? OR id=? OR id=? OR id=? OR id=? OR id=? OR id=? OR id=? OR 
id=? OR id=? OR id=? OR id=? OR id=? OR id=? OR id=? OR id=? OR id=? OR id=? OR 
id=? OR id=? OR id=? OR id=? OR id=? OR id=? OR id=? OR id=? OR id=? OR id=? OR 
id=?)) ORDER BY sv.username ASC
{code}

Firstly, we can optimize the multiple realm OR-queries that is used for 
effective realms to [1st Optimization]:
{code:sql}
SELECT u.any_id, sv.username
FROM (SELECT DISTINCT any_id
      FROM user_search
      WHERE (realm_id=? AND any_id IN (SELECT DISTINCT any_id FROM user_search 
WHERE lastChangeDate<=?))) u,
     user_search sv
WHERE u.any_id = sv.any_id
  AND u.any_id IN (SELECT any_id
                   FROM user_search
                  WHERE realm_id IN (SELECT id AS realm_id
                                      FROM Realm
                                      WHERE realm_id IN (?, ?, ?, ?, ?, ?, ?, 
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)))
ORDER BY sv.username ASC
{code}

Also, it is not clear for me why there is an inner join of two tables "u" and 
"sv"? As far as I can see the effective realms clause and fiql query can be 
done without inner join.  Thus, the query can be optimized to [2nd 
Optimization]:
{code:sql}
SELECT DISTINCT any_id, username
FROM user_search
WHERE (realm_id=? AND any_id IN (SELECT DISTINCT any_id FROM user_search WHERE 
lastChangeDate<=?))
  and realm_id IN
      (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
?,
       ?, ?, ?, ?) ORDER BY user_search.username ASC;
{code}


Is it possbile that in "user_search" view two rows with the same 
`user_search.any_id` can present? If not then the query can be optimized to  
[3rd Optimization]:
{code:sql}
SELECT any_id, username
FROM user_search
WHERE (realm_id=? AND any_id IN (SELECT any_id FROM user_search WHERE 
lastChangeDate<=?))
  and realm_id IN
      (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
?,
       ?, ?, ?, ?) ORDER BY user_search.username ASC;
{code}


After fetching the any_id <-> username pairs 
org.apache.syncope.core.persistence.jpa.dao.AbstractAnySearchDAO#buildResult 
method plays its role.
Here the code selects each entity one by one in the loop by sorted `any_id` 
order keys.
{code:sql}
for (Object anyKey : raw) {
            String actualKey = anyKey instanceof Object[]
                    ? (String) ((Object[]) anyKey)[0]
                    : ((String) anyKey);

            @SuppressWarnings("unchecked")
            T any = kind == AnyTypeKind.USER
                    ? (T) userDAO.find(actualKey)
                    : kind == AnyTypeKind.GROUP
                            ? (T) groupDAO.find(actualKey)
                            : (T) anyObjectDAO.find(actualKey);
            if (any == null) {
                LOG.error("Could not find {} with id {}, even if returned by 
native query", kind, actualKey);
            } else if (!result.contains(any)) {
                result.add(any);
            }
        }
{code}

What if we can optimize this logic and instead of producing the multiple sql 
requests, where each sql request fetches only one user, we will use the IN 
clause? For this purpose we can add an additional method "findByKeys" into 
org.apache.syncope.core.persistence.jpa.dao.AbstractAnyDAO. It can look like 
this [4th Optimization]:
{code:sql}
    @Transactional(readOnly = true)
    @Override
    public Collection<A> findByKeys(Set<String> keys) {
        validateKeys(keys);

        Class<A> entityClass = anyUtils().anyClass();
        TypedQuery<A> query = entityManager().createQuery(
                "SELECT e FROM " + entityClass.getSimpleName()
                        + " e WHERE e.id IN (:keys)",
                entityClass);
        query.setParameter("keys", keys);
        List<A> foundAnys = query.getResultList();

        validateAllAnysPresent(keys, foundAnys);

        // TODO: adopt to multiple keys
        for (A any : foundAnys) {
            securityChecks(any);
        }

        return foundAnys;
    }
{code}
Note that this is a quick implementation and method like 
org.apache.syncope.core.persistence.jpa.dao.AbstractAnyDAO#securityChecks can 
be also optimized and adopted to consume multiple entities.
And we can use the new method "findByKeys" instead of the loop.
After all, entities are fetched by keys in a single query we can sort the 
entities in Java code using the sorted keys.

I prepared the PR where I implemented quickly 
https://github.com/DmitriyBrashevets/syncope/pull/1:
* [1st Optimization]
* [4th Optimization]


What I also want to achieve in result:
1) Understand if Apache Syncope team also sees the benefits in these 
optimizations and to ensure that everybody agrees with them and I haven't 
missed anything and everything written above is correct.
2) If everything is correct we can find other places in the code and also 
optimize them
3) Finally, we can measure the performance if AS team is agree with these 
changes. I hope that AS team already have an infrastructure with integration or 
e2e tests that do measure the performance.


Thanks in advance,
Dmitriy Brashevets

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to