[jira] [Created] (UNOMI-204) Optimize pastEvents conditions execution and count

Thomas Draier (JIRA) Tue, 09 Oct 2018 09:09:13 -0700

Thomas Draier created UNOMI-204:
-----------------------------------

             Summary: Optimize pastEvents conditions execution and count
                 Key: UNOMI-204
                 URL: https://issues.apache.org/jira/browse/UNOMI-204
             Project: Apache Unomi
          Issue Type: Improvement
            Reporter: Thomas Draier



Past event condition query execution is based on an aggregate on events to get 
all profile ids, then generate an id query on profiles with each id. This leads 
to different issues :
- the terms aggregate is limited to 5000 buckets by default ( configurable 
thanks to UNOMI-119 ), so the condition will anyway not return more than 5000 
users (which is an issue for updateExistingProfilesForSegment ). The limit is 
necessary to avoid out of memory, but we still need the list of profiles - 
using aggregate filter/partition should help getting all items.
- The id query can be huge (millions of ids ?) - even if, in the end, we have a 
limit on the size of results we want. This is unfortunately difficult to 
optimize, as 1/we don't know if a limit will be used or not and 2/ the 
condition can be part of a and boolean condition, which would require an 
unknown minimal number of ids
- the "count" method is not optimal as it executes the full query and gets the 
number of results, where it can in some cases be optimized. For 
pastEventCondition, we generate an IdQuery with a list of ids to just get the 
count of profiles - counting the ids should be enough, and in some cases we 
could even use cardinality aggregate to directly get the count. In all cases, 
keeping the list of all ids in memory should not be needed for counting.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (UNOMI-204) Optimize pastEvents conditions execution and count

Reply via email to