[ 
https://issues.apache.org/jira/browse/UNOMI-204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Draier resolved UNOMI-204.
---------------------------------
    Resolution: Fixed

> Optimize pastEvents conditions execution and count
> --------------------------------------------------
>
>                 Key: UNOMI-204
>                 URL: https://issues.apache.org/jira/browse/UNOMI-204
>             Project: Apache Unomi
>          Issue Type: Improvement
>            Reporter: Thomas Draier
>            Priority: Major
>
> Past event condition query execution is based on an aggregate on events to 
> get all profile ids, then generate an id query on profiles with each id. This 
> leads to different issues :
> - the terms aggregate is limited to 5000 buckets by default ( configurable 
> thanks to UNOMI-119 ), so the condition will anyway not return more than 5000 
> users (which is an issue for updateExistingProfilesForSegment ). The limit is 
> necessary to avoid out of memory, but we still need the list of profiles - 
> using aggregate filter/partition should help getting all items.
> - The id query can be huge (millions of ids ?) - even if, in the end, we have 
> a limit on the size of results we want. This is unfortunately difficult to 
> optimize, as 1/we don't know if a limit will be used or not and 2/ the 
> condition can be part of a and boolean condition, which would require an 
> unknown minimal number of ids
> - the "count" method is not optimal as it executes the full query and gets 
> the number of results, where it can in some cases be optimized. For 
> pastEventCondition, we generate an IdQuery with a list of ids to just get the 
> count of profiles - counting the ids should be enough, and in some cases we 
> could even use cardinality aggregate to directly get the count. In all cases, 
> keeping the list of all ids in memory should not be needed for counting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to