[ https://issues.apache.org/jira/browse/UNOMI-204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Draier resolved UNOMI-204. --------------------------------- Resolution: Fixed > Optimize pastEvents conditions execution and count > -------------------------------------------------- > > Key: UNOMI-204 > URL: https://issues.apache.org/jira/browse/UNOMI-204 > Project: Apache Unomi > Issue Type: Improvement > Reporter: Thomas Draier > Priority: Major > > Past event condition query execution is based on an aggregate on events to > get all profile ids, then generate an id query on profiles with each id. This > leads to different issues : > - the terms aggregate is limited to 5000 buckets by default ( configurable > thanks to UNOMI-119 ), so the condition will anyway not return more than 5000 > users (which is an issue for updateExistingProfilesForSegment ). The limit is > necessary to avoid out of memory, but we still need the list of profiles - > using aggregate filter/partition should help getting all items. > - The id query can be huge (millions of ids ?) - even if, in the end, we have > a limit on the size of results we want. This is unfortunately difficult to > optimize, as 1/we don't know if a limit will be used or not and 2/ the > condition can be part of a and boolean condition, which would require an > unknown minimal number of ids > - the "count" method is not optimal as it executes the full query and gets > the number of results, where it can in some cases be optimized. For > pastEventCondition, we generate an IdQuery with a list of ids to just get the > count of profiles - counting the ids should be enough, and in some cases we > could even use cardinality aggregate to directly get the count. In all cases, > keeping the list of all ids in memory should not be needed for counting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)