fabriziofortino opened a new pull request, #1791: URL: https://github.com/apache/jackrabbit-oak/pull/1791
In #1217, we improved statistical facets by introducing a random score script to retrieve a random sample of results from an index without iterating over a large number of entries. By using a seed, we ensured that tests produce consistent results across different executions. This approach worked well compared to the previous solution. However, for very large result sets (over 1 million entries), the strategy could become inefficient, leading to timeouts when the query doesn't return within 15 seconds. To address this, the current PR precomputes consistent random values based on the `path` field and stores them in a dedicated index field of type `short`, reducing disk usage and improving query performance (see [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html#_which_type_should_i_use)). As a result, queries now execute in milliseconds, regardless of the size of the result set. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
