fabriziofortino opened a new pull request, #1791:
URL: https://github.com/apache/jackrabbit-oak/pull/1791

   In #1217, we improved statistical facets by introducing a random score 
script to retrieve a random sample of results from an index without iterating 
over a large number of entries. By using a seed, we ensured that tests produce 
consistent results across different executions.
   
   This approach worked well compared to the previous solution. However, for 
very large result sets (over 1 million entries), the strategy could become 
inefficient, leading to timeouts when the query doesn't return within 15 
seconds.
   
   To address this, the current PR precomputes consistent random values based 
on the `path` field and stores them in a dedicated index field of type `short`, 
reducing disk usage and improving query performance (see [Elasticsearch 
documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html#_which_type_should_i_use)).
 As a result, queries now execute in milliseconds, regardless of the size of 
the result set.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to