Integrating HLL cardinality estimates with join operator estimation

Abhishek Kumar Mon, 25 Nov 2024 03:00:26 -0800

Dear PostgreSQL hackers,

I am writing to seek guidance and potential collaboration on a project
involving cardinality estimation improvements in PostgreSQL. The project
aims to enhance join result cardinality estimation by incorporating
HyperLogLog (HLL) estimates alongside the existing join operator framework.


Project Overview:


   - Goal: Improve the accuracy of join cardinality estimation using HLL
   sketches
   - Scope: Modify the existing join estimation logic to consider HLL-based
   distinct count estimates
   - Expected benefit: More accurate query plans for joins involving
   columns with high cardinality

Technical Areas of Interest:

   1. Current implementation of join selectivity estimation in
   src/backend/optimizer
   2. Integration points for HLL sketches within the existing statistics
   framework
   3. Potential modifications needed to the join operator logic

Questions for the Community:

   - Has similar work been attempted or discussed previously?
   - What would be the preferred approach to integrate HLL estimates with
   the existing join estimation framework?
   - Are there specific areas of the codebase I should focus on initially?
   - Would this enhancement align with the project's current direction for
   query optimization?

I have previously worked with tweaking the BufferReplacement policy for
Postgres wherein I implemented a LazyBufferReplacementPolicy using FIFO
queues, swapping out the clock sweep algorithm, so I have a bit of
familiarity with the Postgres codebase.

I would greatly appreciate any guidance, feedback, or suggestions from the
community.
I'm happy to provide more detailed information about the proposed approach
or clarify any aspects of the project.

Thank you for your time and consideration.

Best regards,
Abhishek Kumar

Integrating HLL cardinality estimates with join operator estimation

Reply via email to