I found some comments which are not implemented.

As far as I have examined, these comments refer to min_group_size,
but min_group_size was decided not to adopt and removed[1], so
it seems these comments also should be removed.

[1] https://www.postgresql.org/message-id/cakjs1f_vuck+qbxqsh4m_fns+d4xa2kpyvvh0zymjvg-eeh...@mail.gmail.com


I'm not sure we still need the min_group_size, when evaluating
dependencies. It was meant to deal with 'noisy' data, but I think it
after switching to the 'degree' it might actually be a bad idea.

Yeah, I'd wondered about this when I first started testing the patch.
I failed to get any functional dependencies because my values were too
unique. Seems I'd gotten a bit used to it, and in the end thought that
if the values are unique enough then they won't suffer as much from
the underestimation problem you're trying to solve here.

I've removed that part of the code now.

Attached patch removes the comments about min_group_size.


Atsushi Torikoshi
NTT Open Source Software Center
diff --git a/src/backend/statistics/README.dependencies 
index 702d34e..6c446bd 100644
--- a/src/backend/statistics/README.dependencies
+++ b/src/backend/statistics/README.dependencies
@@ -71,10 +71,6 @@ To count the rows consistent with the dependency (a => b):
  (c) If there's a single distinct value in 'b', the rows are consistent with
      the functional dependency, otherwise they contradict it.
-The algorithm also requires a minimum size of the group to consider it
-consistent (currently 3 rows in the sample). Small groups make it less likely
-to break the consistency.
 Clause reduction (planner/optimizer)
diff --git a/src/backend/statistics/dependencies.c 
index 89dece3..2e7c0ad 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -286,14 +286,6 @@ dependency_degree(int numrows, HeapTuple *rows, int k, 
AttrNumber *dependency,
         * first (k-1) columns. If there's a single value in the last column, we
         * count the group as 'supporting' the functional dependency. Otherwise 
         * count it as contradicting.
-        *
-        * We also require a group to have a minimum number of rows to be
-        * considered useful for supporting the dependency. Contradicting groups
-        * may be of any size, though.
-        *
-        * XXX The minimum size requirement makes it impossible to identify case
-        * when both columns are unique (or nearly unique), and therefore
-        * trivially functionally dependent.
        /* start with the first row forming a group */
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to