Re: [HACKERS] Remove old comments in dependencies.c and README.dependencies
> Agreed. Removed those comments. Thanks for the patch. Thanks! On 2017/07/27 0:44, Alvaro Herrera wrote: atorikoshi wrote: Attached patch removes the comments about min_group_size. Agreed. Removed those comments. Thanks for the patch. -- Atsushi Torikoshi NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Remove old comments in dependencies.c and README.dependencies
Hi, I found some comments which are not implemented. As far as I have examined, these comments refer to min_group_size, but min_group_size was decided not to adopt and removed[1], so it seems these comments also should be removed. [1] https://www.postgresql.org/message-id/cakjs1f_vuck+qbxqsh4m_fns+d4xa2kpyvvh0zymjvg-eeh...@mail.gmail.com 1) DEPENDENCY_MIN_GROUP_SIZE I'm not sure we still need the min_group_size, when evaluating dependencies. It was meant to deal with 'noisy' data, but I think it after switching to the 'degree' it might actually be a bad idea. Yeah, I'd wondered about this when I first started testing the patch. I failed to get any functional dependencies because my values were too unique. Seems I'd gotten a bit used to it, and in the end thought that if the values are unique enough then they won't suffer as much from the underestimation problem you're trying to solve here. I've removed that part of the code now. Attached patch removes the comments about min_group_size. Regards, -- Atsushi Torikoshi NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center diff --git a/src/backend/statistics/README.dependencies b/src/backend/statistics/README.dependencies index 702d34e..6c446bd 100644 --- a/src/backend/statistics/README.dependencies +++ b/src/backend/statistics/README.dependencies @@ -71,10 +71,6 @@ To count the rows consistent with the dependency (a => b): (c) If there's a single distinct value in 'b', the rows are consistent with the functional dependency, otherwise they contradict it. -The algorithm also requires a minimum size of the group to consider it -consistent (currently 3 rows in the sample). Small groups make it less likely -to break the consistency. - Clause reduction (planner/optimizer) diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c index 89dece3..2e7c0ad 100644 --- a/src/backend/statistics/dependencies.c +++ b/src/backend/statistics/dependencies.c @@ -286,14 +286,6 @@ dependency_degree(int numrows, HeapTuple *rows, int k, AttrNumber *dependency, * first (k-1) columns. If there's a single value in the last column, we * count the group as 'supporting' the functional dependency. Otherwise we * count it as contradicting. -* -* We also require a group to have a minimum number of rows to be -* considered useful for supporting the dependency. Contradicting groups -* may be of any size, though. -* -* XXX The minimum size requirement makes it impossible to identify case -* when both columns are unique (or nearly unique), and therefore -* trivially functionally dependent. */ /* start with the first row forming a group */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix a typo in README.dependencies
On 2017/06/23 6:28, Alvaro Herrera wrote: Ashutosh Bapat wrote: On Mon, Jun 5, 2017 at 8:22 AM, atorikoshi wrote: Hi, I found below formula to compute selectivities, but I think the last Probability 'P(b=?)' should be 'P(c=?)'. P(a=?,b=?,c=?) = P(a=?,b=?) * (d + (1-d)*P(b=?)) Attached patch fixes it, and it also adds some spaces following another formula which is on line 86 and computes P(a=?, b=?). Agree. Also using "d" for "degree of functional dependence (b=>c) as well is confusing. We are already using "d" for "degree of functional dependence (a=>b). Here' patch to use "d'" instead of "d". Since the surrounding text uses single quotes to talk about each letter, I thought it was better to use a new letter (e) so that we don't require the "prime" notation, which would end up being either inconsistent, confusing, stupid-looking, or combinations thereof. Also, your proposed text had a slight mistake: it's not (b=>c) that d' is for, but (a,b=>c). Pushed with those corrections. Thanks for the reports and patches! Thanks! -- Atsushi Torikoshi NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fix a typo in README.dependencies
Hi, I found below formula to compute selectivities, but I think the last Probability 'P(b=?)' should be 'P(c=?)'. P(a=?,b=?,c=?) = P(a=?,b=?) * (d + (1-d)*P(b=?)) Attached patch fixes it, and it also adds some spaces following another formula which is on line 86 and computes P(a=?, b=?). Regards, -- Atsushi Torikoshi NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center diff --git a/src/backend/statistics/README.dependencies b/src/backend/statistics/README.dependencies index 59f9d57..d10e4d0 100644 --- a/src/backend/statistics/README.dependencies +++ b/src/backend/statistics/README.dependencies @@ -90,7 +90,7 @@ Where 'd' is the degree of functional dependence (a=>b). With more than two equality clauses, this process happens recursively. For example for (a,b,c) we first use (a,b=>c) to break the computation into -P(a=?,b=?,c=?) = P(a=?,b=?) * (d + (1-d)*P(b=?)) +P(a=?,b=?,c=?) = P(a=?,b=?) * (d + (1-d) * P(c=?)) and then apply (a=>b) the same way on P(a=?,b=?). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers