Re: [HACKERS] WIP: collect frequency statistics for arrays

2011-11-28 Thread Bruce Momjian
Alexander Korotkov wrote:
 Version of patch with few more comments and some fixes.

Where are we on the idea of better statistics for arrays?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: collect frequency statistics for arrays

2011-11-28 Thread Nathan Boley
 Version of patch with few more comments and some fixes.

 Where are we on the idea of better statistics for arrays?

I need to finish reviewing the patch: I'll try to send in something
tomorrow night. So far it looks good.

Best,
Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: collect frequency statistics for arrays

2011-10-15 Thread Nathan Boley
 Looking now, I see that Alexander wasn't Cc'd on the review, so it's
 possible he missed the message?


We've corresponded off list and have discussed my review at some length.

Alex submitted an updated patch on Sep 22 to me personally ( although
not to the list? Alex? ), with the promise of a new version with
improved comments.

Best,
Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: collect frequency statistics for arrays

2011-10-15 Thread Alexander Korotkov
Hi!

Thanks for your attention to my patch!

On Sat, Oct 15, 2011 at 2:47 PM, Nathan Boley npbo...@gmail.com wrote:

  Looking now, I see that Alexander wasn't Cc'd on the review, so it's
  possible he missed the message?
 

 We've corresponded off list and have discussed my review at some length.

 Alex submitted an updated patch on Sep 22 to me personally ( although
 not to the list? Alex? ), with the promise of a new version with
 improved comments.


Oh, I didn't noticed that I've posted updated patch off-list. So, there is
repost of that version of patch. List of changes is below:
1) Distinct slot is used for length histogram.
2) Standard statistics is collected for arrays.
3) Most common values and most common elements are mapped to distinct
columns of pg_stats view, because both of them are calculated for arrays.

Now, it's hard for me to give to it as much time as I would like to. But I
hope to present improved comments and testing until end of october.

--
With best regards,
Alexander Korotkov.


arrayanalyze-0.5.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: collect frequency statistics for arrays

2011-10-14 Thread Bruce Momjian
Alexander Korotkov wrote:
 Version of patch with few more comments and some fixes.

Where are we on this?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: collect frequency statistics for arrays

2011-10-14 Thread Robert Haas
On Fri, Oct 14, 2011 at 3:24 PM, Bruce Momjian br...@momjian.us wrote:
 Alexander Korotkov wrote:
 Version of patch with few more comments and some fixes.

The CommitFest app page is here.

https://commitfest.postgresql.org/action/patch_view?id=539

It was reviewed in July by Nate Boley, and never updated.

Looking now, I see that Alexander wasn't Cc'd on the review, so it's
possible he missed the message?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: collect frequency statistics for arrays

2011-06-14 Thread Alexander Korotkov
Version of patch with few more comments and some fixes.

--
With best regards,
Alexander Korotkov.


arrayanalyze-0.4.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: collect frequency statistics for arrays

2011-06-13 Thread Alexander Korotkov
On Mon, Jun 13, 2011 at 8:16 AM, Robert Haas robertmh...@gmail.com wrote:

 If the data type is hashable, you could consider building a hash table
 on the MCVs and then do a probe for each element in the array.  I
 think that's better than the other way around because there can't be
 more than 10k MCVs, whereas the input constant could be arbitrarily
 long.  I'm not entirely sure whether this case is important enough to
 be worth spending a lot of code on, but then again it might not be
 that much code.

Unfortunately, most time consuming operation isn't related to elements
comparison. It is caused by complex computations in calc_distr function.


 Another option is to bound the number of operations you're willing to
 perform to some reasonable limit, say, 10 * default_statistics_target.
  Work out ceil((10 * default_statistics_target) /
 number-of-elements-in-const) and consider at most that many MCVs.
 When this limit kicks in you'll get a less-accurate selectivity
 estimate, but that's a reasonable price to pay for not blowing out
 planning time.

 Good option. I'm going to add such condition to my patch.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] WIP: collect frequency statistics for arrays

2011-06-12 Thread Alexander Korotkov
On Fri, Jun 10, 2011 at 9:03 PM, Simon Riggs si...@2ndquadrant.com wrote:

 Initial comments are that the code is well structured and I doubt
 there will be problems at the code level. Looks like a good patch.

I'm worrying about perfomance of column @ const estimation. It takes
O(m*(n+m)) of time, where m - const length and n - statistics target.
Probably, it can be too slow is some some cases.


 At the moment I see no tests. If this code will be exercised by
 existing tests then you should put some notes with the patch to
 explain that, or at least provide some pointers as to how I might test
 this.

I didn't find in existing tests which check selectivity estimation accuracy.
And I found difficult to create them because regression tests gives binary
result while estimation accuracy is quantitative value. Existing regression
tests covers case if typanalyze or selectivity estimation function falls
down. I've added ANALYZE array_op_test; line into array test in order to
these tests covers falldown case for this patch functions too.
Seems that, selectivity estimation accuracy should be tested manually on
various distributions. I've done very small amount of such tests.
Unfortunately, few months pass before I got idea about column @ const
case. And now, I don't have sufficient time for it due to my GSoC project.
It would be great if you can help me with this tests.


 Also, I'd like to see some more explanation. Either in comments, or
 just as a post to hackers. That saves me time, but we need to be clear
 about what this does and does not do, what it might do in the future
 etc.. 3+ years from now we need to be able to remember what the code
 was supposed to do. You will forget yourself in time, if you write
 enough patches. Based on this, I think you'll be writing quite a few
 more.

I've added some more comments. I'm afraid that it should be completely
rewritten before committing due to my english. If some particular points
should be clarified more, please, specify them.


 And of course, a few lines for the docs also.

I found that in statistics patch for tsvector only article about pg_stats
view was corrected. I've corrected this article a little bit too.

--
With best regards,
Alexander Korotkov.


arrayanalyze-0.3.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: collect frequency statistics for arrays

2011-06-12 Thread Robert Haas
On Sun, Jun 12, 2011 at 12:17 PM, Alexander Korotkov
aekorot...@gmail.com wrote:
 I'm worrying about perfomance of column @ const estimation. It takes
 O(m*(n+m)) of time, where m - const length and n - statistics target.
 Probably, it can be too slow is some some cases.

Hmm, that doesn't sound terribly promising.  The best thing to do here
is probably to construct a pessimal case and test it.  So, say, look
for a column @ a 100-element array with a statistics target of 400...
 once you know how bad it is, then we can make intelligent decisions
about where to go with it.

If the data type is hashable, you could consider building a hash table
on the MCVs and then do a probe for each element in the array.  I
think that's better than the other way around because there can't be
more than 10k MCVs, whereas the input constant could be arbitrarily
long.  I'm not entirely sure whether this case is important enough to
be worth spending a lot of code on, but then again it might not be
that much code.

Another option is to bound the number of operations you're willing to
perform to some reasonable limit, say, 10 * default_statistics_target.
 Work out ceil((10 * default_statistics_target) /
number-of-elements-in-const) and consider at most that many MCVs.
When this limit kicks in you'll get a less-accurate selectivity
estimate, but that's a reasonable price to pay for not blowing out
planning time.

 At the moment I see no tests. If this code will be exercised by
 existing tests then you should put some notes with the patch to
 explain that, or at least provide some pointers as to how I might test
 this.

 I didn't find in existing tests which check selectivity estimation accuracy.
 And I found difficult to create them because regression tests gives binary
 result while estimation accuracy is quantitative value. Existing regression
 tests covers case if typanalyze or selectivity estimation function falls
 down. I've added ANALYZE array_op_test; line into array test in order to
 these tests covers falldown case for this patch functions too.
 Seems that, selectivity estimation accuracy should be tested manually on
 various distributions. I've done very small amount of such tests.
 Unfortunately, few months pass before I got idea about column @ const
 case. And now, I don't have sufficient time for it due to my GSoC project.
 It would be great if you can help me with this tests.

AFAIK, we don't have regression tests for any other selectivity
estimator; except perhaps to the extent that we verify they don't
crash.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: collect frequency statistics for arrays

2011-06-10 Thread Simon Riggs
On Mon, May 23, 2011 at 6:54 PM, Alexander Korotkov
aekorot...@gmail.com wrote:

 Second version of patch attached. Main changes:
 1) ANY and ALL keywords handling.
 2) Collecting statistics of array length. It is used in column @ const.
 3) Relatively accurate estimation of column @ const selectivity. This
 estimation is most hard part of patch. I'm warrying that there could be too
 expensibe calculations for estimation. Also, likely current comments don't
 clearify anything...

Just had a brief look at this ahead of starting review.

Initial comments are that the code is well structured and I doubt
there will be problems at the code level. Looks like a good patch.

At the moment I see no tests. If this code will be exercised by
existing tests then you should put some notes with the patch to
explain that, or at least provide some pointers as to how I might test
this.

Also, I'd like to see some more explanation. Either in comments, or
just as a post to hackers. That saves me time, but we need to be clear
about what this does and does not do, what it might do in the future
etc.. 3+ years from now we need to be able to remember what the code
was supposed to do. You will forget yourself in time, if you write
enough patches. Based on this, I think you'll be writing quite a few
more.

And of course, a few lines for the docs also.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: collect frequency statistics for arrays

2011-05-23 Thread Alexander Korotkov
Second version of patch attached. Main changes:
1) ANY and ALL keywords handling.
2) Collecting statistics of array length. It is used in column @ const.
3) Relatively accurate estimation of column @ const selectivity. This
estimation is most hard part of patch. I'm warrying that there could be too
expensibe calculations for estimation. Also, likely current comments don't
clearify anything...

--
With best regards,
Alexander Korotkov.


arrayanalyze-0.2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: collect frequency statistics for arrays

2011-05-23 Thread Alexander Korotkov
I forgot to commit before diff. Here is correct version.

--
With best regards,
Alexander Korotkov.


arrayanalyze-0.2.1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: collect frequency statistics for arrays

2011-02-24 Thread Robert Haas
On Wed, Feb 23, 2011 at 10:00 AM, Alexander Korotkov
aekorot...@gmail.com wrote:
 WIP patch of statistics collection for arrays is attached.

Please add this patch to the currently open CommitFest at

https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers