Re: [HACKERS] Collect frequency statistics for arrays

2012-03-12 Thread Alexander Korotkov
On Thu, Mar 8, 2012 at 4:51 AM, Tom Lane t...@sss.pgh.pa.us wrote: Alexander Korotkov aekorot...@gmail.com writes: True. If (max count - min count + 1) is small, enumerating of frequencies is both more compact and more precise representation. Simultaneously, if (max count - min count + 1)

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-08 Thread Noah Misch
On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote: Alexander Korotkov aekorot...@gmail.com writes: True. If (max count - min count + 1) is small, enumerating of frequencies is both more compact and more precise representation. Simultaneously, if (max count - min count + 1) is large,

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-08 Thread Tom Lane
Noah Misch n...@leadboat.com writes: On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote: On reflection my idea above is wrong; for example assume that we have a column with 900 arrays of length 1 and 100 arrays of length 2. Going by what I said, we'd reduce the histogram to {1,2}, which

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-08 Thread Noah Misch
On Thu, Mar 08, 2012 at 11:30:52AM -0500, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote: On reflection my idea above is wrong; for example assume that we have a column with 900 arrays of length 1 and 100 arrays of length 2.

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-07 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes: On Mon, Mar 5, 2012 at 1:11 AM, Tom Lane t...@sss.pgh.pa.us wrote: Couldn't we reduce the histogram size when there aren't many different counts? It seems fairly obvious to me that we could bound the histogram size with (max count - min count

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-05 Thread Alexander Korotkov
On Mon, Mar 5, 2012 at 1:11 AM, Tom Lane t...@sss.pgh.pa.us wrote: BTW, one other thing about the count histogram: seems like we are frequently generating uselessly large ones. For instance, do ANALYZE in the regression database and then run select tablename,attname,elem_count_histogram

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-04 Thread Alexander Korotkov
On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: 1. I'm still unhappy about the loop that fills the count histogram, as I noted earlier today. It at least needs a decent comment and some overflow protection, and I'm not entirely convinced that it doesn't have more bugs than

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-04 Thread Alexander Korotkov
On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: 2. The tests in the above-mentioned message show that in most cases where mcelem_array_contained_selec falls through to the rough estimate, the resulting rowcount estimate is just 1, ie we are coming out with very small

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-04 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes: On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: 2. The tests in the above-mentioned message show that in most cases where mcelem_array_contained_selec falls through to the rough estimate, the resulting rowcount estimate is just

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-04 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes: On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: 1. I'm still unhappy about the loop that fills the count histogram, as I noted earlier today. It at least needs a decent comment and some overflow protection, and I'm not

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-04 Thread Tom Lane
BTW, one other thing about the count histogram: seems like we are frequently generating uselessly large ones. For instance, do ANALYZE in the regression database and then run select tablename,attname,elem_count_histogram from pg_stats where elem_count_histogram is not null; You get lots of

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-03 Thread Tom Lane
... BTW, could you explain exactly how that Fill histogram by hashtab loop works? It's way too magic for my taste, and does in fact have bugs in the currently submitted patch. I've reworked it to this: /* Fill histogram by hashtab. */ delta = analyzed_rows - 1;

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-03 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes: [ array statistics patch ] I've committed this after a fair amount of editorialization. There are still some loose ends to deal with, but I felt it was ready to go into the tree for wider testing. The main thing I changed that wasn't in the

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-02 Thread Tom Lane
Still working through this patch ... there are some things that bother me about the entries being made in pg_statistic: 1. You re-used STATISTIC_KIND_MCELEM for something that, while similar to tsvector's usage, is not the same. In particular, tsvector adds two extra elements to the stanumbers

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-02 Thread Tom Lane
I wrote: ... So my preference is to align the two definitions of STATISTIC_KIND_MCELEM by adding a null-element frequency to tsvector's usage (where it'll always be zero) and getting rid of the average distinct element count here. Actually, there's a way we can do this without code changes in

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Alexander Korotkov
On Thu, Mar 1, 2012 at 1:19 AM, Alexander Korotkov aekorot...@gmail.comwrote: On Thu, Mar 1, 2012 at 1:09 AM, Tom Lane t...@sss.pgh.pa.us wrote: That seems like a pretty narrow, uncommon use-case. Also, to get accurate stats for such queries that way, you'd need really enormous histograms.

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Robert Haas
On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Nathan Boley npbo...@gmail.com writes: On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: I am starting to look at this patch now.  I'm wondering exactly why the decision was made to continue storing

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Alvaro Herrera
Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012: On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: No, just that we'd no longer have statistics relevant to that, and would have to fall back on default selectivity assumptions.  Do you think that such

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012: On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: I confess I am nervous about ripping this out. I am pretty sure we will get complaints about it.

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Alvaro Herrera
Excerpts from Tom Lane's message of jue mar 01 18:51:38 -0300 2012: Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012: On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: I confess I am nervous about

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Nathan Boley
What about MCV's? Will those be removed as well? Sure.  Those seem even less useful. Ya, this will destroy the performance of several queries without some heavy tweaking. Maybe this is bad design, but I've gotten in the habit of storing sequences as arrays and I commonly join on them. I

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Tom Lane
Nathan Boley npbo...@gmail.com writes: Maybe this is bad design, but I've gotten in the habit of storing sequences as arrays and I commonly join on them. I looked through my code this morning, and I only have one 'range' query ( of the form described up-thread ), but there are tons of the form

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Tom Lane's message of jue mar 01 18:51:38 -0300 2012: How would we make it optional? There's noplace I can think of to stick such a knob ... Uhm, attoptions? Oh, I had forgotten we had that mechanism already. Yeah, that might

Re: [HACKERS] Collect frequency statistics for arrays

2012-03-01 Thread Nathan Boley
[ sorry Tom, reply all this time... ] What do you mean by storing sequences as arrays? So, a simple example is, for transcripts ( sequences of DNA that are turned into proteins ), we store each of the connected components as an array of the form: exon_type in [1,6] splice_type = [1,3] and

Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes: On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch n...@leadboat.com wrote: I've attached a new version that includes the UINT64_FMT fix, some edits of your newest comments, and a rerun of pgindent on the new files. I see no other issues precluding

Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Alexander Korotkov
On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane t...@sss.pgh.pa.us wrote: I am starting to look at this patch now. I'm wondering exactly why the decision was made to continue storing btree-style statistics for arrays, in addition to the new stuff. The pg_statistic rows for array columns tend to

Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes: On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane t...@sss.pgh.pa.us wrote: I am starting to look at this patch now. I'm wondering exactly why the decision was made to continue storing btree-style statistics for arrays, Probably, btree statistics

Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Alexander Korotkov
On Thu, Mar 1, 2012 at 1:09 AM, Tom Lane t...@sss.pgh.pa.us wrote: Alexander Korotkov aekorot...@gmail.com writes: On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane t...@sss.pgh.pa.us wrote: I am starting to look at this patch now. I'm wondering exactly why the decision was made to continue

Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Nathan Boley
On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alexander Korotkov aekorot...@gmail.com writes: On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch n...@leadboat.com wrote: I've attached a new version that includes the UINT64_FMT fix, some edits of your newest comments, and a

Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Tom Lane
Nathan Boley npbo...@gmail.com writes: On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: I am starting to look at this patch now.  I'm wondering exactly why the decision was made to continue storing btree-style statistics for arrays, in addition to the new stuff. If I

Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Nathan Boley
On Wed, Feb 29, 2012 at 2:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Nathan Boley npbo...@gmail.com writes: On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: I am starting to look at this patch now.  I'm wondering exactly why the decision was made to continue storing

Re: [HACKERS] Collect frequency statistics for arrays

2012-02-29 Thread Tom Lane
Nathan Boley npbo...@gmail.com writes: On Wed, Feb 29, 2012 at 2:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Nathan Boley npbo...@gmail.com writes: If I understand you're suggestion, queries of the form SELECT * FROM rel WHERE ARRAY[ 1,2,3,4 ] = x      AND x =ARRAY[ 1, 2, 3, 1000]; would no

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-23 Thread Noah Misch
On Mon, Jan 23, 2012 at 01:21:20AM +0400, Alexander Korotkov wrote: Updated patch is attached. I've updated comment of mcelem_array_contained_selec with more detailed description of probability distribution assumption. Also, I found that rest behavious should be better described by Poisson

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-23 Thread Alexander Korotkov
On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch n...@leadboat.com wrote: + /* Take care about events with low probabilities. */ + if (rest DEFAULT_CONTAIN_SEL) + { Why the change from rest 0 to this in the latest version? Ealier addition of rest distribution require O(m) time.

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-22 Thread Alexander Korotkov
Hi! Updated patch is attached. I've updated comment of mcelem_array_contained_selec with more detailed description of probability distribution assumption. Also, I found that rest behavious should be better described by Poisson distribution, relevant changes were made. On Tue, Jan 17, 2012 at

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-17 Thread Alexander Korotkov
Hi! Thanks for your fixes to the patch. Them looks correct to me. I did some fixes in the patch. The proof of some concepts is still needed. I'm going to provide it in a few days. On Thu, Jan 12, 2012 at 3:06 PM, Noah Misch n...@leadboat.com wrote: I'm not sure about shared lossy counting

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-17 Thread Noah Misch
On Tue, Jan 17, 2012 at 12:04:06PM +0400, Alexander Korotkov wrote: Thanks for your fixes to the patch. Them looks correct to me. I did some fixes in the patch. The proof of some concepts is still needed. I'm going to provide it in a few days. Your further fixes look good. Could you also

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-07 Thread Alexander Korotkov
Hi! Patch where most part of issues are fixed is attached. On Thu, Dec 29, 2011 at 8:35 PM, Noah Misch n...@leadboat.com wrote: I find distressing the thought of having two copies of the lossy sampling code, each implementing the algorithm with different variable names and levels of

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-06 Thread Noah Misch
Corrections: On Thu, Dec 29, 2011 at 11:35:00AM -0500, Noah Misch wrote: On Wed, Nov 09, 2011 at 08:49:35PM +0400, Alexander Korotkov wrote: + *We set s to be the estimated frequency of the K'th element in a natural + *language's frequency table, where K is the target

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-03 Thread Alexander Korotkov
Hi! Thanks for your great work on reviewing this patch. Now I'm trying to find memory corruption bug. Unfortunately it doesn't appears on my system. Can you check if this bug remains in attached version of patch. If so, please provide me information about system you're running (processor, OS

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-03 Thread Noah Misch
On Wed, Jan 04, 2012 at 12:09:16AM +0400, Alexander Korotkov wrote: Thanks for your great work on reviewing this patch. Now I'm trying to find memory corruption bug. Unfortunately it doesn't appears on my system. Can you check if this bug remains in attached version of patch. If so, please

Re: [HACKERS] Collect frequency statistics for arrays

2012-01-03 Thread Alexander Korotkov
On Wed, Jan 4, 2012 at 12:33 AM, Noah Misch n...@leadboat.com wrote: On Wed, Jan 04, 2012 at 12:09:16AM +0400, Alexander Korotkov wrote: Thanks for your great work on reviewing this patch. Now I'm trying to find memory corruption bug. Unfortunately it doesn't appears on my system. Can you

Re: [HACKERS] Collect frequency statistics for arrays

2011-12-27 Thread Noah Misch
On Tue, Dec 20, 2011 at 04:37:37PM +0400, Alexander Korotkov wrote: On Wed, Nov 16, 2011 at 1:43 AM, Nathan Boley npbo...@gmail.com wrote: FYI, I've added myself as the reviewer for the current commitfest. How is going review now? I will examine this patch within the week. -- Sent via

Re: [HACKERS] Collect frequency statistics for arrays

2011-12-20 Thread Alexander Korotkov
Hi! On Wed, Nov 16, 2011 at 1:43 AM, Nathan Boley npbo...@gmail.com wrote: FYI, I've added myself as the reviewer for the current commitfest. How is going review now? -- With best regards, Alexander Korotkov.

Re: [HACKERS] Collect frequency statistics for arrays

2011-11-15 Thread Nathan Boley
Rebased with head. FYI, I've added myself as the reviewer for the current commitfest. Best, Nathan Boley -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Collect frequency statistics for arrays

2011-11-09 Thread Alexander Korotkov
Rebased with head. -- With best regards, Alexander Korotkov. arrayanalyze-0.7.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers