### Re: [HACKERS] Collect frequency statistics for arrays

On Thu, Mar 8, 2012 at 4:51 AM, Tom Lane t...@sss.pgh.pa.us wrote: Alexander Korotkov aekorot...@gmail.com writes: True. If (max count - min count + 1) is small, enumerating of frequencies is both more compact and more precise representation. Simultaneously, if (max count - min count + 1)

### Re: [HACKERS] Collect frequency statistics for arrays

On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote: Alexander Korotkov aekorot...@gmail.com writes: True. If (max count - min count + 1) is small, enumerating of frequencies is both more compact and more precise representation. Simultaneously, if (max count - min count + 1) is large,

### Re: [HACKERS] Collect frequency statistics for arrays

Noah Misch n...@leadboat.com writes: On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote: On reflection my idea above is wrong; for example assume that we have a column with 900 arrays of length 1 and 100 arrays of length 2. Going by what I said, we'd reduce the histogram to {1,2}, which

### Re: [HACKERS] Collect frequency statistics for arrays

On Thu, Mar 08, 2012 at 11:30:52AM -0500, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote: On reflection my idea above is wrong; for example assume that we have a column with 900 arrays of length 1 and 100 arrays of length 2.

### Re: [HACKERS] Collect frequency statistics for arrays

Alexander Korotkov aekorot...@gmail.com writes: On Mon, Mar 5, 2012 at 1:11 AM, Tom Lane t...@sss.pgh.pa.us wrote: Couldn't we reduce the histogram size when there aren't many different counts? It seems fairly obvious to me that we could bound the histogram size with (max count - min count

### Re: [HACKERS] Collect frequency statistics for arrays

On Mon, Mar 5, 2012 at 1:11 AM, Tom Lane t...@sss.pgh.pa.us wrote: BTW, one other thing about the count histogram: seems like we are frequently generating uselessly large ones. For instance, do ANALYZE in the regression database and then run select tablename,attname,elem_count_histogram

### Re: [HACKERS] Collect frequency statistics for arrays

On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: 1. I'm still unhappy about the loop that fills the count histogram, as I noted earlier today. It at least needs a decent comment and some overflow protection, and I'm not entirely convinced that it doesn't have more bugs than

### Re: [HACKERS] Collect frequency statistics for arrays

On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: 2. The tests in the above-mentioned message show that in most cases where mcelem_array_contained_selec falls through to the rough estimate, the resulting rowcount estimate is just 1, ie we are coming out with very small

### Re: [HACKERS] Collect frequency statistics for arrays

Alexander Korotkov aekorot...@gmail.com writes: On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: 2. The tests in the above-mentioned message show that in most cases where mcelem_array_contained_selec falls through to the rough estimate, the resulting rowcount estimate is just

### Re: [HACKERS] Collect frequency statistics for arrays

Alexander Korotkov aekorot...@gmail.com writes: On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: 1. I'm still unhappy about the loop that fills the count histogram, as I noted earlier today. It at least needs a decent comment and some overflow protection, and I'm not

### Re: [HACKERS] Collect frequency statistics for arrays

BTW, one other thing about the count histogram: seems like we are frequently generating uselessly large ones. For instance, do ANALYZE in the regression database and then run select tablename,attname,elem_count_histogram from pg_stats where elem_count_histogram is not null; You get lots of

### Re: [HACKERS] Collect frequency statistics for arrays

... BTW, could you explain exactly how that Fill histogram by hashtab loop works? It's way too magic for my taste, and does in fact have bugs in the currently submitted patch. I've reworked it to this: /* Fill histogram by hashtab. */ delta = analyzed_rows - 1;

### Re: [HACKERS] Collect frequency statistics for arrays

Alexander Korotkov aekorot...@gmail.com writes: [ array statistics patch ] I've committed this after a fair amount of editorialization. There are still some loose ends to deal with, but I felt it was ready to go into the tree for wider testing. The main thing I changed that wasn't in the

### Re: [HACKERS] Collect frequency statistics for arrays

Still working through this patch ... there are some things that bother me about the entries being made in pg_statistic: 1. You re-used STATISTIC_KIND_MCELEM for something that, while similar to tsvector's usage, is not the same. In particular, tsvector adds two extra elements to the stanumbers

### Re: [HACKERS] Collect frequency statistics for arrays

I wrote: ... So my preference is to align the two definitions of STATISTIC_KIND_MCELEM by adding a null-element frequency to tsvector's usage (where it'll always be zero) and getting rid of the average distinct element count here. Actually, there's a way we can do this without code changes in

### Re: [HACKERS] Collect frequency statistics for arrays

On Thu, Mar 1, 2012 at 1:19 AM, Alexander Korotkov aekorot...@gmail.comwrote: On Thu, Mar 1, 2012 at 1:09 AM, Tom Lane t...@sss.pgh.pa.us wrote: That seems like a pretty narrow, uncommon use-case. Also, to get accurate stats for such queries that way, you'd need really enormous histograms.

### Re: [HACKERS] Collect frequency statistics for arrays

On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Nathan Boley npbo...@gmail.com writes: On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: I am starting to look at this patch now.  I'm wondering exactly why the decision was made to continue storing

### Re: [HACKERS] Collect frequency statistics for arrays

Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012: On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: No, just that we'd no longer have statistics relevant to that, and would have to fall back on default selectivity assumptions.  Do you think that such

### Re: [HACKERS] Collect frequency statistics for arrays

Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012: On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: I confess I am nervous about ripping this out. I am pretty sure we will get complaints about it.

### Re: [HACKERS] Collect frequency statistics for arrays

Excerpts from Tom Lane's message of jue mar 01 18:51:38 -0300 2012: Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012: On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: I confess I am nervous about

### Re: [HACKERS] Collect frequency statistics for arrays

What about MCV's? Will those be removed as well? Sure.  Those seem even less useful. Ya, this will destroy the performance of several queries without some heavy tweaking. Maybe this is bad design, but I've gotten in the habit of storing sequences as arrays and I commonly join on them. I

### Re: [HACKERS] Collect frequency statistics for arrays

Nathan Boley npbo...@gmail.com writes: Maybe this is bad design, but I've gotten in the habit of storing sequences as arrays and I commonly join on them. I looked through my code this morning, and I only have one 'range' query ( of the form described up-thread ), but there are tons of the form

### Re: [HACKERS] Collect frequency statistics for arrays

Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Tom Lane's message of jue mar 01 18:51:38 -0300 2012: How would we make it optional? There's noplace I can think of to stick such a knob ... Uhm, attoptions? Oh, I had forgotten we had that mechanism already. Yeah, that might

### Re: [HACKERS] Collect frequency statistics for arrays

[ sorry Tom, reply all this time... ] What do you mean by storing sequences as arrays? So, a simple example is, for transcripts ( sequences of DNA that are turned into proteins ), we store each of the connected components as an array of the form: exon_type in [1,6] splice_type = [1,3] and

### Re: [HACKERS] Collect frequency statistics for arrays

Alexander Korotkov aekorot...@gmail.com writes: On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch n...@leadboat.com wrote: I've attached a new version that includes the UINT64_FMT fix, some edits of your newest comments, and a rerun of pgindent on the new files. I see no other issues precluding

### Re: [HACKERS] Collect frequency statistics for arrays

On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane t...@sss.pgh.pa.us wrote: I am starting to look at this patch now. I'm wondering exactly why the decision was made to continue storing btree-style statistics for arrays, in addition to the new stuff. The pg_statistic rows for array columns tend to

### Re: [HACKERS] Collect frequency statistics for arrays

Alexander Korotkov aekorot...@gmail.com writes: On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane t...@sss.pgh.pa.us wrote: I am starting to look at this patch now. I'm wondering exactly why the decision was made to continue storing btree-style statistics for arrays, Probably, btree statistics

### Re: [HACKERS] Collect frequency statistics for arrays

On Thu, Mar 1, 2012 at 1:09 AM, Tom Lane t...@sss.pgh.pa.us wrote: Alexander Korotkov aekorot...@gmail.com writes: On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane t...@sss.pgh.pa.us wrote: I am starting to look at this patch now. I'm wondering exactly why the decision was made to continue

### Re: [HACKERS] Collect frequency statistics for arrays

On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alexander Korotkov aekorot...@gmail.com writes: On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch n...@leadboat.com wrote: I've attached a new version that includes the UINT64_FMT fix, some edits of your newest comments, and a

### Re: [HACKERS] Collect frequency statistics for arrays

Nathan Boley npbo...@gmail.com writes: On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: I am starting to look at this patch now.  I'm wondering exactly why the decision was made to continue storing btree-style statistics for arrays, in addition to the new stuff. If I

### Re: [HACKERS] Collect frequency statistics for arrays

On Wed, Feb 29, 2012 at 2:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Nathan Boley npbo...@gmail.com writes: On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: I am starting to look at this patch now.  I'm wondering exactly why the decision was made to continue storing

### Re: [HACKERS] Collect frequency statistics for arrays

Nathan Boley npbo...@gmail.com writes: On Wed, Feb 29, 2012 at 2:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Nathan Boley npbo...@gmail.com writes: If I understand you're suggestion, queries of the form SELECT * FROM rel WHERE ARRAY[ 1,2,3,4 ] = x      AND x =ARRAY[ 1, 2, 3, 1000]; would no

### Re: [HACKERS] Collect frequency statistics for arrays

On Mon, Jan 23, 2012 at 01:21:20AM +0400, Alexander Korotkov wrote: Updated patch is attached. I've updated comment of mcelem_array_contained_selec with more detailed description of probability distribution assumption. Also, I found that rest behavious should be better described by Poisson

### Re: [HACKERS] Collect frequency statistics for arrays

On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch n...@leadboat.com wrote: + /* Take care about events with low probabilities. */ + if (rest DEFAULT_CONTAIN_SEL) + { Why the change from rest 0 to this in the latest version? Ealier addition of rest distribution require O(m) time.

### Re: [HACKERS] Collect frequency statistics for arrays

Hi! Updated patch is attached. I've updated comment of mcelem_array_contained_selec with more detailed description of probability distribution assumption. Also, I found that rest behavious should be better described by Poisson distribution, relevant changes were made. On Tue, Jan 17, 2012 at

### Re: [HACKERS] Collect frequency statistics for arrays

Hi! Thanks for your fixes to the patch. Them looks correct to me. I did some fixes in the patch. The proof of some concepts is still needed. I'm going to provide it in a few days. On Thu, Jan 12, 2012 at 3:06 PM, Noah Misch n...@leadboat.com wrote: I'm not sure about shared lossy counting

### Re: [HACKERS] Collect frequency statistics for arrays

On Tue, Jan 17, 2012 at 12:04:06PM +0400, Alexander Korotkov wrote: Thanks for your fixes to the patch. Them looks correct to me. I did some fixes in the patch. The proof of some concepts is still needed. I'm going to provide it in a few days. Your further fixes look good. Could you also

### Re: [HACKERS] Collect frequency statistics for arrays

Hi! Patch where most part of issues are fixed is attached. On Thu, Dec 29, 2011 at 8:35 PM, Noah Misch n...@leadboat.com wrote: I find distressing the thought of having two copies of the lossy sampling code, each implementing the algorithm with different variable names and levels of

### Re: [HACKERS] Collect frequency statistics for arrays

Corrections: On Thu, Dec 29, 2011 at 11:35:00AM -0500, Noah Misch wrote: On Wed, Nov 09, 2011 at 08:49:35PM +0400, Alexander Korotkov wrote: + *We set s to be the estimated frequency of the K'th element in a natural + *language's frequency table, where K is the target

### Re: [HACKERS] Collect frequency statistics for arrays

Hi! Thanks for your great work on reviewing this patch. Now I'm trying to find memory corruption bug. Unfortunately it doesn't appears on my system. Can you check if this bug remains in attached version of patch. If so, please provide me information about system you're running (processor, OS

### Re: [HACKERS] Collect frequency statistics for arrays

On Wed, Jan 04, 2012 at 12:09:16AM +0400, Alexander Korotkov wrote: Thanks for your great work on reviewing this patch. Now I'm trying to find memory corruption bug. Unfortunately it doesn't appears on my system. Can you check if this bug remains in attached version of patch. If so, please

### Re: [HACKERS] Collect frequency statistics for arrays

On Wed, Jan 4, 2012 at 12:33 AM, Noah Misch n...@leadboat.com wrote: On Wed, Jan 04, 2012 at 12:09:16AM +0400, Alexander Korotkov wrote: Thanks for your great work on reviewing this patch. Now I'm trying to find memory corruption bug. Unfortunately it doesn't appears on my system. Can you

### Re: [HACKERS] Collect frequency statistics for arrays

On Tue, Dec 20, 2011 at 04:37:37PM +0400, Alexander Korotkov wrote: On Wed, Nov 16, 2011 at 1:43 AM, Nathan Boley npbo...@gmail.com wrote: FYI, I've added myself as the reviewer for the current commitfest. How is going review now? I will examine this patch within the week. -- Sent via

### Re: [HACKERS] Collect frequency statistics for arrays

Hi! On Wed, Nov 16, 2011 at 1:43 AM, Nathan Boley npbo...@gmail.com wrote: FYI, I've added myself as the reviewer for the current commitfest. How is going review now? -- With best regards, Alexander Korotkov.

### Re: [HACKERS] Collect frequency statistics for arrays

Rebased with head. FYI, I've added myself as the reviewer for the current commitfest. Best, Nathan Boley -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

### Re: [HACKERS] Collect frequency statistics for arrays

Rebased with head. -- With best regards, Alexander Korotkov. arrayanalyze-0.7.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers