On Thu, Mar 8, 2012 at 4:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Alexander Korotkov aekorot...@gmail.com writes:
True. If (max count - min count + 1) is small, enumerating of frequencies
is both more compact and more precise representation. Simultaneously,
if (max count - min count + 1)
On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote:
Alexander Korotkov aekorot...@gmail.com writes:
True. If (max count - min count + 1) is small, enumerating of frequencies
is both more compact and more precise representation. Simultaneously,
if (max count - min count + 1) is large,
Noah Misch n...@leadboat.com writes:
On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote:
On reflection my idea above is wrong; for example assume that we have a
column with 900 arrays of length 1 and 100 arrays of length 2. Going by
what I said, we'd reduce the histogram to {1,2}, which
On Thu, Mar 08, 2012 at 11:30:52AM -0500, Tom Lane wrote:
Noah Misch n...@leadboat.com writes:
On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote:
On reflection my idea above is wrong; for example assume that we have a
column with 900 arrays of length 1 and 100 arrays of length 2.
Alexander Korotkov aekorot...@gmail.com writes:
On Mon, Mar 5, 2012 at 1:11 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Couldn't we reduce the histogram size when there aren't many
different counts?
It seems fairly obvious to me that we could bound the histogram
size with (max count - min count
On Mon, Mar 5, 2012 at 1:11 AM, Tom Lane t...@sss.pgh.pa.us wrote:
BTW, one other thing about the count histogram: seems like we are
frequently generating uselessly large ones. For instance, do ANALYZE
in the regression database and then run
select tablename,attname,elem_count_histogram
On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:
1. I'm still unhappy about the loop that fills the count histogram,
as I noted earlier today. It at least needs a decent comment and some
overflow protection, and I'm not entirely convinced that it doesn't have
more bugs than
On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:
2. The tests in the above-mentioned message show that in most cases
where mcelem_array_contained_selec falls through to the rough
estimate, the resulting rowcount estimate is just 1, ie we are coming
out with very small
Alexander Korotkov aekorot...@gmail.com writes:
On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:
2. The tests in the above-mentioned message show that in most cases
where mcelem_array_contained_selec falls through to the rough
estimate, the resulting rowcount estimate is just
Alexander Korotkov aekorot...@gmail.com writes:
On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:
1. I'm still unhappy about the loop that fills the count histogram,
as I noted earlier today. It at least needs a decent comment and some
overflow protection, and I'm not
BTW, one other thing about the count histogram: seems like we are
frequently generating uselessly large ones. For instance, do ANALYZE
in the regression database and then run
select tablename,attname,elem_count_histogram from pg_stats
where elem_count_histogram is not null;
You get lots of
... BTW, could you explain exactly how that Fill histogram by hashtab
loop works? It's way too magic for my taste, and does in fact have bugs
in the currently submitted patch. I've reworked it to this:
/* Fill histogram by hashtab. */
delta = analyzed_rows - 1;
Alexander Korotkov aekorot...@gmail.com writes:
[ array statistics patch ]
I've committed this after a fair amount of editorialization. There are
still some loose ends to deal with, but I felt it was ready to go into
the tree for wider testing.
The main thing I changed that wasn't in the
Still working through this patch ... there are some things that bother
me about the entries being made in pg_statistic:
1. You re-used STATISTIC_KIND_MCELEM for something that, while similar
to tsvector's usage, is not the same. In particular, tsvector adds two
extra elements to the stanumbers
I wrote:
... So my preference is to align the two
definitions of STATISTIC_KIND_MCELEM by adding a null-element frequency
to tsvector's usage (where it'll always be zero) and getting rid of the
average distinct element count here.
Actually, there's a way we can do this without code changes in
On Thu, Mar 1, 2012 at 1:19 AM, Alexander Korotkov aekorot...@gmail.comwrote:
On Thu, Mar 1, 2012 at 1:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:
That seems like a pretty narrow, uncommon use-case. Also, to get
accurate stats for such queries that way, you'd need really enormous
histograms.
On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Nathan Boley npbo...@gmail.com writes:
On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
I am starting to look at this patch now. I'm wondering exactly why the
decision was made to continue storing
Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012:
On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
No, just that we'd no longer have statistics relevant to that, and would
have to fall back on default selectivity assumptions. Do you think that
such
Alvaro Herrera alvhe...@commandprompt.com writes:
Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012:
On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
I confess I am nervous about ripping this out. I am pretty sure we
will get complaints about it.
Excerpts from Tom Lane's message of jue mar 01 18:51:38 -0300 2012:
Alvaro Herrera alvhe...@commandprompt.com writes:
Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012:
On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
I confess I am nervous about
What about MCV's? Will those be removed as well?
Sure. Those seem even less useful.
Ya, this will destroy the performance of several queries without some
heavy tweaking.
Maybe this is bad design, but I've gotten in the habit of storing
sequences as arrays and I commonly join on them. I
Nathan Boley npbo...@gmail.com writes:
Maybe this is bad design, but I've gotten in the habit of storing
sequences as arrays and I commonly join on them. I looked through my
code this morning, and I only have one 'range' query ( of the form
described up-thread ), but there are tons of the form
Alvaro Herrera alvhe...@commandprompt.com writes:
Excerpts from Tom Lane's message of jue mar 01 18:51:38 -0300 2012:
How would we make it optional? There's noplace I can think of to stick
such a knob ...
Uhm, attoptions?
Oh, I had forgotten we had that mechanism already. Yeah, that might
[ sorry Tom, reply all this time... ]
What do you mean by storing sequences as arrays?
So, a simple example is, for transcripts ( sequences of DNA that are
turned into proteins ), we store each of the connected components as
an array of the form:
exon_type in [1,6]
splice_type = [1,3]
and
Alexander Korotkov aekorot...@gmail.com writes:
On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch n...@leadboat.com wrote:
I've attached a new version that includes the UINT64_FMT fix, some edits of
your newest comments, and a rerun of pgindent on the new files. I see no
other issues precluding
On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane t...@sss.pgh.pa.us wrote:
I am starting to look at this patch now. I'm wondering exactly why the
decision was made to continue storing btree-style statistics for arrays,
in addition to the new stuff. The pg_statistic rows for array columns
tend to
Alexander Korotkov aekorot...@gmail.com writes:
On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane t...@sss.pgh.pa.us wrote:
I am starting to look at this patch now. I'm wondering exactly why the
decision was made to continue storing btree-style statistics for arrays,
Probably, btree statistics
On Thu, Mar 1, 2012 at 1:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Alexander Korotkov aekorot...@gmail.com writes:
On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane t...@sss.pgh.pa.us wrote:
I am starting to look at this patch now. I'm wondering exactly why the
decision was made to continue
On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Alexander Korotkov aekorot...@gmail.com writes:
On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch n...@leadboat.com wrote:
I've attached a new version that includes the UINT64_FMT fix, some edits of
your newest comments, and a
Nathan Boley npbo...@gmail.com writes:
On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
I am starting to look at this patch now. I'm wondering exactly why the
decision was made to continue storing btree-style statistics for arrays,
in addition to the new stuff.
If I
On Wed, Feb 29, 2012 at 2:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Nathan Boley npbo...@gmail.com writes:
On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
I am starting to look at this patch now. I'm wondering exactly why the
decision was made to continue storing
Nathan Boley npbo...@gmail.com writes:
On Wed, Feb 29, 2012 at 2:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Nathan Boley npbo...@gmail.com writes:
If I understand you're suggestion, queries of the form
SELECT * FROM rel
WHERE ARRAY[ 1,2,3,4 ] = x
AND x =ARRAY[ 1, 2, 3, 1000];
would no
On Mon, Jan 23, 2012 at 01:21:20AM +0400, Alexander Korotkov wrote:
Updated patch is attached. I've updated comment
of mcelem_array_contained_selec with more detailed description of
probability distribution assumption. Also, I found that rest behavious
should be better described by Poisson
On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch n...@leadboat.com wrote:
+ /* Take care about events with low probabilities. */
+ if (rest DEFAULT_CONTAIN_SEL)
+ {
Why the change from rest 0 to this in the latest version?
Ealier addition of rest distribution require O(m) time.
Hi!
Updated patch is attached. I've updated comment
of mcelem_array_contained_selec with more detailed description of
probability distribution assumption. Also, I found that rest behavious
should be better described by Poisson distribution, relevant changes were
made.
On Tue, Jan 17, 2012 at
Hi!
Thanks for your fixes to the patch. Them looks correct to me. I did some
fixes in the patch. The proof of some concepts is still needed. I'm going
to provide it in a few days.
On Thu, Jan 12, 2012 at 3:06 PM, Noah Misch n...@leadboat.com wrote:
I'm not sure about shared lossy counting
On Tue, Jan 17, 2012 at 12:04:06PM +0400, Alexander Korotkov wrote:
Thanks for your fixes to the patch. Them looks correct to me. I did some
fixes in the patch. The proof of some concepts is still needed. I'm going
to provide it in a few days.
Your further fixes look good. Could you also
Hi!
Patch where most part of issues are fixed is attached.
On Thu, Dec 29, 2011 at 8:35 PM, Noah Misch n...@leadboat.com wrote:
I find distressing the thought of having two copies of the lossy sampling
code, each implementing the algorithm with different variable names and
levels
of
Corrections:
On Thu, Dec 29, 2011 at 11:35:00AM -0500, Noah Misch wrote:
On Wed, Nov 09, 2011 at 08:49:35PM +0400, Alexander Korotkov wrote:
+ *We set s to be the estimated frequency of the K'th element in a
natural
+ *language's frequency table, where K is the target
Hi!
Thanks for your great work on reviewing this patch. Now I'm trying to find
memory corruption bug. Unfortunately it doesn't appears on my system. Can
you check if this bug remains in attached version of patch. If so, please
provide me information about system you're running (processor, OS
On Wed, Jan 04, 2012 at 12:09:16AM +0400, Alexander Korotkov wrote:
Thanks for your great work on reviewing this patch. Now I'm trying to find
memory corruption bug. Unfortunately it doesn't appears on my system. Can
you check if this bug remains in attached version of patch. If so, please
On Wed, Jan 4, 2012 at 12:33 AM, Noah Misch n...@leadboat.com wrote:
On Wed, Jan 04, 2012 at 12:09:16AM +0400, Alexander Korotkov wrote:
Thanks for your great work on reviewing this patch. Now I'm trying to
find
memory corruption bug. Unfortunately it doesn't appears on my system. Can
you
On Tue, Dec 20, 2011 at 04:37:37PM +0400, Alexander Korotkov wrote:
On Wed, Nov 16, 2011 at 1:43 AM, Nathan Boley npbo...@gmail.com wrote:
FYI, I've added myself as the reviewer for the current commitfest.
How is going review now?
I will examine this patch within the week.
--
Sent via
Hi!
On Wed, Nov 16, 2011 at 1:43 AM, Nathan Boley npbo...@gmail.com wrote:
FYI, I've added myself as the reviewer for the current commitfest.
How is going review now?
--
With best regards,
Alexander Korotkov.
Rebased with head.
FYI, I've added myself as the reviewer for the current commitfest.
Best,
Nathan Boley
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Rebased with head.
--
With best regards,
Alexander Korotkov.
arrayanalyze-0.7.patch.gz
Description: GNU Zip compressed data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
46 matches
Mail list logo