Re: [HACKERS] [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-09 Thread Guillaume Smet
Tom,

Just to confirm you that your last commit fixed the problem:

lbo=# explain analyze select * from cms_items where ancestors LIKE '1062/%';
  QUERY PLAN
---
 Seq Scan on cms_items  (cost=0.00..688.26 rows=*9097* width=103)
(actual time=0.011..22.605 rows=11326 loops=1)
   Filter: ((ancestors)::text ~~ '1062/%'::text)
 Total runtime: 30.022 ms
(3 rows)

Thanks for your time.

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-09 Thread Guillaume Smet
On Nov 9, 2007 5:33 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> he's got no MCVs, presumably because the field
> is unique.

It is. The ancestors field contains the current folder itself so the
id of the folder (which is the primary key) is in it.

--
Guillaume

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-09 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes:
> Could we not use the bogus range to calculate the histogram estimate
> but apply the LIKE pattern directly to the most-frequent-values
> instead of applying the bogus range? Or would that be too much code
> re-organization for now?

We have already done that for quite some time.  It won't help
Guillaume's case anyhow: he's got no MCVs, presumably because the field
is unique.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-09 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes:

> This rule works for all the locales I have installed ... but I don't
> have any Far Eastern locales installed.  Also, my test cases are only
> covering ASCII characters, and I believe many locales have some non-ASCII
> letters that sort after 'Z'.  I'm not sure how hard we need to try to
> cover those corner cases, though.  It is ultimately only an estimate...

If I understand correctly what we're talking about it's generating estimates
for LIKE 'foo%' using the algorithm which makes sense for C locale which means
generating the next range of values which start with 'foo%'.

It seems to me the problematic situations is when the most-frequent-values
come into play. Being off slightly in the histogram isn't going to generate
very inaccurate estimates but including or not a most-frequent-value could
throw off the estimate severely.

Could we not use the bogus range to calculate the histogram estimate but apply
the LIKE pattern directly to the most-frequent-values instead of applying the
bogus range? Or would that be too much code re-organization for now?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-08 Thread Guillaume Smet
On Nov 9, 2007 3:08 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
> This rule works for all the locales I have installed ... but I don't
> have any Far Eastern locales installed.  Also, my test cases are only
> covering ASCII characters, and I believe many locales have some non-ASCII
> letters that sort after 'Z'.  I'm not sure how hard we need to try to
> cover those corner cases, though.  It is ultimately only an estimate...

My opinion is that it's acceptable to fix the problem for most cases
in most locales because, as you said, it's only an estimate. We didn't
have any report of this problem for years so it seems that it's not a
common case or at least it's not common that the bad estimate leads to
noticeably bad plans.

As far as I understand what you plan to do, it doesn't seem to be
something that prevents us to fix the problem afterwards if someone
comes with an example which doesn't fit in the schema you're proposing
and has a real performance problem with it.

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-08 Thread Tom Lane
I wrote:
> I did do some experimentation and found that among the ASCII characters
> (ie, codes 32-126), nearly all the non-C locales on my Fedora machine
> sort Z last and z next-to-last or vice versa.  Most of the remainder
> sort digits last and z or Z as the last non-digit character.  Since Z is
> not that close to the end of the sort order in C locale, however, z
> seems the best bet.

With still further experimentation, it seems that doesn't work very
well, because the locales that sort digits last also seem not to
discriminate against digits in their first pass.  What did seem to work
was:

* Determine which of the strings "Z", "z", "y", "9" is seen as largest
by strcoll().

* Append this string to the given input.

* Search (using the CVS-HEAD make_greater_string logic) for a string
greater than that.

This rule works for all the locales I have installed ... but I don't
have any Far Eastern locales installed.  Also, my test cases are only
covering ASCII characters, and I believe many locales have some non-ASCII
letters that sort after 'Z'.  I'm not sure how hard we need to try to
cover those corner cases, though.  It is ultimately only an estimate...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend