Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-11 Thread Andrea Arcangeli
On Tue, Jan 10, 2006 at 10:46:53AM -0500, Tom Lane wrote: Not with that data, but maybe if you increased the statistics target for the column to 100 or so, you'd catch enough values to get reasonable results. Sorry, I'm not expert with postgresql, could you tell me how to increase the

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-11 Thread Simon Riggs
On Tue, 2006-01-10 at 22:40 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I meant use the same sampling approach as I was proposing for ANALYZE, but do this at plan time for the query. That way we can apply the function directly to the sampled rows and estimate selectivity.

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-11 Thread Andrea Arcangeli
On Wed, Jan 11, 2006 at 09:07:45AM +, Simon Riggs wrote: I would suggest we do this only when all of these are true - when accessing more than one table, so the selectivity could effect a join result FWIW my problem only happens if I join: on the main table where the kernel_version string

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-11 Thread Jim C. Nasby
On Tue, Jan 10, 2006 at 02:44:47AM +0100, Andrea Arcangeli wrote: cooperative runs WHERE kernel_version NOT LIKE '%% PREEMPT %%', while preempt runs WHERE kernel_version LIKE '%% PREEMPT %%'. The only difference One thing you could do is change the like to: WHERE position(' PREEMPT ' in

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-11 Thread Andrea Arcangeli
On Wed, Jan 11, 2006 at 12:40:32PM -0600, Jim C. Nasby wrote: On Tue, Jan 10, 2006 at 02:44:47AM +0100, Andrea Arcangeli wrote: cooperative runs WHERE kernel_version NOT LIKE '%% PREEMPT %%', while preempt runs WHERE kernel_version LIKE '%% PREEMPT %%'. The only difference One thing you

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-11 Thread Andrea Arcangeli
On Wed, Jan 11, 2006 at 09:39:47PM +0100, Andrea Arcangeli wrote: On Wed, Jan 11, 2006 at 12:40:32PM -0600, Jim C. Nasby wrote: On Tue, Jan 10, 2006 at 02:44:47AM +0100, Andrea Arcangeli wrote: cooperative runs WHERE kernel_version NOT LIKE '%% PREEMPT %%', while preempt runs WHERE

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-11 Thread Jim C. Nasby
On Wed, Jan 11, 2006 at 09:39:47PM +0100, Andrea Arcangeli wrote: CREATE INDEX indexname ON tablename ( position(' PREEMPT ' in kernel_version) ); The index only helps the above query with = 0 and not the one with != 0, but it seems not needed in practice. Hrm. If you need indexing then,

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-11 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes: On Wed, Jan 11, 2006 at 09:39:47PM +0100, Andrea Arcangeli wrote: The index only helps the above query with = 0 and not the one with != 0, but it seems not needed in practice. I suspect this is because of a lack of stats for functional indexes. No, it's

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-11 Thread Simon Riggs
On Tue, 2006-01-10 at 17:21 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I think its OK to use the MCV, but I have a problem with the current heuristics: they only work for randomly generated strings, since the selectivity goes down geometrically with length. We could

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-10 Thread Matteo Beccati
Hi, I did just think of something we could improve though. The pattern selectivity code doesn't make any use of the statistics about most common values. For a constant pattern, we could actually apply the pattern test with each common value and derive answers that are exact for the portion of

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-10 Thread Andrea Arcangeli
On Tue, Jan 10, 2006 at 10:11:18AM -0500, Greg Stark wrote: Andrea Arcangeli [EMAIL PROTECTED] writes: Fixing this with proper stats would be great indeed. What would be the most common value for the kernel_version? You can see samples of the kernel_version here

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-10 Thread Greg Stark
Andrea Arcangeli [EMAIL PROTECTED] writes: Fixing this with proper stats would be great indeed. What would be the most common value for the kernel_version? You can see samples of the kernel_version here http://klive.cpushare.com/2.6.15/ . That's the string that is being searched against

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-10 Thread Tom Lane
Andrea Arcangeli [EMAIL PROTECTED] writes: There's only one preempt near the end, not sure if it would work? Not with that data, but maybe if you increased the statistics target for the column to 100 or so, you'd catch enough values to get reasonable results. regards,

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-10 Thread Tom Lane
Matteo Beccati [EMAIL PROTECTED] writes: I did just think of something we could improve though. The pattern selectivity code doesn't make any use of the statistics about most common values. For a constant pattern, we could actually apply the pattern test with each common value and derive

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-10 Thread Simon Riggs
On Tue, 2006-01-10 at 12:49 -0500, Tom Lane wrote: Matteo Beccati [EMAIL PROTECTED] writes: I did just think of something we could improve though. The pattern selectivity code doesn't make any use of the statistics about most common values. For a constant pattern, we could actually apply

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-10 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: I think its OK to use the MCV, but I have a problem with the current heuristics: they only work for randomly generated strings, since the selectivity goes down geometrically with length. We could certainly use a less aggressive curve for that. You got a

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-10 Thread Simon Riggs
On Tue, 2006-01-10 at 17:21 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I think its OK to use the MCV, but I have a problem with the current heuristics: they only work for randomly generated strings, since the selectivity goes down geometrically with length. We could

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-10 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: I meant use the same sampling approach as I was proposing for ANALYZE, but do this at plan time for the query. That way we can apply the function directly to the sampled rows and estimate selectivity. I think this is so unlikely to be a win as to not even

[PERFORM] NOT LIKE much faster than LIKE?

2006-01-09 Thread Andrea Arcangeli
Hello, I've a performance problem with the planner algorithm choosen in a website. See the difference between this: http://klive.cpushare.com/?scheduler=cooperative and this: http://klive.cpushare.com/?scheduler=preemptive (note, there's much less data to show with preemptive,

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-09 Thread Tom Lane
Andrea Arcangeli [EMAIL PROTECTED] writes: It just makes no sense to me that the planner takes a difference decision based on a not. Why in the world would you think that? In general a NOT will change the selectivity of the WHERE condition tremendously. If the planner weren't sensitive to

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-09 Thread Christopher Kings-Lynne
UNLIKELY string LIKE '%% PREEMPT %%' or: LIKELY string NOT LIKE '%% PREEMPT %%' You should be using contrib/tsearch2 for an un-anchored text search perhaps? ---(end of broadcast)--- TIP 4: Have you searched our list archives?

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-09 Thread Andrea Arcangeli
On Tue, Jan 10, 2006 at 10:29:05AM +0800, Christopher Kings-Lynne wrote: UNLIKELY string LIKE '%% PREEMPT %%' or: LIKELY string NOT LIKE '%% PREEMPT %%' You should be using contrib/tsearch2 for an un-anchored text search perhaps? If I wanted to get the fastest speed possible,

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-09 Thread Tom Lane
Andrea Arcangeli [EMAIL PROTECTED] writes: If you don't know the data, I think it's a bug that LIKE is assumed to have a selectivity above 50%. Extrapolating from the observation that the heuristics don't work well on your data to the conclusion that they don't work for anybody is not good

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-09 Thread Stephan Szabo
On Tue, 10 Jan 2006, Andrea Arcangeli wrote: I see. I can certainly fix it by stopping using LIKE. But IMHO this remains a bug, since until the statistics about the numberof matching rows isn't estimated well, you should not make assumptions on LIKE/NOT LIKE. I think you can change the code

Re: [PERFORM] NOT LIKE much faster than LIKE?

2006-01-09 Thread Andrea Arcangeli
On Mon, Jan 09, 2006 at 09:54:44PM -0500, Tom Lane wrote: Extrapolating from the observation that the heuristics don't work well on your data to the conclusion that they don't work for anybody is not good logic. Replacing that code with a flat 50% is not going to happen (or if it does, I'll