Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-10 Thread Bruce Momjian
scott.marlowe wrote:
 On Fri, 6 Jun 2003, Bruce Momjian wrote:
 
  scott.marlowe wrote:
   On Fri, 6 Jun 2003, Peter Eisentraut wrote:
   
scott.marlowe writes:

 If indexes on text worked right in other locales it would be no big deal.

They will in version 7.4, so all these concerns about trading off locale
use vs. performance will become obsolete.
   
   Oh!  I thought there were still issues that couldn't be worked out on that 
   front.  In that case, heck yeah, set the locale on initdb to the current 
   system locale.  sweet.
  
  The problems go away _if_ the user knows about the new way of indexing
  LIKE on non-C locales.
 
 Should we have something about this mentioned in 
 
 http://developer.postgresql.org/docs/postgres/install-upgrading.html
 
 for the 7.4 release?  Or is there a more appropriate place?

Yes, at a minimum, plus we need someone coming fresh to 7.5 to know this
is an issue.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-09 Thread scott.marlowe
On Fri, 6 Jun 2003, Bruce Momjian wrote:

 scott.marlowe wrote:
  On Fri, 6 Jun 2003, Peter Eisentraut wrote:
  
   scott.marlowe writes:
   
If indexes on text worked right in other locales it would be no big deal.
   
   They will in version 7.4, so all these concerns about trading off locale
   use vs. performance will become obsolete.
  
  Oh!  I thought there were still issues that couldn't be worked out on that 
  front.  In that case, heck yeah, set the locale on initdb to the current 
  system locale.  sweet.
 
 The problems go away _if_ the user knows about the new way of indexing
 LIKE on non-C locales.

Should we have something about this mentioned in 

http://developer.postgresql.org/docs/postgres/install-upgrading.html

for the 7.4 release?  Or is there a more appropriate place?


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-08 Thread Alvaro Herrera Munoz
On Thu, Jun 05, 2003 at 09:44:21AM -0600, scott.marlowe wrote:
 On Thu, 5 Jun 2003, Nigel J. Andrews wrote:

 Everything Nigel just wrote plus one thing.
 
 If it comes down to it, we could always require a --locale setting and 
 refuse to initdb without it.  That way, whether it's in an RPM or from 
 source, somebody somewhere along the line has to choose something.

Yeah, that way the RPM guys would put the --locale taking the locale
from the environment and you're back to ground zero.

There's no point in forcing things down the throat of users using this
kind of mechanisms, because someone is going to automate the thing 
along the way.  What is needed is a way to make the user aware of his
system's configuration.

-- 
Alvaro Herrera ([EMAIL PROTECTED])

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-08 Thread scott.marlowe
On Thu, 5 Jun 2003, Peter Eisentraut wrote:

 scott.marlowe writes:
 
  If it comes down to it, we could always require a --locale setting and
  refuse to initdb without it.  That way, whether it's in an RPM or from
  source, somebody somewhere along the line has to choose something.
 
 By default, you choose when you install or configure your operating
 system.  In most cases, the region where you install your operating system
 and the region where you run your database is the same, so equating these
 settings by default is reasonable.

But it's not that simple.  If one could flip a switch and change a 
postgresql installation from one locale to another, then hey, no big deal.  
If indexes on text worked right in other locales it would be no big deal.  

If you don't choose locale=C with doing initdb then you 
have to backup the whole database, reinitdb, and restore it in order to 
switch to it.

If the postgresql engine could use indexes well in all 
locales then it would be reasonable to pick up the environmental locale.  
As long as locale C is the only one that can use indexes on text, it's not 
reasonable to use the locale of the environment without knowing what the 
user really wants to do with the database.

Especially since most of the folks I know who download it are going to 
prefer a locale of C to en_US or whatnot, since they'll likely want fast 
indexed access on text types.

I would at least suggest that certain locales default to be coerced to C 
if the user doesn't pick one.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-08 Thread Peter Eisentraut
scott.marlowe writes:

 If it comes down to it, we could always require a --locale setting and
 refuse to initdb without it.  That way, whether it's in an RPM or from
 source, somebody somewhere along the line has to choose something.

By default, you choose when you install or configure your operating
system.  In most cases, the region where you install your operating system
and the region where you run your database is the same, so equating these
settings by default is reasonable.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-08 Thread Peter Eisentraut
scott.marlowe writes:

 If indexes on text worked right in other locales it would be no big deal.

They will in version 7.4, so all these concerns about trading off locale
use vs. performance will become obsolete.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-07 Thread scott.marlowe
On Thu, 5 Jun 2003, Alvaro Herrera Munoz wrote:

 On Thu, Jun 05, 2003 at 09:44:21AM -0600, scott.marlowe wrote:
  On Thu, 5 Jun 2003, Nigel J. Andrews wrote:
 
  Everything Nigel just wrote plus one thing.
  
  If it comes down to it, we could always require a --locale setting and 
  refuse to initdb without it.  That way, whether it's in an RPM or from 
  source, somebody somewhere along the line has to choose something.
 
 Yeah, that way the RPM guys would put the --locale taking the locale
 from the environment and you're back to ground zero.
 
 There's no point in forcing things down the throat of users using this
 kind of mechanisms, because someone is going to automate the thing 
 along the way.  What is needed is a way to make the user aware of his
 system's configuration.

But initdb IS different since it takes so much effort to change locales 
once you've set up a cluster.

Unless the other locales can offer similar performance to the C locale, I 
would suggest that we make the C locale the default.  IF they need 
something else they can change it after initdb. 

If you're an old time user, you know how to set locale, and the 
implications of a non-C locale, so a default of C is no big deal, and 
you're likely to be looking at initdb to see the message telling you it's 
using C.

If you're a beginner you likely need or want a locale of C, but don't know 
it, and don't know that you can't change it without reinitdbing.

My only concern with going with a default locale of C is if it causes a 
problem with data integrity (i.e. constraints that only behave right in a 
certain locale).


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-06 Thread scott.marlowe
On Thu, 5 Jun 2003, Nigel J. Andrews wrote:

 On Wed, 4 Jun 2003, Tom Lane wrote:
 
  Bruce Momjian [EMAIL PROTECTED] writes:
   That is one thing I liked about the initdb mention --- it clearly told
   them to watch out for something they might not have been looking for.
  
  Only if they read the message, though.  People who are running RPM
  installations probably never get to see what initdb has to say ...
  so I can't put much faith in the usefulness of warnings emitted by
  initdb.
  
 
 Yes, I mentioned this when this thread was going a few weeks ago. I only caught
 the locale setting being wrong on a system before it went into production
 because I happened to install on another system and noticed the message. I then
 had to ask the hosting company's SA to first check and then re-initdb. I was
 even sat watching/directing what he was doing and missed it. He was using
 Redhat with RPMs I was doing it properly from source.
 
 Those RPMs are dangerous, they turn you mind off.
 
 I voted for setting 'C' by default.

Everything Nigel just wrote plus one thing.

If it comes down to it, we could always require a --locale setting and 
refuse to initdb without it.  That way, whether it's in an RPM or from 
source, somebody somewhere along the line has to choose something.

Or would that break RPM / automated installs in too nasty a way?


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-06 Thread scott.marlowe
On Fri, 6 Jun 2003, Peter Eisentraut wrote:

 scott.marlowe writes:
 
  If indexes on text worked right in other locales it would be no big deal.
 
 They will in version 7.4, so all these concerns about trading off locale
 use vs. performance will become obsolete.

Oh!  I thought there were still issues that couldn't be worked out on that 
front.  In that case, heck yeah, set the locale on initdb to the current 
system locale.  sweet.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-06 Thread Bruce Momjian
scott.marlowe wrote:
 On Fri, 6 Jun 2003, Peter Eisentraut wrote:
 
  scott.marlowe writes:
  
   If indexes on text worked right in other locales it would be no big deal.
  
  They will in version 7.4, so all these concerns about trading off locale
  use vs. performance will become obsolete.
 
 Oh!  I thought there were still issues that couldn't be worked out on that 
 front.  In that case, heck yeah, set the locale on initdb to the current 
 system locale.  sweet.

The problems go away _if_ the user knows about the new way of indexing
LIKE on non-C locales.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-05 Thread Nigel J. Andrews
On Wed, 4 Jun 2003, Tom Lane wrote:

 Bruce Momjian [EMAIL PROTECTED] writes:
  That is one thing I liked about the initdb mention --- it clearly told
  them to watch out for something they might not have been looking for.
 
 Only if they read the message, though.  People who are running RPM
 installations probably never get to see what initdb has to say ...
 so I can't put much faith in the usefulness of warnings emitted by
 initdb.
 

Yes, I mentioned this when this thread was going a few weeks ago. I only caught
the locale setting being wrong on a system before it went into production
because I happened to install on another system and noticed the message. I then
had to ask the hosting company's SA to first check and then re-initdb. I was
even sat watching/directing what he was doing and missed it. He was using
Redhat with RPMs I was doing it properly from source.

Those RPMs are dangerous, they turn you mind off.

I voted for setting 'C' by default.


-- 
Nigel J. Andrews


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-05 Thread Alvaro Herrera
On Wed, Jun 04, 2003 at 11:05:03PM -0400, Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  That is one thing I liked about the initdb mention --- it clearly told
  them to watch out for something they might not have been looking for.
 
 Only if they read the message, though.  People who are running RPM
 installations probably never get to see what initdb has to say ...
 so I can't put much faith in the usefulness of warnings emitted by
 initdb.

It'd be nice if the RPM installation mailed initdb's messages to someone
([EMAIL PROTECTED] maybe).  It's not impossible, and while it's likely that
RedHat would remove the feature, at least PGDG's RPM would do it.  Same
for DEBs and other binary packages...

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Nunca confiaré en un traidor.  Ni siquiera si el traidor lo he creado yo
(Barón Vladimir Harkonnen)

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-05 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  That is one thing I liked about the initdb mention --- it clearly told
  them to watch out for something they might not have been looking for.
 
 Only if they read the message, though.  People who are running RPM
 installations probably never get to see what initdb has to say ...
 so I can't put much faith in the usefulness of warnings emitted by
 initdb.

True, but for people who _do_ see initdb output, is it helpful, and what
other places can we put it?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-05 Thread Bruce Momjian
Peter Eisentraut wrote:
 Bruce Momjian writes:
 
  How are people going to know to use these special LIKE indexes?
 
 The same way they presumably find out about anything else: RTFM.  A couple
 of more cross-references and index entries need to be added, though.

Well, this isn't one of those, How do I do X but rather something they
will only know they need if they wonder why their LIKE queries are slow
--- that isn't going to be obvious to too many people.  An FAQ may be
required for this --- fortunately we already have an item for indexes.

That is one thing I liked about the initdb mention --- it clearly told
them to watch out for something they might not have been looking for.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-05 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 That is one thing I liked about the initdb mention --- it clearly told
 them to watch out for something they might not have been looking for.

Only if they read the message, though.  People who are running RPM
installations probably never get to see what initdb has to say ...
so I can't put much faith in the usefulness of warnings emitted by
initdb.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-02 Thread Peter Eisentraut
Tom Lane writes:

 I think that a more general solution would be the ability to select a
 locale (and hence a sort order) per-column, as the SQL spec envisions.

It is a general solution, but not for this problem.  The problem was to
make all locales equally suitable for certain optimizations, not to make
locales available in more places.  I won't pretend to anyone that this
little change will bring us anywhere closer to a solution for that other
problem.

 Then you'd just select C locale for columns you wanted to do pattern
 matching for.

That's wrong, for a number of reasons:

First of all, I don't agree at all that cases where you want both pattern
matching and collation are rare; in fact, I rarely see a case where you
don't want both.  Designing a system on that assumption is not sound,
because all operations should be equally possible in all situations.

Second, we will eventually want pattern matching operations to be locale
aware.  Case-sensitive matching needs this, because case mappings depend
on the locale.  The character class features of POSIX regexps also need
this.  So you cannot make locales and well-performing pattern matching
mutually exclusive.

Third, keep in mind that datums with different locales cannot be combined
liberally.  So systems built the way you propose become crippled in ways
that will be hard to understand and justify.

Finally, the locale of a datum should be a property that describes that
language of the stored data and that can be used for that specific purpose
without concerns and tradeoffs with the internal doings of the
optimization engine.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-02 Thread Peter Eisentraut
Bruce Momjian writes:

 Our default indexes will be able to do =, , , ORDER BY, and the
 special index will be able to do LIKE, ORDER BY, and maybe equals.  Do I
 have that correct?

The default operator class supports comparisons (=, , , etc.) and ORDER
BY based on those operators.  The other operator class supports pattern
matching operations (LIKE, SIMILAR, POSIX regexps).

 Looking at CVS, I see the warning about non-C locales has been removed.
 Should we instead mention the new LIKE index method?

I don't see a need.  The old warning was mainly because once you
initdb'ed, you were basically stuck with your choice.  Now we have plenty
of options to query and adjust things later.

 Doing LIKE with single-byte encodings would be easy because it would be
 only 256 compares to find the min/max char values, but that doesn't work
 with multi-byte encodings, right?

This has nothing to do with encodings.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-02 Thread Peter Eisentraut
Tom Lane writes:

 Are there any locales that claim that not-physically-identical strings
 are equal?

In Unicode there are plenty such combinations.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-02 Thread Bruce Momjian
Peter Eisentraut wrote:
 Bruce Momjian writes:
 
  Our default indexes will be able to do =, , , ORDER BY, and the
  special index will be able to do LIKE, ORDER BY, and maybe equals.  Do I
  have that correct?
 
 The default operator class supports comparisons (=, , , etc.) and ORDER
 BY based on those operators.  The other operator class supports pattern
 matching operations (LIKE, SIMILAR, POSIX regexps).
 
  Looking at CVS, I see the warning about non-C locales has been removed.
  Should we instead mention the new LIKE index method?
 
 I don't see a need.  The old warning was mainly because once you
 initdb'ed, you were basically stuck with your choice.  Now we have plenty
 of options to query and adjust things later.

How are people going to know to use these special LIKE indexes?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-01 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 I don't understand why you call this a hack.  Pattern matching and string
 comparison simply work differently, so the proper solution is to use
 different operator classes.  After all, that's what operator classes exist
 for.  What is left to be desired?

I think that a more general solution would be the ability to select a
locale (and hence a sort order) per-column, as the SQL spec envisions.
Then you'd just select C locale for columns you wanted to do pattern
matching for.

Admittedly, you'd still need the opclass-based approach for cases where
you wanted both pattern matching and a non-C-locale sort order ... but
I doubt that constitutes the majority of cases.

I guess my main concern is that we should not feel that this approach
takes the heat off us to support multiple locales.  As a solution to the
narrow problem of LIKE performance, it's okay --- but it's not getting
us any nearer to a solution to the general locale problem.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-01 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Has the single-byte LIKE penalty been eliminated, so we don't need to
  consider using C as the default locale for initdb, right?
 
 I'm still of the opinion that we should make C the default locale.
 But I'm not sure where the consensus is, so I've not made the change.
 
  If fixed, how was it done?
 
 Peter has provided a hack whereby one can create a LIKE-supporting index
 in a non-C locale.  But a *default* index in a non-C locale is still not
 going to support LIKE ... and the hacked index will not support ordinary
 comparison or ordering operators.  So I think there's still a lot left
 to be desired here.

So, my understanding is that you would create something such as:

CREATE INDEX iix ON tab (LIKE col)

and that does LIKE lookups and knows how to do col LIKE 'abc%', but it
can't be used for = or ORDER BY, but it can be used for equality tests?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-01 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 So, my understanding is that you would create something such as:
   CREATE INDEX iix ON tab (LIKE col)
 and that does LIKE lookups and knows how to do col LIKE 'abc%', but it
 can't be used for = or ORDER BY, but it can be used for equality tests?

Hm.  Right at the moment, it wouldn't be used for equality tests unless
you spelled equality as a ~=~ b.  I wonder whether that's necessary
though; couldn't we dispense with that operator and use ordinary
equality as the BTEqual member of these opclasses?  Are there any
locales that claim that not-physically-identical strings are equal?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-06-01 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  So, my understanding is that you would create something such as:
  CREATE INDEX iix ON tab (LIKE col)
  and that does LIKE lookups and knows how to do col LIKE 'abc%', but it
  can't be used for = or ORDER BY, but it can be used for equality tests?
 
 Hm.  Right at the moment, it wouldn't be used for equality tests unless
 you spelled equality as a ~=~ b.  I wonder whether that's necessary
 though; couldn't we dispense with that operator and use ordinary
 equality as the BTEqual member of these opclasses?  Are there any
 locales that claim that not-physically-identical strings are equal?

Let me see if I understand.  

Our default indexes will be able to do =, , , ORDER BY, and the
special index will be able to do LIKE, ORDER BY, and maybe equals.  Do I
have that correct?

Looking at CVS, I see the warning about non-C locales has been removed. 
Should we instead mention the new LIKE index method?

# (Be sure to maintain the correspondence with locale_is_like_safe() in 
selfuncs.c.)
if test x`pg_getlocale COLLATE` != xC  test x`pg_getlocale COLLATE` != 
xPOSIX; then
echo This locale setting will prevent the use of indexes for pattern 
matching
echo operations.  If that is a concern, rerun $CMDNAME with the collation 
order
echo set to \C\.  For more information see the Administrator's Guide.
fi

Doing LIKE with single-byte encodings would be easy because it would be
only 256 compares to find the min/max char values, but that doesn't work
with multi-byte encodings, right?

This LIKE/encoding problem is a tricky one because it gives poor
performance with little warning to users.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-05-31 Thread Bruce Momjian

Has the single-byte LIKE penalty been eliminated, so we don't need to
consider using C as the default locale for initdb, right?

If fixed, how was it done?

---

Peter Eisentraut wrote:
 Tom Lane writes:
 
  I recall someone floating a proposal that initdb should by default
  initialize the database in C locale, not whatever-it-finds-in-the-
  environment.  To get a non-C locale you'd have to give an explicit
  command-line switch --- essentially, reversing the sense of the present
  initdb --no-locale option.
 
 If you're concerned about speed, let's think about fixing the real
 problems, not about disabling the feature altogether.  A while ago I
 proposed an easy solution that made LIKE use an index based on strxfrm
 order instead.  It was rejected on the grounds that it would prevent a
 future enhancement of the LIKE mechanism to use the locale-enabled
 collation order, but no one seems to be seriously interested in
 implementing that.  I still have the patch; we can reconsider it if you
 like.
 
 (Btw., LIKE using the locale-enabled collation sequence is hardly going to
 work, because most locales compare strings backwards from the end to the
 start in the second pass, so something like LIKE 'foo%' can easily give
 inconsistent results, since you don't know what the end of the string
 really is.  It's better to think of pattern matching as
 character-by-character matching.)
 
 -- 
 Peter Eisentraut   [EMAIL PROTECTED]
 
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-05-31 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Has the single-byte LIKE penalty been eliminated, so we don't need to
 consider using C as the default locale for initdb, right?

I'm still of the opinion that we should make C the default locale.
But I'm not sure where the consensus is, so I've not made the change.

 If fixed, how was it done?

Peter has provided a hack whereby one can create a LIKE-supporting index
in a non-C locale.  But a *default* index in a non-C locale is still not
going to support LIKE ... and the hacked index will not support ordinary
comparison or ordering operators.  So I think there's still a lot left
to be desired here.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] default locale considered harmful? (was Re: [GENERAL]

2003-05-31 Thread Peter Eisentraut
Tom Lane writes:

 Peter has provided a hack whereby one can create a LIKE-supporting index
 in a non-C locale.  But a *default* index in a non-C locale is still not
 going to support LIKE ... and the hacked index will not support ordinary
 comparison or ordering operators.  So I think there's still a lot left
 to be desired here.

I don't understand why you call this a hack.  Pattern matching and string
comparison simply work differently, so the proper solution is to use
different operator classes.  After all, that's what operator classes exist
for.  What is left to be desired?

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org