On fre, 2012-02-17 at 10:19 -0500, Tom Lane wrote:
What if you did this ONCE and wrote the results to a file someplace?
That's still a cache, you've just defaulted on your obligation to think
about what conditions require the cache to be flushed. (In the case at
hand, the trigger for a
Peter Eisentraut pete...@gmx.net writes:
On fre, 2012-02-17 at 10:19 -0500, Tom Lane wrote:
That's still a cache, you've just defaulted on your obligation to think
about what conditions require the cache to be flushed. (In the case at
hand, the trigger for a cache rebuild would probably need
I don't believe it is valid to ignore CJK characters above U+2.
If it is used for names, it will be stored in the database.
If the behaviour is different from characters below U+, you will
get a bug report in meanwhile.
see
CJK Extension B, C, and D
from
http://www.unicode.org/charts/
NISHIYAMA Tomoaki tomoa...@staff.kanazawa-u.ac.jp writes:
I don't believe it is valid to ignore CJK characters above U+2.
If it is used for names, it will be stored in the database.
If the behaviour is different from characters below U+, you will
get a bug report in meanwhile.
I am
Tom Lane t...@sss.pgh.pa.us writes:
Yeah, it's conceivable that we could implement something whereby
characters with codes above some cutoff point are handled via runtime
calls to iswalpha() and friends, rather than being included in the
statically-constructed DFA maps. The cutoff point could
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
Tom Lane t...@sss.pgh.pa.us writes:
Yeah, it's conceivable that we could implement something whereby
characters with codes above some cutoff point are handled via runtime
calls to iswalpha() and friends, rather than being included in the
I wrote:
And here's a poorly-tested draft patch for that.
I've done some more testing now, and am satisfied that this works as
intended. However, some crude performance testing suggests that people
might be annoyed with it. As an example, in 9.1 with pl_PL.utf8 locale,
I see this:
On Sat, Feb 18, 2012 at 7:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Yeah, it's conceivable that we could implement something whereby
characters with codes above some cutoff point are handled via runtime
calls to iswalpha() and friends, rather than being included in the
statically-constructed
On Sun, Feb 19, 2012 at 04:33, Robert Haas robertmh...@gmail.com wrote:
On Sat, Feb 18, 2012 at 7:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Yeah, it's conceivable that we could implement something whereby
characters with codes above some cutoff point are handled via runtime
calls to
On Sat, Feb 18, 2012 at 10:38 PM, Vik Reykja vikrey...@gmail.com wrote:
Does it make sense for regexps to have collations?
As I understand it, collations determine the sort-ordering of strings.
Regular expressions don't care about that. Why do you ask?
--
Robert Haas
EnterpriseDB:
On Sun, Feb 19, 2012 at 05:03, Robert Haas robertmh...@gmail.com wrote:
On Sat, Feb 18, 2012 at 10:38 PM, Vik Reykja vikrey...@gmail.com wrote:
Does it make sense for regexps to have collations?
As I understand it, collations determine the sort-ordering of strings.
Regular expressions
Robert Haas robertmh...@gmail.com writes:
In theory you can imagine a regular expression engine where these
decisions can be postponed until we see the string we're matching
against. IOW, your DFA ends up with state transitions for characters
specifically named, plus a state transition for
Vik Reykja vikrey...@gmail.com writes:
On Sun, Feb 19, 2012 at 05:03, Robert Haas robertmh...@gmail.com wrote:
On Sat, Feb 18, 2012 at 10:38 PM, Vik Reykja vikrey...@gmail.com wrote:
Does it make sense for regexps to have collations?
As I understand it, collations determine the sort-ordering
On Sat, Feb 18, 2012 at 11:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Robert Haas robertmh...@gmail.com writes:
In theory you can imagine a regular expression engine where these
decisions can be postponed until we see the string we're matching
against. IOW, your DFA ends up with state
On 16.02.2012 01:06, Tom Lane wrote:
In bug #6457 it's pointed out that we *still* don't have full
functionality for locale-dependent regexp behavior with UTF8 encoding.
The reason is that there's old crufty code in regc_locale.c that only
considers character codes up to 255 when searching for
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
Here's a wild idea: keep the class of each codepoint in a hash table.
Initialize it with all codepoints up to 0x. After that, whenever a
string contains a character that's not in the hash table yet, query the
class of that
On 02/17/2012 09:39 AM, Tom Lane wrote:
Heikki Linnakangasheikki.linnakan...@enterprisedb.com writes:
Here's a wild idea: keep the class of each codepoint in a hash table.
Initialize it with all codepoints up to 0x. After that, whenever a
string contains a character that's not in the
On Fri, Feb 17, 2012 at 3:48 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
Here's a wild idea: keep the class of each codepoint in a hash table.
Initialize it with all codepoints up to 0x. After that, whenever a
string contains a character that's not in the hash table
Robert Haas robertmh...@gmail.com writes:
On Fri, Feb 17, 2012 at 3:48 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
Recompiling is expensive, but if you cache the results for the session, it
would probably be acceptable.
What if you did this ONCE and wrote the results to
On Fri, Feb 17, 2012 at 10:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:
What if you did this ONCE and wrote the results to a file someplace?
That's still a cache, you've just defaulted on your obligation to think
about what conditions require the cache to be flushed.
Yep. Unfortunately, I don't
Robert Haas robertmh...@gmail.com writes:
On Fri, Feb 17, 2012 at 10:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Before going much further with this, we should probably do some timings
of 64K calls of iswupper and friends, just to see how bad a dumb
implementation will be.
Can't hurt.
The
I wrote:
The answer, on a reasonably new desktop machine (2.0GHz Xeon E5503)
running Fedora 16 in en_US.utf8 locale, is that 64K iterations of
pg_wc_isalpha or sibling functions requires a shade under 2ms.
So this definitely justifies caching the values to avoid computing
them more than once
In bug #6457 it's pointed out that we *still* don't have full
functionality for locale-dependent regexp behavior with UTF8 encoding.
The reason is that there's old crufty code in regc_locale.c that only
considers character codes up to 255 when searching for characters that
should be considered
23 matches
Mail list logo