Peter Eisentraut writes:
> On fre, 2012-02-17 at 10:19 -0500, Tom Lane wrote:
>> That's still a cache, you've just defaulted on your obligation to think
>> about what conditions require the cache to be flushed. (In the case at
>> hand, the trigger for a cache rebuild would probably need to be a g
On fre, 2012-02-17 at 10:19 -0500, Tom Lane wrote:
> > What if you did this ONCE and wrote the results to a file someplace?
>
> That's still a cache, you've just defaulted on your obligation to think
> about what conditions require the cache to be flushed. (In the case at
> hand, the trigger for
On Sat, Feb 18, 2012 at 11:16 PM, Tom Lane wrote:
> Robert Haas writes:
>> In theory you can imagine a regular expression engine where these
>> decisions can be postponed until we see the string we're matching
>> against. IOW, your DFA ends up with state transitions for characters
>> specificall
Vik Reykja writes:
> On Sun, Feb 19, 2012 at 05:03, Robert Haas wrote:
>> On Sat, Feb 18, 2012 at 10:38 PM, Vik Reykja wrote:
>>> Does it make sense for regexps to have collations?
>> As I understand it, collations determine the sort-ordering of strings.
>> Regular expressions don't care about
Robert Haas writes:
> In theory you can imagine a regular expression engine where these
> decisions can be postponed until we see the string we're matching
> against. IOW, your DFA ends up with state transitions for characters
> specifically named, plus a state transition for "anything else that'
On Sun, Feb 19, 2012 at 05:03, Robert Haas wrote:
> On Sat, Feb 18, 2012 at 10:38 PM, Vik Reykja wrote:
> > Does it make sense for regexps to have collations?
>
> As I understand it, collations determine the sort-ordering of strings.
> Regular expressions don't care about that. Why do you ask?
On Sat, Feb 18, 2012 at 10:38 PM, Vik Reykja wrote:
> Does it make sense for regexps to have collations?
As I understand it, collations determine the sort-ordering of strings.
Regular expressions don't care about that. Why do you ask?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
T
On Sun, Feb 19, 2012 at 04:33, Robert Haas wrote:
> On Sat, Feb 18, 2012 at 7:29 PM, Tom Lane wrote:
> >> Yeah, it's conceivable that we could implement something whereby
> >> characters with codes above some cutoff point are handled via runtime
> >> calls to iswalpha() and friends, rather than
On Sat, Feb 18, 2012 at 7:29 PM, Tom Lane wrote:
>> Yeah, it's conceivable that we could implement something whereby
>> characters with codes above some cutoff point are handled via runtime
>> calls to iswalpha() and friends, rather than being included in the
>> statically-constructed DFA maps. T
I wrote:
> And here's a poorly-tested draft patch for that.
I've done some more testing now, and am satisfied that this works as
intended. However, some crude performance testing suggests that people
might be annoyed with it. As an example, in 9.1 with pl_PL.utf8 locale,
I see this:
sele
Dimitri Fontaine writes:
> Tom Lane writes:
>> Yeah, it's conceivable that we could implement something whereby
>> characters with codes above some cutoff point are handled via runtime
>> calls to iswalpha() and friends, rather than being included in the
>> statically-constructed DFA maps. The c
Tom Lane writes:
> Yeah, it's conceivable that we could implement something whereby
> characters with codes above some cutoff point are handled via runtime
> calls to iswalpha() and friends, rather than being included in the
> statically-constructed DFA maps. The cutoff point could likely be a lo
NISHIYAMA Tomoaki writes:
> I don't believe it is valid to ignore CJK characters above U+2.
> If it is used for names, it will be stored in the database.
> If the behaviour is different from characters below U+, you will
> get a bug report in meanwhile.
I am skeptical that there is enough
I don't believe it is valid to ignore CJK characters above U+2.
If it is used for names, it will be stored in the database.
If the behaviour is different from characters below U+, you will
get a bug report in meanwhile.
see
CJK Extension B, C, and D
from
http://www.unicode.org/charts/
Al
I wrote:
> The answer, on a reasonably new desktop machine (2.0GHz Xeon E5503)
> running Fedora 16 in en_US.utf8 locale, is that 64K iterations of
> pg_wc_isalpha or sibling functions requires a shade under 2ms.
> So this definitely justifies caching the values to avoid computing
> them more than o
Robert Haas writes:
> On Fri, Feb 17, 2012 at 10:19 AM, Tom Lane wrote:
>> Before going much further with this, we should probably do some timings
>> of 64K calls of iswupper and friends, just to see how bad a dumb
>> implementation will be.
> Can't hurt.
The answer, on a reasonably new desktop
On Fri, Feb 17, 2012 at 10:19 AM, Tom Lane wrote:
>> What if you did this ONCE and wrote the results to a file someplace?
>
> That's still a cache, you've just defaulted on your obligation to think
> about what conditions require the cache to be flushed.
Yep. Unfortunately, I don't have a good i
Robert Haas writes:
> On Fri, Feb 17, 2012 at 3:48 AM, Heikki Linnakangas
> wrote:
>> Recompiling is expensive, but if you cache the results for the session, it
>> would probably be acceptable.
> What if you did this ONCE and wrote the results to a file someplace?
That's still a cache, you've j
On Fri, Feb 17, 2012 at 3:48 AM, Heikki Linnakangas
wrote:
> Here's a wild idea: keep the class of each codepoint in a hash table.
> Initialize it with all codepoints up to 0x. After that, whenever a
> string contains a character that's not in the hash table yet, query the
> class of that char
On 02/17/2012 09:39 AM, Tom Lane wrote:
Heikki Linnakangas writes:
Here's a wild idea: keep the class of each codepoint in a hash table.
Initialize it with all codepoints up to 0x. After that, whenever a
string contains a character that's not in the hash table yet, query the
class of that
Heikki Linnakangas writes:
> Here's a wild idea: keep the class of each codepoint in a hash table.
> Initialize it with all codepoints up to 0x. After that, whenever a
> string contains a character that's not in the hash table yet, query the
> class of that character, and add it to the hash
On 16.02.2012 01:06, Tom Lane wrote:
In bug #6457 it's pointed out that we *still* don't have full
functionality for locale-dependent regexp behavior with UTF8 encoding.
The reason is that there's old crufty code in regc_locale.c that only
considers character codes up to 255 when searching for ch
In bug #6457 it's pointed out that we *still* don't have full
functionality for locale-dependent regexp behavior with UTF8 encoding.
The reason is that there's old crufty code in regc_locale.c that only
considers character codes up to 255 when searching for characters that
should be considered "let
23 matches
Mail list logo