Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread db
>> Try the sequence below. Then, try to dump and then reload the database.
>> When you try to reload it, you will get an error:
>>
>> ERROR:  invalid byte sequence for encoding "UTF8": 0xbd
>
> I know this could be a problem (like chr() with invalid byte pattern).

And that's enough of a problem already. We don't need more problems.

> What I really want to know is, read query something like this:
>
> SELECT * FROM japanese_table ORDER BY convert(japanese_text using
> utf8_to_euc_jp);
>
> could be a problem (I assume we use C locale).

If convert() produce a sequence of bytes that can't be interpreted as a
string in the server encoding then it's broken. Imho convert() should
return a bytea value. If we hade good encoding/charset support we could do
better, but we can't today.

The above example would work fine if convert() returned a bytea. In the C
locale the string would be compared byte for byte and that's what you get
with bytea values as well.

Strings are not sequences of bytes that can be interpreted in different
ways. That's what bytea values are. Strings are in a specific encoding
always, and in pg that encoding is fixed to a single one for a whole
cluster at initdb time. We should not confuse text with bytea.

/Dennis


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread db
>>> I think the concern is when they use only one slash, like:
>>>   E'\377\000\377'::bytea
>>> which, as I mentioned before, is not correct anyway.
>
> Wait, why would this be wrong? How would you enter the three byte bytea of
> consisting of those three bytes described above?

Either as

E'\\377\\000\\377'

or

'\377\000\377'

/Dennis



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] like/ilike improvements

2007-05-23 Thread db
> And Dennis said:
>
>> It is only when you have a pattern like '%_' when this is a problem
>> and we could detect this and do byte by byte when it's not. Now we
>> check (*p == '\\') || (*p == '_') in each iteration when we scan over
>> characters for '%', and we could do it once and have different loops
>> for the two cases.
>
> That's pretty much what the patch does now - It never tries to match a
> single byte when it sees "_", whether or not preceeded by "%".

My comment was about UTF-8 since I thought we were making a special
version for UTF-8. I don't know what properties other multibyte encodings
have.

/Dennis


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Money type todos?

2007-03-21 Thread db
> Dennis Bjorklund <[EMAIL PROTECTED]> writes:

>> What is the reason to keep it?
>
> The words-of-one-syllable answer is that D'Arcy Cain is still willing
> to put work into supporting the money type, and if it still gets the
> job done for him then it probably gets the job done for some other
> people too.
>
> Personally, as a former currency trader I've not seen any proposals on
> this list for a "money" type that I'd consider 100% feature complete.
> The unit-identification part of it is interesting, but pales into
> insignificance compared to the problem that the unit values vary
> constantly

The unit (currency) part is what I don't like about the money type.

To have a fast and size limited fixed point type is something I think is
good. It could very well be called money if we want to or we can give it a
more neutral name.

/Dennis


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings