Re: Uppercase version of ß desired

2023-03-14 Thread Celia McInnis
Thanks Philip. Certainly interesting, As long as postgresql and python
return something different for upper and lower case versions of these
letters and python indexing of the string picks out the characters
"properly" then it might not require a python fix for me.

What a can of worms!  But at least postgresql and python do far better with
unicode than mysql and perl did! :-)

Celia McInnis

On Tue, Mar 14, 2023 at 9:12 AM Philip Semanchuk <
phi...@americanefficient.com> wrote:

>
>
> > On Mar 13, 2023, at 5:38 PM, Celia McInnis 
> wrote:
> >
> > HI:
> >
> > I would be really happy if postgresql  had an upper case version of the
> ß german character. The wiki page
> > https://en.wikipedia.org/wiki/%C3%9F
> >
> > indicates that the capital (U+1E9E ẞ LATIN CAPITAL LETTER SHARP S) was
> encoded by ISO 10646 in 2008.
> >
> > BTW the reason that I'd like upper('ß') to give something different than
> 'ß'  is because I have written a simple substitution puzzle for a large
> number of languages where I show the encrypted lower case words in upper
> case and the successful letter substitution submissions in lower case - so
> I need the upper and lower case versions of each letter to be different!
> >
> > Thanks for any assistance! Maybe I can hack what I want in python (which
> is what I am using for the puzzle).
>
> Hi Celia,
> I ran into this too back when we were transitioning from Python 2 to 3 (2
> behaved differently from 3). While researching it I discovered this Python
> issue which maybe sheds some additional light on the subject:
> https://github.com/python/cpython/issues/74993
>
> We ultimately found 90 characters that (under Python 3) grew longer when
> uppercased.
>
> python -c "print([c for c in range(0x80, 0x22ff) if len(chr(c)) !=
> len(chr(c).upper())])”
>
>
> I hope this is at least interesting. :-)
>
> Cheers
> Philip
>
>
>


Re: Uppercase version of ß desired

2023-03-14 Thread Kip Cole
The relevant Unicode reference is 
https://unicode.org/faq/casemap_charprop.html#11

Which basically says that since Unicode 5.0 (its now at Unicode 15.0) stability 
is guaranteed and the upper-casing to  (U+1E9E ẞ LATIN CAPITAL LETTER SHARP S)  
is optional.

> On 14 Mar 2023, at 9:12 pm, Philip Semanchuk  
> wrote:
> 
> 
> 
>> On Mar 13, 2023, at 5:38 PM, Celia McInnis  wrote:
>> 
>> HI:
>> 
>> I would be really happy if postgresql  had an upper case version of the ß 
>> german character. The wiki page 
>> https://en.wikipedia.org/wiki/%C3%9F
>> 
>> indicates that the capital (U+1E9E ẞ LATIN CAPITAL LETTER SHARP S) was 
>> encoded by ISO 10646 in 2008.
>> 
>> BTW the reason that I'd like upper('ß') to give something different than 'ß' 
>>  is because I have written a simple substitution puzzle for a large number 
>> of languages where I show the encrypted lower case words in upper case and 
>> the successful letter substitution submissions in lower case - so I need the 
>> upper and lower case versions of each letter to be different!
>> 
>> Thanks for any assistance! Maybe I can hack what I want in python (which is 
>> what I am using for the puzzle).
> 
> Hi Celia,
> I ran into this too back when we were transitioning from Python 2 to 3 (2 
> behaved differently from 3). While researching it I discovered this Python 
> issue which maybe sheds some additional light on the subject: 
> https://github.com/python/cpython/issues/74993 
> 
> 
> We ultimately found 90 characters that (under Python 3) grew longer when 
> uppercased. 
> 
> python -c "print([c for c in range(0x80, 0x22ff) if len(chr(c)) != 
> len(chr(c).upper())])”
> 
> 
> I hope this is at least interesting. :-)
> 
> Cheers
> Philip



Re: Uppercase version of ß desired

2023-03-14 Thread Philip Semanchuk



> On Mar 13, 2023, at 5:38 PM, Celia McInnis  wrote:
> 
> HI:
> 
> I would be really happy if postgresql  had an upper case version of the ß 
> german character. The wiki page 
> https://en.wikipedia.org/wiki/%C3%9F
> 
> indicates that the capital (U+1E9E ẞ LATIN CAPITAL LETTER SHARP S) was 
> encoded by ISO 10646 in 2008.
> 
> BTW the reason that I'd like upper('ß') to give something different than 'ß'  
> is because I have written a simple substitution puzzle for a large number of 
> languages where I show the encrypted lower case words in upper case and the 
> successful letter substitution submissions in lower case - so I need the 
> upper and lower case versions of each letter to be different!
> 
> Thanks for any assistance! Maybe I can hack what I want in python (which is 
> what I am using for the puzzle).

Hi Celia,
I ran into this too back when we were transitioning from Python 2 to 3 (2 
behaved differently from 3). While researching it I discovered this Python 
issue which maybe sheds some additional light on the subject: 
https://github.com/python/cpython/issues/74993

We ultimately found 90 characters that (under Python 3) grew longer when 
uppercased. 

python -c "print([c for c in range(0x80, 0x22ff) if len(chr(c)) != 
len(chr(c).upper())])”


I hope this is at least interesting. :-)

Cheers
Philip






Re: Uppercase version of ß desired

2023-03-14 Thread Thorsten Glaser
On Tue, 14 Mar 2023, Celia McInnis wrote:

>uc_alphabet = lc_alphabet.replace('ß', 'ẞ').upper()

That’s probably for the best. The uppercase Eszett was only added
to Unicode under the rule that the lowercase Eszett’s case rules
are kept unchanged, and the former’s considered normally only ever
typed manually.

Of course, the grammar rules about uppercasing ß have since changed,
but since there’s two valid ways, choosing is the application’s duty.

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)




Re: Uppercase version of ß desired

2023-03-14 Thread Celia McInnis
Hmmm. Yes the  unicode rules seem to be a little strict on conforming to
the past! I just made the following fix to my python code in forming the
upper case alphabet from the lower case one:

uc_alphabet = lc_alphabet.replace('ß', 'ẞ').upper()

So far I have only found German to have a lower case letter which has the
same value for its upper cased one.

Thanks,
Celia McInnis

.

On Mon, Mar 13, 2023 at 6:54 PM Tom Lane  wrote:

> "Peter J. Holzer"  writes:
> > On 2023-03-13 17:38:51 -0400, Celia McInnis wrote:
> >> I would be really happy if postgresql had an upper case version of the ß
> >> german character.
>
> > But the 'ß' is a bit special as it is usually uppercased to 'SS'
> > (although 'ẞ' is now officially allowed, too).
> > Apparently your (and my) locale doesn't uppercase ß at all, which isn't
> > correct according to German spelling rules but was very common in the
> > last decades.
>
> Our code for libc locales doesn't support upcasing 'ß' to 'SS',
> because it uses towlower() which can only manage
> one-character-to-one-character transformations.  It should work for
> upcasing to 'ẞ', but as you say, you need to find a locale that thinks
> that should happen.
>
> You might have better luck if you have a version of Postgres that
> supports ICU and you can use an ICU locale.  That code path doesn't
> appear to have any hard-wired assumption about how many characters
> in convert to how many out.
>
> regards, tom lane
>


Re: Uppercase version of ß desired

2023-03-13 Thread Tom Lane
"Peter J. Holzer"  writes:
> On 2023-03-13 17:38:51 -0400, Celia McInnis wrote:
>> I would be really happy if postgresql had an upper case version of the ß
>> german character.

> But the 'ß' is a bit special as it is usually uppercased to 'SS'
> (although 'ẞ' is now officially allowed, too).
> Apparently your (and my) locale doesn't uppercase ß at all, which isn't
> correct according to German spelling rules but was very common in the
> last decades.

Our code for libc locales doesn't support upcasing 'ß' to 'SS',
because it uses towlower() which can only manage
one-character-to-one-character transformations.  It should work for
upcasing to 'ẞ', but as you say, you need to find a locale that thinks
that should happen.

You might have better luck if you have a version of Postgres that
supports ICU and you can use an ICU locale.  That code path doesn't
appear to have any hard-wired assumption about how many characters
in convert to how many out.

regards, tom lane




Re: Uppercase version of ß desired

2023-03-13 Thread Peter J. Holzer
On 2023-03-13 17:38:51 -0400, Celia McInnis wrote:
> I would be really happy if postgresql  had an upper case version of the ß
> german character. The wiki page
> https://en.wikipedia.org/wiki/%C3%9F
> 
> indicates that the capital (U+1E9E ẞ LATIN CAPITAL LETTER SHARP S) was encoded
> by ISO 10646 in 2008.

The character is there, of course, and lower-casing it works as
expected:

hjp=> select 'ẞ', lower('ẞ');
╔══╤═══╗
║ ?column? │ lower ║
╟──┼───╢
║ ẞ│ ß ║
╚══╧═══╝
(1 row)

But the 'ß' is a bit special as it is usually uppercased to 'SS'
(although 'ẞ' is now officially allowed, too).

Apparently your (and my) locale doesn't uppercase ß at all, which isn't
correct according to German spelling rules but was very common in the
last decades.

You can specify an alternate locale:

hjp=> select upper('ß');
╔═══╗
║ upper ║
╟───╢
║ ß ║
╚═══╝
(1 row)


hjp=> select upper('ß' collate "de-AT-x-icu");
╔═══╗
║ upper ║
╟───╢
║ SS║
╚═══╝
(1 row)


The challenge now is to find a locale which uppercases ß to ẞ.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Uppercase version of ß desired

2023-03-13 Thread Celia McInnis
HI:

I would be really happy if postgresql  had an upper case version of the ß
german character. The wiki page
https://en.wikipedia.org/wiki/%C3%9F

indicates that the capital (U+1E9E ẞ LATIN CAPITAL LETTER SHARP S) was
encoded  by ISO 10646
 in 2008.

BTW the reason that I'd like upper('ß') to give something different than
'ß'  is because I have written a simple substitution puzzle for a large
number of languages where I show the encrypted lower case words in upper
case and the successful letter substitution submissions in lower case - so
I need the upper and lower case versions of each letter to be different!

Thanks for any assistance! Maybe I can hack what I want in python (which is
what I am using for the puzzle).

Celia McInnis