Re: Swapcase for Titlecase characters

Marcel Schneider Sat, 19 Mar 2016 09:47:14 -0700

On Sat Mar 19, 2016 12:54:51, Martin J. Dürst  wrote:

> On 2016/03/19 04:33, Marcel Schneider wrote:
> > On Fri, Mar 18, 2016, 08:43:56, Martin J. Dürst wrote:
> 
> >> b) Convert to upper (or lower), which may simplify implementation.
> 
> >> For example, 'ǅinsi' (jeans) would become 'ǅINSI' with a), 'ǄINSI' (or
> >> 'ǆinsi') with b), and 'dŽINSI' with c). For another example, 'ᾨδή' would
> >> become 'ᾨΔΉ' with a), 'ὨΙΔΉ' (or 'ᾠΔΉ') with b), and 'ὠΙΔΉ' with c).
> 
> > Looking at your examples, I would add a case that typically occurs for 
> > swapcase to be applied:
> 
> > ‘ᾠΔΉ’ (cited [erroneously] as a result of option b) that is to be converted 
> > to ‘ᾨδή’, and ‘ǆINSI’, that is to become ‘ǅinsi’.
> 
> First, what do you mean with "erroneously"?


The intent of that bracketed word was just to give account of the fact that 
when ‘ᾨδή’ is converted to lower case as assumed in option “b-lower”, it 
becomes ‘ᾠδή’, while ‘ᾠΔΉ’ is a typical candidate for swapcase, thus I could 
reutilize it “as is” to illustrate the fourth case.

> 
> Second, did I get this right that your additional case (let's call it 
> d)) would cycle through the three options where available:
> lower -> title -> upper -> lower.

I’m afraid that swapcase as I saw it is not a roundtrip method, therefore I got 
some awkward moments today when I thought about how to implement it. As far as 
I could see, there are two pairs:

I: lowercase → titlecase (needed to correct the initials where the user pressed 
the shift modifier)
II: uppercase → lowercase (needed to correct the body of the words input while 
caps lock was on)

That typically matches what happens when caps lock is accidentally on and the 
user writes normally―on a keyboard that includes digraphs and uses the SGCaps 
feature for them, like this:

Modifier; None; Shift
CapsLock off; Lower; Title
CapsLock on; Upper; Lower

Correcting keyboard input done with the wrong caps lock state is the only 
situation I can see where swapcase is needed and thus is likely to be used. 
This is why the swapcase method is implemented in word processors, as a part of 
an optional autocorrect feature that neutralizes the effet of starting a 
sentence normally while caps lock is on: After completing the input of an 
uppercase word with an initial lowercase letter, the word is automatically 
swapcased and caps lock is turned off.

However now that I tested it with the digraph of the examples (input through 
the composer of the keyboard layout), it doesnʼt work at all in one word 
processor, while in another one it works but uppercases the initial lowercase 
digraph instead of titlecasing it. [That may be considered effects of 
“streamlined” implementations that drop the less frequent cases.]


I donʼt believe that it would be useful to make swapcase a roundtrip method, 
and anyway it would be weird because of the letters with three case forms. The 
case conversion cycle you draw above usually applies to words (and doesnʼt work 
correctly in neither of the two tested word processors when an initial Ǳ 
digraph is present), while most letters have identical values for 
Titlecase_Mapping and Uppercase_Mapping, and usually there is no means to flag 
them with “Titlecase_State”. This might be one more reason why current 
implementations of swapcase donʼt match the expected behavior for digraphs.


> 
> > As about decomposing digraphs and ypogegrammeni to apply swapcase: That 
> > probably would be doing no good,
> > as itʼs unnecessary and users wonʼt expect it.
> 
> Why do you say "users won't expect it"? For those users not aware of the 
> encoding internals, I'd indeed guess that's what users would expect, at 
> least in the Croatian case.

That depends on what is the expected result. If the swapcase method is to 
correct inverted casing, users wouldnʼt like to see the digraphs decomposed, 
the less as in the considered languages, the Ǳ digraph is a part of the 
alphabet between ‘D’ and ‘Đ’, so that users are really aware.

> For Greek, it may be different; it depends 
> on the extent to which the iota is seen as a letter vs. seen as a mark.

Here again the user inputs a precomposed letter, with iota subscript because he 
just wants a capitalized word, not an uppercase one. And here again the 
autocorrect doesnʼt work in one word processor, while in the other one it 
applies uppercasing with uppercase iota adscript―while the rest of the word is 
lowercase―instead of capitalization, with lowercase iota adscript or iota 
subcript, that depends on conventions and preferences.

Letʼs take that as a proof how hard it is to implement swapcase with digraph 
support.

I canʼt better conclude this reply than with Asmus Freytagʼs words on Fri, 1st 
Jan 2016 12:09:13 -0800: [1]

> Unicode aims to be expressive enough to model all plain text. That means, it 
> inherits the non-reducible complexity of text. Even the insight that the 
> complexity is non-reducible would be a big step forward.

Regards,

Marcel

[1] Re: Unicode in the Curriculum? from Asmus Freytag (t) on 2016-01-01. 
http://www.unicode.org/mail-arch/unicode-ml/y2016-m01/0001.html

Re: Swapcase for Titlecase characters

Reply via email to