I'm working on extending the case conversion methods for the programming language Ruby from the current ASCII only to cover all of Unicode.

Ruby comes with four methods for case conversion. Three of them, upcase, downcase, and capitalize, are quite clear. But we have hit a question for the forth method, swapcase.

What swapcase does is swap upper and lower case, so that e.g.

'Unicode Standard'.swapcase => 'uNICODE sTANDARD'

I'm not sure myself where this method is actually used, but it also exists in Python (and maybe Ruby got it from there).


Now the question I have is: What to do for titlecase characters? Several possibilities already have been floated:

a) Leave as is, because there are neither upper nor lower case.

b) Convert to upper (or lower), which may simplify implementation.

c) Decompose the character into upper and lower case components, and apply swapcase to these.


For example, 'Džinsi' (jeans) would become 'DžINSI' with a), 'DŽINSI' (or 'džinsi') with b), and 'dŽINSI' with c). For another example, 'ᾨδή' would become 'ᾨΔΉ' with a), 'ὨΙΔΉ' (or 'ᾠΔΉ') with b), and 'ὠΙΔΉ' with c).

It looks like Python 3 (3.4.3 in my case) is doing a). My guess is that from an user expectation point of view, c) is best, so I'm tending to go for c). There is no existing data from the Unicode Standard for this, but it seems pretty straightforward.

But before I just implement something, I'd appreciate additional input, in particular from users closer to the affected language communities.

Regards,   Martin.

Reply via email to