Currently it is not possible to use unicode codepoints > 0xFF on the console, because our UTF-8 decoding logic is badly broken.
The code in question is in wsemul_subr.c, wsemul_getchar(). The problem is that we calculate the number of bytes in a multi-byte sequence by just looking at the high bits in turn: if (frag & 0x20) { frag &= ~0x20; mbleft++; } if (frag & 0x10) { frag &= ~0x10; mbleft++; } if (frag & 0x08) { frag &= ~0x08; mbleft++; } if (frag & 0x04) { frag &= ~0x04; mbleft++; } This is wrong, for several reasons. Firstly, since about 20 years ago, the maximum number of bytes in a UTF-8 sequence has been four, so we shouldn't be checking 0x08 and 0x04, (or rather we should only check that 0x08 is 0 when 0x10 is 1 to indicate a four-byte sequence. Secondly, the check for 0x10 should only be performed when 0x20 is also set. By chance, the current logic successfully decodes UTF-8 encodings of unicode codepoints 0x80 - 0xFF, because these don't touch bits 2-4 of the first byte. However, to use console fonts with more than 256 characters we need this fixed. I created a font with an extra glyph at position 0x100, and am able to use it once I had applied the attached patch. The UTF-8 decoder still needs more work done on it to reject invalid sequences such as over long encodings and the UTF-16 surrogates. But it would be nice to get at least this fix in as it is trivial and allows further experimentation with UTF-8 on the console using fonts with more than 256 glyphs. I'll do a more detailed write-up about this at some time, but since I've already had questions off-list about "why OpenBSD doesn't support more than 256 characters in a font", since I started posting the console patches, I thought it would be good to get this patch out there. --- wsemul_subr.c.dist Fri Oct 18 19:06:41 2013 +++ wsemul_subr.c Sat Feb 25 13:58:00 2023 @@ -125,20 +125,11 @@ if (frag & 0x20) { frag &= ~0x20; mbleft++; + if (frag & 0x10) { + frag &= ~0x10; + mbleft++; + } } - if (frag & 0x10) { - frag &= ~0x10; - mbleft++; - } - if (frag & 0x08) { - frag &= ~0x08; - mbleft++; - } - if (frag & 0x04) { - frag &= ~0x04; - mbleft++; - } - tmpchar = frag; } }