On Fri, Nov 25, 2005 at 09:58:21AM -0500, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > ... There is a fair > > bit of encoding code in there already but the information you need here > > is specifically: is this char a control character and in particular, is > > it a newline. > > The appropriate test for that is just "ch == '\n'", in all the encodings > we support (although you do have to be careful that ch isn't a non-first > byte of a multibyte character). If that's all you need then inventing > a bunch of additional infrastructure is really inappropriate.
Well, not quite. What about \r (on windows in particular) or \t or Unicode 0xFEFF (zero width non-breaking space)? For Latin-1 and other locales based upon ASCII I use (ch < 32) for control characters, not entirely correct but close. For UTF-8 I use the function ucs_wcwidth() which already existed in psql. It already knows what the control characters are, though I have no idea how up-to-date its table is. The reason I need to know whether a character is a control character or not is so it can be printed as a \uNNNN string. Otherwise any of those characters will destroy the alignment on the screen bringing us right back to what psql does now: splatter across the screen when there are control characters in the output. I didn't add any new information to psql, I just took advantage of what was already there. The comment about using PQdsplen() is good, but the routines in libpq are totally inadequate, the utf8 function has a huge FIXME next to it. Would people prefer a patch that brought the libpq routines up to scratch and have mbprint use that? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
pgpCcDuvC8cyG.pgp
Description: PGP signature