On Fri, Nov 25, 2005 at 09:58:21AM -0500, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > ... There is a fair
> > bit of encoding code in there already but the information you need here
> > is specifically: is this char a control character and in particular, is
> > it a newline.
> 
> The appropriate test for that is just "ch == '\n'", in all the encodings
> we support (although you do have to be careful that ch isn't a non-first
> byte of a multibyte character).  If that's all you need then inventing
> a bunch of additional infrastructure is really inappropriate.

Well, not quite. What about \r (on windows in particular) or \t or
Unicode 0xFEFF (zero width non-breaking space)? For Latin-1 and other
locales based upon ASCII I use (ch < 32) for control characters, not
entirely correct but close.

For UTF-8 I use the function ucs_wcwidth() which already existed in
psql. It already knows what the control characters are, though I have
no idea how up-to-date its table is.

The reason I need to know whether a character is a control character or
not is so it can be printed as a \uNNNN string. Otherwise any of those
characters will destroy the alignment on the screen bringing us right
back to what psql does now: splatter across the screen when there are
control characters in the output.

I didn't add any new information to psql, I just took advantage of what
was already there. The comment about using PQdsplen() is good, but the
routines in libpq are totally inadequate, the utf8 function has a huge
FIXME next to it.

Would people prefer a patch that brought the libpq routines up to
scratch and have mbprint use that?

Have a nice day,
-- 
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment: pgpCcDuvC8cyG.pgp
Description: PGP signature

Reply via email to