Re: [PATCHES] [PATCH] Formatting patch for psql
On Fri, Nov 25, 2005 at 11:08:45AM -0500, Tom Lane wrote: > Martijn van Oosterhout writes: > > Would people prefer a patch that brought the libpq routines up to > > scratch and have mbprint use that? > > I would. ACK. Will look into it. -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgp3I4wyAIv9z.pgp Description: PGP signature
Re: [PATCHES] [PATCH] Formatting patch for psql
Martijn van Oosterhout writes: > Would people prefer a patch that brought the libpq routines up to > scratch and have mbprint use that? I would. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] [PATCH] Formatting patch for psql
On Fri, Nov 25, 2005 at 09:58:21AM -0500, Tom Lane wrote: > Martijn van Oosterhout writes: > > ... There is a fair > > bit of encoding code in there already but the information you need here > > is specifically: is this char a control character and in particular, is > > it a newline. > > The appropriate test for that is just "ch == '\n'", in all the encodings > we support (although you do have to be careful that ch isn't a non-first > byte of a multibyte character). If that's all you need then inventing > a bunch of additional infrastructure is really inappropriate. Well, not quite. What about \r (on windows in particular) or \t or Unicode 0xFEFF (zero width non-breaking space)? For Latin-1 and other locales based upon ASCII I use (ch < 32) for control characters, not entirely correct but close. For UTF-8 I use the function ucs_wcwidth() which already existed in psql. It already knows what the control characters are, though I have no idea how up-to-date its table is. The reason I need to know whether a character is a control character or not is so it can be printed as a \u string. Otherwise any of those characters will destroy the alignment on the screen bringing us right back to what psql does now: splatter across the screen when there are control characters in the output. I didn't add any new information to psql, I just took advantage of what was already there. The comment about using PQdsplen() is good, but the routines in libpq are totally inadequate, the utf8 function has a huge FIXME next to it. Would people prefer a patch that brought the libpq routines up to scratch and have mbprint use that? Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgpCcDuvC8cyG.pgp Description: PGP signature
Re: [PATCHES] [PATCH] Formatting patch for psql
Martijn van Oosterhout writes: > ... There is a fair > bit of encoding code in there already but the information you need here > is specifically: is this char a control character and in particular, is > it a newline. The appropriate test for that is just "ch == '\n'", in all the encodings we support (although you do have to be careful that ch isn't a non-first byte of a multibyte character). If that's all you need then inventing a bunch of additional infrastructure is really inappropriate. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [PATCH] Formatting patch for psql
On Thu, Nov 24, 2005 at 09:23:55PM -0500, Tom Lane wrote: > Martijn van Oosterhout writes: > > And what I'm saying is that you can only do that if all single byte > > encodings are ASCII compatable and I'm not prepared to say that... > > psql already assumes that; see the flex processor. Hmm, Ok, I didn't realise that. > > So in the switch statement I listed the encodings I know to be ASCII > > compatable and at some point someone who actually knows the other > > encodings will explain what needs to be done. > > Even if this were an appropriate solution, the formatting code is not > the place to put such knowledge... It adds two functions to mbprint.c: pg_wcssize - To calculate the screen size required by the string (rows&cols) pg_wcsformat - To actually format the text in the given lines I agree that these should actually be operating system functions but we can't rely on them. psql already has functions there for validating UTF-8 and converting UTF-8 to UCS so it's in the right place in that sense. If you think it's better moved to libpq I can do that. There is a fair bit of encoding code in there already but the information you need here is specifically: is this char a control character and in particular, is it a newline. Most standard encoding implementations return a screen width of zero for control characters, but we dont do that. I am hopeful once we have ICU support we can do away with the enumeration since we can simply read the Unicode Character database and get consistant results everywhere for all encodings. But until that is done I think this is the best we can do. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgp9oXjPYhRNr.pgp Description: PGP signature
Re: [PATCHES] [PATCH] Formatting patch for psql
Martijn van Oosterhout writes: > And what I'm saying is that you can only do that if all single byte > encodings are ASCII compatable and I'm not prepared to say that... psql already assumes that; see the flex processor. > So in the switch statement I listed the encodings I know to be ASCII > compatable and at some point someone who actually knows the other > encodings will explain what needs to be done. Even if this were an appropriate solution, the formatting code is not the place to put such knowledge... regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [PATCH] Formatting patch for psql
On Thu, Nov 24, 2005 at 11:35:11PM +0100, Peter Eisentraut wrote: > Martijn van Oosterhout wrote: > > On Thu, Nov 24, 2005 at 03:45:12PM +0100, Peter Eisentraut wrote: > > > Shouldn't you use PQmblen() to determine whether an encoding is > > > single-byte, rather than having an enumerated list? > > > > Actually, thinking about it the answer is no. The code is relying on > > the fact that it's a charset that has ASCII as a subset. > > Which code does that? Well, the code you're referring to: the two places where it enumerates the encodings in my patch. That code basically understands \n as going to the next line, but that's ASCII. What I beleive you were suggesting was to replace the switch statement with something like if(PQmblen(...) == 1) { code for latin1 } else if( encoding == UTF8 ) { code for utf8 } else {etc} And what I'm saying is that you can only do that if all single byte encodings are ASCII compatable and I'm not prepared to say that... So in the switch statement I listed the encodings I know to be ASCII compatable and at some point someone who actually knows the other encodings will explain what needs to be done. Hope this helps, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgpmwDQPu5UV1.pgp Description: PGP signature
Re: [PATCHES] [PATCH] Formatting patch for psql
Martijn van Oosterhout wrote: > On Thu, Nov 24, 2005 at 03:45:12PM +0100, Peter Eisentraut wrote: > > Shouldn't you use PQmblen() to determine whether an encoding is > > single-byte, rather than having an enumerated list? > > Actually, thinking about it the answer is no. The code is relying on > the fact that it's a charset that has ASCII as a subset. Which code does that? -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [PATCH] Formatting patch for psql
On Thu, Nov 24, 2005 at 03:45:12PM +0100, Peter Eisentraut wrote: > Shouldn't you use PQmblen() to determine whether an encoding is > single-byte, rather than having an enumerated list? Actually, thinking about it the answer is no. The code is relying on the fact that it's a charset that has ASCII as a subset. Unless someone can say that every single-byte encoding we support has an ASCII subset (EBCDIC being the obvious counter-example for not supported encodings) then I think we have to stick to enumerating them. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgpP2bkwtS9pZ.pgp Description: PGP signature
Re: [PATCHES] [PATCH] Formatting patch for psql
On Thu, Nov 24, 2005 at 03:45:12PM +0100, Peter Eisentraut wrote: > Martijn van Oosterhout wrote: > > Here is my patch to make psql handle multi-line output sanely. > > Instead of embedded newlines splattering your output across the > > screen, everything gets indented to the right column. > > Shouldn't you use PQmblen() to determine whether an encoding is > single-byte, rather than having an enumerated list? Probably. I went through psql to make sure there wasn't a function that did it, I didn't think of checking for undocumented libpq functions. Actually, I see there may be another there that would be nice: PQdsplen. Will look into it, thanks. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgp2eqSEdUvfY.pgp Description: PGP signature
Re: [PATCHES] [PATCH] Formatting patch for psql
Martijn van Oosterhout wrote: > Here is my patch to make psql handle multi-line output sanely. > Instead of embedded newlines splattering your output across the > screen, everything gets indented to the right column. Shouldn't you use PQmblen() to determine whether an encoding is single-byte, rather than having an enumerated list? -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster