Re: [PATCH] console UTF-8 fixes

2007-06-19 Thread Bodo Eggert
On Tue, 19 Jun 2007, Egmont Koblinger wrote: > On Tue, Jun 19, 2007 at 03:54:52PM +0200, Bodo Eggert wrote: > > > Does the FLUSH DTRT by design, or does it just shrink and hide the original > > race? > But you may be right: yes, it might be a bug (or misfeature) in the FB code, > too. Could you

Re: [PATCH] console UTF-8 fixes

2007-06-19 Thread Egmont Koblinger
On Tue, Jun 19, 2007 at 03:54:52PM +0200, Bodo Eggert wrote: > Does the FLUSH DTRT by design, or does it just shrink and hide the original > race? I haven't deeply studied this aspect of the source, don't know what _exactly_ this FLUSH does, but of course I have an inner feeling about this. Kind

Re: [PATCH] console UTF-8 fixes

2007-06-19 Thread Bodo Eggert
Egmont Koblinger <[EMAIL PROTECTED]> wrote: > 2. My patch introduced "question mark with inverted color attributes" as a >last resort fallback glyph. Though it perfectly works on VGA console, on >framebuffer you may end up with question marks that are highlighed but >shouldn't be, and

Re: [PATCH] console UTF-8 fixes

2007-06-19 Thread Egmont Koblinger
Hi folks, Recently my console UTF-8 patch went mainline. Here I send a very small additinal patch that fixes two nasty issues and improves a third one, namely: 1. My patch changed the behavior if a glyph is not found in the Unicode mapping table. Previously for Unicode values less than 256 or

[PATCH] console UTF-8 fixes

2007-04-17 Thread Egmont Koblinger
Hi Andrew, I've been told to send this patch to you for inclusion in your patchset, and hopefully sooner or later in the mainline kernel too. It has been discussed on lkml and finally found to be OK by HPA and Jan. The UTF-8 part of the vt driver suffers from the following issues which are addres

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread Jan Engelhardt
On Apr 12 2007 18:55, Egmont Koblinger wrote: > >> >> I've been thinking on it and I'm not sure which one the right >> >> way is. The reason for choosing this was probably that this way >> >> information that is not used by the code can be omitted by the >> >> compiler. >> > >> > Then let's leave

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread Roman Zippel
Hi, On Thu, 12 Apr 2007, Egmont Koblinger wrote: > On Thu, Apr 12, 2007 at 05:52:49PM +0200, Roman Zippel wrote: > > > Well, it often doesn't end there, other users may report these as bugs and > > want to get them fixed, so we have to look ahead a little for possible > > problems. > > They m

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread H. Peter Anvin
Egmont Koblinger wrote: On Thu, Apr 12, 2007 at 10:35:24AM -0700, H. Peter Anvin wrote: Yes, I didn't realize at the time that that was dead code. :- Version 4 of the patch follows. Dead code omitted from version 3. Looks good to me. -hpa - To unsubscribe from this list: send

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread Egmont Koblinger
On Thu, Apr 12, 2007 at 10:35:24AM -0700, H. Peter Anvin wrote: > Yes, I didn't realize at the time that that was dead code. :- Version 4 of the patch follows. Dead code omitted from version 3. Signed-off-by: Egmont Koblinger <[EMAIL PROTECTED]> diff -Naur linux-2.6.20.orig/drivers/char/c

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread H. Peter Anvin
Egmont Koblinger wrote: On Thu, Apr 12, 2007 at 09:58:38AM -0700, H. Peter Anvin wrote: Not leaving dead code in the kernel is long-standing policy; it's nothing new. We constantly remove #if 0'd code that the authors have left in. I see. However, you wrote it recently: Besides, would it

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread Egmont Koblinger
On Thu, Apr 12, 2007 at 09:58:38AM -0700, H. Peter Anvin wrote: > Not leaving dead code in the kernel is long-standing policy; it's > nothing new. We constantly remove #if 0'd code that the authors have > left in. I see. However, you wrote it recently: > Besides, would it not make more sense

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread H. Peter Anvin
Egmont Koblinger wrote: We've arrived at another coding policy :) There are two possible behaviors, each have pros and cons. HPA prefers one, while Jan and me would prefer the other. The difference is one function that contains a large table and an invocation of that function in a small if bran

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread Egmont Koblinger
On Thu, Apr 12, 2007 at 06:41:22PM +0200, Jan Engelhardt wrote: > >> I've been thinking on it and I'm not sure which one the right way is. The > >> reason for choosing this was probably that this way information that is > >> not > >> used by the code can be omitted by the compiler. > > > > Then le

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread Jan Engelhardt
On Apr 12 2007 08:36, H. Peter Anvin wrote: > Egmont Koblinger wrote: > >> > Besides, would it not make more sense to have a single table with the >> > width information, if you insist on having one, instead of multiple >> > ones? >> >> I've been thinking on it and I'm not sure which one the righ

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread Egmont Koblinger
On Thu, Apr 12, 2007 at 05:52:49PM +0200, Roman Zippel wrote: > Well, it often doesn't end there, other users may report these as bugs and > want to get them fixed, so we have to look ahead a little for possible > problems. They may even report that the current behavior of not knowing anything

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread Roman Zippel
Hi, On Thu, 12 Apr 2007, Egmont Koblinger wrote: > > Considering this possible volatility I'm not certain we really need this > > in the kernel. > > The other point is that I have problems imagining, that this should be > > enough to edit random text files with a random editor without problems.

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread H. Peter Anvin
Egmont Koblinger wrote: I don't think width information for characters in BMP is going to change that often. By the way, a note about the size: the larger one of the two tables is unused and hence optimised away by the compiler. I just left in the source so that it only takes a minor modificati

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread Egmont Koblinger
On Thu, Apr 12, 2007 at 04:38:54PM +0200, Roman Zippel wrote: > Considering this possible volatility I'm not certain we really need this > in the kernel. > The other point is that I have problems imagining, that this should be > enough to edit random text files with a random editor without probl

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread Roman Zippel
Hi, On Thu, 12 Apr 2007, Egmont Koblinger wrote: > I tried to create such a script using ideas for regexps from glibc's > charmaps/UTF-8, but it seemed to be quite hopeless to create a small table. > It seems that Markus probably performed some reasonal manual optimisations > that cannot really b

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread Egmont Koblinger
On Thu, Apr 12, 2007 at 02:13:06PM +0100, Alan Cox wrote: > You can pack them a little differently and they'll shrink a lot. The smaller table would actually slightly grow instead of shrinking. In my patch there are 11 intervals, each consume 2*4 bytes, that's 88 bytes. Your variant would store e

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread Alan Cox
> So please accept these hard-coded tables in the first round. Maybe one day > somebody will come up with a better solution. This one should be okay until > then. (I can also send the script I've written so he can improve it.) You can pack them a little differently and they'll shrink a lot. First

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread Egmont Koblinger
Hi, I send the third version. No major modifications from the second version, only small cleanups, coding style... H. Peter Anvin wrote: > I'm still unhappy about these large search tables in the kernel, not > because they take a huge amount of space (it's not that much), but > because they're

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread Egmont Koblinger
On Wed, Apr 11, 2007 at 09:00:49PM +0200, Jan Engelhardt wrote: > >+struct interval { > >+ int first; > >+ int last; > >+}; > > CodingStyle? uint16_t instead of int? > >+{ 0x1D173, 0x1D182 }, { 0x1D185, 0x1D18B }, { 0x1D1AA, 0x1D1AD }, > >+{ 0xE0001, 0xE0001 }, { 0xE0020, 0xE007F }, {

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread Egmont Koblinger
On Wed, Apr 11, 2007 at 11:36:40AM -0700, H. Peter Anvin wrote: > Egmont Koblinger wrote: > >+static int is_zero_width(long ucs) > >+{ > >+ static const struct interval zero_width[] = { > /* lots */ > >+ }; > > I'm still unhappy about these large search tables in the kernel, not > because they

Re: [PATCH] console UTF-8 fixes

2007-04-12 Thread Jan Engelhardt
On Apr 11 2007 19:36, Pavel Machek wrote: > >> + while (max >= min) { >> +mid = (min + max) / 2; >> +if (ucs > table[mid].last) >> + min = mid + 1; >> +else if (ucs < table[mid].first) >> + max = mid - 1; >> +else >> + return 1; >> + } >> + >> + return 0; >> +} >

Re: [PATCH] console UTF-8 fixes

2007-04-11 Thread Pavel Machek
Hi! > I hope you like it. :) Well, more or less... but you need signed-off-by line, and > @@ -70,6 +70,16 @@ > * malformed UTF sequences represented as sequences of replacement glyphs, > * original codes or '?' as a last resort if replacement glyph is undefined > * by Adam Tla/lka <[EMAIL

Re: [PATCH] console UTF-8 fixes

2007-04-11 Thread Jan Engelhardt
On Apr 11 2007 20:28, Egmont Koblinger wrote: >I send a reworked version of the patch. > >Removed from the first version: > - any sign of '.' as substitute glyph > - don't ignore zero-width characters (except for a few zero-width spaces >that are ignored in the current kernel too). However,

Re: [PATCH] console UTF-8 fixes

2007-04-11 Thread H. Peter Anvin
Egmont Koblinger wrote: +static int is_zero_width(long ucs) +{ + static const struct interval zero_width[] = { /* lots */ + }; I'm still unhappy about these large search tables in the kernel, not because they take a huge amount of space (it's not that much), but because they're invariably

Re: [PATCH] console UTF-8 fixes

2007-04-11 Thread Egmont Koblinger
Hi, I send a reworked version of the patch. Removed from the first version: - any sign of '.' as substitute glyph - don't ignore zero-width characters (except for a few zero-width spaces that are ignored in the current kernel too). However, I kept the code organized and commented so t

Re: [PATCH] console UTF-8 fixes

2007-04-11 Thread Jan Engelhardt
On Apr 10 2007 20:51, Egmont Koblinger wrote: >On Tue, Apr 10, 2007 at 10:30:07AM -0700, H. Peter Anvin wrote: > >> Really? Why is CJK so much more fundamental than, say, Arabic? > >Not more fundamental at all. It's just perhaps easier to "support" (I mean >keep track of the cursor, not to really

Re: [PATCH] console UTF-8 fixes

2007-04-10 Thread Egmont Koblinger
On Tue, Apr 10, 2007 at 10:30:07AM -0700, H. Peter Anvin wrote: > Really? Why is CJK so much more fundamental than, say, Arabic? Not more fundamental at all. It's just perhaps easier to "support" (I mean keep track of the cursor, not to really support them of course). I can't see any reason why

Re: [PATCH] console UTF-8 fixes

2007-04-10 Thread H. Peter Anvin
Alan Cox wrote: What do you exactly mean by this? Doing a binary search in a table of 11 intervals to find out whether a character is double-wide? Adding approximately 30 lines of code (including the table and the binary search routine) to the kernel to handle this case? I don't think it's bloat.

Re: [PATCH] console UTF-8 fixes

2007-04-10 Thread Alan Cox
> What do you exactly mean by this? Doing a binary search in a table of 11 > intervals to find out whether a character is double-wide? Adding > approximately 30 lines of code (including the table and the binary search > routine) to the kernel to handle this case? I don't think it's bloat. It's a I

Re: [PATCH] console UTF-8 fixes

2007-04-10 Thread H. Peter Anvin
Egmont Koblinger wrote: On Tue, Apr 10, 2007 at 08:43:14AM -0700, H. Peter Anvin wrote: I don't see the point in dealing with one particular corner case, I wouldn't really call CJK a *corner* case, just think of how many people use these writing systems. Theoretically it's just one particula

Re: [PATCH] console UTF-8 fixes

2007-04-10 Thread Egmont Koblinger
On Tue, Apr 10, 2007 at 08:43:14AM -0700, H. Peter Anvin wrote: > I don't see the point in dealing with one particular corner case, I wouldn't really call CJK a *corner* case, just think of how many people use these writing systems. Theoretically it's just one particular case, I agree. In practi

Re: [PATCH] console UTF-8 fixes

2007-04-10 Thread H. Peter Anvin
Egmont Koblinger wrote: I know that correctly handling all Unicode scripts, including CJK, Hebrew, Arabic, Indic are a much more complicated story and it's way beyond the scope of kernel. I don't even know whether there's any graphical user-space application handling all these issues perfectly.

Re: [PATCH] console UTF-8 fixes

2007-04-10 Thread Egmont Koblinger
On Sat, Apr 07, 2007 at 10:59:19AM -0700, H. Peter Anvin wrote: > As far as width handling -- in order to make all the text line up under > all circumstances you need more than width handling. [...] > > is is ridiculous. It's much better to draw a line in the sand and say > that this is beyon

Re: [PATCH] console UTF-8 fixes

2007-04-07 Thread H. Peter Anvin
Egmont Koblinger wrote: On Sat, Apr 07, 2007 at 01:00:48PM +0200, Jan Engelhardt wrote: Hi, Please, no dot, and no inverse color. Imagine someone had the following bitmap for : No dot, I'm already convinced. To clarify the inverse thingy: This is what the current kernel does: 1) tries to

Re: [PATCH] console UTF-8 fixes

2007-04-07 Thread Egmont Koblinger
On Sat, Apr 07, 2007 at 01:00:48PM +0200, Jan Engelhardt wrote: Hi, > Please, no dot, and no inverse color. > Imagine someone had the following bitmap for : No dot, I'm already convinced. To clarify the inverse thingy: This is what the current kernel does: 1) tries to display the desired symb

Re: [PATCH] console UTF-8 fixes

2007-04-07 Thread Jan Engelhardt
Hi, I just wanted to give my opinion on things... (and enable utf8 to read this properly) On Apr 7 2007 11:24, Egmont Koblinger wrote: > >> I strongly disagree. First of all, you're changing the semantics of a >> 13-year-old API. The semantics of the Linux console is that by >> specifying U

Re: [PATCH] console UTF-8 fixes

2007-04-07 Thread Egmont Koblinger
On Fri, Apr 06, 2007 at 12:43:03PM -0700, H. Peter Anvin wrote: Hi, > I strongly disagree. First of all, you're changing the semantics of a > 13-year-old API. The semantics of the Linux console is that by > specifying U+FFFD SUBSTITUTION GLYPH in your unicode table, you have > specified the

Re: [PATCH] console UTF-8 fixes

2007-04-06 Thread H. Peter Anvin
Egmont Koblinger wrote: - If a certain (otherwise valid UTF-8) character is not found in the glyph table, the current code does one of these two (depending on other circumstances): - Either it displays the replacement character U+FFFD, falling back to a simple question mark. Note that

[PATCH] console UTF-8 fixes

2007-04-06 Thread Egmont Koblinger
Hi folks, I send a patch to the UTF-8 part of the vt driver. I know that this code has recently been updated to be better than it had been previously, but it still suffers from plenty of bugs. My patch addresses all the issues known by me. The difference in the behavior can easily be seen by tryin