Re: Experiments with classical Greek keyboard input

2006-05-09 Thread Vasilis Vasaitis
  Since I'm still cc'ed here...

On Tue, May 09, 2006 at 09:04:52PM +0300, Joe Schaffner wrote:

..[snip]..

 dead_acute is on the semi-colon key and dead_horn is on the same
 key, shifted, the colon key.
 
 dead_grave is on the single-quote key and dead_ogonek is on the
 double-quote key.
 
 That's a pretty good layout. I like it.
 
 Why not name these keysyms dead_psili and dead_dasia?

  Because the list of keysyms is fixed, as defined in
/usr/include/X11/keysymdef.h. At the time, using arbitrary existing
keysyms made more sense than petitioning for correctly-named new
ones. It works, after all. But OK, now maybe it's time to ask for a
few new names if people are annoyed by the current state of affairs.

 Anyway, I activate the gr keymap like this:
 
 setxkbmap us,gr(polytonic) -option grp:alt_shift_toggle
 
 The command syntax is troublesome. There seem to be other ways of
 doing it. Maybe I'm wrong, but it seems to work.

  The canonical invocation would be:

setxkbmap -layout us,gr -variant ,polytonic \
-option grp:alt_shift_toggle

 Yes, the keymap is there, I can see it on the task bar. To switch to
 another group, I can use the alt_shift combination (another meta
 symbol? Where are all these symbols defined?).

  In /etc/X11/xkb, rules/xorg transforms grp:alt_shift_toggle to
group(alt_shift_toggle). So you can look at the relevant section in
symbols/group to see how this implements the layout switching. It all
boils down to the generation of the ISO_Next_Group and ISO_Prev_Group
keysyms.

 Yes, I can enter greek characters. The dead_acute seems to work, but
 I am not sure if it is outputting a tonos or a acute. It's probably a
 tonos.
 
 None of the other dead keys seem to work.
 
 Any ideas?

  For the others to work, you need to have at least
LC_CTYPE=el_GR.UTF-8. In my system, with LANG=el_GR.UTF-8, everything
is working as it should. Keep in mind that for GTK+ applications you
also need GTK_IM_MODULE=xim defined (or else you have to right-click
on each textbox, and select Input Methods - X Input Method).


-- 
Vasilis Vasaitis
A man is well or woe as he thinks himself so.



--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: Text printing with Openoffice

2005-11-14 Thread Vasilis Vasaitis
On Mon, Nov 14, 2005 at 09:51:27AM +0100, Koblinger Egmont wrote:
 On Mon, Nov 14, 2005 at 12:52:23AM +0800, Abel Cheung wrote:
 
  On 11/13/05, Koblinger Egmont [EMAIL PROTECTED] wrote:
   fancy, just the good old fixed-width fonts with 80 columns, but the 
   accented
   (NFC) letters are okay.
  
  ... While all multibyte characters become junk. (since 2001)
 
 What do you mean by multibyte characters? Of course all the accented letters
 are multibyte characters in UTF-8. I created several simple text files in
 UTF-8 encoding, containing standard accented letters that are also part of
 latin-1 or latin-2 (e.g. e with acute grave, e with acute accent, o with
 double acute) as well as euro symbol, low-99 and high-99 quote marks etc.,
 sent them to the printer with lpr filename (with LANG=hu_HU.UTF-8 and no
 other LC_* variables) and they all got printed correctly.

  I tried printing a simple UTF-8 text file with greek text, and the
result was quite inadequate. It managed to get the simple letters from
the Symbol font (I assume), but the accented letters did not get
printed out at all. The result is both ugly and unreadable for the
most part.

  The OOo method, on the other hand, handled it fine.

 What I didn't test is double-width (cjk) characters, combining symbols,
 non-printable characters, invalid UTF-8 sequences and other similar more
 tricky files. It's easily possible that OOo is better in this respect.


-- 
Vasilis Vasaitis
A man is well or woe as he thinks himself so.



--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: Unicode Keyboard Input Linux

2004-06-15 Thread Vasilis Vasaitis
On Mon, Jun 14, 2004 at 11:39:44PM +0200, Pablo Saratxaga wrote:
 Kaixo!
 
 On Sat, Jun 12, 2004 at 09:56:52AM -0700, Elvis Presley wrote:

..[snip]..

  This is about as complicated as it gets in polytonic
  Greek, three dead keys, two pre-position, one
  post-position, 'w' representing omega, and an 'i' for
  iota subscript. 
 
 No, dead keys cannot be post-position; they must always be typed
 *before* the key they modify; that is in fact the very definition
 of a dead_key: they modify the behavioiur of what is typed after them.
 If it is typed after it is not a dead key, but just a regular key.
 
 The ways already defined in el_GR.UTF-8 X11 Compose file for U1fa2
 (, omega with psili varia and ypogrammeni) are:
 
 Multi_key bar greater grave Greek_omega   :   U1fa2
 Multi_key bar grave greater Greek_omega   :   U1fa2
 Multi_key greater bar grave Greek_omega   :   U1fa2
 Multi_key greater grave bar Greek_omega   :   U1fa2
 Multi_key grave bar greater Greek_omega   :   U1fa2
 Multi_key grave greater bar Greek_omega   :   U1fa2
 dead_iota dead_horn dead_grave Greek_omega  :   U1fa2
 dead_iota dead_grave dead_horn Greek_omega  :   U1fa2
 dead_horn dead_iota dead_grave Greek_omega  :   U1fa2
 dead_horn dead_grave dead_iota Greek_omega  :   U1fa2
 dead_grave dead_iota dead_horn Greek_omega  :   U1fa2
 dead_grave dead_horn dead_iota Greek_omega  :   U1fa2
 
 6 ways to type it with dead keys (corresponding to the six
 possible combinations of the three dead keys; but dead keys
 always after the letter)
 and 6 ways to type it with Multi_key (you press Multi_key, then
 the following keys in the given order).

  Note that, even Multi_key combinations always have the letter last,
so that, when a letter arrives, it is certain that the sequence is
complete. See my comments below.

 What you would like would be in fact:
 
 dead_horn dead_grave Greek_omega U0345 :   U1fa2
 dead_grave dead_horn Greek_omega U0345 :   U1fa2
 
 (that is, two dead keys, followed by two normal keys; a key sending
 Greek_omega and a key sending U0345 (COMBINING GREEK YPOGEGRAMMENI)
 
 I haven't tested it but if it works, it could indeed be added for
 all the cases and a layout with U0345 instead of dead_iota, if
 that is more intuitive to type.
 
  The keyboard map is therefore more than a map, it is a
  fsm, a stateful-map.
 
 That is not supported at all.
 If you need that, you need to develop an input method actually
 (like japanese or vietnamese use), that is, a program that interpretes
 what you type and produces a different input.
 
 Yes there is something of that in console (but very limited) and
 in X11 (more powerfull), but it is always linear.
 
 (also, I m' not sure if it is possible to have, for example,
 dead_horn dead_grave Greek_omega U0345 and
 dead_horn dead_grave Greek_omega sequences (that is, sequences
 that one is subset of another))

  You can't. The problem with that is that, if you wanted to type the
second sequence, the composition engine wouldn't know whether to stop
there and emit the symbol, or to wait for another symbol to complete
the sequence. So it waits. This could probably be fixed (partly): when
a symbol comes that causes the sequence to become invalid, the engine
could check the compose sequence just before the arrival of that
symbol, and emit the result. But this is not the current behaviour.

  If I change keyboards in
  midstream (using alt-a, for example), the fsm would
  output the components of an unaccepted character
  individually. How far will keymaps go?
 
 You can't.
 pressing Alt-A means (or any other key) means you broke the sequence.
 in such case you simply lost what you typed in the incomplete sequence.

  Indeed.


-- 
Vasilis Vasaitis
A man is well or woe as he thinks himself so.



--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: Unicode Keyboard Input Linux

2004-06-15 Thread Vasilis Vasaitis
On Mon, Jun 14, 2004 at 08:43:38AM -0700, Elvis Presley wrote:

..[snip]..

 Comparing characters would be easy, they compare as
 unsigned integers, but sorting them would be a
 problem, because you'd want to group all the
 (accented) vowels together, according to language
 specific rules. In Greek, this wouldn't be a problem,
 because monotonic vowels and polytonic vowels, though
 occupying different code ranges, are not mixed in the
 same word: they are essentially different languages. A
 'tonos' is not a 'oxia' or a 'varia'.

  Actually, tonos and oxia are treated as equivalents in Unicode.
Nevertheless, sorting wouldn't be a problem indeed, because it is done
according to the base letter only, punctuation is irrelevant.

 Why do Greek newspapers still use ISO 8859-7?

  If it ain't broke, don't fix it.

 nightmare), but if you're only working in Greek, why
 not stick with what you know?

  Exactly. Nothing to do with size issues, and everything to do with
that. Plus, a major operating system doesn't really support UTF-8, and
instead concentrates on UTF-16, which is unusable in UNIX/GNU systems
for most practical purposes.

 My Microsoft browser(=IE) has problems with ISO Greek
 and Windows Greek, especially capital Alpha with
 tonos: it gets confused, and displays a box.

  Well actually, this particular letter is the only incompatibility
between the two character sets. In ISO-8859-7, this letter occupies
the code point that MS Word once had hardcoded as representing the
paragraph symbol. So for Windows-1253, Microsoft put the paragraph
symbol there and moved capital Alpha with tonos elsewhere.



-- 
Vasilis Vasaitis
A man is well or woe as he thinks himself so.



--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: JOE editor has just added UTF-8 support

2004-05-04 Thread Vasilis Vasaitis
On Mon, May 03, 2004 at 10:27:10PM +0900, Derek Martin wrote:

..[snip]..

 in Gaim.  =8^)  Now if only Mutt will work properly with UTF-8...

  Err... I'm reading these messages inside mutt, which in turn runs
under a UTF-8 enabled xterm (uxterm), with the el_GR.UTF-8 locale. And
let me tell you, it works great, and in fact it's been supporting
UTF-8 for a long time now.

  Make sure that you have a fairly recent version of mutt, and that
it's compiled against ncursesw, not plain ncurses or slang, and you
should be set. The Debian unstable package is what I'm using, BTW.


-- 
Vasilis Vasaitis
A man is well or woe as he thinks himself so.



--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: JOE editor has just added UTF-8 support

2004-05-03 Thread Vasilis Vasaitis
On Mon, May 03, 2004 at 07:32:24PM +0200, Jan Willem Stumpel wrote:

 Thanks very much, this clears up a lot. A few more questions:
 
 1. Most GTK+ programs allow right-clicking in text boxes to change the
 input method, but Mozilla, unfortunately, does not. But it *is* affected
 by the GTK_IM_MODULE=xim environment variable, so it appears to be a
 GTK+ program all right. In fact, starting Mozilla (from the command
 line) with
 
 GTK_IM_MODULE=im-ja mozilla
 
 works, resulting in a Mozilla which accepts Japanese input -- without
 using kinput2!
 
 Now it would be extremely nice if this could also be done (somehow)
 dynamically, on the fly. GTK+ programs seem to be able to do this, so
 (I think) Mozilla should be able to do it also. Is there a way to
 achieve this?

  Mozilla twists and turns GTK+ to its whims, so the result is very
different than usual GTK+ applications. AFAIK, there's no equivalent
method to switch input method; if it's important for you, you could
try filing a bug report in their bugzilla, asking for the Input
Methods submenu to be added to the input box context menu.

 2. In programs which *do* allow right-clicking for input method
 selection, the default input method is (apparently) less useful than
 the xim input method, because of the less-than-perfect Compose
 implementation in default. Is there a way to make xim the default?
 Anyway, what exactly is the default in the input methods menu? Where
 is it defined?

  It's already been stated that you can specify GTK_IM_MODULE=xim in
your environment, which makes all applications use that by default.
Isn't that good enough for you?


-- 
Vasilis Vasaitis
A man is well or woe as he thinks himself so.



--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/



Re: mp3-tags, zip-archives, tool to convert filenames to UTF

2003-02-14 Thread Vasilis Vasaitis
On Fri, Feb 14, 2003 at 07:01:56PM +0100, Helge Hielscher wrote:

 success. Is there a way to convert all ID3-Tags to Unicode? How does 
 ogg-Vorbis handle this issue?

  Ogg Vorbis comment values are encoded in UTF-8 by default, see:

http://www.xiph.org/ogg/vorbis/doc/v-comment.html

 3) Do .zip Files store the encoding of the filenames somewhere and will 
 unzip convert the encodings to utf8? How about uft8 and the other 
 packers/archivers (tar,ace,rar)? Are there any known problems?

  I would expect these to behave pretty much like filesystems
themselves do.

-- 
Vasilis Vasaitis
[EMAIL PROTECTED]
+306976604701


--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Emacs automatic UTF-8 setup

2002-12-07 Thread Vasilis Vasaitis
  Hello,

  Lately, I've started to slowly migrate my environment to
UTF-8. Since I don't feel ready to do a complete switch yet, as the
environment doesn't seem to be mature enough, I like to have a
transition period, when I can run applications in either a UTF-8 or
a non-UTF-8 locale, as needed. More specifically, I want to be able to
constantly switch back and forth between el_GR.ISO8859-7 and
el_GR.UTF-8.

  Most programs don't need any particular setup for this. Emacs [0],
however, is a notable exception. In its default setup, it doesn't
completely support a UTF-8 environment, the main problem being that it
doesn't recognise UTF-8 keyboard input. So I set out to discover the
minimum configuration possible, so that it would fully support the
UTF-8 locale, without creating any problems at the ISO8859-7 locale at
the same time. In addition, it would have to work both in X11 and
terminal mode, and in the latter, both on the Linux console and inside
an xterm. The result isn't the most obvious setup, so I thought I'd
post it here, in the hope that others find it useful as well
(esp. Emacs developers).

  First of all, I wanted to make sure that Emacs automatically sets
the language environment to Greek in all cases, without actually
configuring it to be the default. This is accomplished with the
following line in .emacs:

(setq locale-language-names (cdr locale-language-names))

  The variable locale-language-names is a list of patters that match
locale names to names of language environments. In my version of
Emacs, the first entry inhibits all UTF-8 locales from setting any
language environment. In my case, this seems to cause more harm than
good, so I eliminate that entry with the above command.

  In addition, I want to set the various coding systems for each
locale to sane values. This is achieved with the following piece of
code:

(setq locale-preferred-coding-systems
  (cons (cons .*\\.utf-8 'utf-8) locale-preferred-coding-systems))
((lambda (cs)
   (set-keyboard-coding-system cs)
   (if cs (set-terminal-coding-system cs)))
 (set-locale-environment nil))

  This makes UTF-8 the preferred coding system for UTF-8 locales, and
sets the various coding systems according to the current locale
settings. Now Emacs behaves just like most other applications: assumes
an 8-bit, ISO8859-7 environment under the el_GR.ISO8859-7 locale, and
a multi-byte, UTF-8 environment when run under el_GR.UTF-8.


[0] I use GNU Emacs 21.2-5, the latest version in Debian unstable.

-- 
Vasilis Vasaitis
[EMAIL PROTECTED]
+30976604701


--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: [Devel] Re: Linux Console in UTF-8 - current state

2002-10-07 Thread Vasilis Vasaitis

On Sun, Oct 06, 2002 at 07:35:53PM +0400, Vadim Plessky wrote:
 
 Let's also clarify a few things here.
 Do you speak about CJK fonts, or about Latin+Greek+Cyrillic fonts, right?
 There are some rather good fonts available *for free* for Latin+Greek+Cyrillic 

  Not to my experience. There seem to be a lot of Latin+Cyrillic fonts
around, and I would guess that at least some of them have quite good
quality, but for Greek fonts this is not the case. OTOH, there is a
serious shortage of Greek fonts that are (a) indisputably free, and
(b) of adequate quality so as not to pop my eyeballs off their
sockets. No, come to think of it, there is even a shortage of Greek
fonts that only satisfy the (a) condition.

 alphabet. And I am working on set of my own fonts (Latin+Cyrillic, Greek can 
 follow, as it's not extremly difficult to add if you have already Cyrillic 
 glyphs)

  Well, it's not that easy either. Capital letters are almost ready if
you have capital Latin  Cyrillic ones, but a lot of the small letters
are *very* different from anything else in the other scripts.

  Anyway, I don't intend to criticise anyone, just wanted to clarify
some things...

-- 
Vasilis Vasaitis
[EMAIL PROTECTED]


--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: UTF-8 file to ASCII file converter

2002-04-12 Thread Vasilis Vasaitis

On Thu, Apr 11, 2002 at 12:04:18AM -0700, Pedro Ferreira wrote:
 I already have a perl script (thanks to Oyvind A.
 Holm) that converts an ascii file with U+ unicode
 codes to an utf-8 file.
 Now I would like to do the oposite, convert an utf-8
 file to an ascii file, each utf-8 character would be
 encoded back to U+. Many thanks in advance for any
 help!

  Just like in the case of the opposite conversion, this conversion can also
be easily achieved with an one-liner. The following seems to be able to do
the job:

  perl -ne 'for (unpack U*, $_) { printf $_  255 ? U+%04X : %c, $_ }'

-- 
Vasilis Vasaitis
[EMAIL PROTECTED]

Don't do drugs. Santa Claus is watching.
-- winamp.com


--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: UTF-8 file to ASCII file converter

2002-04-12 Thread Vasilis Vasaitis

On Fri, Apr 12, 2002 at 11:11:41AM -0400, [EMAIL PROTECTED] wrote:
 On Fri, 12 Apr 2002, Vasilis Vasaitis wrote:
 
Just like in the case of the opposite conversion, this conversion can also
  be easily achieved with an one-liner. The following seems to be able to do
  the job:
  
perl -ne 'for (unpack U*, $_) { printf $_  255 ? U+%04X : %c, $_ }'
 
  Unless you regard ISO-8859-1 as a synonym to US-ASCII, '255' has to
 be '127' :-) 

  Er, right. That's what I meant, actually, but I guess I wasn't thinking
much at that moment :^). And since I only tested this with an iconv'ed
ISO-8859-7 text to UTF-8, I didn't even notice...

Cheers,

-- 
Vasilis Vasaitis
[EMAIL PROTECTED]

Don't do drugs. Santa Claus is watching.
-- winamp.com


--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: ASCII file to UTF-8 file converter

2002-03-26 Thread Vasilis Vasaitis

On Tue, Mar 26, 2002 at 06:58:58AM -0800, Pedro Ferreira wrote:
 Please, what is the best tool to convert an ascii file
 with unicode character codes like this:
 U+3400
 U+3405
 to another UTF-8 file with the corresponding unicode
 characters?
 Many thanks!
 Pedro Ferreira

  Perl, of course! Try something like this:

perl -pe 's/U\+([0-9A-Fa-f]{4})/pack U, hex $1/ge'

  No wonder they call it the UNIX swiss army chainsaw...

-- 
Vasilis Vasaitis
[EMAIL PROTECTED]

Don't do drugs. Santa Claus is watching.
-- winamp.com


--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: SI/SO G0/G1 in linux console

2002-02-28 Thread Vasilis Vasaitis

On Thu, Feb 28, 2002 at 03:26:38PM +0100, Erika Pacholleck wrote:
 When doing some experimenting with the acm/screen maps I discovered
 some strange things, as to not-working and vice-versa-working. And
 I would need an advice where to hook in for fixing, please.
 
 system information: 
 - running linux-2.4.10, compiled with/against 2.4.5
 - kbd-1.06, ncurses-5.2
 state after system start:
 - with matrox fb
 - with keymap de-latin-nodeadkeys
 - without font loading/changing
 - and $TERM=linux
 - and dumpkeys saying 0x000f=control_o 0x000e=control_n
 - and inputrc allowing 8-bits
 - and locale charmap as ISO-8859-1

  So far so good.

 expected behaviour from this:
 - G0 set to default latin1 and G1 set to VT100 graphics
 - typing [Ctrl]+[o] sending Control_o (switch to G0)
 - typing [Ctrl]+[n] sending Control_n (switch to G1)

  Expected behaviour where? You can't expect terminal applications (I assume
you were using the shell here) to just echo every character they receive to
the terminal. That defeats the whole point of keyboard handling. For
example, a lot of programs, upon receiving ^L, while redraw the screen, and
not send that character to the terminal. A notable exception to this is
cat(1), which will just echo at its output whatever it receives in its
input. In fact, cat is very useful for tests like this.

 here is what happens in reality:
 docs  hit keysecho -e orders
 SI=G0=^O  [Ctrl]+[o]  \\033o  \\x1bo  \\x0e  \\016
 SO=G1=^N  [Ctrl]+[n]  \\033n  \\x1bn  \\x0f  \\017
 resultsnegative neg.neg.  both switched
 
 Ctrl+o looks like CR is sent; Ctrl+n beeps

  As I said, just because you press ^N or ^O at the bash prompt, doesn't
mean that it will echo them to the terminal. Use cat and it will do what you
expect.

 \\033o  \\x1bo and according n's don't show any change

  These sequences are ESCn and ESCo, completely different from the control
characters you are talking about.

 \\x0e  \\016 look like switched to G1 VT100 graphics
 \\x0f  \\017 look like switched to G0 latin1
 
 So only echoing the hex/oct values seem to get at least the G0/G1 maps
 -- only that they now are just the other way round than it should be.

  Actually, the values you are using should be the other way round, as ^N is
ASCII 14 and ^O is ASCII 15. As a rule of thumb, the ASCII value for ^X,
where X is any letter of the english alphabet, equals the position of that
letter in the alphabet.

 And there seems to be no program to report the status (like say, tty2:
 using G0, currently set to latin1) so I can only judge from what I see.

  Indeed, if there is a way I'm not aware of it.

 I guess the keymap will need change to make the key-hitting work the
 way it should be, but what do I check for the vice-versa 0e/0f signal?
 
 Thanks for your comments.

  To summarize, remember that ^N and ^O, like most other codes, change the
state of the terminal when they are output to it, not when they are input
from it.

-- 
Vasilis Vasaitis
[EMAIL PROTECTED]

Don't do drugs. Santa Claus is watching.
-- winamp.com


--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/