Re: matching base character

2006-07-13 Thread Danilo Segan
On Monday at 10:58, Pádraig Brady wrote: > How does one compare characters using only primary weights? > > For example I would like "a á â" to be equal according to the locale. > I think that strcoll will treat them as equal using primary weight, > but then iterate again to compare with secondary

Re: Experiments with classical Greek keyboard input

2006-05-10 Thread Danilo Segan
Hi Jan, Today at 13:02, Jan Willem Stumpel wrote: >> key.type = "THREE_LEVEL"; >> >> key {[], [ dead_tilde, dead_diaeresis, dead_macron ]}; >> key {[], [ dead_iota, VoidSymbol, dead_breve ]}; >> >> key {[], [ dead_acute, dead_horn ]}; >> key {[], [ dead_grave, dead_ogonek

Re: Experiments with classical Greek keyboard input

2006-05-10 Thread Danilo Segan
Today at 1:00, Vasilis Vasaitis wrote: > For the others to work, you need to have at least > LC_CTYPE=el_GR.UTF-8. In my system, with LANG=el_GR.UTF-8, everything > is working as it should. Keep in mind that for GTK+ applications you > also need GTK_IM_MODULE=xim defined (or else you have to rig

Re: question on Linux UTF8 support

2006-02-01 Thread Danilo Segan
Yesterday at 15:42, 問答無用 wrote: > You can prevent just by only having UTF-8 locales on the machine. GNU systems allow users to install their own locales wherever they wish (even in $HOME) by setting environment variable LOCPATH (and I18NPATH for locale source files themselves). Basically, you wa

Re: question on Linux UTF8 support

2005-08-06 Thread Danilo Segan
Last Wednesday at 20:36, Bruno Haible wrote: >> Or even >> worse, what if administrator provides some dirs for the user in an >> encoding different from the one user wants to use? >> >> Eg. imagine having a global "/Müsik" in ISO-8859-1, and user desires >> to use UTF-8 or ISO-8859-5. > > For this

Re: question on Linux UTF8 support

2005-08-03 Thread Danilo Segan
Hi Bruno, Today at 17:24, Bruno Haible wrote: > This will mess up users who have their LC_CTYPE set to a non-UTF-8 encoding. > It is weird if a user, in an application, enters a new file name "Süß", > and then in a terminal, the filename appears as "Süà " (wow, it even > hangs my xterm!). Oh, i

Re: question on Linux UTF8 support

2005-07-31 Thread Danilo Segan
Yesterday at 13:07, praveen kumar sivapuram wrote: > I am developing an application, which needs to know the format of > the file name. Based on the documents referenced on the web, i > understood that Linux file system does not impose any specific > format for the file system. Users can create fi

Re: How to detect the encoding of a string?

2005-06-02 Thread Danilo Segan
Hi Simos, It's completely impossible to detect which of the 8-bit encodings is used without any further knowledge (for instance, of the language in use). To be able to actually decide for one of the many 8-bit encodings suitable for a language, one would also need to know language properties (

Re: linux and utf-8

2005-02-05 Thread Danilo Segan
Please, don't try to start flames. Thank you. Today at 15:24, srintuar wrote: > Try fedora core 3, its the only way to fly. > > Stay away from emacs. (Ive ben editing utf-8 text in > vim for several years now, no problem) Use gnome > terminal as your shell. (stay away from the linux > console i

Re: Weird behaviour of emacs

2005-02-04 Thread Danilo Segan
Today at 12:08, David Sumbler wrote: > If I run Emacs (version 21.3) under X, (a) display correctly, but (b) > appear as hollow boxes. On the other hand, if I run Emacs, as I > usually do, on a console (unicode enabled), (b) display correctly, and > (a) appear as the "replacement character" 0xfff

Re: Unicode and the Linux console (again)

2005-01-19 Thread Danilo Segan
Hi Richard, Today at 13:11, Richard Jones wrote: > On Fri, Jan 14, 2005 at 02:58:27PM -0500, Behdad Esfahbod wrote: >> "A japanese system administrator should be able to read the error >> messages in japanese." > > Then the Japanese administrator doesn't understand the error message, > so they cu

Re: Unicode and the Linux console (again)

2005-01-15 Thread Danilo Segan
Yesterday at 22:24, Pablo Saratxaga wrote: > Is there some fallback very raw mechanism in case that "console" fails > to load (at least to be able to say "failed to load console server" > (I think "server" is the used terminology in the Hurd)? Yes, there's a built-in raw console in GNU Mach I thi

Re: Unicode and the Linux console (again)

2005-01-15 Thread Danilo Segan
Today at 2:12, Simon wrote: > However, how does this user-space software for console look like? > What exactly should it do? What's to remove/modify from the kernel? > Any proof-of-concept code that can show the validity of direction? As I said already, look at console-server and console-client i

Re: Unicode and the Linux console (again)

2005-01-14 Thread Danilo Segan
Today at 20:58, Behdad Esfahbod wrote: > I very like to agree with you on that implementing Unicode in > kernel is not feasable at all, but I much prefer to see the > console is refactored and implemented as a user-space application > instead, and then supports all good stuff. The reason for that

Re: Where did my UTF encodings go?

2005-01-06 Thread Danilo Segan
Hi Peter, Today at 18:25, Peter B. Steiger wrote: > Can anyone suggest a runtime or build configuration option I can change > to bring Unicode back? I don't need keyboard support, just font > displays in X. I can send anything you need to see out of my /etc/X11 > or other /etc files (except gro

Re: Unicode: endpoint of evolution of encodings?

2004-11-19 Thread Danilo Segan
Today at 12:52, Pablo Saratxaga wrote: >> encodings). I want to type "letters", and display it using any of >> the scripts simply by changing a font. I'm native Serbian, and most >> native Serbian speakers tend to think of it as a display property (you > > Do they? > Non-native names are written

Re: Unicode: endpoint of evolution of encodings?

2004-11-18 Thread Danilo Segan
Hi Pablo, Today at 23:17, Pablo Saratxaga wrote: > It is indeed a good feature to do so; > but the *smallest* unit for which language information is usefull > are *words*, not characters/letters. Indeed. But how do you achieve that? It's easiest to have characters hold language information. O

Re: questions with combining characters

2004-11-18 Thread Danilo Segan
Hi Henry, Yesterday at 15:21, Henry Spencer wrote: > Do you say "a-acute" or "acute-a"? Imagine asking a German: Do you say twenty-one or one-and-twenty? :) Ein-und-zwanzig, natÃrlich. :) Not arguing here on any side, just pointing out that there're always exceptions :) Cheers, Danilo --

Re: Unicode: endpoint of evolution of encodings?

2004-11-18 Thread Danilo Segan
Hi Antoine, Yesterday at 13:37, Antoine Leca wrote: > srintuar wrote: >> FWIW, I'd assert that "j" in Spanish is not the same thing as >> "j" in English (and that one is easily proved), apart from them being >> represented with the same *glyph*. > > You picked (certainly involuntarily) a very ins

Re: Unicode: endpoint of evolution of encodings?

2004-11-17 Thread Danilo Segan
Hi, Please don't use HTML mail, I have problems reading it, and it messes up encoding for me (since I have to use sort of "view source"). Today at 1:31, srintuar wrote: > This may be more of a practical issue: for some scripts such as Korean, > representing every possible character and partial c

Unicode: endpoint of evolution of encodings? (was Re: gcc and utf-8 source)

2004-11-16 Thread Danilo Segan
Hi, Today at 13:44, srintuar wrote: > As for serbian, I dont think that really has much to do with unicode > itself. You could apply a special folding algorithm when doing > searches in a serbian context, but I dont think you would want to make > the script ambiguous. I'd rather make script amb

Re: gcc and utf-8 source

2004-11-15 Thread Danilo Segan
Today at 13:03, Bruno Haible wrote: > srintuar wrote: >> > 1) For printf("%s\n", "SchÃne GrÃÃe"); >> ... >> Being that UTF-8 is sortof an an endpoint in the evolution of encodings, >> I also consider option 1 to be perfectly valid. > > I would be careful with such statements. We don't know what

Re: gcc and utf-8 source

2004-11-15 Thread Danilo Segan
Today at 13:09, Egmont Koblinger wrote: > I agree with you, and though I haven't thoroughly read the manpage, I'm > pretty sure that gcc does this. gcc is, as far as I see, the one and only > gnu project that is maintained correctly and the developers know where > they're going, they have systemat

Re: Unicode Font Guide For Free/Libre Open Source Operating Systems Announcement

2004-11-05 Thread Danilo Segan
Hi Edward, Yesterday at 20:55, Edward H. Trager wrote: > I hope to maintain and expand this resource in the future so that > it can serve as a hub for those searching for high-quality, legally-downloadable > fonts for various scripts for use on their Open Source-based computers. You can find Ver

Re: Unicode Keyboard Input Linux

2004-06-15 Thread Danilo Segan
Today at 20:13, Elvis Presley wrote: > > I haven't been able to determine exactly what vmware does from their website, > too proprietary, too hush-hush, but I assume they write VxDs which map the > Linux kernel to the Windows VMM, and the real hardware. Someone once told me > their product ran on t

Re: grep is horriby slow in UTF-8 locales

2003-11-08 Thread Danilo Segan
Markus Kuhn <[EMAIL PROTECTED]> writes: > $ grep --version > grep (GNU grep) 2.5.1 This doesn't happen with: $ grep --version grep (GNU grep) 2.4.2 $ LC_ALL=POSIX time grep XYZ test.txt Command exited with non-zero status 1 0.03user 0.07system 0:00.36elapsed 27%CPU (0avgtext+0avgdata 0maxreside

Re: Linux console internationalization

2003-08-14 Thread Danilo Segan
уторак, 12. август 2003. 19:22:23 CEST — Beni Cherniavsky написа: - This could be approximated at a low level by not remapping keys with modifies and I'm gonna do precisely this when I learn XKB). I've tried to do something along this lines (see srpski.org/dunav, it's a Serb

Re: [Translation-i18n] Proposal for declinations in gettext

2003-06-15 Thread Danilo Segan
Bruno Haible wrote: Danilo Segan wrote: The usual practice among english-speaking programmers is to "compose" strings out of smaller parts. You need to educate the programmer to use entire sentences. You can refer them to the gettext documentation, section "Preparing Trans

Re: Proposal for declinations in gettext

2003-06-14 Thread Danilo Segan
Hi, Edward H Trager wrote: Why not just use another index for plural forms instead of "msgid_plural"? msgid and msgid_plural are used for "default" strings (English) which are to be translated, and translations which can contain arbitrary number of plural-forms are ATM described as: msgstr[0

Re: Proposal for declinations in gettext

2003-06-14 Thread Danilo Segan
Miloslav Trmac wrote: Approach with [context] markers instead of format strings might work for many languages, but it wouldn't work for all -- actually, it would be wrong in some. So, I believe this kind of context information belongs in comments-to-translators, which xgettext also extracts wi

Re: Proposal for declinations in gettext

2003-06-13 Thread Danilo Segan
Miloslav Trmac wrote: Hello, On Fri, Jun 13, 2003 at 10:14:25PM +0200, Danilo Segan wrote: msgid "king" msgstr<0> "kralj" msgstr<3> "kralja" msgstr<5> "kraljem" msgid "move %s" msgstr "premesti %<3>s" ,

Re: Proposal for declinations in gettext

2003-06-13 Thread Danilo Segan
[EMAIL PROTECTED] wrote: Im curious, how you would implement this? if I call the following in some preexisting application: printf( gettext("Please move %s."), gettext("king") ); the order of operations is : gettext("king"); gettext("Please move %s."); printf( %s , %s ); So how would the first g

Re: [Translation-i18n] Proposal for declinations in gettext

2003-06-13 Thread Danilo Segan
Veronica Loell wrote: >Are you talking about machine translation here? From my perspective as a >computational linguist this is not something that should be part of gettext, rather >in the tools that use gettext or the tools you use to work with gettext material. > > Nope, I'm talking about ma

Proposal for declinations in gettext

2003-06-13 Thread Danilo Segan
Hi, first, sorry for cross-posting (some of you will receive multiple messages :-(). I'd like to propose a simple gettext extension which would work at least for Serbian, but I hope it would work for many other languages. *Background:* Serbian language has 7 declinations of a word (nouns, prono