Re: patch: Enabling utf-8 hangul input.

Bram Moolenaar Sun, 29 May 2011 03:33:26 -0700

Shawn Y.H. Kim wrote:

> >> In response to the following comment made by Bram on Aug 2, 2007:
> >> (can be viewed at 
> >> http://groups.google.com/group/vim_dev/browse_thread/thread/3b73a504c77ba803/)
> >> 
> >>> I hesitate removing the Hangul support without knowing for sure that it
> >>> is not needed.  Browsing through the messages I do see remarks that it
> >>> might still be useful to a few people.
> >>> 
> >>> Perhaps the Hangul support can be changed to also work for UTF-8?
> >> 
> >> I made (finally) a patch that enables hangul-input module to work for
> >> UTF-8.
> > 
> > Thanks.  I'm glad to finally see this implemented.
> > It still needs some work though.
> > 
> >> Finally, hg diff:
> >> ... It is too long. But I cannot find a way to attach a file, so, here
> >> goes the diff:
> > 
> > Please do send this as an attachment.  Long lines got wrapped, making it
> > impossible to apply.
> > 
> > The change to getchar.c should not be there.  Perhaps you are not
> > encoding the strings that go into the input buffer correctly?  A CSI
> > should be put there as three characters: CSI KS_EXTRA KE_CSI.
> > I guess fix_input_buffer() can be used in push_raw_key().
> 
> 1. I took a look into fix_input_buffer() and used it to "fix" hangul input 
> buffer.
> But fix_input_buffer() function did not do anything.
> It escapes CSI into K_SPECIAL KS_EXTRA KE_CSI sequence 
> only when the first byte of the input buffer is CSI.
> But the hangul codes in question have 0x9b in the middle or at the end,
> e.g) EB A0 9B.
> The function does not have any chance to "fix" the buffer.


I think that when CSI appears halfway a utf-8 byte sequence it doesn't
need to be escaped.  That only happens when it's at the start of a
character, it needs to be escaped to avoid it being interpreted as a
special key byte sequence.

> 2. 0x9b in hangul codes is valid code. I encoded the strings correctly. 
> 0x9b(CSI) is part of utf-8 encoded hangul code.

The encoding in the input buffer is a bit weird, it includes special
byte sequences, and then what the user types has to be escaped to avoid
that byte sequence being handled in the wrong way.

> 3. Question: I guest that the CSI is some kind of special character that
> indicates subsequent characters have some special meaning, right? Then,
> in gui mode, in what case a user can generate CSI code?
> If I knew what does the CSI do and when the CSI is generated, it would be
> much easier for me to do the job.

In the GUI it's a bit different, we don't read raw bytes from what the
user types, but create a byte stream from events.  E.g. in
src/gui_gtk_x11.c in key_press_event().

> Now I'm working on the advices you made before :-)
> As soon as you shed some light on the secret of CSI, I will work on it.
> 
> Looking forward to your kind advice.

I hope this helps.

-- 
hundred-and-one symptoms of being an internet addict:
120. You ask a friend, "What's that big shiny thing?" He says, "It's the sun."

 /// Bram Moolenaar -- [email protected] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\  an exciting new programming language -- http://www.Zimbu.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Re: patch: Enabling utf-8 hangul input.

Raspunde prin e-mail lui