Re: patch: Enabling utf-8 hangul input.

Bram Moolenaar Sun, 29 May 2011 09:31:51 -0700

Shawn Kim wrote:

> >>>> In response to the following comment made by Bram on Aug 2, 2007:
> >>>> (can be viewed at 
> >>>> http://groups.google.com/group/vim_dev/browse_thread/thread/3b73a504c77ba803/)
> >>>> 
> >>>>> I hesitate removing the Hangul support without knowing for sure that it
> >>>>> is not needed.  Browsing through the messages I do see remarks that it
> >>>>> might still be useful to a few people.
> >>>>> 
> >>>>> Perhaps the Hangul support can be changed to also work for UTF-8?
> >>>> 
> >>>> I made (finally) a patch that enables hangul-input module to work for
> >>>> UTF-8.
> >>> 
> >>> Thanks.  I'm glad to finally see this implemented.
> >>> It still needs some work though.
> >>> 
> >>>> Finally, hg diff:
> >>>> ... It is too long. But I cannot find a way to attach a file, so, here
> >>>> goes the diff:
> >>> 
> >>> Please do send this as an attachment.  Long lines got wrapped, making it
> >>> impossible to apply.
> >>> 
> >>> The change to getchar.c should not be there.  Perhaps you are not
> >>> encoding the strings that go into the input buffer correctly?  A CSI
> >>> should be put there as three characters: CSI KS_EXTRA KE_CSI.
> >>> I guess fix_input_buffer() can be used in push_raw_key().
> >> 
> >> 1. I took a look into fix_input_buffer() and used it to "fix" hangul input 
> >> buffer.
> >> But fix_input_buffer() function did not do anything.
> >> It escapes CSI into K_SPECIAL KS_EXTRA KE_CSI sequence 
> >> only when the first byte of the input buffer is CSI.
> >> But the hangul codes in question have 0x9b in the middle or at the end,
> >> e.g) EB A0 9B.
> >> The function does not have any chance to "fix" the buffer.
> > 
> > I think that when CSI appears halfway a utf-8 byte sequence it doesn't
> > need to be escaped.  That only happens when it's at the start of a
> > character, it needs to be escaped to avoid it being interpreted as a
> > special key byte sequence.
> 
> Yes, I also believe the 0x9b in the middle of an encoded byte 
> does not need to be escaped. It's part of valid code.


I was wrong, it does need to be escaped.  But for the GUI this happens
early on, not in fix_input_buffer().  See key_press_event(), first use
of CSI.

> >> 2. 0x9b in hangul codes is valid code. I encoded the strings correctly. 
> >> 0x9b(CSI) is part of utf-8 encoded hangul code.
> > 
> > The encoding in the input buffer is a bit weird, it includes special
> > byte sequences, and then what the user types has to be escaped to avoid
> > that byte sequence being handled in the wrong way.
> > 
> >> 3. Question: I guest that the CSI is some kind of special character that
> >> indicates subsequent characters have some special meaning, right? Then,
> >> in gui mode, in what case a user can generate CSI code?
> >> If I knew what does the CSI do and when the CSI is generated, it would be
> >> much easier for me to do the job.
> > 
> > In the GUI it's a bit different, we don't read raw bytes from what the
> > user types, but create a byte stream from events.  E.g. in
> > src/gui_gtk_x11.c in key_press_event().
> 
> The hangul input automata is initiated from THAT routine.
> Following is the callstack when hangul input automata is being in action:
> 
> src/gui_gtk_x11.c: key_press_event()
>  --> src/ui.c: add_to_input_buffer()
>  --> src/hangulin.c: hangul_input_process() (the automata)
> 
> or 
> 
> src/gui_x11.c: gui_x11_key_hit_cb()
>  --> src/ui.c: add_to_input_buffer()
>  --> src/hangulin.c: hangul_input_process() (the automata)
> 
> The hangul_input_process() creates hangul code from what user has
> typed in.  And then it puts the hangul code in "inbuf" buffer by
> calling push_raw_key().
> 
> And then somewhere in the way, the "inbuf" is processd by vgetc() in
> src/getchar.c.  The function finds out that the 0x9b(CSI) is in the
> middle of the code, and the routine I commented out (src/getchar.c:
> vgetc()) interprets the 0x9b as a special code, and modifies "inbuf",
> where it should not be interpreted as a special key, but be preserved
> as they are.
> 
> Am I missing something? 
> And, what should I do to avoid interpreting 0x9b as CSI?
> 
> Please consider that hangul input routine is meaningful only when
> MULTIBYTE and GUI option is enabled.

You need to do the same thing as what happens in the loop in
key_press_event() to escape the CSI characters.

Also see the comment above add_to_input_buf().

-- 
hundred-and-one symptoms of being an internet addict:
121. You ask for e-mail adresses instead of telephone numbers.

 /// Bram Moolenaar -- [email protected] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\  an exciting new programming language -- http://www.Zimbu.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Re: patch: Enabling utf-8 hangul input.

Raspunde prin e-mail lui