Re: Input Method Development

2008-02-08 Thread Jeremiah Flerchinger


this uim has an embedded scheme interpreter... I don't like that too 
much for an embedded device...

hmm, I have no idea: is it big, slow?

It's apparently used on Linux Zaurus.


We could adapt the openmoko soft keyboard to interface with uim, and 
if the API is well designed, the IM module could be changed...
I'm not sure adapting to a soft keyboard would be required.  It may 
seize key presses  emit appropriate utf-8 key values.  Try installing 
it on your desktop  trying it with a few soft keyboards.
could someone update me on the differences between these kr input 
methods described in the doc?


* Byeoru
* Hangul (2-beol)
* Hangul (3-beol)
* Hangul (Romaja)

I have no clue. Were you intending this for the mailing list? I'm 
assuming so, but only saw this addressed to myself.


Yeah, I know the patents problem with T9. But what about this one?
What one? uim is open-source, so there aren't patent issues (if that's 
your question).
___
OpenMoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: Input Method Development

2008-02-07 Thread Sébastien Lorquet
(keeping [EMAIL PROTECTED] and communitylist in the loop)

Hehe,
This is totally understandable. :) We will explain to you the best as we
can.

If you want to make a korean keyboard with a key for each letter then you'll
need a keyboard with...

(sit down before next line)

 ... 11,172 letters, ie from \uAC00 to \uD7A3 + 24 single letters or jamos
!

see here:
http://en.wikipedia.org/wiki/Hangul
http://en.wikipedia.org/wiki/Hangul#Syllabic_blocks

Of course there is a special method.
The korean keyboard layout is a single qwerty, each key with a jamo
(single letters).

The 11172 syllabes are made of combined jamos , and a state machine can be
used to track which keys have been pressed, and how they have to be
combined. See my previous messages for an explanation.
However when I searched the Internet for my use, I did not find a lot of
information on this, so I did the research.

About you soft keyboard, I think the problem can be solved with the use of
plugins, ie a method that takes keystrokes and outputs characters. For
European scripts, the routine will be quite trivial, but for other scripts,
this is needed. With such plugins, we can also imagine support for other
complex scripts such as chinese or indian or whatever. An input method will
then be made of an XML file describing the layout, and a .so to hold the
key-character translation algorithm.

Sebastien

2008/2/7, Jeremiah Flerchinger [EMAIL PROTECTED]:

 Please forgive me for not putting much thought into the problem before.
 I'm really only familiar with European languages  none of this work is
 in my area of expertise.  I have started some reading on IME and other
 input methods.  I'll have to think about the issue and what extensions
 would eventually need to be made.

 Jeremiah Flerchinger wrote:
  The idea behind what I started is that you can make a soft keyboard
  with any number of rows  keys.  Each key could be assigned any
  arbitrary utf-8 value.  If more symbols are needed a file can be
  created describing another keyboard layout with additional keys.
  Maybe it's impractical to have a key for each character  place them
  all into different keyboard layouts.  Any insight to the problem and
  how keyboards are typically used/configured in Korea would be
  appreciated.  Even though I would be unable to do localization, I
  could try to rethink the current design to further meet international
  needs.
 
  Sébastien Lorquet wrote:
  Hi,
 
  nicely done, this program is complementary with what I did!
 
  But a Korean IME engine will require additions to your program, it's
  not only a matter of xml layout.
  Maybe a plugin system that receives key presses and emits characters.
  Basically it's a state machine that will read latin letters and
  output korean letters according to combinations. I have a small
  keypress stack and a ~60KB next states file. for example, you
  type b a and you get a ba syllabe as a single Unicode character.
 
  Not to mention we will need to add a korean font in the neo.
 
  I'm happy with this project. I'm OK to help, if needed.
 
  Sebastien
 




-- 
Sébastien LORQUET - 이세영 (李世榮)
Ingénieur ENSPG 2006 / ENSIMAG-ASI 2007
___
OpenMoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: Input Method Development

2008-02-07 Thread Sébastien Lorquet
Sorry for the imprecision, and thanks for the correction. I spoke too fast.

The backtracking should be trivial to implement with my prototype. I'm
cleaning it a little before giving it to you.

I'm also concerned with copyright. Is there any patent problem with this
input method as with T9 (for mobiles) ?

What I made is reverse engineering. I plan to release it under GPLv3 to be
protected from software patents, will this be useful?
What do you think?

Sebastien.
___
OpenMoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: Input Method Development

2008-02-07 Thread joerg
Am Do  7. Februar 2008 schrieb dda:
 I agree that a system of plugins/callbacks could do fine, if it can
 handle resetting output: eg typing gks bkspc f would output
 successively:
 ㅎ - 하 - 한 - 하 - 할 [Unicode 0x1112, 0xd558, 0xd55c, 0xd558,
 0xd560]. Being able to backtrack is quite necessary in this case.
 

The exact behaviour of BS, DEL, and cursor keys is yet to be defined to 
make this a complete spec. What happens when i press BS e.g. 8 times? What, 
when i press LEFT DEL END BS?
Obviously BS should edit last sylable, but can you edit completed sylables 
(needs back-translation unicode-keystrokes!), and how does cursor movement 
affect closing sylables?

just my 2 cents
j

___
OpenMoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: Input Method Development

2008-02-07 Thread Sébastien Lorquet
for what I recall, backspace starts by removing *letters* from the last
syllable, then entire syllabes.
Moreover, hitting a directional key like LEFT RIGHT or any key that
changes the caret position terminates composition of the current syllabe.

In fact, when you type, the currently composed syllabe appears in blinking
inverted video, and a SPACE or the end of possible composition (ie, no
more letters can be combined into a valid syllabe) terminates the sequence.
At this point, the real syllabe is displayed normally as a unicode character
and composition starts again. I noticed that in some editors where this
blinking character cannot appear (e.g. in a cmd.exe console), the current
combination is displayed in the corner of the current window in a small
frame.

Writing a complete spec is a good idea ;)

seb
___
OpenMoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: Input Method Development

2008-02-06 Thread Sébastien Lorquet
Hi,

I have absolutely no idea on how to develop IM methods on OpenMoko, but one
day I tried to implement a hangul IME for Windows CE. It's based on the one
from windows XP IME2002, ie type tpqktmxldkd to get 세바스티앙. The UI does
not work yet, but the internal state machine is OK. It's written in C. I
don't know if you worked on this part but maybe it can help... (and what
about copyright issues? :( )

Sebastien
___
OpenMoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Input Method Development

2008-02-06 Thread dda
I just received my Neo1973, and since I require Korean input, I'd like
to do it myself... I'd like to do something similar to the soft qwerty
keyboard -- except that I'd use Korean letters instead of Latin ones.
I've poked around the wiki, but I haven't found any good pointers on
creating and adding input methods.

Any pointers would be greatly appreciated. Thanks

-- 
Didier

___
OpenMoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: Input Method Development

2008-02-06 Thread Jeremiah Flerchinger
I've done a little work on a soft querty keyboard written with GTK.  It 
is reconfigurable, by use of an xml config file, and I was playing with 
support for UTF-8.  I haven't had time in the last couple weeks to work 
on it because of my work  school schedule.


A slightly out of date version is available in a tar.gz
http://projects.openmoko.org/projects/gtkeypad/

Feel free to look at it, create a Korean xml config file, and add to the 
source.  I have slightly newer code on my computer that I need to post.  
I also have to get everything into the CVS after Alessandro lurlano told 
me how to correct an authentication issue.


Jeremiah Flerchinger


dda wrote:

I just received my Neo1973, and since I require Korean input, I'd like
to do it myself... I'd like to do something similar to the soft qwerty
keyboard -- except that I'd use Korean letters instead of Latin ones.
I've poked around the wiki, but I haven't found any good pointers on
creating and adding input methods.

Any pointers would be greatly appreciated. Thanks

  


___
OpenMoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: Input Method Development

2008-02-06 Thread dda
I'll have a look, thanks for that. Including Korean into an xml file
won't be easy as the Korean script is a little more complex to
implement than the Latin one. Let's take Sébastien's example:
tpqktmxldkd to get 세바스티앙.

When you type tp ypu get 세 but when you add q [tpq] you get 셉, not
세ㅂ. Only if and when you type k [tpqk] does the letter ㅂ goes away
from 셉 to form a syllable with k/ㅏ. Basycally you have the following:
t -- ㅅ
tp - 세
tpq - 셉
tpqk - 세바
tpqkt - 세밧
etc...
And it can get trickier.

Also, although a qwerty kbd might be a start [as the Korean keyboard
is overlaid on top of qwerty anyway], I was also thinking of a smaller
keyboard, àla mobile phone kbd -- some of the keyboards available on
Korean phones are quite astute, and require fewer keys.

Anyway, I'll have a look and see what can be done.

Cheers,

-- 
dda


On Feb 6, 2008 11:27 PM, Jeremiah Flerchinger
[EMAIL PROTECTED] wrote:
 I've done a little work on a soft qwerty keyboard written with GTK.  It
 is reconfigurable, by use of an xml config file, and I was playing with
 support for UTF-8.  I haven't had time in the last couple weeks to work
 on it because of my work  school schedule.

 A slightly out of date version is available in a tar.gz
 http://projects.openmoko.org/projects/gtkeypad/

 Feel free to look at it, create a Korean xml config file, and add to the
 source.  I have slightly newer code on my computer that I need to post.
 I also have to get everything into the CVS after Alessandro lurlano told
 me how to correct an authentication issue.

 Jeremiah Flerchinger



 dda wrote:
  I just received my Neo1973, and since I require Korean input, I'd like
  to do it myself... I'd like to do something similar to the soft qwerty
  keyboard -- except that I'd use Korean letters instead of Latin ones.
  I've poked around the wiki, but I haven't found any good pointers on
  creating and adding input methods.
 
  Any pointers would be greatly appreciated. Thanks
 
 



___
OpenMoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: Input Method Development

2008-02-06 Thread Sébastien Lorquet
First time I will publish a bit of my personal work... wow! :)

Find here my notes about implementing the algorithm to turn jamos to han
syllabes using IME 2002, reverse engineered from a number of keystokes ;)

consider it's under CC-BY-SA, and not totally public domain as i've spent
a bit of time on it ;)

-8---
notes about understanding IME2002 algorithm
Sébastien Lorquet, July 2006
Published under CC-BY-SA
===
Now I have

(22:14)
[EMAIL PROTECTED] core]# make run

[EMAIL PROTECTED] core]# make run
./core ../../data/hantree.dat
File size is 58226 bytes , mapped @ 0xb7f25000
Init ok, please type chars, exit to quit

d
stack len=1, contents=[d]
currnode=0 , 33 childs
searching char [d]...qQwWeErRtTyuioOpPasd-found
state=3147 nxtoff=36828 out=
temp disp: [3147] ㅇ

h
you typed h, len is 1
stack len=2, contents=[dh]
currnode=36828 , 14 childs
searching char [h]...yuioOpPh-found
state=C624 nxtoff=37890 out=
temp disp: [C624] 오

t
you typed t, len is 1
stack len=3, contents=[dht]
currnode=37890 , 19 childs
searching char [t]...qwerRt-found
state=C637 nxtoff=0 out=C637
temp disp: [C637] 옷
output: [C637] 옷

h
you typed h, len is 1
stack len=4, contents=[dhth]
currnode=0 , 33 childs
searching char [h]...qQwWeErRtTyuioOpPasdfgh-found
state=3157 nxtoff=0 out=3157
temp disp: [3157] ㅗ
output: [3157] ㅗ



here is the main bug of my basic method: i'm not using the stack correctly.
a single vowel is not allowed. I should have cut the syllabe before.

What to do


PUSH d- [d]
- sequence found, possible output ㅇ

PUSH h- [dh]
- sequence found, possible output 오

PUSH t- [dht]
- sequence found, possible output 옷

PUSH h- [dhth]
- sequence NOT FOUND
- last (ㅗ) is a vowel, cant be alone.
- output 오
- keep th in the stack
- [th]
- sequence found, possible output 소

PUSH f- [thf]
- sequence found, possible output 솔

PUSH d- [thfd]
- sequence NOT FOUND
- last (ㅇ) is NOT a vowel, CAN be alone.
- output 솔
- keep d in the stack
- [d]
- sequence found, possible output ㅇ

PUSH l- [dl]
- sequence found, possible output 이

PUSH v- [dlv]
- sequence found, possible output 잎

etc...

YAY IT'S OK
eat sth then implement that

here is the result (00:16)

[EMAIL PROTECTED] core]# make run
cc -I../../include   -c -o himecore.o himecore.c
cc himecore.o posix_main.o -o core
./core ../../data/hantree.dat
File size is 58226 bytes, mapped at 0xb7f61000
Init ok, please type chars, exit to quit
d
stack len=1, contents=[d]
sequence found
current state:ㅇ

h
you typed h, len is 1
stack len=2, contents=[dh]
sequence found
current state:오

t
you typed t, len is 1
stack len=3, contents=[dht]
sequence found
current state:옷

h
you typed h, len is 1
stack len=4, contents=[dhth]
last entered char is h
This is a vowel
2 chars remaining on stack
new stack:
stack len=2, contents=[th]
current state:소
output: 오
f
you typed f, len is 1
stack len=3, contents=[thf]
sequence found
current state:솔

d
you typed d, len is 1
stack len=4, contents=[thfd]
last entered char is d
This is not a vowel
1 chars remaining on stack
new stack:
stack len=1, contents=[d]
current state:ㅇ
output: 솔
l
you typed l, len is 1
stack len=2, contents=[dl]
sequence found
current state:이

v
you typed v, len is 1
stack len=3, contents=[dlv]
sequence found
current state:잎



8---
___
OpenMoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: Input Method Development

2008-02-06 Thread Sébastien Lorquet
some more notes:
- this is exactly the behaviour of IME2002 engine, at least what can be
*observed* on a korean windows xp computer
- as you can see, there is a state that should be displayed somewhere, and
a output that is not generated regularly.
- for the curious, I tested the algorithm on my girlfriend's name, whose
syllabes are perfect for a last-vowel test :)

Seb
___
OpenMoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: Input Method Development

2008-02-06 Thread Sébastien Lorquet
(sorry for the multiple messages)

I am also highly interested by a mobile phone-like korean typepad, but I've
never seen one, and I don't know how it works.

My idea is to have some common code, and a set of frontends that could be
ported either to OpenMoko, Windows Mobile, and possibly others.

Seb
___
OpenMoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community