New release of Enscribi - version 0.2.0

2009-05-26 Thread Olof Sjobergh
Hi,

Enscribi, the handwriting recognition input method for Japanese and
Chinese, has gotten bumped to version 0.2.0.

What's new in this version? Not much, but at least there's now:

* The Zinnia recognizer has been moved to its own process, so the GUI
doesn't lock up when doing the recognition.

* Color coding of different alphabets, so it's easier to find the
character you're after (mostly useful for Japanese, where hiragana,
katakana and kanji sometimes are very similar and hard to separate).

A precompiled package for current SHR unstable is hosted at
http://www.opkg.org/package_133.html. This should fix the problem with
the old package that didn't work with the new EFL version.

For more details about installation and usage, please see
http://olofsj.github.com/enscribi/

Best regards,

Olof Sjöbergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: First release of Enscribi - handwriting recognition input method

2009-05-14 Thread Olof Sjobergh
On Fri, May 15, 2009 at 5:44 AM, jeremy jozwik  wrote:
> so after searching for a bit i am failing to find a link for
> libecore_evas.so.0 as an ipk.
> closest mention is this:
> http://lists.openmoko.org/nabble.html#nabble-td2587826i60
>
> but i find no link. any ideas out there?

I think it should be the package named ecore-evas. I'm on a business
trip, so I don't have access to my freerunner and can't confirm this,
but now that I think about it the efl libraries changed version
recently. So I think Enscribi has to be recompiled for this version.
I'll take a look at it and will put together a new package when I get
home (will probably have time sometime early next week).

Regards,

Olof

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: First release of Enscribi - handwriting recognition input method

2009-05-14 Thread Olof Sjobergh
On Thu, May 14, 2009 at 6:00 PM, jeremy jozwik  wrote:
> shr-testing 20090502:
> "enlightenment was unable to run the application enscribi the
> application failed to start"
>
> Zinnia and Zinnia-tomoe-zh installed
>

Could you try to run it from the command line? There might be some
relevant error message there.

Open up the terminal, run "enscribi" and post any output here.

Regards,

Olof

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: First release of Enscribi - handwriting recognition input method

2009-05-13 Thread Olof Sjobergh
On Wed, May 13, 2009 at 12:38 PM, Russell Hay  wrote:
> Hi Olof,  any chance of having english support added to this?
>
> Otherwise (as a non-hacker) can I contribute anything that'd support you in
> adding english support?

Hi,

There's nothing stopping adding english support. I thought about
adding it myself. The only thing you need is stroke data for all the
letters, which are not so many for English. Then it would be possible
to make a new theme for English input as well (the letters are not as
large and complicated as for Japanese and Chinese, so you can get away
with a smaller drawing area for each letter).

So what's needed is stroke data for all the letters. I'm currently
experimenting with writing a stroke editor so it's easy to add new
characters. When this is done it would be quite simple to add new
characters. You'd just have to draw all the letters.

I'm currently on a business trip and wan't be able to work in it for
now, but when it's ready I'll post info about here on the mailing
list. At best, I'd say a few weeks from now, but that depends on how
much free time I get.

Regards,

Olof

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] Packaging extra fonts

2009-04-06 Thread Olof Sjobergh
On Mon, Apr 6, 2009 at 10:56 AM, Pander  wrote:
> Hi all,
>
> I need some extra fonts to display kanji. I have manged to copy some
> .ttf files manually and that works. Now, I would like to package these
> fonts in an .opkg file. do I need to call some executables to properly
> register the ttf files or is putting the files in the correct directory
> suffiecient?

Hi,

If you want Japanese fonts, the following two fonts are already
available and packaged for SHR:

ttf-sazanami-mincho
ttf-sazanami-gothic

You can just install them with opkg.

Best regards,

Olof Sjöbergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [Om2008.x] Wanted StarDict for my Openmoko

2009-03-02 Thread Olof Sjobergh
On Sun, Mar 1, 2009 at 8:51 AM, Matthias Apitz  wrote:
>
> It would be _extremely_ useful for me to have it as well on my FR; Is
> someone working on a port of this? Thx

Hi,

I'm not working on a port of this, but I'm working on a dictionary
program of my own. There's still some stuff to do, so I haven't made
any packages yet. I'd recommend you wait until it's a bit more stable,
but if you feel like checking it out, it's hosted on Github at
http://github.com/olofsj/elexika.

I never really liked the way most dictionary programs present the
results, so I'm trying to write it the way I'd like it. It's written
in Elementary and the dictionary backend is in C for speed. One goal
is for it to be simple to add new dictionaries, you just specify in a
textfile how the dictionary is formatted and how the results should be
presented. However, it doesn't (and probably won't) support fuzzy
matching and such like Stardict.

Hope you'll like it when it's released. =)

Best regards,

Olof Sjöbergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [Om2008.x] Terminal with UTF-8 support wanted

2009-03-02 Thread Olof Sjobergh
Hi,

On Mon, Mar 2, 2009 at 10:43 AM, Matthias Apitz  wrote:
>
> Is there some way to get UTF-8 support? Thx
>

You need to install a UTF-8 locale. To see which locales you have
installed, run

locale -a

Unfortunately, I don't remember the package names for locales. But
with an UTF-8 locale installed, I know that at least vala-terminal
works and can display UTF-8 encoded text correctly.

Best regards,

Olof Sjöbergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] Accent and special characters in sms

2009-02-19 Thread Olof Sjobergh
Hi,

I thinks this is the same problem as I found with Enscribi. I posted a
patch for Ecore a few days ago, I attached here again if you want to
try it out. Hopefully Rasterman can add it to Ecore soon.

Best regards,

Olof Sjöbergh


On Thu, Feb 19, 2009 at 7:45 PM, Mark Müller
 wrote:
> There is a ticket #58 in shr trac which describes your issue
> (http://trac.shr-project.org/trac/ticket/58). I mentioned it at irc and
> added a comment to reopen the ticket, but there's been no reaction so far.
>
>
> ---
>
> Mark
>
>
> Gaël HERMET schrieb:
>
>> Hi community,
>>
>> I am using the last SHR unstable and I can't insert any accent or
>> special character in the sms app.
>>
>> It don't work with both azerty layout and qwerty layout with french
>> dictionary.
>>
>> If somebody know how to fix that, I don't know where I can search.
>>
>>
>> ---
>> Gaël HERMET
>>
>>
>> ___
>> Openmoko community mailing list
>> community@lists.openmoko.org
>> http://lists.openmoko.org/mailman/listinfo/community
>>
>>
>
>
> ___
> Openmoko community mailing list
> community@lists.openmoko.org
> http://lists.openmoko.org/mailman/listinfo/community
>
Index: ecore/src/lib/ecore_x/xlib/ecore_x_events.c
===
--- ecore/src/lib/ecore_x/xlib/ecore_x_events.c	(revision 39016)
+++ ecore/src/lib/ecore_x/xlib/ecore_x_events.c	(working copy)
@@ -1819,10 +1819,11 @@ _ecore_x_event_handle_client_message(XEvent *xeven
 }
 
 void
-_ecore_x_event_handle_mapping_notify(XEvent *xevent __UNUSED__)
+_ecore_x_event_handle_mapping_notify(XEvent *xevent)
 {
_ecore_x_last_event_mouse_move = 0;
-   /* FIXME: handle this event type */
+
+   XRefreshKeyboardMapping((XMappingEvent *)xevent);
 }
 
 void
___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: First release of Enscribi - handwriting recognition input method

2009-02-15 Thread Olof Sjobergh
I'm glad there's a lot of interest in this. =)

As for the poor results for Chinese characters, I suspect the
character data for Chinese is not perfect. Personally I don't know any
Chinese, so it's hard for me to check. However, for Japanese it works
quite well, but there are some characters that are missing and have to
be added.

The data for the characters are from the Tomoe project (another
handwriting recognition method), available at
http://tomoe.sourceforge.jp. They also have a stroke editor that can
be used to edit/add new characters.

Yesterday I found and fixed the problem with inputting in Edje entry
widgets. I sent the patch to the enlightenment devel list, but have
attached it here as well for anyone interested in testing it. Patching
and recompiling Ecore should make it possible to write in any program
using Elementary or Edje.

There's still a lot to improve, and any suggestions or patches are appreciated.

Best regards,

Olof Sjöbergh


ecore_x_event_mapping_notify.patch
Description: Binary data
___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: First release of Enscribi - handwriting recognition input method

2009-02-15 Thread Olof Sjobergh
Hi,

On Sun, Feb 15, 2009 at 8:26 AM, xiangfu  wrote:
> i just install "Enscribi"
> then:
> r...@om-gta02:~#enscribi
> Enscribi: _cb_move
>
> then nothing.
> the rootfs is FSO milestone 5
>

I should have explained better. After you install encsribi, first
click on the Illume top bar, then on the wrench in the upper left
corner. Then, click on "Keyboard". There you can choose Enscribi
instead of the default keyboard. Then Enscribi will show up whenever
the keyboard is shown.

Best regards,

Olof Sjöbergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: First release of Enscribi - handwriting recognition input method

2009-02-14 Thread Olof Sjobergh
On Sat, Feb 14, 2009 at 4:54 PM, HouYu Li  wrote:
> Hi, It's a nice start...
>
> XiangFu! Have you tried it? I just installed these packages on FSO latest
> stable. It does recognize input although not that precisely. But it does not
> able to input the Chinese character into the zhone message input area.
>
> My question is: Do we need extra configuration??
>

Hi,

Edje, that is used by zhone, does not support inputting multi-byte
characters, so inputting in Zhone won't work for now. This has to be
added to Edje, which I'll try to do.

Best regards,

Olof Sjöbergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


First release of Enscribi - handwriting recognition input method

2009-02-14 Thread Olof Sjobergh
Hi,

This is to announce the first release of Enscribi, a new handwriting
recognition input method I've been working on. The main focus, and the
only thing supported for now, is writing Japanese and Chinese
characters (and numbers, but numbers only won't get you far...). It
uses the excellent Zinnia recognition engine for the actual
recognition. If anyone is interested, please take a look.

There's a project page at http://olofsj.github.com/enscribi/ with
screenshots and some more information.

There are packages on opkg.org for trying it out (only tested on FSO
milestone 5). The following packages are available:
http://www.opkg.org/package_133.htmlEnscribi
http://www.opkg.org/package_130.htmlZinnia (required dependency)
http://www.opkg.org/package_131.htmlZinnia-tomoe-ja (for Japanese support)
http://www.opkg.org/package_132.htmlZinnia-tomoe-zh (for Chinese support)

Also, you need a Japanese or Chinese font to see the characters
(should be available in the usual repos).

The code is hosted on Github at http://github.com/olofsj/enscribi/tree/master

Best regards,

Olof Sjöbergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-30 Thread Olof Sjobergh
On Fri, Jan 30, 2009 at 8:12 PM, The Rasterman Carsten Haitzler
 wrote:
> On Fri, 30 Jan 2009 08:31:43 +0100 Olof Sjobergh  said:
>> But I think a dictionary format in plain utf8 that includes the
>> normalised words as well as any candidates to display would be the
>> best way. Then the dictionary itself could choose which characters to
>> normalise and which to leave as is. So for Swedish, you can leave å, ä
>> and ö as they are but normalise é, à etc. Searching would be as simple
>> as in your original implementation (no need to convert from multibyte
>> format).
>
> the problem is - the dict in utf8 means searching is slow as you do it in utf8
> space. the dict is mmaped() to save ram - if it wasnt it'd need to be 
> allocated
> in non-swappable ram (its a phone - it has no swap) and thus a few mb of your
> ram goes into the kbd dict at all times. by using mmap you leave it to the
> kernels paging system to figure it out.
>
> so as such a dict change will mean a non-ascii format in future for this
> reason. but there will then need to be a tool to generate such a file.

Searching in utf8 doesn't mean it has to be slow. Simple strcmp works
fine on multibyte utf8 strings as well, and should be as fast as the
dictionary was before adding multibyte to widechars conversions. But
if you have some other idea in mind, please don't let me disturb. =)

Best regards,

Olof Sjöbergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-29 Thread Olof Sjobergh
On Fri, Jan 30, 2009 at 4:25 AM, The Rasterman Carsten Haitzler
 wrote:
> On Thu, 29 Jan 2009 08:30:44 +0100 Olof Sjobergh  said:
>
>> On Wed, Jan 28, 2009 at 11:16 PM, The Rasterman Carsten Haitzler
>>  wrote:
>> > On Wed, 28 Jan 2009 18:59:32 +0100 "Marco Trevisan (Treviño)"
>> >  said:
>> >
>> >> Olof Sjobergh wrote:
>> >> > Unless I missed something big (which I hope I didn't, but I wouldn't
>> >> > be surprised if I did), this is not fixable with the current
>> >> > dictionary lookup design. Raster talked about redesigning the
>> >> > dictionary format, so I guess we have to wait until he gets around to
>> >> > it (or someone else does it).
>> >>
>> >> I think that too. Maybe using something like a "trie" [1] to archive the
>> >> words could help (both for words matching and for compressing the
>> >> dictionary).
>> >> Too hard?
>> >>
>> >> [1] http://en.wikipedia.org/wiki/Trie
>> >
>> > the problem here comes with having multiple displays for a single match.
>> > let me take japanese as an example (i hope you have the fonts to see this
>> > at least - though there is no need to understand beyond knowing that there
>> > are a lot of matches that are visibly different):
>> >
>> > sakana ->
>> >  さかな 茶菓な 肴 魚 サカナ 坂な 差かな 左かな 査かな 鎖かな 鎖かな
>> >
>> > unlike simple decimation of é -> e and ë -> e and è -> e etc. you need 1
>> > ascii input string matching one of MANY very different matches. the
>> > european case of
>> >
>> > vogel -> Vogel Vögel
>> >
>> > is a simplified version of the above. the reason i wanted "decimation to
>> > match a simple roman text (ascii) string is - that this is a pretty
>> > universal thing. thats how japanese, chinese and even some korean input
>> > methods work. it also works for european languages too. europeans are NOT
>> > used to the idea of a dictionary guessing/selecting system when they type -
>> > but the asians are. they are always typing and selecting. the smarts come
>> > with the dictionary system selecting the right one more often than not by
>> > default or the right selection you want being only 1 or 2 keystrokes away.
>> >
>> > i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as much
>> > as possible - so you can just type and it will work and offer the
>> > selections as it's trying to guess anyway - it can present the multiple
>> > accented versions too. this limits the need for special keyboards - doesn't
>> > obviate it, but allows more functionality out of the box. in the event
>> > users explicitly select an accented char - ie a non-ascii character, it
>> > should not "decimate". it should try match exactly that char.
>> >
>> > so if you add those keys and use them or flip to another key layout to
>> > select them - you get what you expect. but if i am to redo the dict - the
>> > api is very generic - just the internals and format need changing to be
>> > able to do the above. the cool bit is.. if i manage the above... it has
>> > almost solved asian languages too - and input methods... *IF* the vkbd is
>> > also able to talk to a complex input method (XIM/SCIM/UIM etc.) as
>> > keystroke faking wont let you type chinese characters... :) but in
>> > principle the dictionary and lookup scheme will work - its then just
>> > mechanics of sending the data to the app in a way it can use it.
>> >
>> > so back to the trie... the trie would only be useful for the ascii matching
>> > - i need something more complex. it just combines the data with the match
>> > tree (letters are inline). i need a match tree + lookup table to other
>> > matches to display - and possibly several match entries (all the matches to
>> > display also need to be in the tree pointing to a smaller match list).
>> >
>> > --
>> > - Codito, ergo sum - "I code, therefore I am" --
>> > The Rasterman (Carsten Haitzler)ras...@rasterman.com
>>
>> I think most problems could be solved by using a dictionary format
>> similar to what you describe above, i.e. something like:
>>
>> match : candidate1 candidate2; frequency
>> for example:
>> vogel : Vogel Vögel; 123
>>
>> That would mean you can search on the normalised word where simple
>> strcmp works

Re: [SHR] illume predictive keyboard is too slow

2009-01-29 Thread Olof Sjobergh
On Thu, Jan 29, 2009 at 10:18 AM, Michal Brzozowski  wrote:
> 2009/1/29 Olof Sjobergh 
>>
>> I think most problems could be solved by using a dictionary format
>> similar to what you describe above, i.e. something like:
>>
>> match : candidate1 candidate2; frequency
>> for example:
>> vogel : Vogel Vögel; 123
>>
>> That would mean you can search on the normalised word where simple
>> strcmp works fine and will be fast enough.
>
> This dictionary would have hundreds of millions of rows even if you take
> only reasonable user inputs. But what to do if the users inputs something
> that's not in the dictionary? Of course I'm assuming you want to correct
> typos, as it's doing now.
>
> vogel: Vogel, Vögel
> vigel: Vogel, Vögel
> vpgel: Vogel, Vögel
> wogel: Vogel, Vögel
> wigel: Vogel, Vögel
> vigem: Vogel, Vögel
> vigwl: Vogel, Vögel
> ...
> ...

It did not mean all possible misspellings should be included, only the
normalisation which removes accented chars etc. So for normal English,
there would be almost no extra size compared to now. The current way
of correcting typos by checking all combinations from neighbouring
keys would work just like today.

Best redards,

Olof Sjöbergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-28 Thread Olof Sjobergh
On Wed, Jan 28, 2009 at 11:16 PM, The Rasterman Carsten Haitzler
 wrote:
> On Wed, 28 Jan 2009 18:59:32 +0100 "Marco Trevisan (Treviño)" 
> said:
>
>> Olof Sjobergh wrote:
>> > Unless I missed something big (which I hope I didn't, but I wouldn't
>> > be surprised if I did), this is not fixable with the current
>> > dictionary lookup design. Raster talked about redesigning the
>> > dictionary format, so I guess we have to wait until he gets around to
>> > it (or someone else does it).
>>
>> I think that too. Maybe using something like a "trie" [1] to archive the
>> words could help (both for words matching and for compressing the
>> dictionary).
>> Too hard?
>>
>> [1] http://en.wikipedia.org/wiki/Trie
>
> the problem here comes with having multiple displays for a single match. let 
> me
> take japanese as an example (i hope you have the fonts to see this at least -
> though there is no need to understand beyond knowing that there are a lot of
> matches that are visibly different):
>
> sakana ->
>  さかな 茶菓な 肴 魚 サカナ 坂な 差かな 左かな 査かな 鎖かな 鎖かな
>
> unlike simple decimation of é -> e and ë -> e and è -> e etc. you need 1 ascii
> input string matching one of MANY very different matches. the european case of
>
> vogel -> Vogel Vögel
>
> is a simplified version of the above. the reason i wanted "decimation to match
> a simple roman text (ascii) string is - that this is a pretty universal thing.
> thats how japanese, chinese and even some korean input methods work. it also
> works for european languages too. europeans are NOT used to the idea of a
> dictionary guessing/selecting system when they type - but the asians are. they
> are always typing and selecting. the smarts come with the dictionary system
> selecting the right one more often than not by default or the right selection
> you want being only 1 or 2 keystrokes away.
>
> i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as much as
> possible - so you can just type and it will work and offer the selections as
> it's trying to guess anyway - it can present the multiple accented versions
> too. this limits the need for special keyboards - doesn't obviate it, but
> allows more functionality out of the box. in the event users explicitly select
> an accented char - ie a non-ascii character, it should not "decimate". it
> should try match exactly that char.
>
> so if you add those keys and use them or flip to another key layout to select
> them - you get what you expect. but if i am to redo the dict - the api is very
> generic - just the internals and format need changing to be able to do the
> above. the cool bit is.. if i manage the above... it has almost solved asian
> languages too - and input methods... *IF* the vkbd is also able to talk to a
> complex input method (XIM/SCIM/UIM etc.) as keystroke faking wont let you type
> chinese characters... :) but in principle the dictionary and lookup scheme 
> will
> work - its then just mechanics of sending the data to the app in a way it can
> use it.
>
> so back to the trie... the trie would only be useful for the ascii matching - 
> i
> need something more complex. it just combines the data with the match tree
> (letters are inline). i need a match tree + lookup table to other matches to
> display - and possibly several match entries (all the matches to display also
> need to be in the tree pointing to a smaller match list).
>
> --
> - Codito, ergo sum - "I code, therefore I am" --
> The Rasterman (Carsten Haitzler)ras...@rasterman.com

I think most problems could be solved by using a dictionary format
similar to what you describe above, i.e. something like:

match : candidate1 candidate2; frequency
for example:
vogel : Vogel Vögel; 123

That would mean you can search on the normalised word where simple
strcmp works fine and will be fast enough. To not make it too large
for example the following syntax could also be accepted:
eat; 512 // No candidates, just show the match as is
har här hår; 1234// Also show the match itself as a candidate

If you think this would be good enough, I could try to implement it.

Another problem with languages like Swedish, and also Japanese, is the
heavy use of conjugation. For example, in Japanese the verbs 食べる and
考える can both be conjugated in the same way like this:
食べる 食べました 食べた 食べている 食べていた 食べています 食べていました
考える 考えました 考えた 考えている 考えていた 考えています 考えていました

Another example, the Swedish nouns:
bil bilen bilar bilarna bilens bilarnas

But including all these forms in a dictionary makes it very large,
which is impractical. So some way to indicate possible conjugations
would be good, but it would make the dictionary format a lot more
complex.

Best regards,

Olof Sjöbergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-28 Thread Olof Sjobergh
On Wed, Jan 28, 2009 at 5:50 PM, Helge Hafting  wrote:
> I see. This is done to avoid needing a few extra keys for accents and
> umlauts? Won't that create problems for languages where two words differ
> only in accents?  In Norwegian, there are many such pairs. Examples:
> for/fôr, tå/ta, dør/dor,...

Yes, that's a problem I ran into with Swedish as well. We have for
example har/här/hår etc. But with a good dictionary it actually works
ok, if not optimally. For these words you have to select the one you
want from the matches which is a little annoying but not a total
show-stopper.

To fix it, either you would need different normalisation tables for
each language, or a new dictionary format. Raster said in an earlier
mail on the list that he'd fix it someday but had a lot of other stuff
to look at now. So I guess we have to be patient for now.

Best regards,

Olof Sjobergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-28 Thread Olof Sjobergh
On Wed, Jan 28, 2009 at 2:05 PM, Helge Hafting  wrote:
> The obvious fix is to store the dictionary in such a format that
> conversions won't be necessary. Not sure why utf16 is being used,
> utf8 is more compact and  works so well for everything else in linux.

Yes, the obvious fix is to change the dictionary format. However, it's
not as simple as you might think.

The dictionary today is stored in utf8, not utf16. But the dictionary
lookup tries to match words not exactly the same as the input word,
for example e should also match é, è and ë. To do this, every
character in the input string, and every character of each word, has
to be "normalised" to ascii. Since in utf8 a single character can take
up multiple bytes, to normalise a word it's first converted to utf16
where all characters are the same size, and then a simple lookup table
can be used for each character. But converting from multibyte format
each time a string is compared to another adds overhead.

With a different dictionary format where all words are stored already
normalised, there would be no need for all the conversions. But then
you also have to store all possible conversions for each word, so the
format would be more complicated.

Best regards,

Olof Sjobergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [SHR] illume predictive keyboard is too slow

2009-01-28 Thread Olof Sjobergh
On Wed, Jan 28, 2009 at 11:53 AM, Florian Hackenberger
 wrote:
> That's my UTF8 fix [1] that's causing the slowness, I'm afraid.
> Unfortunately I'm very very busy ATM and therefore I'm unable to work
> on it. It could either be the latin -> UTF16 code which is slow or
> another bug I introduced (causing excessive lookups for example).

I looked into this issue when my Swedish keyboard didn't work
correctly. I found some issues and some parts that could be improved
and sent a patch with these fixes to the enlightenment devel list.
However, even fixing everything I could find, it's still a bit slow.
The problem seems to be the conversion to utf16 for each and every
strcmp when doing the lookup.

Unless I missed something big (which I hope I didn't, but I wouldn't
be surprised if I did), this is not fixable with the current
dictionary lookup design. Raster talked about redesigning the
dictionary format, so I guess we have to wait until he gets around to
it (or someone else does it).

Best regards,

Olof Sjobergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: Illume keyboard dictionary sorting and normalization

2009-01-06 Thread Olof Sjobergh
On Tue, Jan 6, 2009 at 11:57 AM, The Rasterman Carsten Haitzler
 wrote:
> sort -f i think does it... i think...

Thanks, that seems to work.

I created a package and uploaded to
http://www.opkg.org/package_90.html for anyone who is interested. The
source is hosted at http://github.com/olofsj/swedish-illume.

> hmm interesting i was just going of german/french and portuguese on this where
> i thought i could get away with simple normalisation and a basic qwerty layout
> - with selecting the matches (Vogel/Vögel for example). making the table part
> of the dictionary does make a lot of sense of course. the dict format does 
> need
> to change to make it a lot faster and intl-char friendly. i avoided this at 
> the
> time as i'd need to efficiently encode a b-tree in the file and be able to 
> mmap
> () it efficiently and use it.

I understand it would make the dictionary format more complicated.
Maybe it could be split into 2 files, one with general configuration
data such as a normalisation table, an icon etc, and then a raw
dictionary file like there is now.

Best regards,

Olof Sjöbergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Illume keyboard dictionary sorting and normalization

2009-01-06 Thread Olof Sjobergh
Hi,

I'm working on a Swedish dictionary and keyboard for Illume, but I'm
having some trouble with sorting of utf8 chars in the dictionary. I
can't seem to get the sorting right. Looking at the code, Illume sorts
the dictionary after first normalizing the strings according to the
internal normalization table. Is there any way to reproduce this
sorting with the sort command? I've tried with a few different locales
(C, en_US.utf8) which all make the unix sort command work differently.
But no matter what I try words don't show up correctly.

Another issue I found is that the built in normalization table is not
very good for typing Swedish text. On a standard Swedish qwerty
layout, we have three additional letters (å, ä and ö). These are used
very frequently in Swedish and there are many common words that have
different meanings if spellt with a, å or ä (for example har, här and
hår are all very common words). But in Illume these are all normalized
to a. Writing Swedish with a US qwerty layout and then having to
select aåä manually after the dictionary lookup is a pain, since many
common words will have to be selected from the lookup list each time.

Instead, what you want is a Swedish qwerty layout (which is very
simple to implement as a .kbd file), and not normalize åäö for the
Swedish dictionary lookup. So the normalization table would really
need to be configurable, either as a part of the dictionary or the
.kbd file. I suppose this problem exists for other languages as well.
If I were to work on such a change, what would be the best approach?

Best regards,

Olof Sjobergh

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community