New release of Enscribi - version 0.2.0
Hi, Enscribi, the handwriting recognition input method for Japanese and Chinese, has gotten bumped to version 0.2.0. What's new in this version? Not much, but at least there's now: * The Zinnia recognizer has been moved to its own process, so the GUI doesn't lock up when doing the recognition. * Color coding of different alphabets, so it's easier to find the character you're after (mostly useful for Japanese, where hiragana, katakana and kanji sometimes are very similar and hard to separate). A precompiled package for current SHR unstable is hosted at http://www.opkg.org/package_133.html. This should fix the problem with the old package that didn't work with the new EFL version. For more details about installation and usage, please see http://olofsj.github.com/enscribi/ Best regards, Olof Sjöbergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: First release of Enscribi - handwriting recognition input method
On Fri, May 15, 2009 at 5:44 AM, jeremy jozwik wrote: > so after searching for a bit i am failing to find a link for > libecore_evas.so.0 as an ipk. > closest mention is this: > http://lists.openmoko.org/nabble.html#nabble-td2587826i60 > > but i find no link. any ideas out there? I think it should be the package named ecore-evas. I'm on a business trip, so I don't have access to my freerunner and can't confirm this, but now that I think about it the efl libraries changed version recently. So I think Enscribi has to be recompiled for this version. I'll take a look at it and will put together a new package when I get home (will probably have time sometime early next week). Regards, Olof ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: First release of Enscribi - handwriting recognition input method
On Thu, May 14, 2009 at 6:00 PM, jeremy jozwik wrote: > shr-testing 20090502: > "enlightenment was unable to run the application enscribi the > application failed to start" > > Zinnia and Zinnia-tomoe-zh installed > Could you try to run it from the command line? There might be some relevant error message there. Open up the terminal, run "enscribi" and post any output here. Regards, Olof ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: First release of Enscribi - handwriting recognition input method
On Wed, May 13, 2009 at 12:38 PM, Russell Hay wrote: > Hi Olof, any chance of having english support added to this? > > Otherwise (as a non-hacker) can I contribute anything that'd support you in > adding english support? Hi, There's nothing stopping adding english support. I thought about adding it myself. The only thing you need is stroke data for all the letters, which are not so many for English. Then it would be possible to make a new theme for English input as well (the letters are not as large and complicated as for Japanese and Chinese, so you can get away with a smaller drawing area for each letter). So what's needed is stroke data for all the letters. I'm currently experimenting with writing a stroke editor so it's easy to add new characters. When this is done it would be quite simple to add new characters. You'd just have to draw all the letters. I'm currently on a business trip and wan't be able to work in it for now, but when it's ready I'll post info about here on the mailing list. At best, I'd say a few weeks from now, but that depends on how much free time I get. Regards, Olof ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] Packaging extra fonts
On Mon, Apr 6, 2009 at 10:56 AM, Pander wrote: > Hi all, > > I need some extra fonts to display kanji. I have manged to copy some > .ttf files manually and that works. Now, I would like to package these > fonts in an .opkg file. do I need to call some executables to properly > register the ttf files or is putting the files in the correct directory > suffiecient? Hi, If you want Japanese fonts, the following two fonts are already available and packaged for SHR: ttf-sazanami-mincho ttf-sazanami-gothic You can just install them with opkg. Best regards, Olof Sjöbergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [Om2008.x] Wanted StarDict for my Openmoko
On Sun, Mar 1, 2009 at 8:51 AM, Matthias Apitz wrote: > > It would be _extremely_ useful for me to have it as well on my FR; Is > someone working on a port of this? Thx Hi, I'm not working on a port of this, but I'm working on a dictionary program of my own. There's still some stuff to do, so I haven't made any packages yet. I'd recommend you wait until it's a bit more stable, but if you feel like checking it out, it's hosted on Github at http://github.com/olofsj/elexika. I never really liked the way most dictionary programs present the results, so I'm trying to write it the way I'd like it. It's written in Elementary and the dictionary backend is in C for speed. One goal is for it to be simple to add new dictionaries, you just specify in a textfile how the dictionary is formatted and how the results should be presented. However, it doesn't (and probably won't) support fuzzy matching and such like Stardict. Hope you'll like it when it's released. =) Best regards, Olof Sjöbergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [Om2008.x] Terminal with UTF-8 support wanted
Hi, On Mon, Mar 2, 2009 at 10:43 AM, Matthias Apitz wrote: > > Is there some way to get UTF-8 support? Thx > You need to install a UTF-8 locale. To see which locales you have installed, run locale -a Unfortunately, I don't remember the package names for locales. But with an UTF-8 locale installed, I know that at least vala-terminal works and can display UTF-8 encoded text correctly. Best regards, Olof Sjöbergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] Accent and special characters in sms
Hi, I thinks this is the same problem as I found with Enscribi. I posted a patch for Ecore a few days ago, I attached here again if you want to try it out. Hopefully Rasterman can add it to Ecore soon. Best regards, Olof Sjöbergh On Thu, Feb 19, 2009 at 7:45 PM, Mark Müller wrote: > There is a ticket #58 in shr trac which describes your issue > (http://trac.shr-project.org/trac/ticket/58). I mentioned it at irc and > added a comment to reopen the ticket, but there's been no reaction so far. > > > --- > > Mark > > > Gaël HERMET schrieb: > >> Hi community, >> >> I am using the last SHR unstable and I can't insert any accent or >> special character in the sms app. >> >> It don't work with both azerty layout and qwerty layout with french >> dictionary. >> >> If somebody know how to fix that, I don't know where I can search. >> >> >> --- >> Gaël HERMET >> >> >> ___ >> Openmoko community mailing list >> community@lists.openmoko.org >> http://lists.openmoko.org/mailman/listinfo/community >> >> > > > ___ > Openmoko community mailing list > community@lists.openmoko.org > http://lists.openmoko.org/mailman/listinfo/community > Index: ecore/src/lib/ecore_x/xlib/ecore_x_events.c === --- ecore/src/lib/ecore_x/xlib/ecore_x_events.c (revision 39016) +++ ecore/src/lib/ecore_x/xlib/ecore_x_events.c (working copy) @@ -1819,10 +1819,11 @@ _ecore_x_event_handle_client_message(XEvent *xeven } void -_ecore_x_event_handle_mapping_notify(XEvent *xevent __UNUSED__) +_ecore_x_event_handle_mapping_notify(XEvent *xevent) { _ecore_x_last_event_mouse_move = 0; - /* FIXME: handle this event type */ + + XRefreshKeyboardMapping((XMappingEvent *)xevent); } void ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: First release of Enscribi - handwriting recognition input method
I'm glad there's a lot of interest in this. =) As for the poor results for Chinese characters, I suspect the character data for Chinese is not perfect. Personally I don't know any Chinese, so it's hard for me to check. However, for Japanese it works quite well, but there are some characters that are missing and have to be added. The data for the characters are from the Tomoe project (another handwriting recognition method), available at http://tomoe.sourceforge.jp. They also have a stroke editor that can be used to edit/add new characters. Yesterday I found and fixed the problem with inputting in Edje entry widgets. I sent the patch to the enlightenment devel list, but have attached it here as well for anyone interested in testing it. Patching and recompiling Ecore should make it possible to write in any program using Elementary or Edje. There's still a lot to improve, and any suggestions or patches are appreciated. Best regards, Olof Sjöbergh ecore_x_event_mapping_notify.patch Description: Binary data ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: First release of Enscribi - handwriting recognition input method
Hi, On Sun, Feb 15, 2009 at 8:26 AM, xiangfu wrote: > i just install "Enscribi" > then: > r...@om-gta02:~#enscribi > Enscribi: _cb_move > > then nothing. > the rootfs is FSO milestone 5 > I should have explained better. After you install encsribi, first click on the Illume top bar, then on the wrench in the upper left corner. Then, click on "Keyboard". There you can choose Enscribi instead of the default keyboard. Then Enscribi will show up whenever the keyboard is shown. Best regards, Olof Sjöbergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: First release of Enscribi - handwriting recognition input method
On Sat, Feb 14, 2009 at 4:54 PM, HouYu Li wrote: > Hi, It's a nice start... > > XiangFu! Have you tried it? I just installed these packages on FSO latest > stable. It does recognize input although not that precisely. But it does not > able to input the Chinese character into the zhone message input area. > > My question is: Do we need extra configuration?? > Hi, Edje, that is used by zhone, does not support inputting multi-byte characters, so inputting in Zhone won't work for now. This has to be added to Edje, which I'll try to do. Best regards, Olof Sjöbergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
First release of Enscribi - handwriting recognition input method
Hi, This is to announce the first release of Enscribi, a new handwriting recognition input method I've been working on. The main focus, and the only thing supported for now, is writing Japanese and Chinese characters (and numbers, but numbers only won't get you far...). It uses the excellent Zinnia recognition engine for the actual recognition. If anyone is interested, please take a look. There's a project page at http://olofsj.github.com/enscribi/ with screenshots and some more information. There are packages on opkg.org for trying it out (only tested on FSO milestone 5). The following packages are available: http://www.opkg.org/package_133.htmlEnscribi http://www.opkg.org/package_130.htmlZinnia (required dependency) http://www.opkg.org/package_131.htmlZinnia-tomoe-ja (for Japanese support) http://www.opkg.org/package_132.htmlZinnia-tomoe-zh (for Chinese support) Also, you need a Japanese or Chinese font to see the characters (should be available in the usual repos). The code is hosted on Github at http://github.com/olofsj/enscribi/tree/master Best regards, Olof Sjöbergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Fri, Jan 30, 2009 at 8:12 PM, The Rasterman Carsten Haitzler wrote: > On Fri, 30 Jan 2009 08:31:43 +0100 Olof Sjobergh said: >> But I think a dictionary format in plain utf8 that includes the >> normalised words as well as any candidates to display would be the >> best way. Then the dictionary itself could choose which characters to >> normalise and which to leave as is. So for Swedish, you can leave å, ä >> and ö as they are but normalise é, à etc. Searching would be as simple >> as in your original implementation (no need to convert from multibyte >> format). > > the problem is - the dict in utf8 means searching is slow as you do it in utf8 > space. the dict is mmaped() to save ram - if it wasnt it'd need to be > allocated > in non-swappable ram (its a phone - it has no swap) and thus a few mb of your > ram goes into the kbd dict at all times. by using mmap you leave it to the > kernels paging system to figure it out. > > so as such a dict change will mean a non-ascii format in future for this > reason. but there will then need to be a tool to generate such a file. Searching in utf8 doesn't mean it has to be slow. Simple strcmp works fine on multibyte utf8 strings as well, and should be as fast as the dictionary was before adding multibyte to widechars conversions. But if you have some other idea in mind, please don't let me disturb. =) Best regards, Olof Sjöbergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Fri, Jan 30, 2009 at 4:25 AM, The Rasterman Carsten Haitzler wrote: > On Thu, 29 Jan 2009 08:30:44 +0100 Olof Sjobergh said: > >> On Wed, Jan 28, 2009 at 11:16 PM, The Rasterman Carsten Haitzler >> wrote: >> > On Wed, 28 Jan 2009 18:59:32 +0100 "Marco Trevisan (Treviño)" >> > said: >> > >> >> Olof Sjobergh wrote: >> >> > Unless I missed something big (which I hope I didn't, but I wouldn't >> >> > be surprised if I did), this is not fixable with the current >> >> > dictionary lookup design. Raster talked about redesigning the >> >> > dictionary format, so I guess we have to wait until he gets around to >> >> > it (or someone else does it). >> >> >> >> I think that too. Maybe using something like a "trie" [1] to archive the >> >> words could help (both for words matching and for compressing the >> >> dictionary). >> >> Too hard? >> >> >> >> [1] http://en.wikipedia.org/wiki/Trie >> > >> > the problem here comes with having multiple displays for a single match. >> > let me take japanese as an example (i hope you have the fonts to see this >> > at least - though there is no need to understand beyond knowing that there >> > are a lot of matches that are visibly different): >> > >> > sakana -> >> > さかな 茶菓な 肴 魚 サカナ 坂な 差かな 左かな 査かな 鎖かな 鎖かな >> > >> > unlike simple decimation of é -> e and ë -> e and è -> e etc. you need 1 >> > ascii input string matching one of MANY very different matches. the >> > european case of >> > >> > vogel -> Vogel Vögel >> > >> > is a simplified version of the above. the reason i wanted "decimation to >> > match a simple roman text (ascii) string is - that this is a pretty >> > universal thing. thats how japanese, chinese and even some korean input >> > methods work. it also works for european languages too. europeans are NOT >> > used to the idea of a dictionary guessing/selecting system when they type - >> > but the asians are. they are always typing and selecting. the smarts come >> > with the dictionary system selecting the right one more often than not by >> > default or the right selection you want being only 1 or 2 keystrokes away. >> > >> > i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as much >> > as possible - so you can just type and it will work and offer the >> > selections as it's trying to guess anyway - it can present the multiple >> > accented versions too. this limits the need for special keyboards - doesn't >> > obviate it, but allows more functionality out of the box. in the event >> > users explicitly select an accented char - ie a non-ascii character, it >> > should not "decimate". it should try match exactly that char. >> > >> > so if you add those keys and use them or flip to another key layout to >> > select them - you get what you expect. but if i am to redo the dict - the >> > api is very generic - just the internals and format need changing to be >> > able to do the above. the cool bit is.. if i manage the above... it has >> > almost solved asian languages too - and input methods... *IF* the vkbd is >> > also able to talk to a complex input method (XIM/SCIM/UIM etc.) as >> > keystroke faking wont let you type chinese characters... :) but in >> > principle the dictionary and lookup scheme will work - its then just >> > mechanics of sending the data to the app in a way it can use it. >> > >> > so back to the trie... the trie would only be useful for the ascii matching >> > - i need something more complex. it just combines the data with the match >> > tree (letters are inline). i need a match tree + lookup table to other >> > matches to display - and possibly several match entries (all the matches to >> > display also need to be in the tree pointing to a smaller match list). >> > >> > -- >> > - Codito, ergo sum - "I code, therefore I am" -- >> > The Rasterman (Carsten Haitzler)ras...@rasterman.com >> >> I think most problems could be solved by using a dictionary format >> similar to what you describe above, i.e. something like: >> >> match : candidate1 candidate2; frequency >> for example: >> vogel : Vogel Vögel; 123 >> >> That would mean you can search on the normalised word where simple >> strcmp works
Re: [SHR] illume predictive keyboard is too slow
On Thu, Jan 29, 2009 at 10:18 AM, Michal Brzozowski wrote: > 2009/1/29 Olof Sjobergh >> >> I think most problems could be solved by using a dictionary format >> similar to what you describe above, i.e. something like: >> >> match : candidate1 candidate2; frequency >> for example: >> vogel : Vogel Vögel; 123 >> >> That would mean you can search on the normalised word where simple >> strcmp works fine and will be fast enough. > > This dictionary would have hundreds of millions of rows even if you take > only reasonable user inputs. But what to do if the users inputs something > that's not in the dictionary? Of course I'm assuming you want to correct > typos, as it's doing now. > > vogel: Vogel, Vögel > vigel: Vogel, Vögel > vpgel: Vogel, Vögel > wogel: Vogel, Vögel > wigel: Vogel, Vögel > vigem: Vogel, Vögel > vigwl: Vogel, Vögel > ... > ... It did not mean all possible misspellings should be included, only the normalisation which removes accented chars etc. So for normal English, there would be almost no extra size compared to now. The current way of correcting typos by checking all combinations from neighbouring keys would work just like today. Best redards, Olof Sjöbergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Wed, Jan 28, 2009 at 11:16 PM, The Rasterman Carsten Haitzler wrote: > On Wed, 28 Jan 2009 18:59:32 +0100 "Marco Trevisan (Treviño)" > said: > >> Olof Sjobergh wrote: >> > Unless I missed something big (which I hope I didn't, but I wouldn't >> > be surprised if I did), this is not fixable with the current >> > dictionary lookup design. Raster talked about redesigning the >> > dictionary format, so I guess we have to wait until he gets around to >> > it (or someone else does it). >> >> I think that too. Maybe using something like a "trie" [1] to archive the >> words could help (both for words matching and for compressing the >> dictionary). >> Too hard? >> >> [1] http://en.wikipedia.org/wiki/Trie > > the problem here comes with having multiple displays for a single match. let > me > take japanese as an example (i hope you have the fonts to see this at least - > though there is no need to understand beyond knowing that there are a lot of > matches that are visibly different): > > sakana -> > さかな 茶菓な 肴 魚 サカナ 坂な 差かな 左かな 査かな 鎖かな 鎖かな > > unlike simple decimation of é -> e and ë -> e and è -> e etc. you need 1 ascii > input string matching one of MANY very different matches. the european case of > > vogel -> Vogel Vögel > > is a simplified version of the above. the reason i wanted "decimation to match > a simple roman text (ascii) string is - that this is a pretty universal thing. > thats how japanese, chinese and even some korean input methods work. it also > works for european languages too. europeans are NOT used to the idea of a > dictionary guessing/selecting system when they type - but the asians are. they > are always typing and selecting. the smarts come with the dictionary system > selecting the right one more often than not by default or the right selection > you want being only 1 or 2 keystrokes away. > > i was hoping to be able to keep a SIMPLE ascii qwerty keyboard for as much as > possible - so you can just type and it will work and offer the selections as > it's trying to guess anyway - it can present the multiple accented versions > too. this limits the need for special keyboards - doesn't obviate it, but > allows more functionality out of the box. in the event users explicitly select > an accented char - ie a non-ascii character, it should not "decimate". it > should try match exactly that char. > > so if you add those keys and use them or flip to another key layout to select > them - you get what you expect. but if i am to redo the dict - the api is very > generic - just the internals and format need changing to be able to do the > above. the cool bit is.. if i manage the above... it has almost solved asian > languages too - and input methods... *IF* the vkbd is also able to talk to a > complex input method (XIM/SCIM/UIM etc.) as keystroke faking wont let you type > chinese characters... :) but in principle the dictionary and lookup scheme > will > work - its then just mechanics of sending the data to the app in a way it can > use it. > > so back to the trie... the trie would only be useful for the ascii matching - > i > need something more complex. it just combines the data with the match tree > (letters are inline). i need a match tree + lookup table to other matches to > display - and possibly several match entries (all the matches to display also > need to be in the tree pointing to a smaller match list). > > -- > - Codito, ergo sum - "I code, therefore I am" -- > The Rasterman (Carsten Haitzler)ras...@rasterman.com I think most problems could be solved by using a dictionary format similar to what you describe above, i.e. something like: match : candidate1 candidate2; frequency for example: vogel : Vogel Vögel; 123 That would mean you can search on the normalised word where simple strcmp works fine and will be fast enough. To not make it too large for example the following syntax could also be accepted: eat; 512 // No candidates, just show the match as is har här hår; 1234// Also show the match itself as a candidate If you think this would be good enough, I could try to implement it. Another problem with languages like Swedish, and also Japanese, is the heavy use of conjugation. For example, in Japanese the verbs 食べる and 考える can both be conjugated in the same way like this: 食べる 食べました 食べた 食べている 食べていた 食べています 食べていました 考える 考えました 考えた 考えている 考えていた 考えています 考えていました Another example, the Swedish nouns: bil bilen bilar bilarna bilens bilarnas But including all these forms in a dictionary makes it very large, which is impractical. So some way to indicate possible conjugations would be good, but it would make the dictionary format a lot more complex. Best regards, Olof Sjöbergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Wed, Jan 28, 2009 at 5:50 PM, Helge Hafting wrote: > I see. This is done to avoid needing a few extra keys for accents and > umlauts? Won't that create problems for languages where two words differ > only in accents? In Norwegian, there are many such pairs. Examples: > for/fôr, tå/ta, dør/dor,... Yes, that's a problem I ran into with Swedish as well. We have for example har/här/hår etc. But with a good dictionary it actually works ok, if not optimally. For these words you have to select the one you want from the matches which is a little annoying but not a total show-stopper. To fix it, either you would need different normalisation tables for each language, or a new dictionary format. Raster said in an earlier mail on the list that he'd fix it someday but had a lot of other stuff to look at now. So I guess we have to be patient for now. Best regards, Olof Sjobergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Wed, Jan 28, 2009 at 2:05 PM, Helge Hafting wrote: > The obvious fix is to store the dictionary in such a format that > conversions won't be necessary. Not sure why utf16 is being used, > utf8 is more compact and works so well for everything else in linux. Yes, the obvious fix is to change the dictionary format. However, it's not as simple as you might think. The dictionary today is stored in utf8, not utf16. But the dictionary lookup tries to match words not exactly the same as the input word, for example e should also match é, è and ë. To do this, every character in the input string, and every character of each word, has to be "normalised" to ascii. Since in utf8 a single character can take up multiple bytes, to normalise a word it's first converted to utf16 where all characters are the same size, and then a simple lookup table can be used for each character. But converting from multibyte format each time a string is compared to another adds overhead. With a different dictionary format where all words are stored already normalised, there would be no need for all the conversions. But then you also have to store all possible conversions for each word, so the format would be more complicated. Best regards, Olof Sjobergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [SHR] illume predictive keyboard is too slow
On Wed, Jan 28, 2009 at 11:53 AM, Florian Hackenberger wrote: > That's my UTF8 fix [1] that's causing the slowness, I'm afraid. > Unfortunately I'm very very busy ATM and therefore I'm unable to work > on it. It could either be the latin -> UTF16 code which is slow or > another bug I introduced (causing excessive lookups for example). I looked into this issue when my Swedish keyboard didn't work correctly. I found some issues and some parts that could be improved and sent a patch with these fixes to the enlightenment devel list. However, even fixing everything I could find, it's still a bit slow. The problem seems to be the conversion to utf16 for each and every strcmp when doing the lookup. Unless I missed something big (which I hope I didn't, but I wouldn't be surprised if I did), this is not fixable with the current dictionary lookup design. Raster talked about redesigning the dictionary format, so I guess we have to wait until he gets around to it (or someone else does it). Best regards, Olof Sjobergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: Illume keyboard dictionary sorting and normalization
On Tue, Jan 6, 2009 at 11:57 AM, The Rasterman Carsten Haitzler wrote: > sort -f i think does it... i think... Thanks, that seems to work. I created a package and uploaded to http://www.opkg.org/package_90.html for anyone who is interested. The source is hosted at http://github.com/olofsj/swedish-illume. > hmm interesting i was just going of german/french and portuguese on this where > i thought i could get away with simple normalisation and a basic qwerty layout > - with selecting the matches (Vogel/Vögel for example). making the table part > of the dictionary does make a lot of sense of course. the dict format does > need > to change to make it a lot faster and intl-char friendly. i avoided this at > the > time as i'd need to efficiently encode a b-tree in the file and be able to > mmap > () it efficiently and use it. I understand it would make the dictionary format more complicated. Maybe it could be split into 2 files, one with general configuration data such as a normalisation table, an icon etc, and then a raw dictionary file like there is now. Best regards, Olof Sjöbergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Illume keyboard dictionary sorting and normalization
Hi, I'm working on a Swedish dictionary and keyboard for Illume, but I'm having some trouble with sorting of utf8 chars in the dictionary. I can't seem to get the sorting right. Looking at the code, Illume sorts the dictionary after first normalizing the strings according to the internal normalization table. Is there any way to reproduce this sorting with the sort command? I've tried with a few different locales (C, en_US.utf8) which all make the unix sort command work differently. But no matter what I try words don't show up correctly. Another issue I found is that the built in normalization table is not very good for typing Swedish text. On a standard Swedish qwerty layout, we have three additional letters (å, ä and ö). These are used very frequently in Swedish and there are many common words that have different meanings if spellt with a, å or ä (for example har, här and hår are all very common words). But in Illume these are all normalized to a. Writing Swedish with a US qwerty layout and then having to select aåä manually after the dictionary lookup is a pain, since many common words will have to be selected from the lookup list each time. Instead, what you want is a Swedish qwerty layout (which is very simple to implement as a .kbd file), and not normalize åäö for the Swedish dictionary lookup. So the normalization table would really need to be configurable, either as a part of the dictionary or the .kbd file. I suppose this problem exists for other languages as well. If I were to work on such a change, what would be the best approach? Best regards, Olof Sjobergh ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community