Is this really only a Vietnamese problem, but will not all latinate scripts with extra signs have exactly the same problem?
Or actually all scripts which are treated as derrived scripts - Farsi, urdu and Malay from Arabic, Tajik, Uzbek, Azeri from Russian etc - the code points are initially for the "main" characters and then there is a always bunch of extra characters which are used only in one or other language. But maybe I am just showing my ignorance here. I need to look at some dictionaries - never had any installed. Daniel Owens wrote: > Chris, > > I imagine that with most languages, sorting according to unicode codepoint > order > works, but for Vietnamese it doesn't, probably because the majority of > letters > are standard Latin characters, but then some are less usual ("đ" being a good > example). > > This is probably very low on the priority list and I'm not sure how much work > this would involve, but I would suggest at some point adding an option to the > command line syntax for imp2ld either to 1. sort the order of keys according > to > unicode (default) or 2. retain the order of the IMP file (not sort at all). > That > way languages that do not alphabetize well according to the codepoint order > in > Unicode can remain in alphabetical order (assuming the module creator sorted > correctly). > > Daniel > > Chris Little wrote: >> Daniel, >> >> The order of keys in an LD module is according to the codepoint order in >> Unicode. They keys are kept in this order in order to permit binary >> searching. There is currently no way to perform localized collation. >> >> The platform and locale shouldn't play a role in this. If they do, it's >> a bug. >> >> --Chris >> >> Daniel Owens wrote: >> >>> I am working on creating dictionary modules based on the Free Vietnamese >>> Dictionary Project. The Vietnamese-English dictionary is working, but >>> some words are not in alphabetical order, and I am trying to find out >>> how to maintain the original alphabetization. >>> >>> I noticed this when all of the words beginning with a vowel having >>> diacritics/tones or beginning with a "Ä‘" were sorted to the end of the >>> dictionary. The DAT file maintains the original order, which is more >>> accurate. It must be that the IDX file generated by imp2ld creates its >>> own index and alphabetizes according to it's own scheme. The entries of >>> each word are tagged as ThML. Here is a slightly random entry: >>> >>> $$$ác bá >>> <entry key="ác bá" type="main" id="n20"><b>ác bá</b><br />[noun]<br />- >>> Cruel landlord, village tyrant<br /></entry> >>> >>> Is there a way to keep imp2ld from changing the order of the index? I am >>> happy to send someone the IMP file if that helps. I pasted the CONF file >>> at the bottom of this message. >>> >>> Daniel >>> >>> CONF File: >>> >>> [VietAnh] >>> DataPath=./modules/lexdict/rawld4/vietanh/vietanh >>> ModDrv=RawLD4 >>> Encoding=UTF-8 >>> SourceType=THML >>> SwordVersionDate=2007-10-27 >>> Version=1.0 >>> Lang=vi >>> Description=FVDP Vietnamese-English Dictionary >>> About=- This is the Vietnamese-English dictionary database of the Free >>> Vietnamese Dictionary Project. It contains more than 23.400 entries with >>> definitions and illustrative examples.\par\par- This database was >>> compiled by Ho Ngoc Duc and other members of the Free Vietnamese >>> Dictionary Project >>> (http://www.informatik.uni-leipzig.de/~duc/Dict/)\par\par- Copyright (C) >>> 1997-2003 The Free Vietnamese Dictionary Project\par\par- This program >>> is free software; you can redistribute it and/or modify it under the >>> terms of the GNU General Public License as published by the Free >>> Software Foundation; either version 2 of the License, or (at your >>> option) any later version. This program is distributed in the hope that >>> it will be useful, but WITHOUT ANY WARRANTY; without even the implied >>> warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >>> GNU General Public License for more details. >>> TextSource=http://www.informatik.uni-leipzig.de/~duc/Dict/ >>> >>> >>> >>> _______________________________________________ >>> sword-devel mailing list: sword-devel@crosswire.org >>> http://www.crosswire.org/mailman/listinfo/sword-devel >>> Instructions to unsubscribe/change your settings at above page >>> >> >> _______________________________________________ >> sword-devel mailing list: sword-devel@crosswire.org >> http://www.crosswire.org/mailman/listinfo/sword-devel >> Instructions to unsubscribe/change your settings at above page >> >> > > > ------------------------------------------------------------------------ > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page