On 22/5/12 22:57, Steven Dickson wrote:
Hello,
I work for a religious organization that produces publications in
several languages spoken by our members throughout the world. Over the
years, we have developed UTF-8 encoded hyphenated word lists for 91
different languages. We use these word lists to create proprietary
hyphenation software. We would like to use these lists to create
hyphenation pattern files that can be used with more traditional
software such as TeX and OpenOffice applications.
I think that at present the state of pattern-generating tools is pretty
woeful, but that could in principle be changed by some motivated developers.
Are you happy to distribute these hyphenated word lists in some way? If
you were to make them available (under a simple, non-restrictive license
such as BSD), it might be more likely that people in the free software
community would be inspired to tackle the work that's needed to derive
TeX- and OpenOffice-compatible (or other) resources from them.
JK
It appears that hyphenation pattern files are being created by patgen
using tokenized word lists then converting the final output to UTF-8.
Unfortunately, we are dealing with some complex languages that will
exceed the 256 character limit of patgen.
Like others, I have unsuccessfully tried to build opatgen with the
current version of gcc. Trying to find gcc version 2.96 in hopes that it
will work doesn’t make sense, especially when there are reports that
opatgen has some serious reliability and performance issues. I applaud
David Antos for his research and development of opatgen and find it
fascinating that his work has not been adopted and enhanced by the open
source community.
Is using patgen with tokenized word lists and converting the output to
UTF-8 really the only viable way to create pattern files?
Steve Dickson
The Church of Jesus Christ of Latter-day Saints
Publishing Services Department
50 East North Temple Street
Salt Lake City, Utah 84150
Email: [email protected] <mailto:[email protected]>
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.