Hello everyone,

My name is Nanthapume (Poom), I'm undergrad student and now working part-time 
as assistant librarian at a primary school in Thailand. Past weeks, I've 
installed EvergreenILS and testing it through this domain ; 
'library.panyasakbangbon.ac.th' and I found some technical problems issued with 
i18n in Thai language.


1) Thai orthography and collation isn't match perfectly, it results when using 
utf-8, word listing tends to be dysfunctional and off the dictionary . For 
simple program we have traditional solution by converting strings into another 
encoding system (usually  'tis-680' ,maybe also 'iso 8859-11') before ordering 
and then parse them back afterwards to make the text displaying normally on 
users' screens.


2) Thai writing system doesn't have "spacing" between words in the same 
sentence ,but we usually use spacing to indicate sentences' ending instead of 
full stop "." .  This is like writing "The apple is red. The bird is flying." 
as "theappleisred thebirdisflying". This reflect a problem when I was trying to 
search 'พระคริสตธรรมคัมภีร์ภาคพันธสัญญาใหม่' with 'พระคริสตธรรมคัมภีร์' the 
search cannot find the results despite writing full title. Normally developers 
will create a simple program from a concise dictionary (and a list of common 
names) to tokenize words. So, when we search something the words will be 
separated apart before finding the matched strings (which was contiguously 
written also).


3) Thai users normally use Thai fragments along with English in searching. So 
it's something like 'ภาษา C++', 'Bible พันธสัญญาใหม่' instead of using one 
writing system. I think sometimes a set of  romanization, and language 
distinguishing programs may required.


So I have searched google for solution and found that issue number 2) and 3) 
can be fixed by using open-source Python library available online, eg. 
"https://github.com/PyThaiNLP/pythainlp"; , "https://pypi.org/project/PyICU/";. 
This might works but I haven't test it yet. ( Although, I cannot find Evergreen 
docs on configurations and criteria of searching. )


I'm trying to find a way out;  Now I have a list of words from 'Royal Institute 
Dictionary' (old free version) and  concise TH/EN Dictionary found online. From 
my side, I have translated 70% of OPAC .po files and 90% of MARC keys, 
including list of countries (mostly taken from Wikipedia) and some extras (that 
data field can be found in en_GB seed file.) If you have any suggestion for me 
please reply back, I'm looking forward for them.


Regards,
Nanthapume Toonkam (Poom)


Ps.#1  I have a project on making printing page for DATE DUE, which it'll be 
printed on small cards, if you have any technical advice please inform me.


Ps.#2 Before semester break I gave student some hints about coming changes in 
our library.The children were so curious about it. Hope this would come out 
fine. Thank you all, especially ones who helped me on Evergreen IRC chat.


Ps.#3 I've fixed my previous installation problem by setting new password for 
postgresql it seems like when using number as a password (started with 0) it 
cannot recognize the password properly.

Reply via email to