|
Hi Mojca, I'm not a member of the email list [email protected], and therefore have not received any of the emails below until your last email to Turgut Uyar which you've also cc'ed me. I'll now do my best to reply to your queries. On your first comment, the Turkish alphabet has more letters, and that's probably why there are more patterns. On your second comment, the missing 2bi. , 2bö. , 2bü. , 2ci. ... These patterns don't really make a lot of sense in Turkish, since there are words which should be hyphenated at bi or ci such as ga-ra-bi ge-mi-ci bö-rül-ce etc. Could it be that Yannis' patterns mapped ö, ç, ü etc. to some other old Ottoman letters? On your last comment on missing ".i2 .ö2 .ü2" , again there are many counter examples i-yi-lik ö-ner-ge ü-züm etc.. It's a pity that there's only a single mechanism for hyphenation in TeX which is specialized to English. Hyphenation in Turkish is so easy and mechanical that none of the dictionaries in Turkish even bother to show it. And one can write a very short and simple computer program that does hyphenation. Below is an example one written in Mathematica syntax. It would have been great to extend the hyphenation mechanism in TeX, but that may be too much work... I hope I was able to reply to your queries. If you send me Yannis' pattern, I can be of more help probably. Also, my knowledge on TeX hyphenation is limited with what I read at Knuth's TeX book. I did not delve into Liang's thesis. Hence, it is very probable that some parts of my reply are nonsensical. Best, Ekin PS: After I finished typing this email, I saw the thread at [email protected] and specifically the comments In modern day Turkish there are "s" and "ş" . "ş" corresponds to "S WITH CEDILLA". I don't know how "s with line below" character is different than "ş".7.) Can someone explain why "s with line below" has no unicode point? (Yes, I know that it won't be added, ...) Ekin - do you by any chance know any programming language to update the source for generating patterns? I know C. I'll look into http://www.ctan.org/tex-archive/language/turkish/hyphen/turk_hyf.c to make it compatible with current day Turkish. I agree. One other problem that I saw was that, small caps don't work with Turkish. In English "i-I" form a small-CAPITAL letter pair whereas in Turkish there are two pairs "ı-I" and "i-İ" . Dotted ones are paired together, so are the dotless ones---which is different than in English. Some fonts don't even have the small caps version for "ı" (i.e. Adobe Caslon Pro) which is quite annoying. I have not been able to solve this problem...Unless someone objects, I would rename the old patterns and clean the not-needed characters in Turkish ones. But we need to remove all the \lccode and \catcode commands. PS2: I subscribed to tex-hyphen. This is the hyphenating code in Mathematica. 'kelime' means word. These words are not real ones, but still, due to the rules in Turkish, you can pretty much hyphenate anything you can type. I've tested the function with those hypothetical words. Oh and by the way 'hecele' means hyphenate, 'sesli harfler' = consonants. Kelime Heceleyici ver 0.1 kelime = "Şakşukacılaştıramayabileceklerimizdenkilerdensiniz"; kelime = "Tastamamdedirticiolmayacalisanlarinhuzunluuykusuzlugu" Hecele[kelime] "Tastamamdedirticiolmayacalisanlarinhuzunluuykusuzlugu" "Tas-ta-mam-de-dir-ti-ci-ol-ma-ya-ca-li-san-la-rin-hu-zun-lu-uy-ku-suz-lu-gu" x = x; Remove["Global`*"]; Off[General::spell]; Off[General::spell1]; Hecele[x_] := Block[{SesliHarfler = {"a", "e", "ı", "i", "o", "ö", "u", "ü", "A", "E", "I", "İ", "O", "Ö", "U", "Ü"}, out = ""}, For[i = 1, i <= StringLength[x], i++, this = StringTake[x, {i}]; previous = If[i != 1, StringTake[x, {i - 1}], " "]; next = If[i < StringLength[x], StringTake[x, {i + 1}], " "]; If[! MemberQ[SesliHarfler, this], If[MemberQ[SesliHarfler, next], out = out <> "-"; ]; ]; If[MemberQ[SesliHarfler, this], If[MemberQ[SesliHarfler, previous], out = out <> "-"; ]; ]; out = out <> this; ]; If[StringTake[out, {1}] === "-", out = StringDrop[out, 1]; ]; out ]; Mojca Miklavec said the following on 6/25/2008 9:50 AM: Hello Turgut, may I please ask you to comment on this issue concerning the Turkish hyphenation patterns? See http://tug.org/pipermail/tex-hyphen/2008-June/000243.html Thanks a lot, Mojca On Wed, Jun 25, 2008 at 6:44 PM, Mojca Miklavec <[EMAIL PROTECTED]> wrote: |
- [tex-hyphen] Unicode Turkish Hyphenation Pattern S. Ekin Kocabas
- Re: [tex-hyphen] Unicode Turkish Hyphenation Pattern Mojca Miklavec
- Re: [tex-hyphen] Unicode Turkish Hyphenation Pattern Mojca Miklavec
- Re: [tex-hyphen] Unicode Turkish Hyphenation Patt... Mojca Miklavec
- Re: [tex-hyphen] Unicode Turkish Hyphenation ... S. Ekin Kocabas
- Re: [tex-hyphen] Unicode Turkish Hyphenat... Mojca Miklavec
- Re: [tex-hyphen] Unicode Turkish Hyphenation ... Mojca Miklavec
