There has been some discussion of ligatures previously, in an English context.
As I understand the matter, if Unicode chose to encode the Fraktur ligatures that you request, then they would not be encoded as Fraktur ligatures as such, but just as Alphabetic Presentation Forms, so that, say, a "long s I" ligature would be encoded so that the same Unicode code point would be used for both Fraktur and for non-Fraktur use of a ligature, such as transcribing a book from 18th century England. Having had a look at a web page showing a Fraktur fount, I have come to the initial conclusion that you are looking for the following ligatures. ch, ck, ff, fi, fl, ll, long sch, long si, long sl, long s long s, long st, long s s Eszett, tt, tz. Please correct me if I have got the list wrong or if there are any other ligatures that you would like included, so that the list is complete. The one that I have referred to as "long s s Eszett" I have, except for one exception, only ever seen in German texts. The one exception is on the reproduction of the title page of a contemporary edition of a sixteenth century English play "The Massacre at Paris", in the title. This was as an illustration in a book about English drama, I have not seen the typography of the original printed text of the play. The use of that form of double s in England so surprised (and delighted) me that it has stuck in my mind. The long s on its own is encoded in Unicode as U+017F and the "long s s Eszett" as U+00DF. Unicode currently has the ligatures ff, fi, fl, ffi, ffl, long s t and st as U+FB00 through to U+FB06. So, it would appear that Fraktur would need the following added. ch, ck, ll, long sch, long si, long sl, long s long s, tt, tz. In relation to ligatures it would be helpful for the transcription of English printed books of the 18th Century to add the following. ct, long s b, long s h, long s k. Also I suspect adding the following would be desirable. long s long s i, long s long s l. I am unsure what happened historically as to whether long s f and f long s ever existed and would seek advice from participants in this forum please. Also advice as to any other long s ligatures, or indeed other ligatures generally, that could reasonably be included. This is a total of possibly seventeen extra ligatures at present, at least thirteen and maybe more than seventeen. As I understand it, the Unicode consortium and possibly the ISO body are reluctant to encode any further ligatures. My suggested solution is that these ligatures be encoded as U+E707 and following, using the Private Use Area, with ct as U+E707 as I have already previously suggested that one as an explicit suggestion. The idea behind this is that U+E707 is chosen so that ct could possibly be promoted to U+FB07 in time, if the Unicode consortium and ISO so choose. I feel that keeping open the possibility of a straightforward promotion would be a good idea, so using U+E708 through to U+E70F for nine of the ligatures would be a good idea, then continuing from U+E750 through to U+E75F which would provide for another 16 code points. That would allow 23 ligatures to be added. So, which code point should represent which ligature? I suggest that U+E707 be ct as I have already publicly suggested that previously and some people may have made a note of that. The rest I suggest could be discussed in this forum with a view to an interesting experiment to observe whether people might like to agree amongst themselves a set of Private Use Area encodings which, by the encoding becoming published on various websites, maybe other people will choose to use them and a workable set be achieved. I wonder if I may open the discussion by suggesting that of the approximately seventeen ligatures that are needed, a possibility would be to encode all of those that include a long s in the U+E750 through to U+E75F range and the others in the U+E707 through to U+E70F range. That would, from my initial list of possible ligatures be six in the range U+E707 through to U+E70F, leaving three unused code points, and eleven in the range U+E750 through to U+E75F, leaving five unused code points. This would enable some code points to exist for all of these ligatures, even though they are only in the Private Use Area and are non-exclusive definitions. The Unicode Consortium, by its own rules, will not endorse any allocations in the Private Use Area. If they become widely used, then that will provide good evidence for them to become promoted to regular Unicode status. Such promotion, which is in no way automatic, would, if it occurred, mean that new code point values would be assigned to the characters, it would not be a matter of saying that the allocation to U+E707 and so on were made into a regular Unicode code point, for that would be against the laid down rules for the Private Use Area. Using the Private Use Area is, however, a better choice than it might first appear, for, even if the Unicode Consortium immediately liked the idea of including Fraktur ligatures there would still be quite a time lag before the code points were allocated, so at least using the Private Use Area does have the advantage that if some of us discuss the idea in this Unicode discussion group for a few days then, by perhaps next Saturday, a list of code point allocations can be produced, posted in this discussion group and hopefully published on a few websites. As time goes on, web search engines will pick up the pages and so the allocations will be able to be found by anyone who looks up the word ligature on some of the major search engines. Another aspect is that this discussion list gets sent to people in many of the major organizations concerned with typography and computers. One never knows whether such a list produced by a few interested people in this newsgroup would be disregarded by major organizations or whether librarians would carefully print it out and put it into the organization's internal reference library. I have no direct evidence for it, yet my thoughts are that any such list will, in fact, without any public comment, be carefully filed by such librarians, just in case in a year or two someone asks the librarian if any code points for Fraktur ligatures or other ligatures are known to be in use. Voila, the list is produced! Now, certainly, the idea behind suggesting that U+E707 to U+E70F be used is that promotion to U+FB07 through to U+FB0F would be as straightforward as possible. The idea behind suggesting U+E750 through to U+E75F is also so that promotion to U+FB50 through to U+FB5F would be as straightforward as possible, so that all of the ligatures in the list that I am suggesting that could be produced by discussion in this newsgroup could possibly be promoted by adding the same constant to their code point. Now, as it happens, it is clear from the code charts that U+FB07 through to U+FB0F are presently unused, but I am unsure as to whether U+FB50 through to U+FB5F are being used or whether there is some possible other use in mind. Now, I am aware that the Unicode Consortium cannot endorse any particular Private Use Area code point and I am not suggesting that allocations in the Private Use Area effectively reserve space in regular Unicode for "promotion space" yet, as the possibility of trying to get these characters promoted at some stage clearly exists and people making a formal proposal can suggest code points in the proposal, could anyone say what uses, if any, presently exist or are under consideration within the range U+FB50 through to U+FBFF please, so that, if there would be a clash over a straightforward promotion, the suggested range of U+E750 though to U+E75F could be changed within the next few days thereby keeping open the possibility of a straightforward promotion by a standard offset rather than immediately having problems due to lack of forward planning. In fairness I add that there are issues about encoding ligatures where some people feel that the ligature should be signalled by having a code between two ordinary letters indicating that using a ligature is desired. That approach may well have its merits, yet I do feel that there is scope also to encode ligature characters separately as they can be very useful for encoding printed texts from long ago where one wishes to preserve the typography. Hopefully, people will wish to discuss the issues fully in this thread and everybody will end up having wide knowledge of the topic. However, I do feel that these ligature characters should be encoded. It need only take a little time over a few days and hopefully a result with long lasting benefits will be achieved. William Overington 21 May 2002

