Sorry to revive this topic, but I think I've found a solution. The original post described a problem when using the rare ligatures (e.g. "fty") in the Junicode font, in that the strings could not be found by their decomposed characters. At the time, it was suggested the /ActualText PDF feature would be useful, but no implementation was given.
I'll save the details for how I stumbled onto the solution for another time, but here's the result: There are two ways about this: font encoding and text mapping. If you have any Adobe OpenType fonts, you might have noticed that the ffi and ffl ligatures can be copied from a PDF intact, but the fi and fl ligatures will show up as ??. On the other hand, if you use Latin Modern, you will not encounter any problem of the sort. This is because the font tables in LM were done properly. If your font does not have the proper tables, you can supplement them with a TECkit mapping, which are quite powerful. (I posted in Sept '09 about using them for Inuktitut syllabary-romanization conversion, and I've also used them for Persian script-transliteration conversion.) You've probably used Mapping=tex-text at some point, and the solution I'm proposing requires you to just add a couple of lines to the tex-text.map file and compile it (you may wish to make a copy and make changes to that). When you open the tex-text.map file (in \fonts\misc\xetex\fontmapping for miktex portable), you'll see mappings from individual characters to composed unicode glyphs, for example: ; ligatures from Knuth's original CMR fonts U+002D U+002D <> U+2013 ; -- -> en dash U+002D U+002D U+002D <> U+2014 ; --- -> em dash In order to make the common f/ff ligatures searchable in PDFs, add the following lines and compile the map file with teckit_compile (should be in the bin folder): U+0066 U+0066 <> U+FB00 ; ff -> ff ligature U+0066 U+0069 <> U+FB01 ; fi -> fi ligature U+0066 U+006C <> U+FB02 ; fl -> fl ligature U+0066 U+0066 U+0069 <> U+FB03 ; ffi -> ffi ligature U+0066 U+0066 U+006C <> U+FB04 ; ffl -> ffl ligature I've attached such a map file and the resulting tec file for those who aren't interested in the nitty-gritty. Simply drop these into the fonts\misc\xetex\fontmapping folder and run texhash/mktexlsr. BTW, when you use this teckit mapping for ligatures, it bypasses the OpenType ligature setting, i.e. you can't turn them off unless you use a different mapping. And it won't check to see if your font has the required glyphs. However, it does allow you to easily access ligatures in fonts that don't have an OT ligature table (e.g. Times New Roman and Georgia, which is why I made the map file in the first place). Hope someone will find this useful. -Andy Lin
tex-text-ms.map
Description: Binary data
tex-text-ms.tec
Description: Binary data
-------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
