Hi Joe, On 30/12/2013, at 8:12 AM, Joe Corneli wrote:
> This answer talks about how to turn off litgatures: > http://tex.stackexchange.com/a/5419/4357 > > Is there a way to turn off *all* special characters (e.g. small caps) > and just get ASCII characters in the copy-and-paste level of the PDF? In short, no! — because this is against the idea of making more use of Unicode, across all computing platforms. Certainly a ligature can have an /ActualText replacement consisting of the separate characters, but this requires the PDF producer to have supplied this within the PDF, as it is being generated. I've played a lot with this kind of thing, and think that this is the wrong approach. One should use /ActualText to provide the correct Unicode replacement, when one exists. Thus one can extract textual information reliably, even when the PDF uses legacy fonts that may not contain a /ToUnicode resource, or if that resource is inadequate in special situations. Besides, do you really mean *all* special characters? What about simple symbols like: ß∑∂√∫Ω and all the other myriad foreign/accented letters and mathematical symbols? If you want these to Copy/Paste as TeX coding (\beta \Sum \delta \sqrt etc.) within documents that you write yourself, then I wrote a package called mmap where this is an option for the original Computer Modern fonts. Alternatively, a PDF reader might supply a filtering mode that converts the ligatures back to separate characters. Then the user ought to be able to choose whether or not to use this filter. I don't know of any that actually do this. (In any case, you would want such a tool to allow you to specify which characters to replace, and which to preserve.) Your best option is surely to (get someone else to) write such a filter that meets your needs, and use it to post-process the text extracted via Copy/Paste or with other text-extraction tools. Of course this is no use if your aim is to create documents for which others get the desired result via Copy/Paste. For this, the /ActualText approach is what you need. Hope this helps, Ross ------------------------------------------------------------------------ Ross Moore [email protected] Mathematics Department office: E7A-206 Macquarie University tel: +61 (0)2 9850 8955 Sydney, Australia 2109 fax: +61 (0)2 9850 8114 ------------------------------------------------------------------------
<<inline: logo.png>>
-------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
