On Fri, 18 Nov 2011 13:52:56 +0100, Zdenek Wagner <[email protected]> wrote: > 2011/11/18 Philip TAYLOR <[email protected]>: >> Is it safe to assume that these "code listings" >> are restricted to the ASCII character set ? If >> so, yes, spaces are likely to be a problem, but >> if the code listing can also include ligature- >> digraphs, then these are likely to prove even >> more problematic. >> > If the code listing is typeset in a fixed width font, it is usually no > problem. I copied a few code samples from books in PDF, most of them > were typeset by TeX. If I want to copy text in Devanagari, it is > almost impossible.
Besides TeX, Dr. Knuth also invented Literate Programming. In our own project, we use LP to extract the code listings from the original source code, rather than from the PDF. One advantage is that in addition to the re-ordering at the character level (mentioned in part of Zdenek's email that I didn't copy over), this allows re-ordering at any arbitrary level, even entire sections of program code. (We happen to be using XML to contain the source of both our text and our programming language constructs, but that's a different issue.) I agree that it would be nice to be able to reliably copy Unicode text from the PDF, but (a) that issue isn't confined to program listings, and (b) that would only solve the character ordering part of the problem. Mike Maxwell -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
