2011/11/17 Ross Moore <[email protected]>: > Hi Phil, > On 17/11/2011, at 23:53, Philip TAYLOR <[email protected]> wrote: > > Keith J. Schultz wrote: > > You mention in a later post that you do consider a space as a printable > character. > > This line should read as: > > You mention in a later post that you consider a space as a > non-printable character. > > No, I don't think of it as a "character" at all, when we are talking > about typeset output (as opposed to ASCII (or Unicode) input). > > This is fine, when all that you require of your output is that it be visible > on > a printed page. But modern communication media goes much beyond that. > A machine needs to be able to tell where words and lines end, reflowing > paragraphs when appropriate and able to produce a flat extraction of all the > text, perhaps also with some indication of the purpose of that text (e.g. by > structural tagging). > In short, what is output for one format should also be able to serve as > input for another. > Thus the space certainly does play the role of an output character - though > the presence of a gap in the positioning of visible letters may serve this > role in many, but not all, circumstances. > > Clearly > it is a character on input, but unless it generates a glyph in the > output stream (which TeX does not, for normal spaces) then it is not > a character (/qua/ character) on output but rather a formatting > instruction not dissimilar to (say) end-of-line. > > But a formatting instruction for one program cannot serve as reliable input > for another. > A heuristic is then needed, to attempt to infer that a programming > instruction must have been used, and guess what kind of instruction it might > have been. This is not 100% reliable, so is deprecated in modern methods of > data storage and document formats. > XML based formats use tagging, rather that programming instructions. This is > the modern way, which is used extensively for communicating data between > different software systems. > Yes, that's the point. The goal of TeX is nice typographical appearance. The goal of XML is easy data exchange. If I want to send structured data, I send XML, not PDF.
> ** Phil. > > TeX's strength is in its superior ability to position characters on the page > for maximum visual effect. This is done by producing detailed programming > instructions within the content stream of the PDF output. However, this is > not enough to meet the needs of formats such as EPUB, non-visual reading > software, archival formats, searchability, and other needs. > Tagged PDF can be viewed as Adobe's response to address these requirements > as an extension of the visual aspects of the PDF format. It is a direction > in which TeX can (and surely must) move, to stay relevant within the > publishing industry of the future. > > Hope this helps, > Ross > No, it does not help. Remember that tha last (almost) portable version of PDF is 1.2. If you are to open tagged PDF or even PDF with a toUnicode map or a colorspace other than RGB or CMYK in Acrobat Reader 3, it displays a fatal error and dies. I reported it to Adobe in March 2001 and they did nothing. I even reported another fatal bug in January 2001. I sent sample files but nothing happened, Adobe just stopped development of Acrobat Reader at buggy version 3 for some operating systems. Why do you so much rely on Adobe? When exchanging structured documents I will always do it in XML and never create tagged PDF because I know that some users will be unable to read them by Adobe Acrobat Reader. I do not wish to make them dependent on ghostscript and similar tools. > > -------------------------------------------------- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > > -- Zdeněk Wagner http://hroch486.icpf.cas.cz/wagner/ http://icebearsoft.euweb.cz -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
