On Wed, Aug 18, 2010 at 08:11:06AM +1000, Ross Moore wrote: > Hi Khaled and Michiel, > > On 18/08/2010, at 6:58 AM, Khaled Hosny wrote: > > > On Tue, Aug 17, 2010 at 01:16:02PM -0700, Michiel Kamermans wrote: > >> Khaled, > >> > >>> AFAIK, epup is just a subset of xhtml with a subset of css2, so IMO not a > >>> kind of output format that is very well suited for TeX (well, I hardly > >>> consider html an output format at all, the output is what the browser > >>> renders out of it). > > >> For print media the epub format is, of course, nonsense. Hence the > >> desire for parallel format generation. > > > > I understand the benefits of EPUB, what I don't understand is the need > > for TeX at all. > > To me the problem is not about using TeX for formatting, > it is about obtaining different output formats from > the same (La)TeX sources --- especially when math formulas, > and other 2-dimensional layouts, are involved. > > Since ePub, and similar, are XML- or XHTML-based, you want the > detailed structure of the tagging to be produced automatically, > without having to make edits on each output result, to "get it right". > You want to enter your information in just one place, in a language > that the author already understands and can use effectively. > Software should then do the rest, modulo possible minor tweaking > at the end.
If that is the case, I wouldn't start with TeX as input format, but with some thing else easier to parse with 3rd party tools to get different output formats. XML is the preferred by industry, and there are structural XML based formats like DocBook with tools to convert it to many output formats including HTML and LaTeX or even EPUB. However, If I'm to do that myself, I'd even try something much simpler like Markdown. > This is not just simply a matter of redefining macros, because the > structure rules for the markup can be quite different for different > output formats. So some kind of knowledge about what macros are being > used for, and what kinds of things will follow after, is required > of any translation software. > Since LaTeX, processing to PDF as a major form of output, figures > to be the comfortable input format, this is desirable for encoding > the author's work --- though some may say it ought to be in XML. > > And since TeX already understands the expansion of macros and their > arguments, it is attractive to want to use it as a starting point > for generating other formats; but certainly it cannot be the > whole shebang. Trying to parse TeX input is something that I'd not try to do in my right mind, but others have did that, PlasTeX seems to work nicely and generates clean HTML. But since you loose all the visual formating of TeX, the remaining structural formating is not worth the trouble, you can get with more parser friendly formats. > For instance, in my work for Tagged PDF, an XML version will be able > to be exported (using Adobe Acrobat Pro) from the complete PDF. > Mathematics will be fully tagged as MathML, in this view. > Other PDF readers may only see the rendered pages, but others may > be able to use the tagging to extract an alternative view suitable > to their own display screen. > > > (X)HTML is dynamic by nature, you should be able to > > resize or change text size and the layout will re-flow, forcing a rigid, > > box based layout that is a direct translation of TeX output just does > > not make much sense to me. > > I agree that it is not the TeX *output* that needs to be further > processed, but the input source --- or something intermediate > that can be generated and written to a file as a by-product > of LaTeX processing, with extra packages loaded to achieve this. > > TeX4Ht works by putting extra information into the .dvi file, > to encode the required tagging. An extra post-processor is required > to extract this information, producing HTML or XML or whatever. > That is very similar to what I do for Tagged PDF, where the > extra post-processor is Acrobat Pro. This is even more flexible > than TeX4HT, since Acrobat can export into a range of formats, > whereas TeX4ht only produces the format that was specified when > the .dvi was being created. As I wrote above, if it is about the structural formating, then it does not worth the trouble, it can be achieved with almost every tool and document format out there (even office suits can build structured documents). It is visual, the precise output, where TeX excels which is totally lost during such conversions. This can be useful, however, if one have existing TeX material that need to be processed to other output format, though one can still argue that converting it ones to some sort of XML is much better long term plan. Don't get me wrong, I like TeX syntax and find it more easier to author with than many other markups, but I accept that it does not fit every need. Regards, Khaled -- Khaled Hosny Arabic localiser and member of Arabeyes.org team Free font developer -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
