Re: [whatwg] Mathematics in HTML5

James Graham Tue, 06 Jun 2006 07:49:40 -0700

[EMAIL PROTECTED] wrote:

James Graham wrote:

I could go on but
at least in academic  fields, LaTeX is either the only format accepted
for publication or the  preferred format.


In mathematics, and theoretical physics sure, in rest of science? I doubt.
In chemistry, LaTeX is not preferred for example.


Not just in theoretical physics, but in all varieties of physics that I have
ever encountered. Nor, as far as I can tell, is th widespread use of LaTeX just
limited to the mathematics and physics communities. It is also, for example, one
of four accepted submission formats of the Royal Society of Chemistry (Word,
Wordperfect, RFT, (LaTeX), the only format accepted by Electronic Notes in
Theoretical Computer Science and the only acceptable format for IEEE
Transactions On Wireless Communication. In general, Googling for these
examples, I was unable to find a single print journal which accepted
electronic submissions but did not accept LaTeX as a format. Indeed, it
is the _only_ hand-authored format accepted by the journals I encountered on my
brief search, except for one online-only robotics journal which published
in HTML and accepted submissions in HTML. Even in that case, the submissions

page is quick to suggest a LaTeX to HTML workflow, implying that engineers areanother group who often work with LaTeX, a speculation lent credence by

http://www.eng.cam.ac.uk/help/tpl/textprocessing/ which contains an extensive
set of notes for engineers on using LaTeX and begins "TeX is a powerful
text processing language and is the required format for some periodicals
now").

Of course using Google to turn up a few journals hardly makes for a good sampleand you can no doubt provide counter-examples but it is extremely disingenuousto suggest that only pure mathematicians and a small subset of physicistscommonly use LaTeX - it is clearly in very widespread use wherever mathematicalcommunication is required.

The key is that you learn any new tool when it is useful and solves
problems. TeX-LaTeX solves a minimum subset of problems of real life and
reason is not popular except in some academic communities. The only really
good point of TeX-LaTeX systems is on mathematical typesetting; textual,
graphical, diagrams, and others items are best done with different systems
and approaches.


Ah. That would be called "doing one thing and doing it well". I've heard
that it's commonly believed to be a good design principle. In this case,
the problem I would like to solve is "how do we typeset mathematics on
the internet so that people actually use the technology rather than
ignoring it into oblivion"? We've already determined that LaTeX solves
the same problem offline so it seems like a reasonable place to start
when addressing the question for online publishing.

You may think I
am overstating this but I disagree - bear in mind  that a significant
fraction of astronomical (chosen merely because it is the  field I know
best) software is written in Fortran 77. For many of these people
almost 30 years of language design has never happened.


If Fortran 77 fulfills the needs they have no reason for the change but if
it does not fulfill then they will adopt Fortran 90, or C++, or Java, or
Maple, or anything else.

Technically, all the languages you've suggested are clearly better than Fortran77. They don't have irritating limitations like fixed column numbers. They have_very_ useful features like dynamically allocatable data structures. It wouldmake many people's lives better to migrate away from these languages. But theydon't - because they are in the business of doing research, not learning newtechnology - so they are always in a metastable state which perhaps doesn'tprovide the most long term benefit but does work well at any given moment. (Ofcourse some people embrace new technology, particularly if it is relatively easyto use. But don't be fooled into thinking that people will use new technologiesjust because they are in some global sense "better" than the ones they arefamiliar with, particularly if there is no easy path from here to there).

There are old academicians still using ordinary mail for communicating
with colleagues. Is this an argument against e-mail or when designing a
new communication model would we think in a subset of guys loving ordinary
mail?


Well it maps pretty well to ordinary mail. For example an email address
like [EMAIL PROTECTED] corresponds to the addressing format
commonly used in ordinary mail (starts with the name, becomes more
general toward the end). But more importantly, there are a number of
immediately obvious and tangible benefits to email. In particular the
fact that it is near instant. I don't see anything in your proposals
that offers anything like the same level of obvious and tangible benefits.

I always am perplexed of double measurement scale of TeX-people. They
rudely critique mathematical typesetting of programs such as MSWord.


I'm not a "TeX-person", merely a LaTeX user and, in the context of this
discussion, my "pro-LaTeX" stance is merely a practical one; I have come
at it by considering the needs of the audience, not through a desire to
advocate one particular technology. Nor have I mentioned MSWord, except

as an accepted format for submission to some academic journals. Indeed, ifanything I am quite an anti-LaTeX person - I would never consider using it for aposter or slide presentation for example. I have, however, used LaTeX to createthe equations for a poster and embedded the resulting postscript into anothrpackage. That is closer to the level of interaction I am advocating).

However, most of web pages generated from TeX-LaTeX systems are really
unprofessional even at that small subset of static and boring academic
webpages.


Indeed. But there are two main reasons for that:
1) latex2html sucks

2) Academics have no interest in learning any language other than LaTeX
(did I say that already?). They have to use LaTeX to prepare documents
for publication, it is the only language they know for typesetting
mathematics and, in general, the web is not their major target medium.
LaTeX generated websites tend to be html representations of lecture
notes or papers that are primarily designed for consumption in paper or
PDF formats. So the html version only exists at all because it is
relatively little effort to produce it in addition to the main
publication format. When that is not the case, there will simply be no
html version provided.

People abandoned TeX-LaTeX in favor of best approaches in many places.

Where? In no journal I could find. If you mean publishers, for archival, that isirrelevant because, on the web, most content is created by individuals who arenot publishers by profession. The tools suitable for the two groups are quitedifferent.

Some weeks ago I received a draft of manuscript prepared by a
mathematician and will probably be published in MSOR journal in brief. He
is not using TeX or LateX because limitations and write:

<blockquote>
Mathematicians have been served well by TeX and LaTeX for their
mathematical typesetting. Too well, perhaps. At least, if an dedicated
TeXnician of the last
ten years has a chance to \relax and look about himself he will see that
the rest
of the world has moved on in several incompatible ways to the cosy world of
TeX.
</blockquote>

So one person contacted you and made a comment, which has no substantial contentI can discern? What's your point? Who are the rest of the world are where arethey? Why should I listen to this person? For comparison I did a straw poll oftwo people who I work with, asking "will astronomers ever be prepared to learnlanguages other than LaTeX for typesetting mathematics?" they both answered"no". But I don't think it's really meaningful enough to talk about.

This is why the web is liberally
sprinkled with the ugly gif output of  latex2html. If we want this
situation to change, the _only_ solution is to allow  LaTeX as a
document creation format.


For creation of unprofessional webpages or electronic documents? Okay.
Somewhat as anyone can create low quality webpages using “save as” in
MSWord, but if you want professional webpages then MSWord is not the
correct tool. Similar thoughts apply to TeX-LaTeX.


That doesn't follow at all. For example Google are successful in making
excellent HTML+js applications starting from Java. If I write a program
in C it's likely to be much better than an equivalent one I write in
assembler. Writing a document in Docbook and converting to postscript is
much easier than writing in postscript directly. Computing is full of
examples of people writing in one language and transforming to something
else for consumption. I am merely stating that for any meaningful
adoption of our chosen output format, it must be compatible with the
chosen high-level format of the majority of research scientists - LaTeX.

As an exercise let me comment ITeX output in one of your pages. I will not
review your web page “I'll go and play with words and pictures”, and I
will say nothing on the quality of the rest of web design not in its
typesetting.


(Nice job on the subtle implication that because my webpages won't win
any awards for beauty I have no business in a technical discussion, by
the way. In return I won't mention any of the dubious content on your
webpages :) )

You begin from an IteX source (a dialect of LaTeX) and next present the
MathML output generated. Then you claim

<blockquote>
It's pretty clear which version is easier to enter, read and maintain.
</blockquote>

Well. It is clear that IteX is easier to enter and read than MathML. But
if use this as an argument in favor of IteX then let me say that ASCIIMath
is still easier to enter and read. Therefore if easiner reallt matter one
would discard IteX and other Tex-LaTeX approaches.


But is ASCIIMath so expressive? It certainly isn't so widely known.
Therefore it won't be so widely adopted. I seem to have to keep
repeating this point that compatibility with existing technology is
important. The existing technology in the field of mathematical authoring is 
LaTeX.

However, IteX is not easier to maintain. If you are looking for basic
unprofessional encoding of mathematical formulae, then IteX is okay, but
if you are looking for professional encoding of formulae, IteX is not good
enough and this will obligate to you to learn CSS, XSL-FO, and p-MathML
for fine-tuning and maybe DOM, Javascript, or c-MathML (or even OpenMath)
if you want add interactivity and semantics to your encoding.


No professional I know wants to do that though. They just want to

present mathematical equations in a sane way. GIFs are not a particularly saneway - they are ugly and so not scale with the text on the page but, despitethis, the evidence of the web suggests that they are the best we currently have.MathML is not sane - it is too hard to author. ITeX, though far form perfect, ismuch much better.


[lots of irrelevant junk about the itex2mathml output]

Another point of disappointment is in the encoding of the differential.
The differential is encoded as a simple variable d. There exist special
entities defined in MathML DTD and also special Unicode fonts and the true
is those special character were designed with accessibility in mind.
Still, if by some reason the author wan not use the special differential
character, one can easily see that differential is not and variable or
identifier but a operator. Therefore, <mo>d</mo> is more accurate. The
same error appears in the other integral.


First of all, that's a false argument. I could just have easily gone
through my character map, found the differential-d and used that in the
formula when writing the ITeX as I could when writing the MathML. The

problem is, in absolute terms it's much harder than just writing "d". This is abig problem I have with any solution that requires the extensive use ofcharacters not found on a standard US keyboard. The single best idea I've seenin this entire discussion is the text-transform:math* properties for CSS.


Now consider: what extra information would we gain by going though our
character map application until we find the right codepoint to express
the d operator? Presumably in each case a visual UA will display almost
exactly the same thing. An aural UA will probably read out "d x" (or
whatever) in either case, in the same way a human would. I guess a
hypothetical computer algebra package that can accept input from the web
might get confused but that seems like such a marginal case that it's
hardly worth optimizing for with the price of damaging language adoption.

The code is how we can see very deficient even ignoring accessibility
issues. Note that vectorial quantities are rendered in italic bold font.
Many authors and some journals prefer roman font for vectors. Imagine you
have 5 electronic documents containing 10 equations each one. Either you
learn MathML (and then you are obligated to study three or four language
even for simplest tasks) and modify by hand the 50 equations or either you
modify the IteX source. Since the IteX source is presentational, you would
change each \mathbf in the 50 equations (even using a macro or an
automated search and replace the task wastes time).


Of course, as an author, one can improve this in TeX by writing a single
\vec macro that changes the formatting to a vector style. Then it is a
simple matter to change vector formatting everywhere with a single
simple change to the macro definition. So, if one wanted to make life
easy for LaTeX authors who envisioned targeting the web, one could
provide a package that would add some mapping onto the more semantic
constructs of the target language. But the majority of authors will have
legacy content that does not use these features and it must be possible
to convert that legacy content to the new output format if you want it
to gain any traction, even if the content produced is not so suitable
for wholesale style changes at a later date (which is a feature that
authors have lived without for years).

How do encode this example in HTML-Math? Well, that may be debated here
but a workling possibility could be (I use MathML entities by commodity,
they could be substituted by Unicode)


[lots of musings about language design]

Note that designing a markup language that can represent maths is
trivial by comparison to the task of making people use that language. My
point throughout is that if you want people to use the language then
backwards compatibility is key.

And what if I send a document? Would I send the source? The
final HTML? Both?


It depends who you're sending it to, and for what purpose, obviously. If
you were sending it to a coworker for editing, you would send the
original source. If you've done something fancy manipulating the DOM of
the final output, you would have to send that. It's no different to
LaTeX-postscript (or any other conversion process) - in 99% of cases where the
postscript can be regenerated from the original source, you edit the LaTeX,
in the 1% of cases where you manually edited the postscript file, you'll
have to work with that from now on.

I see no reason for limiting capabilities of a web markup by
satisfying a subset of academicians who want not waste their time on

learning best markup languages.


I see no point in wasting time designing a document markup language that
will be roundly ignored by ~100% of the people creating content.

Somewhat as HTML was not designed with
LaTeX as a “document creation format” in mind but was derived from solid
and sophisticated SGML


But HTML succeeded for 2 reasons:
1) It was simple (consider the relative semantics on offer with Docbook,
for example, and the relative popularity of each)
2) It wasn't SGML. At least not for long. Browsers brought an
unprecedented ease of authoring to HTML. Sure, it has come back to bite us
now, but the fact that you could send almost any garbage to a browser
and get something rendered on the screen made HTML accessible to people
who wouldn't otherwise have been authors.

I should say that, as far as I can tell, using LaTeX as the input
language isn't  the accessibility disaster that you make out.


you? Have you noted that LaTeX was ignored by Maple, Mathematica, ISO
12083, EuroMath, MathML, OpenMath...


Yeah and look at how many authors are using those to create content
(note: the primary function of Maple and Mathematica is computer
algebra, not document creation). They may be used by big publishers, I
don't know, but that's utterly irrelevant. The web is primarily a self-published
medium and so things have to be easy for individual authors. Big
publishers also use Docbook but that doesn't mean we should be trying to
use it on the web. Those creating mathematical content largely use
LaTeX. If our publishing solution is not designed so that LaTeX -> foo
converters produce good-looking output then the exercise will be as
futile as XHTML2 is looking to be.

Re: [whatwg] Mathematics in HTML5

Reply via email to