Re: [Pharo-dev] Improving the documentation model

Dmitri Zagidulin Tue, 21 Apr 2015 09:19:28 -0700

I strongly believe that the WYSIWYG editing route is a fundamentally worse
approach for documentation (and textbooks) than text-based formatting such
as Pillar, Markdown and Asciidoc.

Specifically, it will result in less community contribution, and it will
make distributed version control of documents much harder.

That said, I will support whatever documentation format or tech stack that
the community adapts. Any documentation is better than none, regardless of
the underlying details. (Though I will always advocate for text-based
formats).

Instead of going the route that you propose (which essentially attempts to
Word or Google Docs in-image), I think we should:

* Extend Pillar. With a few more features, it can be on par with Markdown
and Asciidoc, and then eventually surpass it (and have nice Pharo-specific
features like detection and unit-testing of Pharo code blocks, etc).

* Invest in instant-preview tech, in-image. This is similar to what
PillarHub does, for example, by using the Ace online editor. Take a look at
the first screenshot in: http://pillarhub.pharocloud.com/hub/pillarhub/about
Side-by-side instant preview makes it possible to have the best of both
worlds, text-based markup and WYSIWYG, without the typical WYSIWYG
drawbacks (of making distributed version control difficult, first and
foremost).

I will attempt to explain my reasoning, both for text-based markup, and
against heavier WYSIWYG approaches.

1) With documentation (and textbooks), the number one goal and number one
virtue is: make it easy for people to contibute. This means two things --
the simplicity of the format (this is where LaTeX fails), and the ease of
distributed version control (merging, pull requests, reviews and commentary
on contributions).

Look at the explosion in community-generated documentation and content
(READMEs on repos, the entirety of Wikipedia) that has resulted from
*making it easy for people to contribute and edit collaboratively*.

2) Version control, especially with more than one or two collaborators, is
a *nightmare* with WYSIWYG tools. Look at what the state of the art is, at
what Microsoft's Word and Google's Docs have been able to accomplish, in
terms of revision control.

Despite unimaginable amounts of person-hours of development put into it,
it's pretty much unusable (I speak from extensive personal experience, both
from collaborating on technical and business documents using Word and Docs,
and from seeing my wife and her friends (who are professional authors)
struggle with Word's revision control systems while working with their
publishers).

We are not Microsoft or Google. We are not going to solve the WYSIWYG
source control / version control problem better than they are. We need to
focus where we spend our efforts. In contrast, text-based markup version
control is a solved problem.

3) The convenience of WYSIWYG can be provided with side-by-side instant
preview (again, see PillarHub and the numerous WYSIWYG instant-previews in
Markdown editors).

4) The ability to render source text-based markup into multiple formats
(PDF, HTML, etc), is *essential*. Going from WYSIWYG to HTML is impossible
(all attempts to do so, by Microsoft, Adobe, etc, have utterly failed).
Whereas going from text-based to print/PDF is very doable (see LaTeX, the
entirety of HTML ecosystem, Pillar, etc).
This is a serious problem that FrameMaker never had to solve.

5) Text-based markup is not primitive. You mention a comparison to HTML
1.0. This is apt, but in the opposite direction than you intend. 1.0 may
have been primitive. But it has evolved into HTML 5, which not only has
many semantics level content features, but is expressive enough that pretty
much all UIs are moving to it (operating systems, desktop app suites,
mobile devices, etc).

Pillar may be in a primitive state right now, but it already has some
decent semantics-level capability, and can have a lot more with
considerably less effort than it would take to evolve WYSIWYG tools.

And actually, the WYSIWYG approach is much closer to HTML 1.0 (in the sense
that, users have to indicate *semantic* intentions like emphasis by
selecting different fonts, versus something like HTML 4/5, where the actual
intention is declared (EMPH tags, QUOTE tags, etc).

6) You mention the two standards in document writing (Word and LaTeX), and
the drawbacks to each. I completely agree there. There is a third option,
however, on which the world of open-source community technical writing is
standardizing. And it involves text-based markup languages:

* Markdown (in the form of https://www.gitbook.com/ )

* Asciidoc (more specifically, the Asciidoctor http://asciidoctor.org/ text
processor and publishing toolchain). Take a look at
https://medium.com/@chacon/living-the-future-of-technical-writing-2f368bd0a272
for example.

* Pillar (that's us).

7) Text-based markup formats (aside from LaTeX) actually possess all of the
desired features that you mention, that are required to write book-length
technical documentations. Something like Asciidoc already possesses them,
and Pillar has most of them (and can be extended to have the rest).
Let's take a look at some of them:

* High-level publishing-centered semantic abstractions. In other words,
both the concept of book sections, chapters, chapter sections, paragraphs,
figures, etc, as well as the ability to compose a larger document out of
smaller named documents:
 - Pillar and Asciidoc have chapters, sections, paragraphs, figures and
named scripts/code blocks.
- Asciidoc has the ability to do file imports (compose a larger document
out of smaller docs)

* Links. Asciidoc has semantic links both within a document, and across
different Asciidoc documents (references to chapters, sections, etc).
Pillar has within-document links, and inter-doc links are on the roadmap.

* The ability to drop down to a more expressive markup (LaTeX or HTML). For
heavier-duty features like formulas and equations, all of the simple markup
languages allow the author to drop down to LaTeX and lay out formulas to
their heart's content.

8) Text-based markup formats are much easier to both extend (add new
semantic tags, etc) and to machine-process (parse, apply macros, etc) than
WYSIWYG formats.

So, in summary:
- Text-based markup languages result in a lot more technical docs being
written
- Pillar can match or beat any of the WYSIWYG editor features, with not too
much time and effort investment.

On Tue, Apr 21, 2015 at 9:11 AM, stephan <[email protected]> wrote:

>  TL;DR: Some roadmap ideas. Looks like a lot of work.
> Comments and improvements welcome:
> We should replace the Pillar document format
> by a better one, suitable for WYSIWYG editing and
> creating long documents.
>
> ---
>
> The current documentation model for Pharo is Pillar.
> Pillar is the document model from the Pier CMS and
> provides exports to (a.o) html and LaTeX. It is a
> simplified form of the LaTeX document model
> without a WYSIWYG UI.
>
> In the research world two documentation systems
> dominate: LaTeX and Word. Word and its clones
> dominate areas where ease of use for small papers
> without maths are important, LaTeX the other fields.
>
> From personal experience I know that the lack of
> abstraction in Word and clones makes it very expensive
> to create large, consistently formatted documents.
> In addition, the typographical quality of the resulting
> documents is much lower than that achievable with
> LaTeX.
>
> On the other hand, repurposing LaTeX to generate
> anything other than PDF/paper documentation is
> difficult because of the underlying language that
> LaTeX is written in, and there is no easy to use
> WYSIWYG UI for LaTeX.
>
> It pains me to see the return of text based formatting
> with primitive formats like markdown. At least in LaTeX
> you can preserve semantics level content, in markdown
> we are back at html 1.0.
>
> The program I liked best for creating longer documents
> was Framemaker. That provided the needed abstractions
> in an efficient WYSIWYG UI. Framemaker was sold
> from 1986, so the performance of current hardware
> should be enough to run something similar in smalltalk.
> I used versions 5.5 and 6, and had to abandon it when
> Adobe stopped development and it was never migrated
> from PowerPC.
>
> Framemaker was fast enough to create books with
> hundreds and even thousands of pages. It had working
> versions of the long document features Word claimed to
> have.
>
> With Athens and TxText we now have low level
> abstractions for dealing with cursor and selection,  fonts,
> rendering glyphs and having both on-screen and
> PDF output.
>
> On top of TxText we could add a model somewhat like
> the attached figure
> [image: UML diagram of document structure]
> A book consists of a number of named documents.
> This is essential for dealing with longer material, as
> in a wysiwyg system we want to avoid having to re-layout
> too much after a key is pressed. Across documents we only
> need to remember the starting page/section numbers.
>
> Each document consists of pages. On a page there can be fixed
> content and content that is dependent on the text flow.
> Most pages of a document have a similar layout, so each page
> refers to a masterpage that defines the default content.
> A document can have separate masterpages for
> first, left and right pages, and rotated or extra large ones.
> A masterpage can define fixed items and calculated ones
> (pagenumbers and current chapter). A textframedefinition
> describes the textframes and the textflow for each textframe.
>
> The text (and other in-line contents) of the document are stored
> in paragraphs, which are stored in textflows.
> The paragraphstyle of a paragraph knows how to layout it in
> a textframe, and how to deal with the end of a textframe.
> The paragraphstyle knows how to paginate, how to number
> or provide other autotext at the beginning of a paragraph and
> if the paragraph text should be part of a table of contents.
> A textframe is  a (rectangular) area on a page.
> The characterstyle of a paragraph is responsible for the font family,
> size and style. The characterstyle can be overridden
> by a specific paragraph or by a textrange.
>
> With a model like this (and adding maths, tables, notes, figures
> and references) we should be able to use Pharo to create both
> high-quality documentation, and write research articles
> (and books) in-image.
>
>  Stephan
>
>
>
>
>

Re: [Pharo-dev] Improving the documentation model

Reply via email to