Re: [twdev] three questions

Jeremy Ruston Thu, 07 Jul 2011 15:12:34 -0700

> I'm writing a tiddlyWiki parser as part of an exercise try and
> understand how the tiddlyWiki works.


Awesome. I like the way that you've been approaching TiddlyWiki from a
formal perspective; its history has been influenced by its early
development as an experiment, and then its rapid mass adoption quite
early on as a practical tool. It has rather constrained our ability to
change some of the less desirable fundamentals, and meant that the
definitive specification is all too frequently the source code.

> I have a few questions:

> 1) What is a wikiword?- I can find no definition. Where is the
> definitive definition?

The definitive definition is a regular expression the source code, starting at:

https://github.com/TiddlyWiki/tiddlywiki/blob/master/js/Config.js#L179

The important bit is:

config.textPrimitives.wikiLink = "(?:(?:" +
config.textPrimitives.upperLetter + "+" +
        config.textPrimitives.lowerLetter + "+" +
        config.textPrimitives.upperLetter +
        config.textPrimitives.anyLetter + "*)|(?:" +
        config.textPrimitives.upperLetter + "{2,}" +
        config.textPrimitives.lowerLetter + "+))";

Confusingly, lowerLetter includes the digits and dashes, as does anyLetter.

> My tiddly wiki seems to think  that UTF-8 is a wikiword (since it's
> written ~UTF-8) - my parser
> finds a ~ then expects a wikiword but UTF-8 is not by the definitions
> I have found a wikiword
> ie its not CamelCase.

See above.

> 2) What is escaped text. As in """ escaped text """
> Is there any difference between """ ... """ (three consecutive double
> quotes) and {{{ ... }}}

The purpose of escaped text is to suppress wikification without
applying the monospaced formatting applied by {{{....}}}.

As you can see from the code, <nowiki>...</nowiki> can also be used to
escape text.

https://github.com/TiddlyWiki/tiddlywiki/blob/master/js/Formatter.js#L504

> 3) What is supposed to happen when markup elements are incorrectly nested?

The wikifier generally treats the conflicting formatting as if it were
plain text, but frequently finds a confusingly optimistic way of
interpreting broken markup.

> I tried a tiddly page containing this:
>
> * --abc//def--hij//
> * ghi
> * 123

In this case, the first double dash triggers strikethrough formatting.
Then the double slash triggers italics.

The second double dash isn't interpreted as the end of the
strikethrough formatting because the wikifier is still waiting for the
italics to be terminated first. Instead, it interprets the second
double dash as the start of a new run of strike through text, nested
inside the first. Of course, there's no visual difference.

By the time the second double slash is encountered, then, the same
thing happens: the wikifier interprets it as the start of a new,
nested run of italic text.

All of which is why the second and third lines are rendered with
italics and strikethrough.

Finally, the reason that the second and third lines are indented is
that the wikifier is looking for the italics and strikethrough to end
before it will recognise the next sibling list item. It's probably a
bug that it therefore falls back to interpreting the subsequent list
items as child items.

The reason that the wikifier behaves in this unexpected way is that it
is structured very simply: each individual formatter (ie element of
wiki syntax) is modelled as a regexp that triggers it, and a function
that processes it, usually with the help of another regular
expression. The idea was to make it easy to add new formatters without
disturbing the existing ones.

> Now there is no notion of "incorrect markup" ie parsing cannot fail
> with a syntax error (or can it?)

No, that is correct, parsing cannot fail; the goal in error conditions
is just to fail visibly, and emit as much readable text as possible.

> unfortunately what is "sensable" in the event of badly nested markup
> is debatable. This bodes
> ill for standardization and wide scale adoption.

Hopefully we can fix the most annoying problems.

> In most html parsers an incorrect markup does not propagate beyond the
> scope of the
> current block (given some definition of a block)

Yes, that would be much more useful.

> Consider this:
>
> * abc//def
> * ghi//
> * 123
>
> The result is weird this generates three levels of list indents - all italic
>
> In my mind all "open" markup should be closed at the end of each superior 
> block.

I think that this could be fixed - as discussed elsewhere, we're very
interested at Osmosoft in evolving a tiddly markup mark 2.

> Comments: 1) this yeilds well-formed XHTML 2) a bit more work could make this
> well-formed and retain what I "expected"

I suspect that the broad user community would favour well-formed XHTML
for these cases, but perhaps we can still make the mis-nesting
behavour less unexpected.

Best wishes

jeremy

> Cheers
>
> /Joe
>
> --
> You received this message because you are subscribed to the Google Groups 
> "TiddlyWikiDev" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/tiddlywikidev?hl=en.
>
>



-- 
Jeremy Ruston
mailto:[email protected]
http://www.tiddlywiki.com

-- 
You received this message because you are subscribed to the Google Groups 
"TiddlyWikiDev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tiddlywikidev?hl=en.

Re: [twdev] three questions

Reply via email to