Re: [twdev] three questions

Arlen Beiler Fri, 08 Jul 2011 04:59:26 -0700

For your information, TiddlyWiki has no concept of paragraphs, as you may
have noticed, it simply adds a line break each time you hit enter, which is
why I suggested closing tags at line breaks if they aren't closed. Maybe you
meant the same thing, but I thought I would point it out.


On Fri, Jul 8, 2011 at 5:37 AM, Joe Armstrong <[email protected]> wrote:

> On Fri, Jul 8, 2011 at 12:12 AM, Jeremy Ruston <[email protected]>
> wrote:
> >> I'm writing a tiddlyWiki parser as part of an exercise try and
> >> understand how the tiddlyWiki works.
> >
> > Awesome. I like the way that you've been approaching TiddlyWiki from a
> > formal perspective;
>
> I'm not too worried about the formal bit - I just want to tie down the
> semantics
> of wikitext so strictly that it can implemented in multiple languages with
> equivalent results.
>
> When I see bit of program code (ie wikitext) I want to know exactly
> what it expands into, without running a program to see.
>
> The help files as they stand are fine for a user, but not an implementor.
>
> I have no idea of the scope of the constructs.
>
> What does
>
> | @@a | b@@| mean?
>
> Is this table with one cell with emphasised content.
> Or two cells with the emphasis flag propagating over the cell boundary?
>
> I have to do small experiments to find out. Answer "@@" binds tighter than
> "|" so it's one cell
>
> I have a feeling that the use of regular expressions greatly hinders
> writing a parser - regexps are not good at operator precidence, nor
> for matching context sensitive grammars and most wiki parsers seem to
> be massive sets of regexps which I just cannot read - top-down
> recursive descent parsing seems a lot easier.
>
> > its history has been influenced by its early
> > development as an experiment, and then its rapid mass adoption quite
> > early on as a practical tool. It has rather constrained our ability to
> > change some of the less desirable fundamentals, and meant that the
> > definitive specification is all too frequently the source code.
>
> Right - the same thing happend in Erlang. First there was an
> implementation,
> then (later) documentation. The two were always out of phase.
>
> If people read the documentation and tried something and it did something
> different we said "the documentation is wrong, read the source, Luke" and
> we changed the documentation.
>
> After ten years of this we said "this is daft" - from now on we will write
> the
> documentation and describe how it is "supposed" to work - then if the
> documentation and code differed the code has a bug. That's what's what
> we do today.
>
> This is why I like to see "a proper definition" (In english) first - all
> the
> regexp and stuff that follows is a "detail" - I must be the only programmer
> on the planet who hates reading code to see what something does.
> Whenever I read code I have a pathological desire to rewrite it in a
> clearer
> manner so I can understand it.
>
> >
> >> I have a few questions:
> >
> >> 1) What is a wikiword?- I can find no definition. Where is the
> >> definitive definition?
> >
> > The definitive definition is a regular expression the source code,
> starting at:
> >
> > https://github.com/TiddlyWiki/tiddlywiki/blob/master/js/Config.js#L179
> >
> > The important bit is:
> >
> > config.textPrimitives.wikiLink = "(?:(?:" +
> > config.textPrimitives.upperLetter + "+" +
> >        config.textPrimitives.lowerLetter + "+" +
> >        config.textPrimitives.upperLetter +
> >        config.textPrimitives.anyLetter + "*)|(?:" +
> >        config.textPrimitives.upperLetter + "{2,}" +
> >        config.textPrimitives.lowerLetter + "+))";
> >
> > Confusingly, lowerLetter includes the digits and dashes, as does
> anyLetter.
> >
>
> I knew I hated regular expressions.
>
> The wikipedia says (of camelcase) "The name comes from the uppercase
> "bumps" in the middle of the compound word, suggestive of the humps of
> a camel.
>
> It seems that:
>
> Is the following correct:
>
> A wikiword (in the tiddlywiki)
>
> Is either
>
> - a single uppercase letter followed by one or more lowercase letters
>  followed by an uppercase letter followed by any combination of letters
>
> OR
>
> - two or more uppercase letters followed by at least one lower case letter
>  then any combination of letters
>
> Where uppercase letter = [A-Z]
>            lowercase letter = [a-z0-9] and dash
>
> So   Abcd is not a wiki word
>       ABcd IS a wiki word (but this is NOT camelcase (according to
> the wikipedia definition))
>
>       ABC-876 is a wikiword
>
>
> >> My tiddly wiki seems to think  that UTF-8 is a wikiword (since it's
> >> written ~UTF-8) - my parser
> >> finds a ~ then expects a wikiword but UTF-8 is not by the definitions
> >> I have found a wikiword
> >> ie its not CamelCase.
> >
> > See above.
> >
> >> 2) What is escaped text. As in """ escaped text """
> >> Is there any difference between """ ... """ (three consecutive double
> >> quotes) and {{{ ... }}}
> >
> > The purpose of escaped text is to suppress wikification without
> > applying the monospaced formatting applied by {{{....}}}.
> >
> > As you can see from the code, <nowiki>...</nowiki> can also be used to
> > escape text.
> >
> >
> https://github.com/TiddlyWiki/tiddlywiki/blob/master/js/Formatter.js#L504
> >
> >> 3) What is supposed to happen when markup elements are incorrectly
> nested?
> >
> > The wikifier generally treats the conflicting formatting as if it were
> > plain text, but frequently finds a confusingly optimistic way of
> > interpreting broken markup.
>
> The problem is that the casual user does not know if the markup
> is badly formed - it would be nice to (say) color the rendered page
> red (or something) and tell them why the markup was badly formed.
>
> There's no concept of a "warning" here for dubious markup.
>
>
>
> >
> >> I tried a tiddly page containing this:
> >>
> >> * --abc//def--hij//
> >> * ghi
> >> * 123
> >
> > In this case, the first double dash triggers strikethrough formatting.
> > Then the double slash triggers italics.
> >
> > The second double dash isn't interpreted as the end of the
> > strikethrough formatting because the wikifier is still waiting for the
> > italics to be terminated first. Instead, it interprets the second
> > double dash as the start of a new run of strike through text, nested
> > inside the first. Of course, there's no visual difference.
> >
> > By the time the second double slash is encountered, then, the same
> > thing happens: the wikifier interprets it as the start of a new,
> > nested run of italic text.
> >
> > All of which is why the second and third lines are rendered with
> > italics and strikethrough.
> >
> > Finally, the reason that the second and third lines are indented is
> > that the wikifier is looking for the italics and strikethrough to end
> > before it will recognise the next sibling list item. It's probably a
> > bug that it therefore falls back to interpreting the subsequent list
> > items as child items.
>
> The usual technique in HTML parsers is to view <b> <i> as a set
> of toggles. <b> puts the bold flag on </b> turns it off
>
> I can imagine the following
>
> As the start of parsing all flags are set false
> Then
>
> // toggles the italic flag
> '' toggels the bold flag
> @@ toggels the emphasis flag etc.
>
> When you hit raw text you just apply the styles to achieve what even
> combination of flags are set - then set all the flags to false at the
> end
> of the syntactic form.
>
> I tend to think of // etc as "short range" operators. I'm not sure if this
> is correct but having // propagate into the next paragraph feels wrong
> I feel time I've forgotten to terminate the italics.
>
> My feeling is that log range styling should be enclosed in
>
> {{css{
>
> }}}
>
> Tags. And that all character styling should be reset at the end of each
> paragraph, each list element and each cell element.
>
> Most "well formed" tiddlers seem to to obey this rule
>
>
> >
> > The reason that the wikifier behaves in this unexpected way is that it
> > is structured very simply: each individual formatter (ie element of
> > wiki syntax) is modelled as a regexp that triggers it, and a function
> > that processes it, usually with the help of another regular
> > expression. The idea was to make it easy to add new formatters without
> > disturbing the existing ones.
> >
> >> Now there is no notion of "incorrect markup" ie parsing cannot fail
> >> with a syntax error (or can it?)
> >
> > No, that is correct, parsing cannot fail; the goal in error conditions
> > is just to fail visibly, and emit as much readable text as possible.
> >
> >> unfortunately what is "sensable" in the event of badly nested markup
> >> is debatable. This bodes
> >> ill for standardization and wide scale adoption.
> >
> > Hopefully we can fix the most annoying problems.
> >
> >> In most html parsers an incorrect markup does not propagate beyond the
> >> scope of the
> >> current block (given some definition of a block)
> >
> > Yes, that would be much more useful.
> >
> >> Consider this:
> >>
> >> * abc//def
> >> * ghi//
> >> * 123
> >>
> >> The result is weird this generates three levels of list indents - all
> italic
> >>
> >> In my mind all "open" markup should be closed at the end of each
> superior block.
> >
> > I think that this could be fixed - as discussed elsewhere, we're very
> > interested at Osmosoft in evolving a tiddly markup mark 2.
> >
> >> Comments: 1) this yeilds well-formed XHTML 2) a bit more work could make
> this
> >> well-formed and retain what I "expected"
> >
> > I suspect that the broad user community would favour well-formed XHTML
> > for these cases, but perhaps we can still make the mis-nesting
> > behavour less unexpected.
>
>
> >
> > Best wishes
> >
> > jeremy
> >
> >> Cheers
> >>
> >> /Joe
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups "TiddlyWikiDev" group.
> >> To post to this group, send email to [email protected].
> >> To unsubscribe from this group, send email to
> [email protected].
> >> For more options, visit this group at
> http://groups.google.com/group/tiddlywikidev?hl=en.
> >>
> >>
> >
> >
> >
> > --
> > Jeremy Ruston
> > mailto:[email protected]
> > http://www.tiddlywiki.com
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> "TiddlyWikiDev" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> [email protected].
> > For more options, visit this group at
> http://groups.google.com/group/tiddlywikidev?hl=en.
> >
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "TiddlyWikiDev" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/tiddlywikidev?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"TiddlyWikiDev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tiddlywikidev?hl=en.

Re: [twdev] three questions

Reply via email to