For your information, TiddlyWiki has no concept of paragraphs, as you may have noticed, it simply adds a line break each time you hit enter, which is why I suggested closing tags at line breaks if they aren't closed. Maybe you meant the same thing, but I thought I would point it out.
On Fri, Jul 8, 2011 at 5:37 AM, Joe Armstrong <[email protected]> wrote: > On Fri, Jul 8, 2011 at 12:12 AM, Jeremy Ruston <[email protected]> > wrote: > >> I'm writing a tiddlyWiki parser as part of an exercise try and > >> understand how the tiddlyWiki works. > > > > Awesome. I like the way that you've been approaching TiddlyWiki from a > > formal perspective; > > I'm not too worried about the formal bit - I just want to tie down the > semantics > of wikitext so strictly that it can implemented in multiple languages with > equivalent results. > > When I see bit of program code (ie wikitext) I want to know exactly > what it expands into, without running a program to see. > > The help files as they stand are fine for a user, but not an implementor. > > I have no idea of the scope of the constructs. > > What does > > | @@a | b@@| mean? > > Is this table with one cell with emphasised content. > Or two cells with the emphasis flag propagating over the cell boundary? > > I have to do small experiments to find out. Answer "@@" binds tighter than > "|" so it's one cell > > I have a feeling that the use of regular expressions greatly hinders > writing a parser - regexps are not good at operator precidence, nor > for matching context sensitive grammars and most wiki parsers seem to > be massive sets of regexps which I just cannot read - top-down > recursive descent parsing seems a lot easier. > > > its history has been influenced by its early > > development as an experiment, and then its rapid mass adoption quite > > early on as a practical tool. It has rather constrained our ability to > > change some of the less desirable fundamentals, and meant that the > > definitive specification is all too frequently the source code. > > Right - the same thing happend in Erlang. First there was an > implementation, > then (later) documentation. The two were always out of phase. > > If people read the documentation and tried something and it did something > different we said "the documentation is wrong, read the source, Luke" and > we changed the documentation. > > After ten years of this we said "this is daft" - from now on we will write > the > documentation and describe how it is "supposed" to work - then if the > documentation and code differed the code has a bug. That's what's what > we do today. > > This is why I like to see "a proper definition" (In english) first - all > the > regexp and stuff that follows is a "detail" - I must be the only programmer > on the planet who hates reading code to see what something does. > Whenever I read code I have a pathological desire to rewrite it in a > clearer > manner so I can understand it. > > > > >> I have a few questions: > > > >> 1) What is a wikiword?- I can find no definition. Where is the > >> definitive definition? > > > > The definitive definition is a regular expression the source code, > starting at: > > > > https://github.com/TiddlyWiki/tiddlywiki/blob/master/js/Config.js#L179 > > > > The important bit is: > > > > config.textPrimitives.wikiLink = "(?:(?:" + > > config.textPrimitives.upperLetter + "+" + > > config.textPrimitives.lowerLetter + "+" + > > config.textPrimitives.upperLetter + > > config.textPrimitives.anyLetter + "*)|(?:" + > > config.textPrimitives.upperLetter + "{2,}" + > > config.textPrimitives.lowerLetter + "+))"; > > > > Confusingly, lowerLetter includes the digits and dashes, as does > anyLetter. > > > > I knew I hated regular expressions. > > The wikipedia says (of camelcase) "The name comes from the uppercase > "bumps" in the middle of the compound word, suggestive of the humps of > a camel. > > It seems that: > > Is the following correct: > > A wikiword (in the tiddlywiki) > > Is either > > - a single uppercase letter followed by one or more lowercase letters > followed by an uppercase letter followed by any combination of letters > > OR > > - two or more uppercase letters followed by at least one lower case letter > then any combination of letters > > Where uppercase letter = [A-Z] > lowercase letter = [a-z0-9] and dash > > So Abcd is not a wiki word > ABcd IS a wiki word (but this is NOT camelcase (according to > the wikipedia definition)) > > ABC-876 is a wikiword > > > >> My tiddly wiki seems to think that UTF-8 is a wikiword (since it's > >> written ~UTF-8) - my parser > >> finds a ~ then expects a wikiword but UTF-8 is not by the definitions > >> I have found a wikiword > >> ie its not CamelCase. > > > > See above. > > > >> 2) What is escaped text. As in """ escaped text """ > >> Is there any difference between """ ... """ (three consecutive double > >> quotes) and {{{ ... }}} > > > > The purpose of escaped text is to suppress wikification without > > applying the monospaced formatting applied by {{{....}}}. > > > > As you can see from the code, <nowiki>...</nowiki> can also be used to > > escape text. > > > > > https://github.com/TiddlyWiki/tiddlywiki/blob/master/js/Formatter.js#L504 > > > >> 3) What is supposed to happen when markup elements are incorrectly > nested? > > > > The wikifier generally treats the conflicting formatting as if it were > > plain text, but frequently finds a confusingly optimistic way of > > interpreting broken markup. > > The problem is that the casual user does not know if the markup > is badly formed - it would be nice to (say) color the rendered page > red (or something) and tell them why the markup was badly formed. > > There's no concept of a "warning" here for dubious markup. > > > > > > >> I tried a tiddly page containing this: > >> > >> * --abc//def--hij// > >> * ghi > >> * 123 > > > > In this case, the first double dash triggers strikethrough formatting. > > Then the double slash triggers italics. > > > > The second double dash isn't interpreted as the end of the > > strikethrough formatting because the wikifier is still waiting for the > > italics to be terminated first. Instead, it interprets the second > > double dash as the start of a new run of strike through text, nested > > inside the first. Of course, there's no visual difference. > > > > By the time the second double slash is encountered, then, the same > > thing happens: the wikifier interprets it as the start of a new, > > nested run of italic text. > > > > All of which is why the second and third lines are rendered with > > italics and strikethrough. > > > > Finally, the reason that the second and third lines are indented is > > that the wikifier is looking for the italics and strikethrough to end > > before it will recognise the next sibling list item. It's probably a > > bug that it therefore falls back to interpreting the subsequent list > > items as child items. > > The usual technique in HTML parsers is to view <b> <i> as a set > of toggles. <b> puts the bold flag on </b> turns it off > > I can imagine the following > > As the start of parsing all flags are set false > Then > > // toggles the italic flag > '' toggels the bold flag > @@ toggels the emphasis flag etc. > > When you hit raw text you just apply the styles to achieve what even > combination of flags are set - then set all the flags to false at the > end > of the syntactic form. > > I tend to think of // etc as "short range" operators. I'm not sure if this > is correct but having // propagate into the next paragraph feels wrong > I feel time I've forgotten to terminate the italics. > > My feeling is that log range styling should be enclosed in > > {{css{ > > }}} > > Tags. And that all character styling should be reset at the end of each > paragraph, each list element and each cell element. > > Most "well formed" tiddlers seem to to obey this rule > > > > > > The reason that the wikifier behaves in this unexpected way is that it > > is structured very simply: each individual formatter (ie element of > > wiki syntax) is modelled as a regexp that triggers it, and a function > > that processes it, usually with the help of another regular > > expression. The idea was to make it easy to add new formatters without > > disturbing the existing ones. > > > >> Now there is no notion of "incorrect markup" ie parsing cannot fail > >> with a syntax error (or can it?) > > > > No, that is correct, parsing cannot fail; the goal in error conditions > > is just to fail visibly, and emit as much readable text as possible. > > > >> unfortunately what is "sensable" in the event of badly nested markup > >> is debatable. This bodes > >> ill for standardization and wide scale adoption. > > > > Hopefully we can fix the most annoying problems. > > > >> In most html parsers an incorrect markup does not propagate beyond the > >> scope of the > >> current block (given some definition of a block) > > > > Yes, that would be much more useful. > > > >> Consider this: > >> > >> * abc//def > >> * ghi// > >> * 123 > >> > >> The result is weird this generates three levels of list indents - all > italic > >> > >> In my mind all "open" markup should be closed at the end of each > superior block. > > > > I think that this could be fixed - as discussed elsewhere, we're very > > interested at Osmosoft in evolving a tiddly markup mark 2. > > > >> Comments: 1) this yeilds well-formed XHTML 2) a bit more work could make > this > >> well-formed and retain what I "expected" > > > > I suspect that the broad user community would favour well-formed XHTML > > for these cases, but perhaps we can still make the mis-nesting > > behavour less unexpected. > > > > > > Best wishes > > > > jeremy > > > >> Cheers > >> > >> /Joe > >> > >> -- > >> You received this message because you are subscribed to the Google > Groups "TiddlyWikiDev" group. > >> To post to this group, send email to [email protected]. > >> To unsubscribe from this group, send email to > [email protected]. > >> For more options, visit this group at > http://groups.google.com/group/tiddlywikidev?hl=en. > >> > >> > > > > > > > > -- > > Jeremy Ruston > > mailto:[email protected] > > http://www.tiddlywiki.com > > > > -- > > You received this message because you are subscribed to the Google Groups > "TiddlyWikiDev" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > [email protected]. > > For more options, visit this group at > http://groups.google.com/group/tiddlywikidev?hl=en. > > > > > > -- > You received this message because you are subscribed to the Google Groups > "TiddlyWikiDev" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tiddlywikidev?hl=en. > > -- You received this message because you are subscribed to the Google Groups "TiddlyWikiDev" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tiddlywikidev?hl=en.
