This thread is quickly becoming quite tl;dr - but I think the discussions are valid and useful
Maybe we could break some of these desperate topics up a bit? - Trevor On Tue, Feb 7, 2012 at 8:35 AM, Pavel Tkachenko <[email protected]> wrote: > Platonides, > > 2012/2/7 Platonides <[email protected]>: > > they are not imposed by the wikitext in any way, and could be > > removed today if wished: > Then why are they still there? > > As I have said in my previous message I am ready to break down any > piece of markup that you want. Templates are just most crazy part of > current wikitext. > > >> 2. What is "pp-semi" why it "move-indef"? > > Names given by the users. > It's funny that users give names that others don't understand. Even > those who are "technically proficient" but not part of "the elite". > > > The pipe in {{About|the planet}} can look odd, but the pipe > > at the beginning of the line looks natural. It seems like some kind of > > continuation of the {{. > Apart from just "looking natural" argument I would put "crucial need". > I think quotes look natural after template parameter names' - but they > have no use and will duplicate existing functionality (a parameter > cannot last after the beginning of next parameter, for instance). > > In other words why do we need a pipe if we assume that a template can > have two modes: inline and block. Inline contain no line feeds, block > contain line feeds before each of their parameters. > > But first let's think if template should have 'mixed' mode. This will > make things more complex without any need and it will make source look > messy because it'll depend on the user if he wants to put that > particular parameter on a separate line or not (imagine two guys: one > of a "VT100" terminal with 80 characters in column and one with a > latest 30" plasma display). It will also require markup to provide > additional means of separating parameters if line feed can be trusted > no more. > > It's time to sharpen our Occam's Razor or the life will do this. > > > Space separated arguments are more readable for the casual editor, but > > normal editors would have a harder time to find out what's the template > > and what the parameters. > You're thinking in terms of parameters and my point was to discard all > of this stuff and think in terms of human writing. > > What looks more natural - {{About|Earth|the planet}} or {{About Earth, > the planet}}? Since when our handwriting produces pipes instead of > spaces (and not even all of them?). > > > Also, there are colons as parameters. How would you write as the > parameter > > the article [[Gypsy: A Musical Fable]] or [[Batman: Year One]] ? > > By banning ':' in titles? > Have I said something about colons and links? Links are fine with > colons or any other symbols. > > But if we're touching this pipes in links are not that intuitive > either. Pipes are actually not present on many keyboard layouts but > even apart from that it's more natural to use an equality sign. Or > double, for the purpose of text markup. > > In fact, doubling a symbol is a great way of both differentiating it > from misoperations and making easily recognizable by human eye. Space > can be used for the same purpose (as a delimiter). > > Let's be concrete: > 1. Links are wrapped in [[ and ]]. > 2. Links may optionally have titles. A title is separated from link > URL (or local page name) with a space. > 3. Those links which contain spaces in their address can be given > title after double equality sign. > 4. Finally, in very rare cases when both space and equality symbol is > necessary a special markup-wise (!) escape symbol can be used. > > Note how we tie up the links between markup at a whole and its > particular tokens: > 1. Markup-wise, tokens (links, headings, formatting, etc.) are created > using double symbols. Say, [[ and ]] for links, == and == for > headings, **bold**, __underline__, etc. This is the cornerstone. > 2. Markup-wise, there is a single way of escaping markup. Escaping > must be particularly well-thought because it will be the trickiest > part even for experienced editors. > > Currently wikitext uses terrible "<nowiki>stuff</nowiki>" but it > doesn't always work and HTMLTidy comes in handy with its < and > >. And some places (such as link titles) cannot be escaped > altogether. > Of course, given this approach we are forced to use an inconvenient > symbol - pipe - which is unlikely to occur in normal text and thus > needs no escaping 99% of cases. > > But let's put this from bottom-up ot top-down. Think about a > good-looking, rarely-used symbol that we will use for escaping... I'm > sure the amount of texts Wikipedia has it's easy to conduct a research > to determine that symbol but in this example I'll pick tilde (~), I'm > actually using it for that purpose in my home-made markups. > > What follows is the complete list of cases: > 1. [[Link]] > 2. [[Link Title]] = [[Link|Title]] now > 3. [[Space link==Title]] == [[Space link|Title]] now (pipe needs > layout change while ==' doesn't) > 4. An URL containing '==' in its query part: > [[http://weird.url/?k~==v]]. I've put tilde before '==' to prevent it > from being treated as an address/title separator. Since there's no > other separator - link has no caption. How wikitext handles links with > pipes? Are they banned? > 5. Local page name containing tilde: [[~ (Tilde)==Title]] - nothing > breaks here because tilde is only special when it escapes something... > and space isn't something to be escaped, so tilde is treated as normal > text. > 6. Extreme case: an URL containing both tilde and double equality sign > which originally (in browser address bar) looks like: > http://url?k~==v. In our link we will simply triple the tilde - making > first tilde to escape itself and last tilde - to escape the separator: > [[http://url?k~~~==v]]. > 7. For the sake of completeness, most extreme case: URL with ~== and > we've got to specify a title. We don't have to invent a wheel - simply > put the title after a space and properly escape the URL as in case #6: > [[http://url?k~~~==v Title]]. > > Now if you think "wow. that's a particular mess, nobody is going to > understand and use this syntax". But before that note that only the > first 3 cases are standard - others are exceptions which still > increase their complexity gracefully according to the task. > > For example, how wikitext will handle links containing pipes? hard to > say, probably [[|]]. <nowiki>? [[<nowiki>l]]. > <nowiki>? [[&lt;nowiki&gtl]]. > > As it is demonstrated the above system scales well and thus there is a > space for improving wikitext, even if not using exactly my scheme. > > A best thing about this is that once you're memorized the above 2 > fundamental rules you can apply them anywhere. Markup can be escaped > using tilde: ~[[no more a link]]. Associative (aka definition) lists > can be uniformized: > = Definition Value > = Definition with spaces == Value > = Definition of ~==, now its == Value > > And so on. Scalable. > > > No. We have "Earth, the planet"! > You mean that a template cannot put "the" in front because some > planets have their name without "the"? If so, they are not planets > (nebulas, satellites, etc.) and a different template must be used. And > if it's used the machine can handle language peculiarities. > > > You will need the proper name in the infobox, such as "Felis silvestris > > catus", even if the article is just called "Cat". > Yes: {{About Felis silvestris catus, cat}} > > >> What kind of false positives are we talking about? Will any sane > >> individual spend his precious time not editing but preparing to edit > >> this mess? > > I think they copy and paste, then fill the fields. Which is a good way > > of learning as they encounter it. > And the best way is to create/write things from scratch on our own. > Where have you learned programming, in the class copy-pasting lines > from blackboard to your notebook or in the office actually hitting the > keys? > > Many things can be hidden under copy-paste approach, there's even a > notion "code monkey". I believe if wikitext continutes the path of STL > there will be "wiki monkeys", provided that WMF gets commercial (which > is a last thing I want to see). > > But only syntax that hides nothing and is crisp to its bones can be > called fair. Only syntax that doesn't require any "templates" that you > just "copy" and "feel"... sorry, "fill in". > And of course, there are cases when templates (I mean wikitext {{ and > }}) with parameters are of great help and there are cases when > parameters with their pipes and whistles are redundant. Just like > life. > > > Well, just by looking at it I have no idea what those temperatures are :) > You mean that "| max_temp_1 = 331 K<ref name=asu_highest_temp/>" > gives you more ideas? > > > What if they were in a different order? > So the machine cannot sort them out and determine which is "max"? > > > 184 and 331 are probably some kind of limits, but what's that 287.2? > > Some kind of boilding point? > Hm, you might be right on this one and that it's better to have two > template parameters: "temperature" and "temperature mean". This is > what discussion is for - finding rough edges, right. > > > The goal of wikitext is to make html editing easy. > HTML editing? I thought wikitext was about text editing. Why not edit > HTML using HTML? > > > HTML only needs a few special characters: <>&;=" but it's bothersome. > And 4 of them are absent on my native layout. > > > We define that * is a bullet and serves to make lists: > > It's easier to type, and looks good. > Completely agree, this is a human-readable markup. > > > We define # as the equivalent for numbered lists. Note that there's no > > usage of # for numbers in many cultures, so that's less 'visual' there. > And this is to be refactored adding features along the way. Let's > write ordered lists using digits! > 1. One > 2. Two > 3. Three > > The Japanese have their own kanji for numbers but they use Arabic > digits sometimes as well. > > Oh, I hear your thoughts - "items are moved/removed and the order is > gone". Sure but machine can help us out: > 1. One > 1. Two > 1. Three > > We use a little trick here: since ordered items with identical marker > value are useless in human texts the markup can use them to represent > automatically ordered markers. > But even if someday we need two identical markers this can be fixed > using some clean syntax, such as: > 1. First > 1#1 First > > Just as with links and [[eq==signs]] above our syntax gradually > increases complexity. The person gets crazy so does the markup - but > not before the user. > > The added features of this approach are: > 1. Lists now support, say, 6 marker types: 1. (digits) 01. > (zero-padded digits) a. (lower-alpha) A. (upper-alpha) i. > (lower-Roman) I. (upper-Roman) > 2. Lists now can have markers of any value: > 3. Third > 2. Second > 1. First > 3. List types can be mixed: > 1. Digit > i. Roman > a. Alpha > > > But each feature requires new symbols, and when you > > look at those available on every layout, you get *very* limited... > This is far from truth. During last years I have developed for my > projects a markup that generally surpasses MediaWiki's and uses just > common symbols. I will briefly summarize it in the end of this > message. > > > For example, I could decide to list imagemaps as `Image1? `Image2?... > > (grave and acute), but oh, many keyboards don't have both accents. > Obviously, you cannot get a symbol for each and every particular > markup case. But you don't have to. Compare how often will you use > imagemaps and, say, highlighted PHP code. Right now both are pretty > lengthy and angular in wikitext and the former (less used) is even > shorter: > > <imagemap> > ... > </imagemap> > 10 + 11 symbols. > > <source lang="php"> > ... > </source> > 19 + 9 symbols. > > Do you see my point? > > > And obviously, you can't use something that would easily appear in a > > normal text (or you start defining escape codes which are uglier, too). > Nah, escape codes come from C, don't forget it's not a very friendly > language. Instead of escape codes one can use "quoting" (in Pascal > style) - I have already touched tilde symbol above. > > But I agree that markup that relies on escapes/quoting of any kind is > not fair. Escapes by definition are exceptions and cannot overwhelm > the common rule. > > > How do you type the *content* of the references? > As a title: [[*http:// My reference goes here]]. Or, better: we can > use the same syntax for footnotes: [[*My footnote]] and references > will be removed. So two types of footnotes: inline [[*text with > ''markup'']], and block which start with [[*, then line break, then > any kind of markup (including line breaks, more footnotes, links, even > headings and lists), then line break, closing ]] and another line > break. Uniform and consistent yet powerful and flexible. > > > But when you want to "go further", you start being limited. > I believe we will not if we start thinking in term of simplicity, not > features - the former will give features as demonstrated above but if > we focus on the latter it will force it out. > > > Just because that's what the underlying html uses. > This thinking is the problem - it is attached to particular use case. > "We will use HTML 5 DOM just because we won't need to transform it > when rendering". But what about PDF? XML? FB2? RTF? DOC? ODT? > > Machine can handle everything, its time is much less precious than > human's. Once written, a framework will perform wikitext <-> HTML > transformations in an instant; so goes for some intermediate > (completely detached from target, notice this) tree serialization > format - even if it's binary (personally I think binary is the only > choice here). > > Then why care about if we will be rendering Wikipedia for someone's > browser or Kindle? Because if we do we will need to invent adaptors > and switches for all but the format we have chosen as primary. And > things change, even that format may change and the framework will be > left with DOM format theoretically based on some old HTML3 with > patches here and there. It will no more use "underlying HTML". > > > The reason being that use of underlining is discouraged. > I agree but this is again target-thinking. Looking in future, markup > does not necessary define presentation, even basic like bold and > italic. You certainly know that <b> is discouraged in HTML in favor of > <strong> - why? Because it's semantics, not presentation. Similarly, > <u> is presentation but __ is semantics. We can define __terms__ like > this. Does it look good? Can you define <u>terms</u> like that? No, > you will need a new entity - of course soon there will be no symbols > left on any keyboard, even Japanese! > > Ockam's razor. > > > *And that also turned out to have issues, ever tried to write wikitext > > in piedmontese? > I have already said that a research of modern keyboard layouts with > their approximate user count is necessary if someone is going to > define an ideal keyboard layout. I am sure that even "=" can be absent > in some layouts (Japanese again). But there are general symbols thanks > to IBM. > > > #REDIRECT and __TOC__ are a sad effect of > > separate building of contents. > Half of current wikitext is a sad effect. Most of C++ standard is a > sad effect. Come on, does this prevents us from producing positive? > > This is what next generation of a software is meant to bring - core > reworked to the last screw based on previous use experience. > > > You can't wrap #REDIRECT in a template, though, because the redirect > applies > > to the template itself (unless you use some odd escaping?) > Once again target-thinking. Why limit {{...}} to templates? I have > mentioned this in my message: > > 2012/2/6 Pavel Tkachenko <[email protected]>: > > can be uniformized in a way similar to template > > insertions: {{redir New page}}, ... Templates can be > > called as {{tpl template arg arg arg}} > > Note that templates are subset of {{construct}} features. "Redir" and > "TOC" are actually not templates but "extensions". Or "actions", name > doesn't change the meaning. The point is to have a uniform syntax for > custom constructs, in other words extensions. It's obvious that no > matter how well-thought a standard is there will always be missing > features once it hits the reality. It must be prepared for this and > this construct is one of the ways. > > Now, to be concrete, what I see as a better syntax for text markup: > * formatting: **bold** //italic// __underline__ (or other semantic > meaning) --strikethru-- ++small++ (semantics) ^^superscript^^ > * styling text - replacement for <span class="cls">: !!(cls)text!! > * code: %%unformatted%% > * highlighting: %%(php)unformatted%% > * lists (ordered, unordered, definition) - already covered above > * blocks in different language (ISO 639): @@ru text@@ > * footnotes: [[*footnote text]] > * quotes: >inline ( >>older etc. ) and <[block]> > * terms - replacement for <span title="desc">: (?term desc?) or > (?space term==desc?) > * headings - just like current wikitext, ==heading== and so on > * misc markup: ??comment?? (invisible in resulting document), HTML > &&entities; (double ampersand) > > The actual markup is almost twice as large as the above but you > already know it: most tokens have block form as opposed to the inline > (above). > * styling blocks: > !!(cls) > content > !! > * code highlighting: > %%(php) > echo 'Hello, world!'; > %% > * language blocks: > @@jp > some Japanese text > @@ > * footnotes: > [[* > Footnote. [[Link]] > More text. > ]] > * comments: > ?? > author's comment. > can be shown in "draft output mode". > ?? > > As demonstrated, it uses no HTML, BB-codes or any other tag-driven > markup. Symbols used are (ordered by rough use count): > * / % ( ) ! = - ? & > [ ] _ + ^ > > The first 9 symbols are quite common, so are _ and +. > I have put [ ] in the end because if alternative ((link)) syntax is > allowed then those two are only used by <[blockquote]>. > > If wikitext syntax improvement is to be considered by the community I > am ready to give it more details. The above listing misses several > important points which require more explanations (in particular about > %%code and {{action}} calls). > > Signed, > P. Tkachenko > > _______________________________________________ > Wikitext-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitext-l >
_______________________________________________ Wikitext-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitext-l
