The difference comes from deciding what to do once you discover that the first
two asterisks don't constitute a valid span.
In your case, you say: the first asterisk IS an open, so now I MUST find a
close to match; and then you search the rest of the string trying to find one.
Whereas I say: the first asterisk CAN'T BE an open (it's followed immediately
by a close), so treat it as text and move onto the next character.
I don't think there is a rule violation in either case precisely because this
isn't specified; you'll need to specify how to deal with this case - keep or
reject the potentially invalid open directive.
I'm not sure there is anything in the current rules that would require the
extra lookahead; you should be able to parse the string once to identify all of
the potential directives and then construct the spans using that list. Repeat
searching trying to find the best match doesn't sound very lazy.
The possible-directive characters that might appear within spans are ones which
would not otherwise be identified as a valid open or close.
Consider the following examples (braces mark span start & end):
1. {*text*}text*
2. {*text*} text*
3. {*text *text*}
Example 1 is the same as in your examples; 2 is basically the same (the space
comes after the close); but for 3 the space comes before the second asterisk
which invalidates it as a close, thus making it a
possible-but-not-directive-present-between-two-directives. That's easily
identified without searching the whole string, and no rules are broken.
Anyway, let's not continue to spam the mailing list - sorry everyone! - we can
continue this debate elsewhere if necessary.
The best thing to do is write some code that follows the rules and see what
that leads to - that should also allow you to identify where the rules are
underspecified and generate consistent examples.
________________________________
On Fri, Nov 6, 2020, at 15:59, Tedd Sterr wrote:
> The way you're suggesting requires unbounded lookahead
I've been meaning to send an email about this for ages too, but this
*is* how it works right now. You definitely have to do this with the
current rules (regardless of how this particular situation works). It's
not really a problem in XMPP land though because the server will enforce
a max message length. I would have liked to fix this, but it would have
made it a lot more likely to have false positives and the user
experience isn't as good (which is why I suspect Slack/Watsapp/etc. do
something similar to what I've done). I probably should have put in an
explicit max-span-length though. Either way, it's not likely to be a
problem even if someone sends you tons of messages with 4k (or whatever
the server allows you to send) spans in them. These are small
(relatively) messages, not documents or serialization formats.
We can chat about this in another thread at some point though
> Don't try to be overly clever with the parsing, a lookahead of one
> character should be sufficient to identify directives. (Whether they
> are active and demark spans depends on matching pairs of directives.)
I understand what you're saying and how your parsing rules
work. What I'm trying to figure out is what the text says right now, and
I'm not sure if it matches what you're describing (which is how I've
written some of my implementations before) or what I described in my
original message. I am not trying to decide what would be best or change
the normative text right now.
I *think* your rules violate "and thus may be present between two
other styling directives" which would mean that "***" is valid, but
I'm not sure.
It also may not matter since this isn't likely to be a real situation,
but if I can clarify the rules I'd love to do so.
—Sam
_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: [email protected]
_______________________________________________