On 15-04-28 04:33 PM, Benito van der Zander wrote:
Hi Michael,
I don't think there's a problem with saying it's tokenized as two tokens.
Just because a text can be tokenized doesn't mean it's free of syntax
errors. And section A.2.2 gives just one of the many requirements that a
sequence of tokens must satisfy in order to be error-free. (Specifically,
"div" and "3" are adjacent non-delimiting terminal symbols, and so must be
separated by Whitespace and/or Comments.)
What if it parses it in
12!(12 div.)
as two tokens?
"." is a terminal symbol, and "div" is not a NCName there, just part of a
MultiplicativeExpr.
As pointed out by Ghislain yesterday, the last paragraph of A.2.2 applies:
if a QName or NCName is followed by a "." or "-", the two tokens must be
separated by whitespace and/or Comments.
Or in
1<<a>2</a>
as "<" and "<a>2</a>"
"<<" is longer, but not consistent.
"<<" is longer than "<", and there are continuations of "1<<" that conform
to the EBNF, so the LMP rule compels the tokenizer to pick "<<", which leads
to raising an error at ">". Ghislain also said this yesterday.
It's unclear what you mean by "consistent". If you mean that having the
tokenizer pick "<<" is not consistent with parsing the string as:
1 < <a>2</a>
then, yes, that's quite true.
-Michael
_______________________________________________
[email protected]
http://x-query.com/mailman/listinfo/talk