I think it's impossible to put it into <token> as long as <bytevector> contains ')'. (again correct me if I'm wrong.)
> On the other hand the rule <bytevector> is placed under section 7.1.1 > (Lexical structure), not under 7.1.2 (External representations). I > think this is not related to the complexity of the lexer implementation. Yes, you're right. On Tuesday, 29 October 2013, 9:23, Yuichi Nishiwaki <[email protected]> wrote: Hi, > As my understanding (correct me if I'm wrong), <token> meant be the smallest > unit that a lexer reads and <bytevector> is a compound datum that a parser > needs to construct. Suppose your reader gets #u8(1 2 3) as its input, then > first your lexer needs to return a token to let your parser know what type of > datum needs to be constructed. Even in that case, the lexer still is capable of scanning bytevectors, though the implementation will be much more complex than otherwise. > Now, first the lexer returns #u8( then the parser understands this is a > bytevector. However if <bytevector> is in <token> then your lexer needs to > read the input as a bytevector. Then inside of the bytevector it has a > delimiter and token so lexer needs to read your input recursively. I'm not > good with theory but seems something is wrong. I'm really wondering which of '#u8(' and <bytevector> is a token. As seeing the definition of <token> rule it reads '#u8(' is a token. On the other hand the rule <bytevector> is placed under section 7.1.1 (Lexical structure), not under 7.1.2 (External representations). I think this is not related to the complexity of the lexer implementation. -- Yuichi Nishiwaki 2013/10/29 Takashi Kato <[email protected]>: > I think it's on purpose. > > As my understanding (correct me if I'm wrong), <token> meant be the smallest > unit that a lexer reads and <bytevector> is a compound datum that a parser > needs to construct. Suppose your reader gets #u8(1 2 3) as its input, then > first your lexer needs to return a token to let your parser know what type of > datum needs to be constructed. Now, first the lexer returns #u8( then the > parser understands this is a bytevector. However if <bytevector> is in > <token> then your lexer needs to read the input as a bytevector. Then inside > of the bytevector it has a delimiter and token so lexer needs to read your > input recursively. I'm not good with theory but seems something is wrong. > > > Hope it can help you. > > _/_/ > Takashi Kato > E-mail: [email protected] > > > > > On Tuesday, 29 October 2013, 5:48, Yuichi Nishiwaki > <[email protected]> wrote: > Hi, all. I'm very excited to see the final R7RS draft published. Thank > you all for the great work. > Reading the final draft, I have one question about the formal syntax > definition (7.1.1). <bytevector> is not listed in <token> line. Is it > by purpose? Or just a missing? > > -- Yuichi Nishiwaki > > _______________________________________________ > Scheme-reports mailing list > [email protected] > http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports _______________________________________________ Scheme-reports mailing list [email protected] http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports
