Sorry, I sent that early by mistake. More below: On Thu, Jul 24, 2014 at 1:02 AM, Jon Zeppieri <[email protected]> wrote: > Your example string is "\n; BB#0;\n" > So, I'd expect the lexer to match: > - whitespace > - line-comment > > Yes, `block-comment` matches, but `line-comment'
... gives the longer match, because it includes the newline at the end, whereas `block-comment` will not match that newline. Since the ending newline will be taken care of by the whitespace rule, perhaps you could simply remove the final newline from the `line-comment` definition? It will still match everything up to (but not including) the newline. -Jon > > On Thu, Jul 24, 2014 at 12:46 AM, Mangpo Phitchaya Phothilimthana > <[email protected]> wrote: >> Hi, >> >> I try to write a lexer and parser, but I cannot figure out how to set >> priority to lexer's tokens. My simplified lexer (shown below) has only 2 >> tokens BLOCK, and COMMENT. BLOCK is in fact a subset of COMMENT. BLOCK >> appears first in the lexer, but when I parse something that matches BLOCK, >> it always matches to COMMENT instead. Below is my program. In this >> particular example, I expect to get a BLOCK token, but I get COMMENT token >> instead. If I comment out (line-comment (token-COMMENT lexeme)) in the >> lexer, I then get the BLOCK token. >> >> Can anyone tell me how to work around this issue? I can only find this in >> the documentation >> "When multiple patterns match, a lexer will choose the longest match, >> breaking ties in favor of the rule appearing first." >> >> #lang racket >> >> (require parser-tools/lex >> (prefix-in re- parser-tools/lex-sre) >> parser-tools/yacc) >> >> (define-tokens a (BLOCK COMMENT)) >> (define-empty-tokens b (EOF)) >> >> (define-lex-trans number >> (syntax-rules () >> ((_ digit) >> (re-: (uinteger digit) >> (re-? (re-: "." (re-? (uinteger digit)))))))) >> >> (define-lex-trans uinteger >> (syntax-rules () >> ((_ digit) (re-+ digit)))) >> >> (define-lex-abbrevs >> (block-comment (re-: "; BB#" number10 ":")) >> (line-comment (re-: ";" (re-* (char-complement #\newline)) #\newline)) >> (digit10 (char-range "0" "9")) >> (number10 (number digit10))) >> >> (define my-lexer >> (lexer-src-pos >> (block-comment (token-BLOCK lexeme)) >> (line-comment (token-COMMENT lexeme)) >> (whitespace (position-token-token (my-lexer input-port))) >> ((eof) (token-EOF)))) >> >> (define my-parser >> (parser >> (start code) >> (end EOF) >> (error >> (lambda (tok-ok? tok-name tok-value start-pos end-pos) >> (raise-syntax-error 'parser >> (format "syntax error at '~a' in src l:~a c:~a" >> tok-name >> (position-line start-pos) >> (position-col start-pos))))) >> (tokens a b) >> (src-pos) >> (grammar >> (unit ((BLOCK) $1) >> ((COMMENT) $1)) >> (code ((unit) (list $1)) >> ((unit code) (cons $1 $2)))))) >> >> (define (lex-this lexer input) >> (lambda () >> (let ([token (lexer input)]) >> (pretty-display token) >> token))) >> >> (define (ast-from-string s) >> (let ((input (open-input-string s))) >> (ast input))) >> >> (define (ast input) >> (my-parser (lex-this my-lexer input))) >> >> (ast-from-string " >> ; BB#0: >> ") >> >> ____________________ >> Racket Users list: >> http://lists.racket-lang.org/users >> ____________________ Racket Users list: http://lists.racket-lang.org/users

