Hi Fredrik

Unicode characters may be represented in different ways in different PEG
grammar implementations.

Here is my suggestion, with my choice of acceptable Unicodes. It uses a
slightly different syntax from Brian Ford's PEG syntax, but the translation
should be obvious:

qbar  = term (bar term)+
term  = alnum+ / quote
quote : quo (!(quo (bar/end)) char)* quo
end   : !char
alnum : 'a'..'z' / 'A'..'Z' / '0'..'9'
bar    : '|'
quo   : 39
char  : 0x1..D7FF / 0xE000..FFFD / 0x10000..10FFFF

Peter.




On Sat, Aug 21, 2010 at 8:43 AM, Fredrik Karlsson <dargo...@gmail.com>wrote:

> Dear list,
>
> Sorry for asking this, but I am not getting my grammar to work
> correctly in, and I think it is due to me not knowing enough. :-/
>
> What I need is to be able to parse a sequence of n strings, separated
> by a | -char (if  n > 1).
>
> * IFF the string is within matching ' - chars, the string itself could
> contain any character from the full UTF-8 charset. Including  the
> '-char.
> * If not within '-chars, the string could be only ascii letters and
> numbers (this is the easy part of course. :-) )
>
> So,
> -  a|b|c should be thee (ascii) strings
> -  'affffɫɱ'|'ɠð' should be two (unicode) strings
> - as will should this be :  'affffɫɱ'|'ɠð''''''''''''''''''''''''''''''''
>
> So, a full unicode (ok, UTF-8) string should be found to be terminated
> when '-char is found just before a |, space or "end of line/string".
>
> How do I do this?
>
> /Fredrik
>
> --
> "Life is like a trumpet - if you don't put anything into it, you don't
> get anything out of it."
>
> _______________________________________________
> PEG mailing list
> PEG@lists.csail.mit.edu
> https://lists.csail.mit.edu/mailman/listinfo/peg
>
_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to