Hi Fredrik Unicode characters may be represented in different ways in different PEG grammar implementations.
Here is my suggestion, with my choice of acceptable Unicodes. It uses a slightly different syntax from Brian Ford's PEG syntax, but the translation should be obvious: qbar = term (bar term)+ term = alnum+ / quote quote : quo (!(quo (bar/end)) char)* quo end : !char alnum : 'a'..'z' / 'A'..'Z' / '0'..'9' bar : '|' quo : 39 char : 0x1..D7FF / 0xE000..FFFD / 0x10000..10FFFF Peter. On Sat, Aug 21, 2010 at 8:43 AM, Fredrik Karlsson <dargo...@gmail.com>wrote: > Dear list, > > Sorry for asking this, but I am not getting my grammar to work > correctly in, and I think it is due to me not knowing enough. :-/ > > What I need is to be able to parse a sequence of n strings, separated > by a | -char (if n > 1). > > * IFF the string is within matching ' - chars, the string itself could > contain any character from the full UTF-8 charset. Including the > '-char. > * If not within '-chars, the string could be only ascii letters and > numbers (this is the easy part of course. :-) ) > > So, > - a|b|c should be thee (ascii) strings > - 'affffɫɱ'|'ɠð' should be two (unicode) strings > - as will should this be : 'affffɫɱ'|'ɠð'''''''''''''''''''''''''''''''' > > So, a full unicode (ok, UTF-8) string should be found to be terminated > when '-char is found just before a |, space or "end of line/string". > > How do I do this? > > /Fredrik > > -- > "Life is like a trumpet - if you don't put anything into it, you don't > get anything out of it." > > _______________________________________________ > PEG mailing list > PEG@lists.csail.mit.edu > https://lists.csail.mit.edu/mailman/listinfo/peg >
_______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg