Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-24 Thread Jordan Rose
Committed in r173368-71. Thanks, Richard! http://llvm-reviews.chandlerc.com/D312 ___ cfe-commits mailing list cfe-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-24 Thread Richard Smith
On Thu, Jan 24, 2013 at 12:54 PM, Jordan Rose jordan_r...@apple.com wrote: Committed in r173368-71. Thanks, Richard! http://llvm-reviews.chandlerc.com/D312 Awesome, thanks. This seems release-note-worthy. ___ cfe-commits mailing list

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-23 Thread Jordan Rose
Many more tests. This is actually now four patches in my git repo, which is how I'm planning to commit it: - Unify diagnostics for \x, \u, and \U without any following hex digits. - Handle universal character names and Unicode characters outside of literals. - As an extension, treat

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-23 Thread Richard Smith
This looks great, thanks! Comment at: lib/Lex/Lexer.cpp:2770 @@ +2769,3 @@ + // string literal corresponds to a control character (in either of the + // ranges 0x00–0x1F or 0x7F–0x9F, both inclusive) or to a character in the + // basic source character set, the

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-22 Thread Jordan Rose
Addresses most comments from before, and now diagnoses illegal UCNs in #if 0 blocks. This currently uses the presence of a preprocessor as a heuristic to warn even in raw mode. Hi rsmith, http://llvm-reviews.chandlerc.com/D312 CHANGE SINCE LAST DIFF

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-21 Thread Jordan Rose
Comment at: lib/Lex/Lexer.cpp:1598 @@ -1597,3 +1693,3 @@ char PrevCh = 0; - while (isNumberBody(C)) { // FIXME: UCNs in ud-suffix. CurPtr = ConsumeChar(CurPtr, Size, Result); Richard Smith wrote: This FIXME still needs to be addressed, right? I'm not

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-21 Thread Jordan Rose
Comment at: lib/Lex/Lexer.cpp:1598 @@ -1597,3 +1693,3 @@ char PrevCh = 0; - while (isNumberBody(C)) { // FIXME: UCNs in ud-suffix. CurPtr = ConsumeChar(CurPtr, Size, Result); Jordan Rose wrote: Richard Smith wrote: This FIXME still needs to be

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-18 Thread Jordan Rose
On Jan 17, 2013, at 17:43 , Richard Smith rich...@metafoo.co.uk wrote: On Thu, Jan 17, 2013 at 11:31 AM, Jordan Rose jordan_r...@apple.com wrote: How about this approach? - LexUnicode mirrors LexTokenInternal, dispatching to the proper lex method based on the first Unicode character in a

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-18 Thread Richard Smith
On Fri, Jan 18, 2013 at 11:20 AM, Jordan Rose jordan_r...@apple.com wrote: On Jan 17, 2013, at 17:43 , Richard Smith rich...@metafoo.co.uk wrote: On Thu, Jan 17, 2013 at 11:31 AM, Jordan Rose jordan_r...@apple.com wrote: How about this approach? - LexUnicode mirrors LexTokenInternal,

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-18 Thread Jordan Rose
On Jan 18, 2013, at 11:36 , Richard Smith rich...@metafoo.co.uk wrote: On Fri, Jan 18, 2013 at 11:20 AM, Jordan Rose jordan_r...@apple.com wrote: On Jan 17, 2013, at 17:43 , Richard Smith rich...@metafoo.co.uk wrote: On Thu, Jan 17, 2013 at 11:31 AM, Jordan Rose jordan_r...@apple.com

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-18 Thread Richard Smith
On Fri, Jan 18, 2013 at 2:56 PM, Jordan Rose jordan_r...@apple.com wrote: This is converging, so I'm putting it up on Phabricator for better spot-comments. E-mail review still welcome as well, of course. http://llvm-reviews.chandlerc.com/D312 One thing I missed before: please don't use

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-18 Thread Jordan Rose
On Jan 18, 2013, at 16:52 , Richard Smith rich...@metafoo.co.uk wrote: On Fri, Jan 18, 2013 at 2:56 PM, Jordan Rose jordan_r...@apple.com wrote: This is converging, so I'm putting it up on Phabricator for better spot-comments. E-mail review still welcome as well, of course.

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-18 Thread Richard Smith
Comment at: lib/Lex/Lexer.cpp:1598 @@ -1597,3 +1693,3 @@ char PrevCh = 0; - while (isNumberBody(C)) { // FIXME: UCNs in ud-suffix. CurPtr = ConsumeChar(CurPtr, Size, Result); This FIXME still needs to be addressed, right? Comment at:

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-17 Thread Jordan Rose
How about this approach? - LexUnicode mirrors LexTokenInternal, dispatching to the proper lex method based on the first Unicode character in a token. - UCNs are validated in readUCN (called by LexTokenInternal and LexIdentifier). The specific identifier restrictions are checked in LexUnicode and

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-17 Thread Jordan Rose
Another flaw here is that if a UCN is not a valid identifier character, it gets read in a second time by LexTokenInternal, which means we get the warnings twice. I was trying not to have a NoWarn variant but maybe it's necessary. Jordan On Jan 17, 2013, at 11:31 , Jordan Rose

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-17 Thread Richard Smith
On Thu, Jan 17, 2013 at 11:31 AM, Jordan Rose jordan_r...@apple.com wrote: How about this approach? - LexUnicode mirrors LexTokenInternal, dispatching to the proper lex method based on the first Unicode character in a token. - UCNs are validated in readUCN (called by LexTokenInternal and

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-14 Thread Jordan Rose
This got load-balanced to me, so I've been reworking Eli's patch to handle the recursive-getCharAndSize problem: // Parsing this UCN requires line-splicing. This is valid C99. #define newline_1_\u00F\ C 1 The basic idea is the same (the spelling of the token contains the UCN, but the

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-14 Thread Richard Smith
On Mon, Jan 14, 2013 at 11:53 AM, Jordan Rose jordan_r...@apple.com wrote: This got load-balanced to me, so I've been reworking Eli's patch to handle the recursive-getCharAndSize problem: // Parsing this UCN requires line-splicing. This is valid C99. #define newline_1_\u00F\ C 1 The

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-14 Thread Jordan Rose
On Jan 14, 2013, at 13:19 , Richard Smith rich...@metafoo.co.uk wrote: As a general point, please keep in mind how we might support UTF-8 in source code when working on this. The C++ standard requires that our observable behavior is that we treat extended characters in the source code and

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-14 Thread Richard Smith
On Mon, Jan 14, 2013 at 4:54 PM, Jordan Rose jordan_r...@apple.com wrote: On Jan 14, 2013, at 13:19 , Richard Smith rich...@metafoo.co.uk wrote: As a general point, please keep in mind how we might support UTF-8 in source code when working on this. The C++ standard requires that our

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2013-01-14 Thread Jordan Rose
On Jan 14, 2013, at 17:33 , Richard Smith rich...@metafoo.co.uk wrote: On Mon, Jan 14, 2013 at 4:54 PM, Jordan Rose jordan_r...@apple.com wrote: On Jan 14, 2013, at 13:19 , Richard Smith rich...@metafoo.co.uk wrote: As a general point, please keep in mind how we might support UTF-8 in

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2012-12-19 Thread Eli Friedman
On Tue, Dec 18, 2012 at 11:01 PM, Chris Lattner clatt...@apple.com wrote: On Dec 18, 2012, at 8:40 PM, Eli Friedman eli.fried...@gmail.com wrote: Oh, I see... so the idea is to hack up getCharAndSize instead of calling isUCNAfterSlash/ConsumeUCNAfterSlash where we expect a UCN, use a marker

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2012-12-19 Thread Richard Smith
On Wed, Dec 19, 2012 at 1:18 PM, Eli Friedman eli.fried...@gmail.com wrote: On Tue, Dec 18, 2012 at 11:01 PM, Chris Lattner clatt...@apple.com wrote: On Dec 18, 2012, at 8:40 PM, Eli Friedman eli.fried...@gmail.com wrote: Oh, I see... so the idea is to hack up getCharAndSize instead of

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2012-12-19 Thread Richard Smith
On Wed, Dec 19, 2012 at 4:24 PM, Richard Smith rich...@metafoo.co.uk wrote: On Wed, Dec 19, 2012 at 1:18 PM, Eli Friedman eli.fried...@gmail.com wrote: On Tue, Dec 18, 2012 at 11:01 PM, Chris Lattner clatt...@apple.com wrote: On Dec 18, 2012, at 8:40 PM, Eli Friedman eli.fried...@gmail.com

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2012-12-18 Thread Eli Friedman
On Tue, Nov 27, 2012 at 5:04 PM, Eli Friedman eli.fried...@gmail.com wrote: On Tue, Nov 27, 2012 at 3:33 PM, Eli Friedman eli.fried...@gmail.com wrote: On Tue, Nov 27, 2012 at 3:01 PM, Richard Smith rich...@metafoo.co.uk wrote: On Tue, Nov 27, 2012 at 2:37 PM, Eli Friedman

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2012-12-18 Thread Chris Lattner
On Dec 18, 2012, at 8:40 PM, Eli Friedman eli.fried...@gmail.com wrote: Oh, I see... so the idea is to hack up getCharAndSize instead of calling isUCNAfterSlash/ConsumeUCNAfterSlash where we expect a UCN, use a marker which essentially means saw a UCN. Seems like a workable approach; I

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2012-12-18 Thread James Dennett
On Tue, Nov 27, 2012 at 2:37 PM, Eli Friedman eli.fried...@gmail.com wrote: On Tue, Nov 27, 2012 at 2:25 PM, Richard Smith rich...@metafoo.co.uk wrote: I had a look at supporting UTF-8 in source files, and came up with the attached approach. getCharAndSize maps UTF-8 characters down to a char

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2012-11-27 Thread Richard Smith
I had a look at supporting UTF-8 in source files, and came up with the attached approach. getCharAndSize maps UTF-8 characters down to a char with the high bit set, representing the class of the character rather than the character itself. (I've not done any performance measurements yet, and the

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2012-11-27 Thread Eli Friedman
On Tue, Nov 27, 2012 at 2:25 PM, Richard Smith rich...@metafoo.co.uk wrote: I had a look at supporting UTF-8 in source files, and came up with the attached approach. getCharAndSize maps UTF-8 characters down to a char with the high bit set, representing the class of the character rather than

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2012-11-27 Thread Richard Smith
On Tue, Nov 27, 2012 at 2:37 PM, Eli Friedman eli.fried...@gmail.comwrote: On Tue, Nov 27, 2012 at 2:25 PM, Richard Smith rich...@metafoo.co.uk wrote: I had a look at supporting UTF-8 in source files, and came up with the attached approach. getCharAndSize maps UTF-8 characters down to a

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2012-11-27 Thread Eli Friedman
On Tue, Nov 27, 2012 at 3:01 PM, Richard Smith rich...@metafoo.co.uk wrote: On Tue, Nov 27, 2012 at 2:37 PM, Eli Friedman eli.fried...@gmail.com wrote: On Tue, Nov 27, 2012 at 2:25 PM, Richard Smith rich...@metafoo.co.uk wrote: I had a look at supporting UTF-8 in source files, and came up

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2012-11-27 Thread Eli Friedman
On Tue, Nov 27, 2012 at 3:33 PM, Eli Friedman eli.fried...@gmail.com wrote: On Tue, Nov 27, 2012 at 3:01 PM, Richard Smith rich...@metafoo.co.uk wrote: On Tue, Nov 27, 2012 at 2:37 PM, Eli Friedman eli.fried...@gmail.com wrote: On Tue, Nov 27, 2012 at 2:25 PM, Richard Smith

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2012-11-16 Thread Nico Weber
On Thu, Nov 15, 2012 at 8:30 PM, Richard Smith rich...@metafoo.co.uk wrote: On Thu, Nov 15, 2012 at 7:17 PM, Eli Friedman eli.fried...@gmail.com wrote: Patch attached. Adds support universal character names in identifiers, e.g.: char * \u00FC = u-umlaut; Not that it's particularly useful,

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2012-11-16 Thread Eli Friedman
On Thu, Nov 15, 2012 at 8:30 PM, Richard Smith rich...@metafoo.co.uk wrote: On Thu, Nov 15, 2012 at 7:17 PM, Eli Friedman eli.fried...@gmail.com wrote: Patch attached. Adds support universal character names in identifiers, e.g.: char * \u00FC = u-umlaut; Not that it's particularly useful,

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2012-11-16 Thread Eli Friedman
On Fri, Nov 16, 2012 at 9:54 AM, Nico Weber tha...@chromium.org wrote: On Thu, Nov 15, 2012 at 8:30 PM, Richard Smith rich...@metafoo.co.uk wrote: On Thu, Nov 15, 2012 at 7:17 PM, Eli Friedman eli.fried...@gmail.com wrote: Patch attached. Adds support universal character names in identifiers,

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2012-11-16 Thread Eli Friedman
On Fri, Nov 16, 2012 at 6:53 PM, Eli Friedman eli.fried...@gmail.com wrote: On Thu, Nov 15, 2012 at 8:30 PM, Richard Smith rich...@metafoo.co.uk wrote: On Thu, Nov 15, 2012 at 7:17 PM, Eli Friedman eli.fried...@gmail.com wrote: Patch attached. Adds support universal character names in

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2012-11-16 Thread Richard Smith
On Fri, Nov 16, 2012 at 6:53 PM, Eli Friedman eli.fried...@gmail.com wrote: On Thu, Nov 15, 2012 at 8:30 PM, Richard Smith rich...@metafoo.co.uk wrote: On Thu, Nov 15, 2012 at 7:17 PM, Eli Friedman eli.fried...@gmail.com wrote: Patch attached. Adds support universal character names in

Re: [cfe-commits] [PATCH] Support for universal character names in identifiers

2012-11-15 Thread Richard Smith
On Thu, Nov 15, 2012 at 7:17 PM, Eli Friedman eli.fried...@gmail.com wrote: Patch attached. Adds support universal character names in identifiers, e.g.: char * \u00FC = u-umlaut; Not that it's particularly useful, but it's a longstanding hole in our C99 support. The general outline of the