Committed in r173368-71. Thanks, Richard!
http://llvm-reviews.chandlerc.com/D312
___
cfe-commits mailing list
cfe-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
On Thu, Jan 24, 2013 at 12:54 PM, Jordan Rose jordan_r...@apple.com wrote:
Committed in r173368-71. Thanks, Richard!
http://llvm-reviews.chandlerc.com/D312
Awesome, thanks. This seems release-note-worthy.
___
cfe-commits mailing list
Many more tests.
This is actually now four patches in my git repo, which is how I'm planning
to commit it:
- Unify diagnostics for \x, \u, and \U without any following hex digits.
- Handle universal character names and Unicode characters outside of literals.
- As an extension, treat
This looks great, thanks!
Comment at: lib/Lex/Lexer.cpp:2770
@@ +2769,3 @@
+ // string literal corresponds to a control character (in either of the
+ // ranges 0x00–0x1F or 0x7F–0x9F, both inclusive) or to a character in
the
+ // basic source character set, the
Addresses most comments from before, and now diagnoses illegal UCNs in #if 0
blocks. This currently uses the presence of a preprocessor as a heuristic to
warn even in raw mode.
Hi rsmith,
http://llvm-reviews.chandlerc.com/D312
CHANGE SINCE LAST DIFF
Comment at: lib/Lex/Lexer.cpp:1598
@@ -1597,3 +1693,3 @@
char PrevCh = 0;
- while (isNumberBody(C)) { // FIXME: UCNs in ud-suffix.
CurPtr = ConsumeChar(CurPtr, Size, Result);
Richard Smith wrote:
This FIXME still needs to be addressed, right?
I'm not
Comment at: lib/Lex/Lexer.cpp:1598
@@ -1597,3 +1693,3 @@
char PrevCh = 0;
- while (isNumberBody(C)) { // FIXME: UCNs in ud-suffix.
CurPtr = ConsumeChar(CurPtr, Size, Result);
Jordan Rose wrote:
Richard Smith wrote:
This FIXME still needs to be
On Jan 17, 2013, at 17:43 , Richard Smith rich...@metafoo.co.uk wrote:
On Thu, Jan 17, 2013 at 11:31 AM, Jordan Rose jordan_r...@apple.com wrote:
How about this approach?
- LexUnicode mirrors LexTokenInternal, dispatching to the proper lex method
based on the first Unicode character in a
On Fri, Jan 18, 2013 at 11:20 AM, Jordan Rose jordan_r...@apple.com wrote:
On Jan 17, 2013, at 17:43 , Richard Smith rich...@metafoo.co.uk wrote:
On Thu, Jan 17, 2013 at 11:31 AM, Jordan Rose jordan_r...@apple.com wrote:
How about this approach?
- LexUnicode mirrors LexTokenInternal,
On Jan 18, 2013, at 11:36 , Richard Smith rich...@metafoo.co.uk wrote:
On Fri, Jan 18, 2013 at 11:20 AM, Jordan Rose jordan_r...@apple.com wrote:
On Jan 17, 2013, at 17:43 , Richard Smith rich...@metafoo.co.uk wrote:
On Thu, Jan 17, 2013 at 11:31 AM, Jordan Rose jordan_r...@apple.com
On Fri, Jan 18, 2013 at 2:56 PM, Jordan Rose jordan_r...@apple.com wrote:
This is converging, so I'm putting it up on Phabricator for better
spot-comments. E-mail review still welcome as well, of course.
http://llvm-reviews.chandlerc.com/D312
One thing I missed before: please don't use
On Jan 18, 2013, at 16:52 , Richard Smith rich...@metafoo.co.uk wrote:
On Fri, Jan 18, 2013 at 2:56 PM, Jordan Rose jordan_r...@apple.com wrote:
This is converging, so I'm putting it up on Phabricator for better
spot-comments. E-mail review still welcome as well, of course.
Comment at: lib/Lex/Lexer.cpp:1598
@@ -1597,3 +1693,3 @@
char PrevCh = 0;
- while (isNumberBody(C)) { // FIXME: UCNs in ud-suffix.
CurPtr = ConsumeChar(CurPtr, Size, Result);
This FIXME still needs to be addressed, right?
Comment at:
How about this approach?
- LexUnicode mirrors LexTokenInternal, dispatching to the proper lex method
based on the first Unicode character in a token.
- UCNs are validated in readUCN (called by LexTokenInternal and LexIdentifier).
The specific identifier restrictions are checked in LexUnicode and
Another flaw here is that if a UCN is not a valid identifier character, it gets
read in a second time by LexTokenInternal, which means we get the warnings
twice. I was trying not to have a NoWarn variant but maybe it's necessary.
Jordan
On Jan 17, 2013, at 11:31 , Jordan Rose
On Thu, Jan 17, 2013 at 11:31 AM, Jordan Rose jordan_r...@apple.com wrote:
How about this approach?
- LexUnicode mirrors LexTokenInternal, dispatching to the proper lex method
based on the first Unicode character in a token.
- UCNs are validated in readUCN (called by LexTokenInternal and
This got load-balanced to me, so I've been reworking Eli's patch to handle the
recursive-getCharAndSize problem:
// Parsing this UCN requires line-splicing. This is valid C99.
#define newline_1_\u00F\
C 1
The basic idea is the same (the spelling of the token contains the UCN, but
the
On Mon, Jan 14, 2013 at 11:53 AM, Jordan Rose jordan_r...@apple.com wrote:
This got load-balanced to me, so I've been reworking Eli's patch to handle
the recursive-getCharAndSize problem:
// Parsing this UCN requires line-splicing. This is valid C99.
#define newline_1_\u00F\
C 1
The
On Jan 14, 2013, at 13:19 , Richard Smith rich...@metafoo.co.uk wrote:
As a general point, please keep in mind how we might support UTF-8 in source
code when working on this. The C++ standard requires that our observable
behavior is that we treat extended characters in the source code and
On Mon, Jan 14, 2013 at 4:54 PM, Jordan Rose jordan_r...@apple.com wrote:
On Jan 14, 2013, at 13:19 , Richard Smith rich...@metafoo.co.uk wrote:
As a general point, please keep in mind how we might support UTF-8 in
source code when working on this. The C++ standard requires that our
On Jan 14, 2013, at 17:33 , Richard Smith rich...@metafoo.co.uk wrote:
On Mon, Jan 14, 2013 at 4:54 PM, Jordan Rose jordan_r...@apple.com wrote:
On Jan 14, 2013, at 13:19 , Richard Smith rich...@metafoo.co.uk wrote:
As a general point, please keep in mind how we might support UTF-8 in
On Tue, Dec 18, 2012 at 11:01 PM, Chris Lattner clatt...@apple.com wrote:
On Dec 18, 2012, at 8:40 PM, Eli Friedman eli.fried...@gmail.com wrote:
Oh, I see... so the idea is to hack up getCharAndSize instead of
calling isUCNAfterSlash/ConsumeUCNAfterSlash where we expect a UCN,
use a marker
On Wed, Dec 19, 2012 at 1:18 PM, Eli Friedman eli.fried...@gmail.com wrote:
On Tue, Dec 18, 2012 at 11:01 PM, Chris Lattner clatt...@apple.com wrote:
On Dec 18, 2012, at 8:40 PM, Eli Friedman eli.fried...@gmail.com wrote:
Oh, I see... so the idea is to hack up getCharAndSize instead of
On Wed, Dec 19, 2012 at 4:24 PM, Richard Smith rich...@metafoo.co.uk wrote:
On Wed, Dec 19, 2012 at 1:18 PM, Eli Friedman eli.fried...@gmail.com wrote:
On Tue, Dec 18, 2012 at 11:01 PM, Chris Lattner clatt...@apple.com wrote:
On Dec 18, 2012, at 8:40 PM, Eli Friedman eli.fried...@gmail.com
On Tue, Nov 27, 2012 at 5:04 PM, Eli Friedman eli.fried...@gmail.com wrote:
On Tue, Nov 27, 2012 at 3:33 PM, Eli Friedman eli.fried...@gmail.com wrote:
On Tue, Nov 27, 2012 at 3:01 PM, Richard Smith rich...@metafoo.co.uk wrote:
On Tue, Nov 27, 2012 at 2:37 PM, Eli Friedman
On Dec 18, 2012, at 8:40 PM, Eli Friedman eli.fried...@gmail.com wrote:
Oh, I see... so the idea is to hack up getCharAndSize instead of
calling isUCNAfterSlash/ConsumeUCNAfterSlash where we expect a UCN,
use a marker which essentially means saw a UCN.
Seems like a workable approach; I
On Tue, Nov 27, 2012 at 2:37 PM, Eli Friedman eli.fried...@gmail.com wrote:
On Tue, Nov 27, 2012 at 2:25 PM, Richard Smith rich...@metafoo.co.uk wrote:
I had a look at supporting UTF-8 in source files, and came up with the
attached approach. getCharAndSize maps UTF-8 characters down to a char
I had a look at supporting UTF-8 in source files, and came up with the
attached approach. getCharAndSize maps UTF-8 characters down to a char with
the high bit set, representing the class of the character rather than the
character itself. (I've not done any performance measurements yet, and the
On Tue, Nov 27, 2012 at 2:25 PM, Richard Smith rich...@metafoo.co.uk wrote:
I had a look at supporting UTF-8 in source files, and came up with the
attached approach. getCharAndSize maps UTF-8 characters down to a char with
the high bit set, representing the class of the character rather than
On Tue, Nov 27, 2012 at 2:37 PM, Eli Friedman eli.fried...@gmail.comwrote:
On Tue, Nov 27, 2012 at 2:25 PM, Richard Smith rich...@metafoo.co.uk
wrote:
I had a look at supporting UTF-8 in source files, and came up with the
attached approach. getCharAndSize maps UTF-8 characters down to a
On Tue, Nov 27, 2012 at 3:01 PM, Richard Smith rich...@metafoo.co.uk wrote:
On Tue, Nov 27, 2012 at 2:37 PM, Eli Friedman eli.fried...@gmail.com
wrote:
On Tue, Nov 27, 2012 at 2:25 PM, Richard Smith rich...@metafoo.co.uk
wrote:
I had a look at supporting UTF-8 in source files, and came up
On Tue, Nov 27, 2012 at 3:33 PM, Eli Friedman eli.fried...@gmail.com wrote:
On Tue, Nov 27, 2012 at 3:01 PM, Richard Smith rich...@metafoo.co.uk wrote:
On Tue, Nov 27, 2012 at 2:37 PM, Eli Friedman eli.fried...@gmail.com
wrote:
On Tue, Nov 27, 2012 at 2:25 PM, Richard Smith
On Thu, Nov 15, 2012 at 8:30 PM, Richard Smith rich...@metafoo.co.uk wrote:
On Thu, Nov 15, 2012 at 7:17 PM, Eli Friedman eli.fried...@gmail.com wrote:
Patch attached. Adds support universal character names in identifiers, e.g.:
char * \u00FC = u-umlaut;
Not that it's particularly useful,
On Thu, Nov 15, 2012 at 8:30 PM, Richard Smith rich...@metafoo.co.uk wrote:
On Thu, Nov 15, 2012 at 7:17 PM, Eli Friedman eli.fried...@gmail.com wrote:
Patch attached. Adds support universal character names in identifiers, e.g.:
char * \u00FC = u-umlaut;
Not that it's particularly useful,
On Fri, Nov 16, 2012 at 9:54 AM, Nico Weber tha...@chromium.org wrote:
On Thu, Nov 15, 2012 at 8:30 PM, Richard Smith rich...@metafoo.co.uk wrote:
On Thu, Nov 15, 2012 at 7:17 PM, Eli Friedman eli.fried...@gmail.com wrote:
Patch attached. Adds support universal character names in identifiers,
On Fri, Nov 16, 2012 at 6:53 PM, Eli Friedman eli.fried...@gmail.com wrote:
On Thu, Nov 15, 2012 at 8:30 PM, Richard Smith rich...@metafoo.co.uk wrote:
On Thu, Nov 15, 2012 at 7:17 PM, Eli Friedman eli.fried...@gmail.com wrote:
Patch attached. Adds support universal character names in
On Fri, Nov 16, 2012 at 6:53 PM, Eli Friedman eli.fried...@gmail.com wrote:
On Thu, Nov 15, 2012 at 8:30 PM, Richard Smith rich...@metafoo.co.uk wrote:
On Thu, Nov 15, 2012 at 7:17 PM, Eli Friedman eli.fried...@gmail.com wrote:
Patch attached. Adds support universal character names in
On Thu, Nov 15, 2012 at 7:17 PM, Eli Friedman eli.fried...@gmail.com wrote:
Patch attached. Adds support universal character names in identifiers, e.g.:
char * \u00FC = u-umlaut;
Not that it's particularly useful, but it's a longstanding hole in our
C99 support.
The general outline of the
38 matches
Mail list logo