Re: [GSoC] gccrs Unicode support
2023年3月18日(土) 18:28 Jakub Jelinek : > That is a pretty simple thing, so no need to use an extra library for that. > As is documented in contrib/unicode/README, the Unicode *.txt files are > already checked in and there are several generators of tables. > libcpp/makeucnid.cc already creates tables based on the > UnicodeData.txt DerivedNormalizationProps.txt DerivedCoreProperties.txt > files, including NFC/NKFC, it is true it doesn't currently compute > whether a character is alphanumeric. That is either Alphabetic > DerivedCoreProperties.txt property, or for numeric Nd, Nl or No category > (3rd column) in UnicodeData.txt. Should be a few lines to add that support > to libcpp/makeucnid.cc, the only question is if it won't make the ucnranges > array much larger if it differentiates based on another ALPHANUM flag. > If it doesn't grow too much, let's put it there, if it would grow too much, > perhaps we should emit it in a separate table. > Sounds good. I have got a concrete idea of implementation. Thank you everyone for giving your advice. Sincerely yours, Raiki Tamura
Re: [GSoC] gccrs Unicode support
2023年3月18日(土) 17:47 Jonathan Wakely : > On Sat, 18 Mar 2023, 08:32 Raiki Tamura via Gcc, wrote: > >> Thank you everyone for your advice. >> Some kinds of names are restricted to unicode alphabetic/numeric in Rust. >> > > Doesn't it use the same rules as C++, based on XID_Start and XID_Continue? > That should already be supported. > Yes, C++ and Rust use the same rules for identifiers (described in UAX#31) and we can reuse it in the lexer of gccrs. I was talking about values of Rust's crate_name attributes, which only allow Unicode alphabetic/numeric characters. (Ref: https://doc.rust-lang.org/reference/crates-and-source-files.html#the-crate_name-attribute ) Raiki Tamura
Re: [GSoC] gccrs Unicode support
Thank you everyone for your advice. Some kinds of names are restricted to unicode alphabetic/numeric in Rust. And the current definition of the table defined in libcpp/ucind.h lacks some rows representing which characters are alphabetic/numeric. But it is not a problem because it seems to be easy to add missing rows to the table and use it in the Rust frontend. 2023年3月16日(木) 21:59 Mark Wielaard : > You might want to research whether NFC normalization of identifiers is > required to be done by the lexer or parser in Rust and how it interacts > with proc macros. Yes, NFC normalization must be done by the lexer, which may be complex and hard to implement. libunistring can also be used for normalization, so is it good to use libunistring only in the normalization process? Raiki Tamura
Re: [GSoC] gccrs Unicode support
Thank you everyone for your advice. Some kinds of names are restricted to unicode alphabetic/numeric in Rust. And the current definition of the table defined in libcpp/ucind.h lacks some rows representing which characters are alphabetic/numeric. But it is not a problem because it seems to be easy to add missing rows to the table and use it in the Rust frontend. 2023年3月16日(木) 21:59 Mark Wielaard : > You might want to research whether NFC normalization of identifiers is > required to be done by the lexer or parser in Rust and how it interacts > with proc macros. Yes, NFC normalization must be done by the lexer, which may be complex and hard to implement. libunistring can also be used for normalization, so is it good to use libunistring only in the normalization process? Raiki Tamura -- Gcc-rust mailing list Gcc-rust@gcc.gnu.org https://gcc.gnu.org/mailman/listinfo/gcc-rust
Re: [GSoC] gccrs Unicode support
Sorry for resending this email. I forgot using “Reply All”. Thank you for your response, Arsen and Jakub. I did not know C++ also supports Unicode identifiers. I looked a little into C++ and found C++ accepts the same form of identifiers as Rust. So I will do further investigation of libcpp with the hope that it can also be used in the Rust frontend. Raiki Tamura On Thu, Mar 16, 2023 at 0:18 Jakub Jelinek wrote: > On Wed, Mar 15, 2023 at 11:00:19AM +, Philip Herron via Gcc wrote: > > Excellent work on getting up to speed on the rust front-end. From my > > perspective I am interested to see what the wider GCC community thinks > > about using https://www.gnu.org/software/libunistring/ library within > GCC > > instead of rolling our own, this means it will be another dependency on > GCC. > > > > The other option is there is already code in the other front-ends to do > > this so in the worst case it should be possible to extract something out > of > > them and possibly make this a shared piece of functionality which we can > > mentor you through. > > I don't know what exactly Rust FE needs in this area, but e.g. libcpp > already handles whatever C/C++ need from Unicode support POV and can handle > it without any extra libraries. > So, if we could avoid the extra dependency, it would be certainly better, > unless you really need massive amounts of code from those libraries. > libcpp already e.g. provides mapping of unicode character names to code > points, determining which unicode characters can appear at the start or > in the middle of identifiers, etc. > > Jakub > >
Re: [GSoC] gccrs Unicode support
Sorry for resending this email. I forgot using “Reply All”. Thank you for your response, Arsen and Jakub. I did not know C++ also supports Unicode identifiers. I looked a little into C++ and found C++ accepts the same form of identifiers as Rust. So I will do further investigation of libcpp with the hope that it can also be used in the Rust frontend. Raiki Tamura On Thu, Mar 16, 2023 at 0:18 Jakub Jelinek wrote: > On Wed, Mar 15, 2023 at 11:00:19AM +, Philip Herron via Gcc wrote: > > Excellent work on getting up to speed on the rust front-end. From my > > perspective I am interested to see what the wider GCC community thinks > > about using https://www.gnu.org/software/libunistring/ library within > GCC > > instead of rolling our own, this means it will be another dependency on > GCC. > > > > The other option is there is already code in the other front-ends to do > > this so in the worst case it should be possible to extract something out > of > > them and possibly make this a shared piece of functionality which we can > > mentor you through. > > I don't know what exactly Rust FE needs in this area, but e.g. libcpp > already handles whatever C/C++ need from Unicode support POV and can handle > it without any extra libraries. > So, if we could avoid the extra dependency, it would be certainly better, > unless you really need massive amounts of code from those libraries. > libcpp already e.g. provides mapping of unicode character names to code > points, determining which unicode characters can appear at the start or > in the middle of identifiers, etc. > > Jakub > > -- Gcc-rust mailing list Gcc-rust@gcc.gnu.org https://gcc.gnu.org/mailman/listinfo/gcc-rust
[GSoC] gccrs Unicode support
Hello, My name is Raiki Tamura, an undergraduate student at Kyoto University in Japan and I want to work on Unicode support in gccrs this year. I have already written my proposal (linked below) and shared it with the gccrs team in Zulip. In the project, I am planning to use the GNU unistring library to handle Unicode characters and the GNU IDN library to normalize identifiers. According to my potential mentor, it would provide Unicode libraries for all frontends in GCC. If there are concerns or feedback about this, please tell me about it. Thank you. Link to my proposal: https://docs.google.com/document/d/1MgsbJMF-p-ndgrX2iKeWDR5KPSWw9Z7onsHIiZ2pPKs/edit?usp=sharing Raiki Tamura