Re: [GSoC] gccrs Unicode support

2023-03-20 Thread Raiki Tamura via Gcc
2023年3月18日(土) 18:28 Jakub Jelinek :

> That is a pretty simple thing, so no need to use an extra library for that.
> As is documented in contrib/unicode/README, the Unicode *.txt files are
> already checked in and there are several generators of tables.
> libcpp/makeucnid.cc already creates tables based on the
> UnicodeData.txt DerivedNormalizationProps.txt DerivedCoreProperties.txt
> files, including NFC/NKFC, it is true it doesn't currently compute
> whether a character is alphanumeric.  That is either Alphabetic
> DerivedCoreProperties.txt property, or for numeric Nd, Nl or No category
> (3rd column) in UnicodeData.txt.  Should be a few lines to add that support
> to libcpp/makeucnid.cc, the only question is if it won't make the ucnranges
> array much larger if it differentiates based on another ALPHANUM flag.
> If it doesn't grow too much, let's put it there, if it would grow too much,
> perhaps we should emit it in a separate table.
>

Sounds good. I have got a concrete idea of implementation.
Thank you everyone for giving your advice.

Sincerely yours,
Raiki Tamura


Re: [GSoC] gccrs Unicode support

2023-03-18 Thread Raiki Tamura via Gcc
2023年3月18日(土) 17:47 Jonathan Wakely :

> On Sat, 18 Mar 2023, 08:32 Raiki Tamura via Gcc,  wrote:
>
>> Thank you everyone for your advice.
>> Some kinds of names are restricted to unicode alphabetic/numeric in Rust.
>>
>
> Doesn't it use the same rules as C++, based on XID_Start and XID_Continue?
> That should already be supported.
>

Yes, C++ and Rust use the same rules for identifiers (described in UAX#31)
and we can reuse it in the lexer of gccrs.
I was talking about values of Rust's crate_name attributes, which only
allow Unicode alphabetic/numeric characters.
(Ref:
https://doc.rust-lang.org/reference/crates-and-source-files.html#the-crate_name-attribute
)

Raiki Tamura


Re: [GSoC] gccrs Unicode support

2023-03-18 Thread Raiki Tamura via Gcc
Thank you everyone for your advice.
Some kinds of names are restricted to unicode alphabetic/numeric in Rust.
And the current definition of the table defined in libcpp/ucind.h lacks
some rows representing which characters are alphabetic/numeric.
But it is not a problem because it seems to be easy to add missing rows to
the table and use it in the Rust frontend.

2023年3月16日(木) 21:59 Mark Wielaard :

> You might want to research whether NFC normalization of identifiers is
> required to be done by the lexer or parser in Rust and how it interacts
> with proc macros.


Yes, NFC normalization must be done by the lexer, which may be complex and
hard to implement.
libunistring can also be used for normalization, so is it good to use
libunistring only in the normalization process?

Raiki Tamura


Re: [GSoC] gccrs Unicode support

2023-03-18 Thread Raiki Tamura via Gcc-rust
Thank you everyone for your advice.
Some kinds of names are restricted to unicode alphabetic/numeric in Rust.
And the current definition of the table defined in libcpp/ucind.h lacks
some rows representing which characters are alphabetic/numeric.
But it is not a problem because it seems to be easy to add missing rows to
the table and use it in the Rust frontend.

2023年3月16日(木) 21:59 Mark Wielaard :

> You might want to research whether NFC normalization of identifiers is
> required to be done by the lexer or parser in Rust and how it interacts
> with proc macros.


Yes, NFC normalization must be done by the lexer, which may be complex and
hard to implement.
libunistring can also be used for normalization, so is it good to use
libunistring only in the normalization process?

Raiki Tamura
-- 
Gcc-rust mailing list
Gcc-rust@gcc.gnu.org
https://gcc.gnu.org/mailman/listinfo/gcc-rust


Re: [GSoC] gccrs Unicode support

2023-03-16 Thread Raiki Tamura via Gcc
Sorry for resending this email. I forgot using “Reply All”.

Thank you for your response, Arsen and Jakub.
I did not know C++ also supports Unicode identifiers.
I looked a little into C++ and found C++ accepts the same form of
identifiers as Rust.
So I will do further investigation of libcpp with the hope that it can also
be used in the Rust frontend.

Raiki Tamura

On Thu, Mar 16, 2023 at 0:18 Jakub Jelinek  wrote:

> On Wed, Mar 15, 2023 at 11:00:19AM +, Philip Herron via Gcc wrote:
> > Excellent work on getting up to speed on the rust front-end. From my
> > perspective I am interested to see what the wider GCC community thinks
> > about using https://www.gnu.org/software/libunistring/ library within
> GCC
> > instead of rolling our own, this means it will be another dependency on
> GCC.
> >
> > The other option is there is already code in the other front-ends to do
> > this so in the worst case it should be possible to extract something out
> of
> > them and possibly make this a shared piece of functionality which we can
> > mentor you through.
>
> I don't know what exactly Rust FE needs in this area, but e.g. libcpp
> already handles whatever C/C++ need from Unicode support POV and can handle
> it without any extra libraries.
> So, if we could avoid the extra dependency, it would be certainly better,
> unless you really need massive amounts of code from those libraries.
> libcpp already e.g. provides mapping of unicode character names to code
> points, determining which unicode characters can appear at the start or
> in the middle of identifiers, etc.
>
> Jakub
>
>


Re: [GSoC] gccrs Unicode support

2023-03-16 Thread Raiki Tamura via Gcc-rust
Sorry for resending this email. I forgot using “Reply All”.

Thank you for your response, Arsen and Jakub.
I did not know C++ also supports Unicode identifiers.
I looked a little into C++ and found C++ accepts the same form of
identifiers as Rust.
So I will do further investigation of libcpp with the hope that it can also
be used in the Rust frontend.

Raiki Tamura

On Thu, Mar 16, 2023 at 0:18 Jakub Jelinek  wrote:

> On Wed, Mar 15, 2023 at 11:00:19AM +, Philip Herron via Gcc wrote:
> > Excellent work on getting up to speed on the rust front-end. From my
> > perspective I am interested to see what the wider GCC community thinks
> > about using https://www.gnu.org/software/libunistring/ library within
> GCC
> > instead of rolling our own, this means it will be another dependency on
> GCC.
> >
> > The other option is there is already code in the other front-ends to do
> > this so in the worst case it should be possible to extract something out
> of
> > them and possibly make this a shared piece of functionality which we can
> > mentor you through.
>
> I don't know what exactly Rust FE needs in this area, but e.g. libcpp
> already handles whatever C/C++ need from Unicode support POV and can handle
> it without any extra libraries.
> So, if we could avoid the extra dependency, it would be certainly better,
> unless you really need massive amounts of code from those libraries.
> libcpp already e.g. provides mapping of unicode character names to code
> points, determining which unicode characters can appear at the start or
> in the middle of identifiers, etc.
>
> Jakub
>
>
-- 
Gcc-rust mailing list
Gcc-rust@gcc.gnu.org
https://gcc.gnu.org/mailman/listinfo/gcc-rust


[GSoC] gccrs Unicode support

2023-03-13 Thread Raiki Tamura via Gcc
Hello,

My name is Raiki Tamura, an undergraduate student at Kyoto University in
Japan and I want to work on Unicode support in gccrs this year.
I have already written my proposal (linked below) and shared it with the
gccrs team in Zulip.
In the project, I am planning to use the GNU unistring library to handle
Unicode characters and the GNU IDN library to normalize identifiers.
According to my potential mentor, it would provide Unicode libraries for
all frontends in GCC. If there are concerns or feedback about this, please
tell me about it.
Thank you.

Link to my proposal:
https://docs.google.com/document/d/1MgsbJMF-p-ndgrX2iKeWDR5KPSWw9Z7onsHIiZ2pPKs/edit?usp=sharing

Raiki Tamura