Re: "ICU - International Components for Unicode"

2020-09-29 Thread Matthew Stuckwisch
In #raku it was mentioned that it would be nice to have a $*UNICODE variable of 
sorts that reports back the version, but not sure how that would be from an 
implementation POV.

I'm also late to the discussion, so pardon me jumping back a bit.  Basically, 
ICU is something that lets you quickly add in robust Unicode support.  But it's 
also a swiss army knife and overkill for what Raku generally needs (at 
whichever its implemented in), and also limiting in some ways because you 
become beholden to their structures which as Samantha pointed out, doesn't work 
for MoarVM's approach.  Rolling your own has a lot of advantages.

Beyond UCD and UAC (sorting), everything else really should go into module land 
since they're heavily based on an ever changing and growing CLDR, and even 
then, there can be good arguments made for putting sorting in module space too. 
 For reasons like performance, code clarity, data size, etc, companies have 
rolled their own ICU-like libraries (Google's Closure for JS, TwitterCLDR in 
Ruby, etc) running on the same CLDR data.  In Raku (shameless selfplug), a lot 
is already available in the Intl namespace.  There are actually some very cool 
things that can be done mixing CLDR and Raku like creating new 
character-class-like tokens, or even extending built ins — they just don't have 
any business being near core, just... core-like :-)

Matéu


PS: For understanding some of Samantha's incredible work, her talks at the 
Amsterdam convention are really great, and Perl Weekly has an archive of her 
grant write ups:
  Articles: https://perlweekly.com/a/samantha-mcvey.html
  High End Unicode in Perl 6: https://www.youtube.com/watch?v=Oj_lgf7A2LM
  Unicode Internals of Perl 6: https://www.youtube.com/watch?v=9Vv7nUUDdeA
  

> On Sep 29, 2020, at 3:14 PM, William Michels via perl6-users 
>  wrote:
> 
> Thank you, Samantha!
> 
> An outstanding question is one posed by Joseph Brenner--that
> is--knowing which version of the Unicode standard is supported by
> Raku. I grepped through two files, one called "unicode.c" and the
> other called "unicode_db.c". They're both located in rakudo at:
> /rakudo/rakudo-2020.06/nqp/MoarVM/src/strings/ .
> 
> Below are the first 4 lines of my grep results. As you can see
> (above/below), rakudo-2020.06 supports Unicode12.1.0:
> 
> ~$ raku -ne '.say if .grep(/unicode/)'
> ~/rakudo/rakudo-2020.06/nqp/MoarVM/src/strings/unicode_db.c
> # For terms of use, see http://www.unicode.org/terms_of_use.html
> # The UAXes can be accessed at http://www.unicode.org/versions/Unicode12.1.0/
> From http://unicode.org/copyright.html#Exhibit1 on 2017-11-28:
> Distributed under the Terms of Use in http://www.unicode.org/copyright.html.
> 
> 
> It would be really interesting to follow your Unicode work, Samantha.
> The ideas you propose are interesting and everyone hopes for speed
> improvements. Is there any place Raku-uns can go to read
> updates--maybe a grant report, blog, or Github issue? Or maybe right
> here, on the Perl6-Users mailing list? Thanks in advance.
> 
> Best, Bill.
> 
> W. Michels, Ph.D.
> 
> 
> 
> On Sun, Sep 27, 2020 at 4:03 AM Samantha McVey  wrote:
>> 
>> So MoarVM uses its own database of the UCD. One nice thing is this can
>> probably be faster than calling to the ICU to look up information of each
>> codepoint in a long string. Secondly it implements its own text data
>> structures, so the nice features of the UCD to do that would be difficult to
>> use.
>> 
>> In my opinion, it could make sense to use ICU for things like localized
>> collation (sorting). It also could make sense to use ICU for unicode
>> properties lookup for properties that don't have to do with grapheme
>> segmentation or casing. This would be a lot of work but if something like 
>> this
>> were implemented it would probably happen in the context of a larger
>> rethinking of how we use unicode. Though everything is complicated by that we
>> support lots of complicated regular expressions on different unicode
>> properties. I guess first I'd start by benchmarking the speed of ICU and
>> comparing to the current implementation.
>> 
>> 


Re: "ICU - International Components for Unicode"

2020-09-29 Thread William Michels via perl6-users
Thank you, Samantha!

An outstanding question is one posed by Joseph Brenner--that
is--knowing which version of the Unicode standard is supported by
Raku. I grepped through two files, one called "unicode.c" and the
other called "unicode_db.c". They're both located in rakudo at:
/rakudo/rakudo-2020.06/nqp/MoarVM/src/strings/ .

Below are the first 4 lines of my grep results. As you can see
(above/below), rakudo-2020.06 supports Unicode12.1.0:

~$ raku -ne '.say if .grep(/unicode/)'
~/rakudo/rakudo-2020.06/nqp/MoarVM/src/strings/unicode_db.c
# For terms of use, see http://www.unicode.org/terms_of_use.html
# The UAXes can be accessed at http://www.unicode.org/versions/Unicode12.1.0/
>From http://unicode.org/copyright.html#Exhibit1 on 2017-11-28:
Distributed under the Terms of Use in http://www.unicode.org/copyright.html.


It would be really interesting to follow your Unicode work, Samantha.
The ideas you propose are interesting and everyone hopes for speed
improvements. Is there any place Raku-uns can go to read
updates--maybe a grant report, blog, or Github issue? Or maybe right
here, on the Perl6-Users mailing list? Thanks in advance.

Best, Bill.

W. Michels, Ph.D.



On Sun, Sep 27, 2020 at 4:03 AM Samantha McVey  wrote:
>
> So MoarVM uses its own database of the UCD. One nice thing is this can
> probably be faster than calling to the ICU to look up information of each
> codepoint in a long string. Secondly it implements its own text data
> structures, so the nice features of the UCD to do that would be difficult to
> use.
>
> In my opinion, it could make sense to use ICU for things like localized
> collation (sorting). It also could make sense to use ICU for unicode
> properties lookup for properties that don't have to do with grapheme
> segmentation or casing. This would be a lot of work but if something like this
> were implemented it would probably happen in the context of a larger
> rethinking of how we use unicode. Though everything is complicated by that we
> support lots of complicated regular expressions on different unicode
> properties. I guess first I'd start by benchmarking the speed of ICU and
> comparing to the current implementation.
>
>


Re: "ICU - International Components for Unicode"

2020-09-27 Thread Samantha McVey
So MoarVM uses its own database of the UCD. One nice thing is this can 
probably be faster than calling to the ICU to look up information of each 
codepoint in a long string. Secondly it implements its own text data 
structures, so the nice features of the UCD to do that would be difficult to 
use.

In my opinion, it could make sense to use ICU for things like localized 
collation (sorting). It also could make sense to use ICU for unicode 
properties lookup for properties that don't have to do with grapheme 
segmentation or casing. This would be a lot of work but if something like this 
were implemented it would probably happen in the context of a larger 
rethinking of how we use unicode. Though everything is complicated by that we 
support lots of complicated regular expressions on different unicode 
properties. I guess first I'd start by benchmarking the speed of ICU and 
comparing to the current implementation.


Re: "ICU - International Components for Unicode"

2020-09-25 Thread Patrick R. Michaud
On Fri, Sep 25, 2020 at 12:37:49PM +0200, Elizabeth Mattijsen wrote:
> > On 25 Sep 2020, at 04:25, Brad Gilbert  wrote:
> > Rakudo does not use ICU
> > 
> > It used to though.
> > 
> > Rakudo used to run on Parrot.
> > Parrot used ICU for its Unicode features.
> 
> I do remember that in the Parrot days, any non-ASCII character in 
> any string, would have a significant negative effect on grammar parsing.  
> This was usually not that visible when trying to run a script, but the 
> time needed to compile the core setting (which already took a few minutes 
> then) rose (probably exponentially) to: well, I don't know.  

Part of this is because Parrot/ICU was using UTF-8 and/or UTF-16 to
encode non-ASCII strings.  As a result, indexing into a string often 
became a O(n) operation instead of O(1).  For short strings, no problem,
for long strings (such as the core setting) it was really painful.

We did work on some ways in Parrot/NQP to reduce the amount of string
scanning involved, such as caching certain index-points in the string, 
but it was always a bit of a hack.  Switching to a fixed-width encoding
(NFG, which MoarVM implements) was definitely the correct path to take
there.

Pm


Re: "ICU - International Components for Unicode"

2020-09-25 Thread Elizabeth Mattijsen
> On 25 Sep 2020, at 04:25, Brad Gilbert  wrote:
> Rakudo does not use ICU
> 
> It used to though.
> 
> Rakudo used to run on Parrot.
> Parrot used ICU for its Unicode features.

Ah, the days.

I do remember that in the Parrot days, any non-ASCII character in any string, 
would have a significant negative effect on grammar parsing.  This was usually 
not that visible when trying to run a script, but the time needed to compile 
the core setting (which already took a few minutes then) rose (probably 
exponentially) to: well, I don't know.  The last time I tried to see if it 
would actually complete, I killed the compilation process after an hour.

People complain about compilation taking long these days, and they're right: it 
should be better.  But still, compared to the Parrot days...  it's orders of 
magnitude better now.


Liz

Re: "ICU - International Components for Unicode"

2020-09-24 Thread Brad Gilbert
Rakudo does not use ICU

It used to though.

Rakudo used to run on Parrot.
Parrot used ICU for its Unicode features.

(Well maybe the JVM backend does currently, I don't actually know.)

MoarVM just has Unicode as one of its features.
Basically it has something similar to ICU already.

---

The purpose of ICU is to be able to add Unicode abilities to systems that
don't already have them.

As such, it does not really make sense to add support for the ICU library
in Raku as I don't think it adds anything that isn't already present.

If there is some feature that ICU has that Raku doesn't then it would make
more sense to add that feature directly to Raku itself.

On Thu, Sep 24, 2020 at 2:15 PM William Michels via perl6-users <
perl6-us...@perl.org> wrote:

> Thanks everyone for the replies. I guess the two questions I have
> pertain mainly to 1) lineage and 2) versioning:
>
> Regarding lineage, I'm interested in knowing if
> Pugs/Parrot/Niecza/STD/Perlito/viv/JVM/Rakudo ever used the ICU
> Libraries--even if now that data has been extracted into a Raku-native
> data structure. I'm fairly certain one principal Rakudo developer is a
> C++ expert, so this idea isn't too far fetched.
>
> Regarding versioning, it would be great to tell people that Raku
> conforms to the latest-and-greatest ICU Library version, currently
> sitting at version# ICU_67. That way when people are weighing Raku vs
> Ruby or Python or Haskell or Go, we can tell them "Raku v6.d extracts
> ICU_67 thus it conforms to the most current (and most widely accepted)
> Unicode Library release (ICU 67 / CLDR 37 locale data / Unicode 13)."
> I've read over Daniel's blog post but I don't recall explicit mention
> of Unicode version 12, or 13, etc., although it does seem that
> following his links takes you to references for Unicode 13.0.0 (see
> https://www.unicode.org/reports/tr44/). Does Rakudo roll it's own UCD?
> Is there no reliance on ICU?
>
> Anyway, If Daniel or Samantha or Joseph or Liz can confirm/refute
> Raku's use of the (widely-adopted) ICU C-Library and/or Java-Library,
> I will have learned something.
>
> Thanks, Bill.
>
> http://site.icu-project.org/download/67
>
> "ICU 67 updates to CLDR 37 locale data with many additions and
> corrections. This release also includes the updates to Unicode 13,
> subsuming the special CLDR 36.1 and ICU 66 releases. ICU 67 includes
> many bug fixes for date and number formatting, including enhanced
> support for user preferences in the locale identifier. The
> LocaleMatcher code and data are improved, and number skeletons have a
> new “concise” form that can be used in MessageFormat strings. This is
> the first regular release after ICU 65. ICU 66 was a low-impact
> release with just Unicode 13 and a few bug fixes."
>
>
> Library/Language support for ICU:
>
> Objective C CocoaICU A set of Objective-C classes that encapsulate parts
> of ICU.
> C# GenICUWrapper A tool that generates a rudimentary C# wrapper around
> the C API of ICU4C. This could be used to generate headers for other
> ICU wrappers.
> C# ICU Dotnet - .NET bindings for ICU
> D Mango.icu is a set of wrappers for the D programming language
> Erlang icu4e is a set of bindings for Erlang to ICU4C
> Cobol COBOL A page on how ICU could be used from a COBOL application.
> Go icu4go provides a Go binding for the icu4c library
> Haskell Data.Text.ICU Haskell bindings for ICU4C.
> Lua ICU-Lua ICU for the Lua language
> Pascal ICU4PAS An Object Pascal wrapper around ICU4C.
> Perl PICU Perl wrapper for ICU
> PHP PHP intl A PHP wrapper around core ICU4C APIs.
> Python PyICU A Python extension wrapper around ICU4C.
> R stringi An R language wrapper of for ICU4C.
> Ruby icu4r ICU4C binding for MRI ruby.
> Smalltalk VA Smalltalk Wrappers
> Parrot Virtual Machine This is a virtual machine for Perl 6 and other
> various programming languages. ICU4C is used to improve the Unicode
> support.
> PHP The upcoming PHP 6 language is expected to support Unicode through
> ICU4C.
>
> Companies and Organizations using ICU:
>
> ABAS Software, Adobe, Amazon (Kindle), Amdocs, Apache, Appian, Apple,
> Argonne National Laboratory, Avaya, BAE Systems Geospatial
> eXploitation Products, BEA, BluePhoenix Solutions, BMC Software,
> Boost, BroadJump, Business Objects, caris, CERN, CouchDB, Debian
> Linux, Dell, Eclipse, eBay, EMC Corporation, ESRI, Facebook (HHVM),
> Firebird RDBMS, FreeBSD, Gentoo Linux, Google, GroundWork Open Source,
> GTK+, Harman/Becker Automotive Systems GmbH, HP, Hyperion, IBM,
> Inktomi, Innodata Isogen, Informatica, Intel, Interlogics, IONA, IXOS,
> Jikes, Library of Congress, LibreOffice, Mathworks, Microsoft,
> Mozilla, Netezza, Node.js, Oracle (Solaris, Java), Lawson Software,
>

Re: "ICU - International Components for Unicode"

2020-09-24 Thread Joseph Brenner
I think more to the point is which version of Unicode is supported,
rather than the ICU libraries.   It might be worth writing some tests
that check that Raku's unicode handling matches the ICU libraries.

On 9/24/20, William Michels  wrote:
> Thanks everyone for the replies. I guess the two questions I have
> pertain mainly to 1) lineage and 2) versioning:
>
> Regarding lineage, I'm interested in knowing if
> Pugs/Parrot/Niecza/STD/Perlito/viv/JVM/Rakudo ever used the ICU
> Libraries--even if now that data has been extracted into a Raku-native
> data structure. I'm fairly certain one principal Rakudo developer is a
> C++ expert, so this idea isn't too far fetched.
>
> Regarding versioning, it would be great to tell people that Raku
> conforms to the latest-and-greatest ICU Library version, currently
> sitting at version# ICU_67. That way when people are weighing Raku vs
> Ruby or Python or Haskell or Go, we can tell them "Raku v6.d extracts
> ICU_67 thus it conforms to the most current (and most widely accepted)
> Unicode Library release (ICU 67 / CLDR 37 locale data / Unicode 13)."
> I've read over Daniel's blog post but I don't recall explicit mention
> of Unicode version 12, or 13, etc., although it does seem that
> following his links takes you to references for Unicode 13.0.0 (see
> https://www.unicode.org/reports/tr44/). Does Rakudo roll it's own UCD?
> Is there no reliance on ICU?
>
> Anyway, If Daniel or Samantha or Joseph or Liz can confirm/refute
> Raku's use of the (widely-adopted) ICU C-Library and/or Java-Library,
> I will have learned something.
>
> Thanks, Bill.
>
> http://site.icu-project.org/download/67
>
> "ICU 67 updates to CLDR 37 locale data with many additions and
> corrections. This release also includes the updates to Unicode 13,
> subsuming the special CLDR 36.1 and ICU 66 releases. ICU 67 includes
> many bug fixes for date and number formatting, including enhanced
> support for user preferences in the locale identifier. The
> LocaleMatcher code and data are improved, and number skeletons have a
> new “concise” form that can be used in MessageFormat strings. This is
> the first regular release after ICU 65. ICU 66 was a low-impact
> release with just Unicode 13 and a few bug fixes."
>
>
> Library/Language support for ICU:
>
> Objective C CocoaICU A set of Objective-C classes that encapsulate parts of
> ICU.
> C# GenICUWrapper A tool that generates a rudimentary C# wrapper around
> the C API of ICU4C. This could be used to generate headers for other
> ICU wrappers.
> C# ICU Dotnet - .NET bindings for ICU
> D Mango.icu is a set of wrappers for the D programming language
> Erlang icu4e is a set of bindings for Erlang to ICU4C
> Cobol COBOL A page on how ICU could be used from a COBOL application.
> Go icu4go provides a Go binding for the icu4c library
> Haskell Data.Text.ICU Haskell bindings for ICU4C.
> Lua ICU-Lua ICU for the Lua language
> Pascal ICU4PAS An Object Pascal wrapper around ICU4C.
> Perl PICU Perl wrapper for ICU
> PHP PHP intl A PHP wrapper around core ICU4C APIs.
> Python PyICU A Python extension wrapper around ICU4C.
> R stringi An R language wrapper of for ICU4C.
> Ruby icu4r ICU4C binding for MRI ruby.
> Smalltalk VA Smalltalk Wrappers
> Parrot Virtual Machine This is a virtual machine for Perl 6 and other
> various programming languages. ICU4C is used to improve the Unicode
> support.
> PHP The upcoming PHP 6 language is expected to support Unicode through
> ICU4C.
>
> Companies and Organizations using ICU:
>
> ABAS Software, Adobe, Amazon (Kindle), Amdocs, Apache, Appian, Apple,
> Argonne National Laboratory, Avaya, BAE Systems Geospatial
> eXploitation Products, BEA, BluePhoenix Solutions, BMC Software,
> Boost, BroadJump, Business Objects, caris, CERN, CouchDB, Debian
> Linux, Dell, Eclipse, eBay, EMC Corporation, ESRI, Facebook (HHVM),
> Firebird RDBMS, FreeBSD, Gentoo Linux, Google, GroundWork Open Source,
> GTK+, Harman/Becker Automotive Systems GmbH, HP, Hyperion, IBM,
> Inktomi, Innodata Isogen, Informatica, Intel, Interlogics, IONA, IXOS,
> Jikes, Library of Congress, LibreOffice, Mathworks, Microsoft,
> Mozilla, Netezza, Node.js, Oracle (Solaris, Java), Lawson Software,
> Leica Geosystems GIS & Mapping LLC, Mandrake Linux, OCLC, Progress
> Software, Python, QNX, Rogue Wave, SAP, SIL, SPSS, Software AG, SuSE,
> Sybase, Symantec, Teradata (NCR), ToolAware, Trend Micro, Virage,
> webMethods, Wine, WMS Gaming, XyEnterprise, Yahoo!, Vuo, and many
> others.
>
>
> On Thu, Sep 24, 2020 at 11:14 AM Joseph Brenner  wrote:
>>
>> Elizabeth Mattijsen  wrote:
>> > https://www.codesections.com/blog/raku-unicode/
>>
>> Thanks, yes I w

Re: "ICU - International Components for Unicode"

2020-09-24 Thread William Michels via perl6-users
Thanks everyone for the replies. I guess the two questions I have
pertain mainly to 1) lineage and 2) versioning:

Regarding lineage, I'm interested in knowing if
Pugs/Parrot/Niecza/STD/Perlito/viv/JVM/Rakudo ever used the ICU
Libraries--even if now that data has been extracted into a Raku-native
data structure. I'm fairly certain one principal Rakudo developer is a
C++ expert, so this idea isn't too far fetched.

Regarding versioning, it would be great to tell people that Raku
conforms to the latest-and-greatest ICU Library version, currently
sitting at version# ICU_67. That way when people are weighing Raku vs
Ruby or Python or Haskell or Go, we can tell them "Raku v6.d extracts
ICU_67 thus it conforms to the most current (and most widely accepted)
Unicode Library release (ICU 67 / CLDR 37 locale data / Unicode 13)."
I've read over Daniel's blog post but I don't recall explicit mention
of Unicode version 12, or 13, etc., although it does seem that
following his links takes you to references for Unicode 13.0.0 (see
https://www.unicode.org/reports/tr44/). Does Rakudo roll it's own UCD?
Is there no reliance on ICU?

Anyway, If Daniel or Samantha or Joseph or Liz can confirm/refute
Raku's use of the (widely-adopted) ICU C-Library and/or Java-Library,
I will have learned something.

Thanks, Bill.

http://site.icu-project.org/download/67

"ICU 67 updates to CLDR 37 locale data with many additions and
corrections. This release also includes the updates to Unicode 13,
subsuming the special CLDR 36.1 and ICU 66 releases. ICU 67 includes
many bug fixes for date and number formatting, including enhanced
support for user preferences in the locale identifier. The
LocaleMatcher code and data are improved, and number skeletons have a
new “concise” form that can be used in MessageFormat strings. This is
the first regular release after ICU 65. ICU 66 was a low-impact
release with just Unicode 13 and a few bug fixes."


Library/Language support for ICU:

Objective C CocoaICU A set of Objective-C classes that encapsulate parts of ICU.
C# GenICUWrapper A tool that generates a rudimentary C# wrapper around
the C API of ICU4C. This could be used to generate headers for other
ICU wrappers.
C# ICU Dotnet - .NET bindings for ICU
D Mango.icu is a set of wrappers for the D programming language
Erlang icu4e is a set of bindings for Erlang to ICU4C
Cobol COBOL A page on how ICU could be used from a COBOL application.
Go icu4go provides a Go binding for the icu4c library
Haskell Data.Text.ICU Haskell bindings for ICU4C.
Lua ICU-Lua ICU for the Lua language
Pascal ICU4PAS An Object Pascal wrapper around ICU4C.
Perl PICU Perl wrapper for ICU
PHP PHP intl A PHP wrapper around core ICU4C APIs.
Python PyICU A Python extension wrapper around ICU4C.
R stringi An R language wrapper of for ICU4C.
Ruby icu4r ICU4C binding for MRI ruby.
Smalltalk VA Smalltalk Wrappers
Parrot Virtual Machine This is a virtual machine for Perl 6 and other
various programming languages. ICU4C is used to improve the Unicode
support.
PHP The upcoming PHP 6 language is expected to support Unicode through ICU4C.

Companies and Organizations using ICU:

ABAS Software, Adobe, Amazon (Kindle), Amdocs, Apache, Appian, Apple,
Argonne National Laboratory, Avaya, BAE Systems Geospatial
eXploitation Products, BEA, BluePhoenix Solutions, BMC Software,
Boost, BroadJump, Business Objects, caris, CERN, CouchDB, Debian
Linux, Dell, Eclipse, eBay, EMC Corporation, ESRI, Facebook (HHVM),
Firebird RDBMS, FreeBSD, Gentoo Linux, Google, GroundWork Open Source,
GTK+, Harman/Becker Automotive Systems GmbH, HP, Hyperion, IBM,
Inktomi, Innodata Isogen, Informatica, Intel, Interlogics, IONA, IXOS,
Jikes, Library of Congress, LibreOffice, Mathworks, Microsoft,
Mozilla, Netezza, Node.js, Oracle (Solaris, Java), Lawson Software,
Leica Geosystems GIS & Mapping LLC, Mandrake Linux, OCLC, Progress
Software, Python, QNX, Rogue Wave, SAP, SIL, SPSS, Software AG, SuSE,
Sybase, Symantec, Teradata (NCR), ToolAware, Trend Micro, Virage,
webMethods, Wine, WMS Gaming, XyEnterprise, Yahoo!, Vuo, and many
others.


On Thu, Sep 24, 2020 at 11:14 AM Joseph Brenner  wrote:
>
> Elizabeth Mattijsen  wrote:
> > https://www.codesections.com/blog/raku-unicode/
>
> Thanks, yes I was just reading through that.  It makes it clear that
> the "Unicode Character Database" is built-in to the MoarVM, but I'm
> not that clear what the ICU libraries do for you, and I thought there
> might be some point in using them for something or other.
>
>
> On 9/24/20, Elizabeth Mattijsen  wrote:
> > https://www.codesections.com/blog/raku-unicode/
> >
> >> On 24 Sep 2020, at 20:00, Joseph Brenner  wrote:
> >>
> >> I'm not sure myself, but my first guess would be probably not...I
> >> *think*  Raku is doing it's own Unicode thing, and isn't using any
> >> system ICU libraries (but I'm willing to stand correc

Re: "ICU - International Components for Unicode"

2020-09-24 Thread Joseph Brenner
Elizabeth Mattijsen  wrote:
> https://www.codesections.com/blog/raku-unicode/

Thanks, yes I was just reading through that.  It makes it clear that
the "Unicode Character Database" is built-in to the MoarVM, but I'm
not that clear what the ICU libraries do for you, and I thought there
might be some point in using them for something or other.


On 9/24/20, Elizabeth Mattijsen  wrote:
> https://www.codesections.com/blog/raku-unicode/
>
>> On 24 Sep 2020, at 20:00, Joseph Brenner  wrote:
>>
>> I'm not sure myself, but my first guess would be probably not...I
>> *think*  Raku is doing it's own Unicode thing, and isn't using any
>> system ICU libraries (but I'm willing to stand corrected on that).
>>
>> As far as perl (the-language-formerly-known-as-perl5) is concerned:
>>
>> That page http://site.icu-project.org/related is a little strange in
>> any case.  If you follow the links for "perl" it goes to J. Briggs
>> personal web page, and if you comb through that there's a link to his
>> PICU just in tarball form.  He has a CPAN account, but doesn't seem to
>> have put this code there.
>>
>> (On the other hand there's this cpan module that uses the system icu
>> libraries:   https://metacpan.org/pod/Unicode::Transliterate)
>>
>> Anyway, I don't think perl has an ICU dependency either, it does it's
>> own unicode thing as well (i.e. the Unicode "database" ships with it).
>>
>>
>> On 9/24/20, William Michels  wrote:
>>> Hi,
>>>
>>> I stumbled across the "ICU - International Components for Unicode"
>>> website:
>>>
>>> http://site.icu-project.org/
>>> https://github.com/unicode-org/icu
>>>
>>> There's a list of programming languages using the ICU libraries here:
>>>
>>> http://site.icu-project.org/related
>>>
>>> Should Raku be added to the list above?
>>> I see Perl and Parrot listed, but not Raku.
>>>
>>> Best, Bill.
>>>
>


Re: "ICU - International Components for Unicode"

2020-09-24 Thread Elizabeth Mattijsen
https://www.codesections.com/blog/raku-unicode/

> On 24 Sep 2020, at 20:00, Joseph Brenner  wrote:
> 
> I'm not sure myself, but my first guess would be probably not...I
> *think*  Raku is doing it's own Unicode thing, and isn't using any
> system ICU libraries (but I'm willing to stand corrected on that).
> 
> As far as perl (the-language-formerly-known-as-perl5) is concerned:
> 
> That page http://site.icu-project.org/related is a little strange in
> any case.  If you follow the links for "perl" it goes to J. Briggs
> personal web page, and if you comb through that there's a link to his
> PICU just in tarball form.  He has a CPAN account, but doesn't seem to
> have put this code there.
> 
> (On the other hand there's this cpan module that uses the system icu
> libraries:   https://metacpan.org/pod/Unicode::Transliterate)
> 
> Anyway, I don't think perl has an ICU dependency either, it does it's
> own unicode thing as well (i.e. the Unicode "database" ships with it).
> 
> 
> On 9/24/20, William Michels  wrote:
>> Hi,
>> 
>> I stumbled across the "ICU - International Components for Unicode" website:
>> 
>> http://site.icu-project.org/
>> https://github.com/unicode-org/icu
>> 
>> There's a list of programming languages using the ICU libraries here:
>> 
>> http://site.icu-project.org/related
>> 
>> Should Raku be added to the list above?
>> I see Perl and Parrot listed, but not Raku.
>> 
>> Best, Bill.
>> 


Re: "ICU - International Components for Unicode"

2020-09-24 Thread Joseph Brenner
I'm not sure myself, but my first guess would be probably not...I
*think*  Raku is doing it's own Unicode thing, and isn't using any
system ICU libraries (but I'm willing to stand corrected on that).

As far as perl (the-language-formerly-known-as-perl5) is concerned:

That page http://site.icu-project.org/related is a little strange in
any case.  If you follow the links for "perl" it goes to J. Briggs
personal web page, and if you comb through that there's a link to his
PICU just in tarball form.  He has a CPAN account, but doesn't seem to
have put this code there.

(On the other hand there's this cpan module that uses the system icu
libraries:   https://metacpan.org/pod/Unicode::Transliterate)

Anyway, I don't think perl has an ICU dependency either, it does it's
own unicode thing as well (i.e. the Unicode "database" ships with it).


On 9/24/20, William Michels  wrote:
> Hi,
>
> I stumbled across the "ICU - International Components for Unicode" website:
>
> http://site.icu-project.org/
> https://github.com/unicode-org/icu
>
> There's a list of programming languages using the ICU libraries here:
>
> http://site.icu-project.org/related
>
> Should Raku be added to the list above?
> I see Perl and Parrot listed, but not Raku.
>
> Best, Bill.
>


"ICU - International Components for Unicode"

2020-09-24 Thread William Michels via perl6-users
Hi,

I stumbled across the "ICU - International Components for Unicode" website:

http://site.icu-project.org/
https://github.com/unicode-org/icu

There's a list of programming languages using the ICU libraries here:

http://site.icu-project.org/related

Should Raku be added to the list above?
I see Perl and Parrot listed, but not Raku.

Best, Bill.


Re: My unicode keeper

2019-12-10 Thread ToddAndMargo via perl6-users

I am having a ton if fun with this!

E = ½MV²

my $e; my $m=5; my $v=22; $e=½*$m*$v²; say $e
1210


Re: My unicode keeper

2019-12-10 Thread ToddAndMargo via perl6-users

On 2019-12-10 13:50, ToddAndMargo via perl6-users wrote:

Hi All,

My unicode keeper.  It is a work in progress.  Pelase
comment, if you be of a mind to.

Do we use `U2248 ≈` at all?  Maybe I just use that on in writing, 
instead of ~


-T


Today's revised revision with new additions and ammendments.
Ad no kittens.



Perl6: Unicode characters:

References:
https://en.wikipedia.org/wiki/Quotation_mark#Curved_quotes_and_Unicode
https://docs.raku.org/language/quoting

https://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Operators_block

https://www.techsupportalert.com/content/how-easily-insert-special-symbols-and-characters-windows-part-ii.htm


Unicode characters are convenient to use to avoid having
to escape things.



To enable Unicode keyboard input in Windows, install the following
registry key and reboot:

REGEDIT4
[HKEY_CURRENT_USER\Control Panel\Input Method]
"EnableHexNumpad"="1"


From a standard keyboard:

   Some useful Unicode characters:

   Notes:
   Windows: you must use the `+` from the keypad, not the regular 
keyboard
   Linux:   does not work in xterm's or terminals without special 
modifications


   UFF62  「Linux: uff62Windows: <+>ff62
   UFF62  」Linux: uff63Windows: <+>ff63
   U201D  „   Linux: u201DWindows: <+>201D
   U00AB  «   Linux: u00ABWindows: <+>00AB
   U00BB  »   Linux: u00BBWindows: <+>00BB
   U2260  ≠   Linux: u2260Windows: <+>2260
   U2248  ≈   Linux: u2248Windows: <+>2248


Some uses:

   For use as a literal quote in a regex (`Q[]` does not work inside 
regex's)


  say so Q[A:\] ~~ / 「:\」 /;
  True

  say so Q[A:\] ~~ / 「:/」 /;
  False

   For accessing keys inside a hash with a variable:

 my %h= a=>"A", b=>"B";
 my $i= "b";
 say %h<$i>;
 (Any)
 say %h<<$i>>;
 B
 say %h«$i»;
 B
 say %h{$i};
 B

  Math:
 say so 5 ≠ 6
 True

 say so 5 ≠ 5
 False


Re: My unicode keeper

2019-12-10 Thread ToddAndMargo via perl6-users

On 2019-12-10 13:55, Veesh Goldman wrote:

literal, not litter.


Hi Veesh,

Chuckle.  Thank you!

So, how many kittens did the typo produce?

:-)

-T


Re: My unicode keeper

2019-12-10 Thread Veesh Goldman
>
> For use as a litter quote in a regex (Q[] does not work inside regex's)
>
>say so Q[A:\] ~~ / 「:\」 /;
>True
>
literal, not litter.

On Tue, Dec 10, 2019 at 11:54 PM ToddAndMargo via perl6-users <
perl6-us...@perl.org> wrote:

> Hi All,
>
> My unicode keeper.  It is a work in progress.  Pelase
> comment, if you be of a mind to.
>
> Do we use `U2248 ≈` at all?  Maybe I just use that on in writing,
> instead of ~
>
> -T
>
>
> Perl6: Unicode characters:
>
> References:
>
> https://en.wikipedia.org/wiki/Quotation_mark#Curved_quotes_and_Unicode
>  https://docs.raku.org/language/quoting
>
>
> https://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Operators_block
>
> Unicode charaters are covienient to use to avoid having
> to escape things.
>
>
>  From a standard keyboard, Ctrl+Shift+unicode
>
> Some useful unicode characters:
>
> UFF62 「Ctrl+Shift+u f f 6 2
> UFF62 」Ctrl+Shift+u f f 6 3
> U201D „Ctrl+Shift+u 2 0 1 D
> U00AB «Ctrl+Shift+u 0 0 A B
> U00BB »   Ctrl+Shift+u 0 0 B B
> U2260 ≠Ctrl+Shift+u 2 2 6 0
> U2248 ≈Ctrl+Shift+u 2 2 4 8
>
> Some uses:
>
> For use as a litter quote in a regex (Q[] does not work inside regex's)
>
>say so Q[A:\] ~~ / 「:\」 /;
>True
>
>say so Q[A:\] ~~ / 「:/」 /;
>False
>
> For accessing keys inside a hash with a variable:
>
>   my %h= a=>"A", b=>"B";
>   my $i= "b";
>   say %h<$i>;
>   (Any)
>   say %h<<$i>>;
>   B
>   say %h«$i»;
>   B
>   say %h{$i};
>   B
>
>Math:
>   say so 5 ≠ 6
>   True
>
>   say so 5 ≠ 5
>   False
>


My unicode keeper

2019-12-10 Thread ToddAndMargo via perl6-users

Hi All,

My unicode keeper.  It is a work in progress.  Pelase
comment, if you be of a mind to.

Do we use `U2248 ≈` at all?  Maybe I just use that on in writing, 
instead of ~


-T


Perl6: Unicode characters:

References:
https://en.wikipedia.org/wiki/Quotation_mark#Curved_quotes_and_Unicode
https://docs.raku.org/language/quoting

https://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Operators_block

Unicode charaters are covienient to use to avoid having
to escape things.


From a standard keyboard, Ctrl+Shift+unicode

   Some useful unicode characters:

   UFF62 「Ctrl+Shift+u f f 6 2
   UFF62 」Ctrl+Shift+u f f 6 3
   U201D „Ctrl+Shift+u 2 0 1 D
   U00AB «Ctrl+Shift+u 0 0 A B
   U00BB »Ctrl+Shift+u 0 0 B B
   U2260 ≠Ctrl+Shift+u 2 2 6 0
   U2248 ≈Ctrl+Shift+u 2 2 4 8

Some uses:

   For use as a litter quote in a regex (Q[] does not work inside regex's)

  say so Q[A:\] ~~ / 「:\」 /;
  True

  say so Q[A:\] ~~ / 「:/」 /;
  False

   For accessing keys inside a hash with a variable:

 my %h= a=>"A", b=>"B";
 my $i= "b";
 say %h<$i>;
 (Any)
 say %h<<$i>>;
 B
 say %h«$i»;
 B
 say %h{$i};
 B

  Math:
 say so 5 ≠ 6
 True

 say so 5 ≠ 5
 False


Re: can't match unicode chars?

2018-07-31 Thread Patrick R. Michaud
On Tue, Jul 31, 2018 at 09:28:08PM +0200, Marc Chantreux wrote:
> @*ARGS.map: {
> gather {
> my @lines;
> for .IO.lines -> $l {
>if /'›'/ {
>@lines and take @lines;
>@lines = $l;
>}
>else {
>@lines.push($l);
>take @lines if /''/;
>}
> }
> }
> }
> 
> this doesn't work as it seems that '›' and '' aren't matched.

Is it as simple that the  /'›'/  regex is being matched against $_ instead of 
$l ?  

If I'm reading the above code correctly, $_ is being set to each of the values 
of @ARGS in turn.  The lines iterated by the for loop are all being bound to 
$l, meaning $_ is unchanged from its outer (map) meaning.

Pm


Re: can't match unicode chars?

2018-07-31 Thread Todd Chester




On 07/31/2018 12:28 PM, Marc Chantreux wrote:

hello people,

given the slides of my talk in the slides.vim format
(https://github.com/eiro/slides.vim), i want some of
them to be shown one bullet a slide. so when i have
this input:

 › Renater et le libre

  Sympa
  FileSender

the desired output is:

 › Renater et le libre

  Sympa
 › Renater et le libre

  FileSender

and it seems gather is the perfect solution for that so i started to
write it (any golfer magic or other feedback warmly welcome):

@*ARGS.map: {
 gather {
 my @lines;
 for .IO.lines -> $l {
if /'›'/ {
@lines and take @lines;
@lines = $l;
}
else {
@lines.push($l);
take @lines if /''/;
}
 }
 }
}

this doesn't work as it seems that '›' and '' aren't matched.
i tried both

 for .IO.lines -> $l {

 # following https://docs.perl6.org/routine/lines#class_IO::Path
 for .IO.lines(enc => 'utf8') -> $l {

but none of them worked and i run out of idea to know what's going on.

any idea or documentation point for me ?
regards

marc




Hi Marc,

This probably will not help. but what the heck!

I do a lot of string manipulations on html files I down load
from the Internet.  They often end in what I call "weird
characters".

This table is helpful:
   https://www.ascii-code.com/

Here are some of the things I do:

   if / \> /Note that I am escaping the ">" with "\"
   if / char(62) /  62 is ">"

if their are weird characters are at the end of the string and
I know what the string is suppose to end in, I will do
a sub with "greedy" (".*") to suck up all the weird stuff
at the end and replace it with what I want.

$x="abc-1234.exe";
$x ~~ s/ \.exe .* /.exe/;

Note that in the first part of the sub, I can space
things out, but in the second part, it is literal.
If you use a space, it becomes part of the substitution.
I use spaces in the first part as it helps get me
around run together confusions.

You can also use greedy to whack the weird stuff off the
beginning too:

$x="abc-1234.exe";
$x ~~ s/ .* "abc" /abc/;

I hope this helps, if only somewhat.

-T


can't match unicode chars?

2018-07-31 Thread Marc Chantreux
hello people,

given the slides of my talk in the slides.vim format
(https://github.com/eiro/slides.vim), i want some of
them to be shown one bullet a slide. so when i have
this input:

› Renater et le libre

 Sympa
 FileSender

the desired output is:

› Renater et le libre

 Sympa
› Renater et le libre

 FileSender

and it seems gather is the perfect solution for that so i started to
write it (any golfer magic or other feedback warmly welcome):

@*ARGS.map: {
gather {
my @lines;
for .IO.lines -> $l {
   if /'›'/ {
   @lines and take @lines;
   @lines = $l;
   }
   else {
   @lines.push($l);
   take @lines if /''/;
   }
}
}
}

this doesn't work as it seems that '›' and '' aren't matched.
i tried both

for .IO.lines -> $l {

# following https://docs.perl6.org/routine/lines#class_IO::Path
for .IO.lines(enc => 'utf8') -> $l {

but none of them worked and i run out of idea to know what's going on.

any idea or documentation point for me ?
regards

marc


[perl #130483] [UNI] Regex Unicode properties check string values before checking bool properties

2018-03-16 Thread Jan-Olof Hendig via RT
On Wed, 04 Jan 2017 21:27:05 -0800, samant...@posteo.net wrote:
> Also see this bisectable results:
> https://gist.github.com/Whateverable/50acf5fe072680085746459f144a106f
> 
> You can see how with the new commit, 'space' and 'White_Space' now
> resolve to the same property. Before 'space' resolved to the LF
> property, and magically worked. When this was fixed and 'space' ==
> 'White_Space', it broke. This bug will be considered close when:
> 
> use nqp; say nqp::unipropcode('space') ==
> nqp::unipropcode('White_Space') #> True
> and also must work doing: `'  ' ~~ /<:space>/; #> 「  」`

Fixed with commit 
https://github.com/rakudo/rakudo/commit/49dce163e8182ee726cd1e512a03c29551cc16da


[perl #127671] [EXOTICTEST] 「dir」 dies if weird unicode sequences are encountered (dir;)

2018-02-02 Thread Aleks-Daniel Jakimenko-Aleksejev via RT
Test added in
https://github.com/perl6/roast/commit/a7590d6543e1d29bc935377c727e4d15e38ee713

Note that the test *does create* files with weird names, but that's totally OK
in /tmp I think.

On 2017-03-02 07:06:54, c...@zoffix.com wrote:
> Explanation and ideas on IRC: https://irclog.perlgeek.de/moarvm/2017-
> 03-02#i_14193748
>
> - It's risky to be part of normal tests, lest the user won't be able
> to delete these weird dirs
> - Setup `make risky-test` target that would run these risky tests only
> when user asks (knowing they're on a system that can be trashed
> freely)


Re: [perl #132441] [SEVERE][WINDOWS][IO] IO::Handle.read-internal cannot handle fancy Unicode chars on TTY handles

2017-12-25 Thread Brandon Allbery via RT
On Tue, Dec 26, 2017 at 12:15 AM, Brandon Allbery via RT <
perl6-bugs-follo...@perl.org> wrote:

> IIRC this is known, and not really fixable. It's not even cmd.exe but a
> Windows console mode limitation.
>

Come to think of it, there should be existing mention of this on the moarvm
bug tracker (ticket may have been closed as unfixable).

-- 
brandon s allbery kf8nh   sine nomine associates
allber...@gmail.com  ballb...@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net


Re: [perl #132441] [SEVERE][WINDOWS][IO] IO::Handle.read-internal cannot handle fancy Unicode chars on TTY handles

2017-12-25 Thread Brandon Allbery
On Tue, Dec 26, 2017 at 12:15 AM, Brandon Allbery via RT <
perl6-bugs-follo...@perl.org> wrote:

> IIRC this is known, and not really fixable. It's not even cmd.exe but a
> Windows console mode limitation.
>

Come to think of it, there should be existing mention of this on the moarvm
bug tracker (ticket may have been closed as unfixable).

-- 
brandon s allbery kf8nh   sine nomine associates
allber...@gmail.com  ballb...@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net


Re: [perl #132441] [SEVERE][WINDOWS][IO] IO::Handle.read-internal cannot handle fancy Unicode chars on TTY handles

2017-12-25 Thread Brandon Allbery
On Mon, Dec 25, 2017 at 1:07 AM, Zoffix Znet via RT <
perl6-bugs-follo...@perl.org> wrote:

> On Thu, 16 Nov 2017 09:53:46 -0800, c...@zoffix.com wrote:
> > On 2017.07 on Win7 with 65001 code page enabled, the » char doesn't
> > show up at all. Just seems to get removed from the content if I paste
> > it into the terminal.
>
> Starting to think this might be a limitation of cmd.exe. Though strangely,
> I'm failing to find anyone mentioning this problem on Google...
>

IIRC this is known, and not really fixable. It's not even cmd.exe but a
Windows console mode limitation.

-- 
brandon s allbery kf8nh   sine nomine associates
allber...@gmail.com  ballb...@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net


Re: [perl #132441] [SEVERE][WINDOWS][IO] IO::Handle.read-internal cannot handle fancy Unicode chars on TTY handles

2017-12-25 Thread Brandon Allbery via RT
On Mon, Dec 25, 2017 at 1:07 AM, Zoffix Znet via RT <
perl6-bugs-follo...@perl.org> wrote:

> On Thu, 16 Nov 2017 09:53:46 -0800, c...@zoffix.com wrote:
> > On 2017.07 on Win7 with 65001 code page enabled, the » char doesn't
> > show up at all. Just seems to get removed from the content if I paste
> > it into the terminal.
>
> Starting to think this might be a limitation of cmd.exe. Though strangely,
> I'm failing to find anyone mentioning this problem on Google...
>

IIRC this is known, and not really fixable. It's not even cmd.exe but a
Windows console mode limitation.

-- 
brandon s allbery kf8nh   sine nomine associates
allber...@gmail.com  ballb...@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net


[perl #132441] [SEVERE][WINDOWS][IO] IO::Handle.read-internal cannot handle fancy Unicode chars on TTY handles

2017-12-24 Thread Zoffix Znet via RT
On Thu, 16 Nov 2017 09:53:46 -0800, c...@zoffix.com wrote:

> On 2017.07 on Win7 with 65001 code page enabled, the » char doesn't
> show up at all. Just seems to get removed from the content if I paste
> it into the terminal.

Starting to think this might be a limitation of cmd.exe. Though strangely,
I'm failing to find anyone mentioning this problem on Google...

For example, Perl 5 has the exact same problem that the fancy chars get 
stripped:

C:\rakudo>perl -wlE "say '[' . scalar(readline) . ']'"
e »♥ b
[eb
]

I also cobbled together this C program from MSDN's code examples and the same
problem is present in it as well:

#include 
#include 
#include 
#include 
#include 

int main( void ) {
   int fh, i;
   unsigned char buffer[6];
   unsigned int nbytes = 6, bytesread;
   int result;

   result = _setmode(_fileno(stdin), _O_BINARY);
   if( result == -1 )
  perror( "Cannot set mode" );
   else
  printf( "'stdin' successfully changed to binary mode\n" );

   if( ( bytesread = _read( _fileno( stdin ), buffer, nbytes ) ) <= 0 )
  perror( "Problem reading file" );
   else
  printf( "Read %u bytes from file\n", bytesread );

printf("Read this: `");
for (i = 0; i < bytesread; i++)
printf("%u ", buffer[i]);
printf("`\n\n");

return 1;
}


Anything fancy gets read as a nul byte instead of the proper bytes for that 
char:

C:\rakudo>gcc test.c && a.exe
'stdin' successfully changed to binary mode
e »♥ b
Read 8 bytes from file
Read this: `101 32 0 0 32 98 13 10 `


C:\rakudo>


[perl #132441] [SEVERE][WINDOWS][IO] IO::Handle.read-internal cannot handle fancy Unicode chars on TTY handles

2017-12-24 Thread Zoffix Znet via RT
On Thu, 16 Nov 2017 09:53:46 -0800, c...@zoffix.com wrote:

> On 2017.07 on Win7 with 65001 code page enabled, the » char doesn't
> show up at all. Just seems to get removed from the content if I paste
> it into the terminal.

Starting to think this might be a limitation of cmd.exe. Though strangely,
I'm failing to find anyone mentioning this problem on Google...

For example, Perl 5 has the exact same problem that the fancy chars get 
stripped:

C:\rakudo>perl -wlE "say '[' . scalar(readline) . ']'"
e »♥ b
[eb
]

I also cobbled together this C program from MSDN's code examples and the same
problem is present in it as well:

#include 
#include 
#include 
#include 
#include 

int main( void ) {
   int fh, i;
   unsigned char buffer[6];
   unsigned int nbytes = 6, bytesread;
   int result;

   result = _setmode(_fileno(stdin), _O_BINARY);
   if( result == -1 )
  perror( "Cannot set mode" );
   else
  printf( "'stdin' successfully changed to binary mode\n" );

   if( ( bytesread = _read( _fileno( stdin ), buffer, nbytes ) ) <= 0 )
  perror( "Problem reading file" );
   else
  printf( "Read %u bytes from file\n", bytesread );

printf("Read this: `");
for (i = 0; i < bytesread; i++)
printf("%u ", buffer[i]);
printf("`\n\n");

return 1;
}


Anything fancy gets read as a nul byte instead of the proper bytes for that 
char:

C:\rakudo>gcc test.c && a.exe
'stdin' successfully changed to binary mode
e »♥ b
Read 8 bytes from file
Read this: `101 32 0 0 32 98 13 10 `


C:\rakudo>


[perl #132452] Unicode: Windows shells print garbage instead of "「」"

2017-11-16 Thread Zoffix Znet via RT
On Thu, 16 Nov 2017 02:28:11 -0800, d...@zwell.net wrote:
> The Windows command shells I've used (CMD and Cmder) fail to print the "「」"
> that are part of stringified matches. This is a significant issue, since
> new users use regular expressions when working through tutorials.
> 
> Since pasting these characters into the console will also yield garbage, I
> suggest we pick different symbols, like "┘┌", instead. (IMO, it's important
> to choose two symbols that will not blend together when showing an empty
> match.)
> 
> The current output:
> > say 'a' ~~ /./
> 「a」
> 
> Note: I'm using the October 2017 Rakudo release.

That's because Windows cmd.exe isn't using UTF-8 by default. You need to run 
`chcp 65001` to switch to the proper code page.

-1 on trying to cater to cmd.exe's featureset or using mismatched brackets.


[perl #132452] Unicode: Windows shells print garbage instead of "「」"

2017-11-16 Thread Zoffix Znet via RT
On Thu, 16 Nov 2017 02:28:11 -0800, d...@zwell.net wrote:
> The Windows command shells I've used (CMD and Cmder) fail to print the "「」"
> that are part of stringified matches. This is a significant issue, since
> new users use regular expressions when working through tutorials.
> 
> Since pasting these characters into the console will also yield garbage, I
> suggest we pick different symbols, like "┘┌", instead. (IMO, it's important
> to choose two symbols that will not blend together when showing an empty
> match.)
> 
> The current output:
> > say 'a' ~~ /./
> 「a」
> 
> Note: I'm using the October 2017 Rakudo release.

That's because Windows cmd.exe isn't using UTF-8 by default. You need to run 
`chcp 65001` to switch to the proper code page.

-1 on trying to cater to cmd.exe's featureset or using mismatched brackets.


[perl #132452] Unicode: Windows shells print garbage instead of "「」"

2017-11-16 Thread via RT
# New Ticket Created by  Dan Zwell 
# Please include the string:  [perl #132452]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=132452 >


The Windows command shells I've used (CMD and Cmder) fail to print the "「」"
that are part of stringified matches. This is a significant issue, since
new users use regular expressions when working through tutorials.

Since pasting these characters into the console will also yield garbage, I
suggest we pick different symbols, like "┘┌", instead. (IMO, it's important
to choose two symbols that will not blend together when showing an empty
match.)

The current output:
> say 'a' ~~ /./
「a」

Note: I'm using the October 2017 Rakudo release.


[perl #132176] [RFC] Aliasing of Unicode ops to Texas Versions

2017-09-28 Thread Zoffix Znet via RT
s:g/Mexico/Fancy Unicode/;

per RT#132179: https://rt.perl.org/Ticket/Display.html?id=132179#ticket-history


[perl #132176] [RFC] Aliasing of Unicode ops to Texas Versions

2017-09-28 Thread Zoffix Znet via RT
s:g/Mexico/Fancy Unicode/;

per RT#132179: https://rt.perl.org/Ticket/Display.html?id=132179#ticket-history


[perl #130384] [UNI] degenerates: Mo or Mn Unicode characters combine with punctuation

2017-07-16 Thread Samantha McVey via RT
Bug has been open a while, and I have not forgotten it, I had just not reached 
a final decision. After further thought I'm closing this WONTFIX. It would 
needlessly complicate our grapheme concatenation and in addition I believe it 
may break some of the grapheme concatenation tests.


[perl #131384] .open(:enc) on file with unicode leads to MoarVM panic: MVM_nfg_get_synthetic_info called with out-of-range synthetic

2017-07-07 Thread jn...@jnthn.net via RT
On Sat, 27 May 2017 08:01:00 -0700, c...@zoffix.com wrote:
>  c: HEAD with "/tmp/foo2121".IO { .spurt: "fo♥o"; with
> .open(:enc) { say .slurp } }
>  Zoffix___, ¦HEAD(0c5fe56): «MoarVM panic:
> MVM_nfg_get_synthetic_info called with out-of-range synthetic «exit
> code = 1»»
> 
> This is not the result of encoding refactor; still exists in 2017.05:
>  c: 2017.05 with "/tmp/foo2121".IO { .spurt: "fo♥o"; with
> .open(:enc) { say .slurp } }
>  Zoffix___, ¦2017.05: «MoarVM panic:
> MVM_nfg_get_synthetic_info called with out-of-range synthetic «exit
> code = 1»»

Fixed (now it gives a normal decoding exception) and tested in 
S32-io/io-handle.t.


[perl #131384] .open(:enc) on file with unicode leads to MoarVM panic: MVM_nfg_get_synthetic_info called with out-of-range synthetic

2017-07-07 Thread jn...@jnthn.net via RT
On Sat, 27 May 2017 08:01:00 -0700, c...@zoffix.com wrote:
>  c: HEAD with "/tmp/foo2121".IO { .spurt: "fo♥o"; with
> .open(:enc) { say .slurp } }
>  Zoffix___, ¦HEAD(0c5fe56): «MoarVM panic:
> MVM_nfg_get_synthetic_info called with out-of-range synthetic «exit
> code = 1»»
> 
> This is not the result of encoding refactor; still exists in 2017.05:
>  c: 2017.05 with "/tmp/foo2121".IO { .spurt: "fo♥o"; with
> .open(:enc) { say .slurp } }
>  Zoffix___, ¦2017.05: «MoarVM panic:
> MVM_nfg_get_synthetic_info called with out-of-range synthetic «exit
> code = 1»»

Fixed (now it gives a normal decoding exception) and tested in 
S32-io/io-handle.t.


[perl #131384] .open(:enc) on file with unicode leads to MoarVM panic: MVM_nfg_get_synthetic_info called with out-of-range synthetic

2017-05-27 Thread via RT
# New Ticket Created by  Zoffix Znet 
# Please include the string:  [perl #131384]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=131384 >


 c: HEAD with "/tmp/foo2121".IO { .spurt: "fo♥o"; with 
.open(:enc) { say .slurp } }
 Zoffix___, ¦HEAD(0c5fe56): «MoarVM panic: 
MVM_nfg_get_synthetic_info called with out-of-range synthetic «exit code = 1»»

This is not the result of encoding refactor; still exists in 2017.05:
 c: 2017.05 with "/tmp/foo2121".IO { .spurt: "fo♥o"; with 
.open(:enc) { say .slurp } }
 Zoffix___, ¦2017.05: «MoarVM panic: MVM_nfg_get_synthetic_info 
called with out-of-range synthetic «exit code = 1»»


[perl #131048] [STAR][BUG] Cursor behavior with Unicode in command line interactive input

2017-03-23 Thread via RT
# New Ticket Created by  Matt Rosin 
# Please include the string:  [perl #131048]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=131048 >


When Unicode is entered In the perl6 interactive shell on the command line in 
Mac OS X, when you backspace over a previously inputted line the cursor 
position is displayed incorrectly and backspacing only partially destroys the 
unicode character.

$ perl6
>  ∩ 
set(c, b)
> (press the up arrow key to copy the line)
—> the cursor appears some spaces to the right of the end of the line.
(Press the delete key 9 times)
—> the second byte of the unicode intersect character only is deleted, leaving 
a question mark in a diamond character (unicode name: REPLACEMENT CHARACTER): �
(Press enter key)
Malformed termination of UTF-8 string
  in sub nativecast at 
/Applications/Rakudo/share/perl6/sources/51E302443A2C8FF185ABC10CA1E5520EFEE885A1
 (NativeCall::Types) line 5
  in method deref at 
/Applications/Rakudo/share/perl6/sources/51E302443A2C8FF185ABC10CA1E5520EFEE885A1
 (NativeCall::Types) line 58
  in sub linenoise at 
/Applications/Rakudo/share/perl6/site/sources/0BDF8C54D33921FEA066491D8D13C96A7CB144B9
 (Linenoise) line 86
  in any interactive at src/Perl6/Compiler.nqp line 62

I enter this Unicode character by using the Japanese input method, but you can 
use the Mac OS X unicode/emoji viewer. I used the N-ARY INTERSECT character 
above but the fat INTERSECT character does the same thing.

This problem does not occur when entering a program in this shell using perl6 
-e ‘’, nor when I run vi in this shell and edit in vi.
Environment: Mac OS X 10.12.3 Sierra, Rakudo 2017.1 dmg, iTerm Build 
3.1.beta.1, bash-3.2 with export LC_ALL=en_US.UTF-8

Regards,

Matt


[perl #131002] [RFC] Add support for Unicode versions of ?? and !!

2017-03-15 Thread Zoffix Znet via RT
Per discussion[^1], closing this RFC due to current lack of interest in the 
feature.

[1] https://irclog.perlgeek.de/perl6/2017-03-15#i_14269321


[perl #131002] [RFC] Add support for Unicode versions of ?? and !!

2017-03-15 Thread Zoffix Znet via RT
Rakudo:
PR that added support: https://github.com/rakudo/rakudo/pull/1029/
Revert commit: https://github.com/rakudo/rakudo/commit/9644fc360f

Roast:
PR that added tests: https://github.com/perl6/roast/pull/246
Revert commit: https://github.com/perl6/roast/commit/b4d4df1e09

Docs:
Issues on new ternary: https://github.com/perl6/doc/issues/1228
Commit that added to docs: https://github.com/perl6/doc/commit/e4d341bbc5
Revert commit: https://github.com/perl6/doc/commit/3bdf5329ed


[perl #131002] [RFC] Add support for Unicode versions of ?? and !!

2017-03-15 Thread via RT
# New Ticket Created by  Zoffix Znet 
# Please include the string:  [perl #131002]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=131002 >


Some people voiced interest in adding Unicode versions for the ternary operator 
and the two characters
below were briefly implemented:

U+2047 DOUBLE QUESTION MARK [Po] (⁇)
U+203C DOUBLE EXCLAMATION MARK [Po] (‼)

Their introduction created an LTA error message[1] and they had some
rendering issues (such as ‼ rendering as an emoji[^2]; or being
really ugly in some fonts[^3]). It's also not entirely clear what the
proposed characters' actual intended use is and whether they're
an appropriate choice for the job and will be widely used by users.

In light of these concerns, it was decided we revert the addition of
these characters as an alternative ternary. I will include the
links to all the revert commits in a reply to this ticket shortly.

[1] https://irclog.perlgeek.de/perl6/2017-03-14#i_14261780
[2] https://twitter.com/zoffix/status/841811442588385281
[3] https://irclog.perlgeek.de/perl6/2017-03-15#i_14265206
[4] https://irclog.perlgeek.de/perl6/2017-03-15#i_14265177


[perl #130912] [BUG] Str.perl/repl fail on outside-Unicode codepoints

2017-03-04 Thread via RT
# New Ticket Created by  Zefram 
# Please include the string:  [perl #130912]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=130912 >


> "\x[11]".ords
(1114112)
> "\x[11]".gist.ords
(1114112)
> "\x[11]".perl.ords
(34 1114112 34)
> "\x[11]"
Error encoding UTF-8 string: could not encode codepoint 1114112
> "\x[11]".perl
Error encoding UTF-8 string: could not encode codepoint 1114112

This string contains the first out-of-Unicode-range codepoint.  There is
a bug somewhere in the above, leading to the error messages, but it's
a matter of opinion which part contains the bug.

Since the Str type is normally described as representing a Unicode string,
it would be reasonable to say that it cannot contain an out-of-Unicode
codepoint.  In that view, the bug is that the string literal "\x[11]"
is accepted.  It's also then a bug that chr(0x11) evaluates without
error, and so on for other ways of constructing a string.

If it is accepted that a Str can contain an out-of-Unicode codepoint,
then methods such as .perl and .gist need to handle that appropriately.
The range of characters that may be used in .perl output isn't explicitly
stated, but it would certainly be reasonable to say that it should be
a subset of Unicode.  In that view, it is a bug that .perl uses this
codepoint in its output: it should represent that grapheme non-literally,
in the same way that it does for "\x[1]".  Similar arguments apply to
.gist, though not as strongly.

If it's accepted that text from .perl or .gist, intended for the user
to see, may contain out-of-Unicode-range codepoints, then it is a bug
that the repl fails to display such text.  The UTF-8 codepoint-to-octets
encoding extends up to codepoint 0x7fff, so there's an obvious way to
output it if it were willing.  Whether the user's terminal could render
it is another matter, but if you're concerned about that then that would
be a good reason to say that .perl and .gist shouldn't be including this
sort of thing in their output.

-zefram


Re: [perl #127925] [BUG] Unicode handling on Windows command line

2017-02-11 Thread A. Sinan Unur
https://github.com/MoarVM/MoarVM/pull/528/files?diff=split

was merged so this ticket can be closed. Thank you.

-- Sinan


Re: [perl #127925] [BUG] Unicode handling on Windows command line

2017-02-10 Thread A. Sinan Unur
I think my pull request has reached the point where it should work on
others' machines, too ;-)

Please try it out:

https://github.com/MoarVM/MoarVM/pull/528/files?diff=split


Re: [perl #127925] [BUG] Unicode handling on Windows command line

2017-02-08 Thread A. Sinan Unur
See also https://github.com/MoarVM/MoarVM/issues/527


[perl #127925] [BUG] Unicode handling on Windows command line

2017-02-07 Thread A. Sinan Unur
@Parrot Raiser, please see

https://github.com/perl6/nqp/issues/346#issuecomment-278090170

https://github.com/perl6/nqp/issues/346#issuecomment-278102220

https://github.com/perl6/nqp/issues/346#issuecomment-278104580

-- Sinan


Re: [perl #130736] AutoReply: Bug #127925 for perl6: [BUG] Unicode handling on Windows command line

2017-02-07 Thread A. Sinan Unur
I created this report by mistake when I was hastily trying to follow-up on

https://rt.perl.org/Public/Bug/Display.html?id=127925

The reply belongs there. I would appreciate it if you could merge this
ticket with the correct one.

Apologies and thank you.


[perl #130736] Bug #127925 for perl6: [BUG] Unicode handling on Windows command line

2017-02-07 Thread via RT
# New Ticket Created by  A. Sinan Unur 
# Please include the string:  [perl #130736]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=130736 >


The problem is caused by the fact that moar.exe uses main rather than
wmain, so it has no hope of getting the right characters if anything
outside of the code page is used.

However, I wrote a wrapper around moar.exe which basically does what
perl6.bat does but UTF-8 encodes the command line arguments which
moar.exe relays to whatever executes Perl 6 code.

In that case, I get incorrectly encoded output as in:

C:\> p6run -e "say 'yağmur'"
yaÄŸmur

and

C:\> p6run yağmur.pl6
Could not open yaÄŸmur.pl6. Failed to stat file: no such file or directory

perl and a bunch of other programs also use main instead of wmain, so
they also end up echoing "yagmur" instead of  "yağmur" and attempt to
open "yagmur.pl" instead of "yağmur.pl6", but that is beside the
point.

I would like to know what encoding is assumed for command line
arguments to perl6 and where that is determined/implemented/handled.

Thank you.

-- Sinan


Re: [perl #127925] [BUG] Unicode handling on Windows command line

2017-02-07 Thread Parrot Raiser
A quick look at Stackoverflow suggests that Windows isn't being
terribly helpful.

On 2/7/17, Zoffix Znet via RT  wrote:
> Another report in NQP repo: https://github.com/perl6/nqp/issues/346
>
> ->8--
>
> It is entirely possible that I am missing something obvious, but while
> trying to figure out what happens between typing
>
> C:\> perl6 -e "say 'yağmur'"
>
> and getting the output
>
> yagmur
>


[perl #127925] [BUG] Unicode handling on Windows command line

2017-02-07 Thread Zoffix Znet via RT
Another report in NQP repo: https://github.com/perl6/nqp/issues/346

->8--

It is entirely possible that I am missing something obvious, but while trying 
to figure out what happens between typing

C:\> perl6 -e "say 'yağmur'"

and getting the output

yagmur


[perl #130564] [UNI] East_Asian_Width unicode property is not supported ( .uniprop(‘East_Asian_Width’) )

2017-01-15 Thread via RT
# New Ticket Created by  Aleks-Daniel Jakimenko-Aleksejev 
# Please include the string:  [perl #130564]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=130564 >


Code:
dd ‘あ’.uniprop(‘East_Asian_Width’)

Result:
""


Not sure what the right output should be, but definitely not an empty string :)


[perl #117683] [UNI] Several unicode char (nick)names unrecognized

2017-01-13 Thread Samantha McVey
I have fixed it on the JVM as of NQP commit:
# Fix RT #117683 on JVM \c[LINE FEED] \c[CARRIAGE RETURN]
#Also fixes \c[NEXT LINE] as well.
https://github.com/perl6/nqp/commit/0c249e7236a63325e6440df55a762a4378e6e63a

Fixed on MoarVM as of MoarVM commit:
# Fix RT #117683 \c[LINE FEED] \c[CARRIAGE RETURN]
# Also fixes \c[NEXT LINE] and \c[FORM FEED] as well.
https://github.com/MoarVM/MoarVM/commit/ef734fc8d7abaf687c5108f06c181f1bd6333634

Also in addition as soon as this Pull Request gets accepted we will get access 
to all of the Unicode Alias names:
https://github.com/MoarVM/MoarVM/pull/497


[perl6/specs] 660070: S15-unicode, change .chars to .codes where this wa...

2017-01-09 Thread GitHub
  Branch: refs/heads/master
  Home:   https://github.com/perl6/specs
  Commit: 660070af5a819dbbca51fc78e7707dee5e69e44c
  
https://github.com/perl6/specs/commit/660070af5a819dbbca51fc78e7707dee5e69e44c
  Author: Samantha McVey <samant...@posteo.net>
  Date:   2017-01-08 (Sun, 08 Jan 2017)

  Changed paths:
M S15-unicode.pod

  Log Message:
  ---
  S15-unicode, change .chars to .codes where this was actually intended

`.chars` was used be all the examples output's specified the number of
codes. Use `.codes` in the example because of this




[perl #130483] [UNI] Regex Unicode properties check values before checking property names

2017-01-02 Thread via RT
# New Ticket Created by  Samantha McVey 
# Please include the string:  [perl #130483]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=130483 >


See test in properties-general.t

The test used to pass before, but only because 'space' resolved to Unicode 
Property 'LF'='space'.

Since https://github.com/MoarVM/MoarVM/commit/
5f1e081bad4a4846f2e7a0681af60450e82155c8
or one the commits right before it, this is broken because space no longer 
means LF=space.

 #?rakudo.moar TODO "Possible bug in NQP where <:space> does not match, 
because it checks property VALUES before checking Bool property names"



[perl #130414] [BUG] associativity not right for ⁿ unicode superscript exponents

2016-12-27 Thread via RT
# New Ticket Created by  Ron Schmidt 
# Please include the string:  [perl #130414]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=130414 >


The issues have already been raised in roast issue 200 [1] of ++$i² and
2² ** 3 not behaving as expected; ie not agreeing with ++$i**2 and 2 **
2 ** 3. The discrepancy would seem to be cause for a ticket here anyway.
 The RT# from creating this ticket is also wanted for inclusion in the
skip line of the new roast tests.

  

Links:
--
[1] https://github.com/perl6/roast/issues/200


Re: [perl #130384] AutoReply: Mo or Mn Unicode characters incorrectly combine with any other character

2016-12-23 Thread Samantha McVey
It looks like according to the Unicode grapheme things, ‘degenerates’ do not 
have to be accounted for in supported the spec.

> Ignore degenerates. No special provisions are made to get marginally better 
behavior for degenerate cases that never occur in practice, such as an A 
followed by an Indic combining mark.

So we don't *have* to support this case, but the spec makes it very clear that 
the grapheme separation rules are allowed to cover more cases which may not be 
covered by the rules laid out in http://unicode.org/reports/tr29/
#Default_Grapheme_Cluster_Table

These degenerate cases are also not tested for in any of the Unicode grapheme 
spec tests they provide as well, so we are free to be smarter if we wish for 
this.


[perl #130384] Mo or Mn Unicode characters incorrectly combine with any other character

2016-12-21 Thread via RT
# New Ticket Created by  Samantha McVey 
# Please include the string:  [perl #130384]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=130384 >


say "ୈ"; # U+0B48 ORIYA VOWEL SIGN AI
Bogus statement
at /home/samantha/git/roast/EVAL_0:1
--> ⏏'ୈ'
expecting any of:
prefix
term

Discovered this while trying to add a test to roast to cover the 
Indic_Positional_Category Unicode property.


The most telling part of the bug is:
say Q<ୈtest<ୈ
OUTPUT: test

It seems these combining characters are combining with characters they should 
not combine with.


If I try Q style quoting normally:

Q<ୈ>
===SORRY!=== Error while compiling:
Couldn't find terminator <ୈ (corresponding <ୈ was at line 1)
at line 2


It seems this is also true for other Mn or Mo charactcers


[perl #129878] [bug][unicode] Grapheme boundaries not recalculated for string repetition

2016-10-14 Thread via RT
# New Ticket Created by  cygx 
# Please include the string:  [perl #129878]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=129878 >


Cf

say ("\c[REGIONAL INDICATOR SYMBOL LETTER G]" x 2).chars #=> 2

vs

say ([~] "\c[REGIONAL INDICATOR SYMBOL LETTER G]" xx 2).chars #=> 1



[perl #129259] [UNI] Unicode 9.0 (say ‘

2016-09-28 Thread jn...@jnthn.net via RT
On Mon Sep 12 14:01:41 2016, alex.jakime...@gmail.com wrote:
> It is a known issue, but I figured a ticket is not going to hurt.
> 
> Code:
> say ‘曆’.uniname
> 
> Result:
> 
> 
> Expected Result:
> BUTTERFLY

Works now, and added a test in S15-unicode-information/uniname.t to verify it 
works.

There are some further issues that need resolving as a result of bumping to 
Unicode 9, however. The NFG algorithm will need updates due to changes in the 
grapheme boundary specification to add support for emoji. There are already two 
RT issues tracking that issue, so I'll resolve this one given we've done what 
it says. :-)

/jnthn


[perl #129319] Base 16 radix using unicode 16-full-stop (U+2497) fails

2016-09-20 Thread via RT
# New Ticket Created by  Aaron Sherman 
# Please include the string:  [perl #129319]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=129319 >


Via IRC:

[15:39]  Saw the note about unicode radixes and so immediately
tried:
[15:39]  m: say :⒗
[15:39] <+camelia> rakudo-moar 77a7a4: OUTPUT«===SORRY!===␤Argument to
"say" seems to be malformed␤at :1␤--> say⏏ :⒗␤Confused␤at
:1␤--> say :⏏⒗␤expecting any of:␤colon pair␤Other
potential difficulties:␤Unsupported…»
[15:46]  harmil_wk: I see no reason why that shouldn't work.
Would you please rakudobug it and maybe ping MasterDuke about it, since
they might be more familiar with that code path

Also, a LTA error, even if that was considered invalid. Should probably
mention that a radix is a valid option there.


--
Aaron Sherman, M.:
P: 617-440-4332 Google Talk, Email and Google Plus: a...@ajs.com
Toolsmith, developer, gamer and life-long student.


Re: unicode

2016-09-17 Thread Timo Paulssen
On 17/09/16 13:34, Moritz Lenz wrote:>> Searching further I found the
ucd2c.pl program in the Moarvm tools >>  directory. This generates the
unicode_db.c somewhere else in the >> rakudo tree. I run this program
myself on the Unicode 9.0.0 >> database and comparing the generated
files shows many differences >> between the one in the rakudo tree and
the generated one. > > Please make a rakudo spectest with those changes,
and if it passes, > submit your patch as a pull request.
Unicode support is more than just having the data from the text files in
our own unicode database. In Unicode 9, the Zero Width Joiner is now
explicitly supported for emoji. If we don't change the algorithm to
create individual graphemes from streams of codepoints to consider this,
we'll end up with improper support for 8 (because new stuff is in there)
and improper support for 9 (because some stuff is missing) at the same
time; i suspect that'll help nobody.

I expect Jnthn will do the full & proper update during the coming month,
and running ucd2c.pl is the least time-consuming step of that, but
perhaps a pull request for this is still welcome.


Re: unicode

2016-09-17 Thread MT

Hi,
I am looking forward to it
Thanks,
Marcel

On Sat, Sep 17, 2016 at 01:34:45PM +0200, Moritz Lenz wrote:

Hi,

On 17.09.2016 13:12, MT wrote:

The date found in the file  unicode_db.c file is 2012-07-20 which is
about Unicode version 6.1.0

So the content in that file is not getting updated when the shipped Unicode
version is updated? If so, is there a tool that needs fixing to automate that?
  

docs/ChangeLog in MoarVM says

+ Updated to Unicode 8

in the section of the 2015.07 release, so it's not that bad :-)

I believe that the plan is to update to Unicode 9 just after this month's
release (to give a whole month to iron out any instabilities or bugs).

So it might be a little bit bad this month, but next month will be awesome.
Allegedly :-)

Nicholas Clark





Re: unicode

2016-09-17 Thread Nicholas Clark
On Sat, Sep 17, 2016 at 01:34:45PM +0200, Moritz Lenz wrote:
> Hi,
> 
> On 17.09.2016 13:12, MT wrote:

> > The date found in the file  unicode_db.c file is 2012-07-20 which is 
> > about Unicode version 6.1.0

So the content in that file is not getting updated when the shipped Unicode
version is updated? If so, is there a tool that needs fixing to automate that?
 
> docs/ChangeLog in MoarVM says
> 
> + Updated to Unicode 8
>
> in the section of the 2015.07 release, so it's not that bad :-)

I believe that the plan is to update to Unicode 9 just after this month's
release (to give a whole month to iron out any instabilities or bugs).

So it might be a little bit bad this month, but next month will be awesome.
Allegedly :-)

Nicholas Clark


Re: unicode

2016-09-17 Thread MT





Searching further I found the ucd2c.pl program in the Moarvm tools
directory. This generates the unicode_db.c somewhere else in the rakudo
tree. I run this program myself on the Unicode 9.0.0 database and
comparing the generated files shows many differences between the one in
the rakudo tree and the generated one.

Please make a rakudo spectest with those changes, and if it passes,
submit your patch as a pull request.


The date found in the file  unicode_db.c file is 2012-07-20 which is
about Unicode version 6.1.0
How do I proceed from here? Do I pull in the newest rakudo version, make 
another git branch, then change it and then push the branch after the 
tests have run successfully ? This way I am not able to cripple the 
rakudo code. Other people can check the changes too before merging.

docs/ChangeLog in MoarVM says

+ Updated to Unicode 8
in the section of the 2015.07 release, so it's not that bad :-)

I have seen it now, indeed not that old, but it means also the Unicode 
changes a lot between versions.


Greets,
Marcel


Re: unicode

2016-09-17 Thread Moritz Lenz
Hi,

On 17.09.2016 13:12, MT wrote:

> Searching further I found the ucd2c.pl program in the Moarvm tools 
> directory. This generates the unicode_db.c somewhere else in the rakudo 
> tree. I run this program myself on the Unicode 9.0.0 database and 
> comparing the generated files shows many differences between the one in 
> the rakudo tree and the generated one.

Please make a rakudo spectest with those changes, and if it passes,
submit your patch as a pull request.

> The date found in the file  unicode_db.c file is 2012-07-20 which is 
> about Unicode version 6.1.0

docs/ChangeLog in MoarVM says

+ Updated to Unicode 8

in the section of the 2015.07 release, so it's not that bad :-)

Cheers,
Moritz

-- 
Moritz Lenz
https://deploybook.com/ -- https://perlgeek.de/ -- https://perl6.org/


unicode

2016-09-17 Thread MT

Hi,

I am wondering if perl 6 is keeping up pace with unicode versions. I've 
done the following after I have seen that the method/sub uniprop() did 
not give proper results all the time using data from the PropList.txt 
taken from Unicode version 9.0.0.


E.g. 0xFFD0.uniprop('Noncharacter_Code_Point') returns 0 instead of 1.

To check out all data from the PropList.txt I've made a program which 
revealed more of those.


Searching further I found the ucd2c.pl program in the Moarvm tools 
directory. This generates the unicode_db.c somewhere else in the rakudo 
tree. I run this program myself on the Unicode 9.0.0 database and 
comparing the generated files shows many differences between the one in 
the rakudo tree and the generated one.


The date found in the file  unicode_db.c file is 2012-07-20 which is 
about Unicode version 6.1.0


Greetings,
Marcel



[perl #129270] [BUG] Unicode ellipsis works as a Stub, but still triggers redeclaration errors

2016-09-16 Thread Zoffix Znet via RT
Fixed in https://github.com/rakudo/rakudo/commit/d63f983290
Tests added in https://github.com/perl6/roast/commit/0ade2a58c9


[perl #129279] [BUG] [UNI] Unicode digits cause an error in radix bases, match variables, and type constraints

2016-09-15 Thread via RT
# New Ticket Created by  Daniel Green 
# Please include the string:  [perl #129279]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=129279 >


 m: say :3<12>
 rakudo-moar 466770: OUTPUT«5␤»
 m: say :۳<12>
 rakudo-moar 466770: OUTPUT«===SORRY!===␤Error encoding ASCII string: 
could not encode codepoint 1779␤»

 m: "a b" ~~ /(\w) \s (\w)/; say $1
 rakudo-moar 466770: OUTPUT«「b」␤»
 m: "a b" ~~ /(\w) \s (\w)/; say $١
 rakudo-moar 466770: OUTPUT«===SORRY!===␤Error encoding ASCII string: 
could not encode codepoint 1633␤»

 m: sub f(-1) { say "hi" }; say f(-1)
 rakudo-moar 466770: OUTPUT«hi␤True␤»
 m: sub f(-١) { say "hi" }; say f(-1)
 rakudo-moar 466770: OUTPUT«===SORRY!===␤Error encoding ASCII string: 
could not encode codepoint 1633␤»


[perl #129270] [BUG] Unicode ellipsis works as a Stub, but still triggers redeclaration errors

2016-09-14 Thread via RT
# New Ticket Created by  Zoffix Znet 
# Please include the string:  [perl #129270]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=129270 >


Works:
zoffix@VirtualBox:~$ perl6 -e 'class Foo { ... }; class Foo { }'

Redeclaration error:
zoffix@VirtualBox:~$ perl6 -e 'class Foo { … }; class Foo { }'  


   
===SORRY!=== Error while compiling -e
Redeclaration of symbol 'Foo'
at -e:1
--> class Foo { … }; class Foo⏏ { }
expecting any of:
generic role

Even though it can be used as a stub:
zoffix@VirtualBox:~$ perl6 -e '…'   


   
Stub code executed
  in any  at 
/home/zoffix/.rakudobrew/moar-nom/install/share/perl6/runtime/CORE.setting.moarvm
 line 1
  in block  at -e line 1

zoffix@VirtualBox:~$ perl6 -v
This is Rakudo version 2016.08.1-163-g04af57c built on MoarVM version 
2016.08-43-g3d04391
implementing Perl 6.c.


[perl #129259] [UNI] Unicode 9.0 (say ‘

2016-09-12 Thread via RT
# New Ticket Created by  Aleks-Daniel Jakimenko-Aleksejev 
# Please include the string:  [perl #129259]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=129259 >


It is a known issue, but I figured a ticket is not going to hurt.

Code:
say ‘曆’.uniname

Result:


Expected Result:
BUTTERFLY


[perl #125071] Roast rakudo skip/todo test:./S15-unicode-information/uniname.t line:44 reason: :either and :one NYI

2016-07-27 Thread Will Coleda via RT
This ticket duplicates the individual :one & :either tickets, rejecting.

-- 
Will "Coke" Coleda


[perl #128706] Unicode character from name

2016-07-23 Thread via RT
# New Ticket Created by  Aaron Sherman 
# Please include the string:  [perl #128706]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=128706 >


I have a name in a scalar of a unicode codepoint. I want to get the
codepoint number or character. It looks like there's no way to do that
other than eval:

$ perl6 -e 'my $name = "COMMA"; say "qq\{\\c[$name]}".EVAL'
,

>From IRC:

[14:51]  what a good question
[14:51]  i exposed the nqp op for the lookup i think
[14:52]  but i didnt want to decide on the p6 function name
[14:52]  harmil: rakudobug it
[14:53]  okay
[14:53]  m: say “\c[PILE OF POO]”.uniname
[14:53]  AlexDaniel: |«HEAD»: PILE OF POO
[14:54] == autarch [~auta...@173-11-48-49-minnesota.hfc.comcastbusiness.net]
has joined #perl6
[14:55]  codepointffromname
[14:55]  is the name of tge nqp op
[14:55]  one less f

Version info:

$ perl6 -v
This is Rakudo version 2016.07.1-34-ge5c909c built on MoarVM version
2016.07-3-gc01472d
implementing Perl 6.c.


--
Aaron Sherman, M.:
P: 617-440-4332 Google Talk, Email and Google Plus: a...@ajs.com
Toolsmith, developer, gamer and life-long student.


Re: [perl #128705] Segfault on unicode string handling

2016-07-23 Thread Nicholas Clark
On Sat, Jul 23, 2016 at 10:48:40AM -0700, Aaron Sherman wrote:

> $ perl6 -e 'for 0..0x -> $i { say $i if $i %% 100; my $c = try {
> :16(uniprop($i.chr, "Bidi_Mirroring_Glyph")).chr }; say "{$i.fmt("%04x")}:
> {$i.chr} ~ $c" if $c.defined}'
> 0
> 100
> 200
> 300
> 400
> 500
> 600
> 700
> 800
> 900
> 1000
> 1100
> Segmentation fault (core dumped)

I get the SEGV earlier:

$ ./perl6-m -Ilib -e 'for 0..0x -> $i { say $i if $i %% 100; my $c = try { 
:16(uniprop($i.chr, "Bidi_Mirroring_Glyph")).chr }; say "{$i.fmt("%04x")}: 
{$i.chr} ~ $c" if $c.defined}'
0
100
200
ASAN:SIGSEGV
=
==20272==ERROR: AddressSanitizer: SEGV on unknown address 0x (pc 
0x7fa2e639f022 sp 0x7fff92696ac0 bp 0x7fff92698660 T0)
#0 0x7fa2e639f021 in MVM_interp_run src/core/interp.c:2859
#1 0x7fa2e666f7c4 in MVM_vm_run_file src/moar.c:304
#2 0x401a4f in main src/main.c:191
#3 0x7fa2e5bb0d5c in __libc_start_main (/lib64/libc.so.6+0x1ed5c)
#4 0x401058 (/home/nicholas/Sandpit/moar-san/bin/moar+0x401058)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV src/core/interp.c:2859 MVM_interp_run
==20272==ABORTING


Looks like "just" a direct NULL pointer dereference, not the fallout from
earlier more subtle undefined behaviour.

Nicholas Clark


[perl #128705] Segfault on unicode string handling

2016-07-23 Thread via RT
# New Ticket Created by  Aaron Sherman 
# Please include the string:  [perl #128705]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=128705 >


On rakudo version 2016-04 and also the current m: bot version on IRC, this
code segfaults:

$ perl6 -e 'for 0..0x -> $i { say $i if $i %% 100; my $c = try {
:16(uniprop($i.chr, "Bidi_Mirroring_Glyph")).chr }; say "{$i.fmt("%04x")}:
{$i.chr} ~ $c" if $c.defined}'
0
100
200
300
400
500
600
700
800
900
1000
1100
Segmentation fault (core dumped)

I think that it's defaulting to uniprop-int, but the odd thing is that
uniprop seems to be returning consecutive integers that aren't related to
the value being tested.

If I call uniprop-str, this works as expected.


--
Aaron Sherman, M.:
P: 617-440-4332 Google Talk, Email and Google Plus: a...@ajs.com
Toolsmith, developer, gamer and life-long student.


[perl #127925] [BUG] Unicode handling on Windows command line

2016-04-19 Thread via RT
# New Ticket Created by  zebster 
# Please include the string:  [perl #127925]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=127925 >


I hope Unicode passes through via e-mail, if not please refer to the
identical description at https://stackoverflow.com/q/36648940/1529709

Unicode handling on the Windows command line fails:

C:\Windows\System32>perl6 -e "'Я'.say"
?

Interestingly, this works:

C:\Windows\System32>perl6 -e "Buf.new(0xD0, 0xAF).decode('UTF-8').say"
Я

Seen on Rakudo version 2016.01.1 built on MoarVM version 2016.01
Windows 8.1 64-bit


[perl #127671] 「dir」 dies if weird unicode sequences are encountered (dir;)

2016-03-13 Thread Tobias Leich via RT
Patch: https://github.com/MoarVM/MoarVM/commit/79dce1101b

I hesitate to put a test for this in...
Closing as resolved anyway.


Re: [perl #127671] 「dir」 dies if weird unicode sequences are encountered (dir;)

2016-03-07 Thread Parrot Raiser
On Mageia 5, that creates a subdirectory and generates the error.

On 3/7/16, Alex Jakimenko  wrote:
> # New Ticket Created by  Alex Jakimenko
> # Please include the string:  [perl #127671]
> # in the subject line of all future correspondence about this issue.
> # https://rt.perl.org/Ticket/Display.html?id=127671 >
>
>
> One-liner to reproduce the bug (preferably run it in an empty directory):
>
> perl -E 'mkdir pack "h*", "60ba"'; perl6 -e 'dir; say ‘hello’'
>
> Result:
> Malformed UTF-8 at line 1 col 2
>   in block  at -e line 1
>
>
> It should not die. Such file (or directory) exists and all operations with
> it should work.
>
> I've reproduced that on linux with ext4 filesystem. ugexe confirmed it.
>
> It was reported that this bug is not reproducible with that one-liner on OS
> X:
>  AlexDaniel I get no errors. it just makes a directory called
> ^F%AB/..
>
> Windows gives no error, and doesn't appear to make the directory
>
> See also this: http://irclog.perlgeek.de/perl6/2016-03-07#i_12150444
>


[perl #127671] 「dir」 dies if weird unicode sequences are encountered (dir;)

2016-03-07 Thread via RT
# New Ticket Created by  Alex Jakimenko 
# Please include the string:  [perl #127671]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=127671 >


One-liner to reproduce the bug (preferably run it in an empty directory):

perl -E 'mkdir pack "h*", "60ba"'; perl6 -e 'dir; say ‘hello’'

Result:
Malformed UTF-8 at line 1 col 2
  in block  at -e line 1


It should not die. Such file (or directory) exists and all operations with it 
should work.

I've reproduced that on linux with ext4 filesystem. ugexe confirmed it.

It was reported that this bug is not reproducible with that one-liner on OS X:
 AlexDaniel I get no errors. it just makes a directory called 
^F%AB/..

Windows gives no error, and doesn't appear to make the directory

See also this: http://irclog.perlgeek.de/perl6/2016-03-07#i_12150444


Re: [perl #126678] [JVM] Failings tests in S15-unicode-information/uniname.t: Method 'NFC' not found for invocant of class 'Str'

2015-11-29 Thread Elizabeth Mattijsen
Added NYI stubs for Str.NFC/NFD/NFKC/NFKD for JVM in 
289723972efae46b6a5236ee6049

> On 18 Nov 2015, at 20:46, Christian Bartolomaeus (via RT) 
> <perl6-bugs-follo...@perl.org> wrote:
> 
> # New Ticket Created by  Christian Bartolomaeus 
> # Please include the string:  [perl #126678]
> # in the subject line of all future correspondence about this issue. 
> # https://rt.perl.org/Ticket/Display.html?id=126678 >
> 
> 
> There are skipped tests in S15-unicode-information/uniname.t which die with 
> the following error mode:
> 
> $ perl6-j -e 'say uninames("AB")'
> Method 'NFC' not found for invocant of class 'Str'
>  in block  at -e:1
> 
> The expected output would be '(LATIN CAPITAL LETTER A LATIN CAPITAL LETTER 
> B)'.



[perl #126678] [JVM] Failings tests in S15-unicode-information/uniname.t: Method 'NFC' not found for invocant of class 'Str'

2015-11-18 Thread via RT
# New Ticket Created by  Christian Bartolomaeus 
# Please include the string:  [perl #126678]
# in the subject line of all future correspondence about this issue. 
# https://rt.perl.org/Ticket/Display.html?id=126678 >


There are skipped tests in S15-unicode-information/uniname.t which die with the 
following error mode:

$ perl6-j -e 'say uninames("AB")'
Method 'NFC' not found for invocant of class 'Str'
  in block  at -e:1

The expected output would be '(LATIN CAPITAL LETTER A LATIN CAPITAL LETTER B)'.


[perl #124872] Roast rakudo skip/todo test:./S05-mass/properties-general.t line:816 reason: Unicode spec change in v6.1

2015-10-31 Thread Will Coleda via RT
These todo tests are fudged for the JVM only and are now passing.

Removed the fudge, closing ticket.


-- 
Will "Coke" Coleda


[perl #117683] Several unicode char (nick)names unrecognized

2015-10-27 Thread Will Coleda via RT
On Mon Mar 02 21:56:43 2015, coke wrote:
> On Sun Jul 20 10:16:40 2014, coke wrote:
> > On Sat Apr 20 19:12:48 2013, pmichaud wrote:
> > > On Sat, Apr 20, 2013 at 07:05:25PM -0700, Will Coleda wrote:
> > > > This seems to be an OS X only error:
> > > >
> > > > ./perl6 -e 'say "\c[LINE FEED (LF)]"'
> > > > ===SORRY!===
> > > > Unrecognized character name LINE FEED (LF)
> > > > at -e:1
> > > >
> > > > but:
> > > >
> > > > ./perl6 -e 'say "\c[COLON]"'
> > > > :
> > >
> > > What version of icu?
> > >
> > > Pm
> > >
> >
> > Note that these failures are also happening on rakudo-moar
> 
> The test was being skipped on all platforms. It's now unfudged, but
> will probably require a platform-specific todo for OS X.

\c[LINE FEED] is still unrecognized, the test is currently skipped.

-- 
Will "Coke" Coleda


[perl #125556] Rakudo doesn't do Unicode Special Casing (uc, tc, uclc, tclc for ffl ligature,Turkish i, etc.)

2015-10-09 Thread jn...@jnthn.net via RT
On Sun Jul 05 17:56:45 2015, raiph wrote:
> What I did:
> 
> > say 'ffl'.uc; # say the uppercased version of an ffl ligature
> 
> What I got with camelia (rakudo-moar 01edd3):
> 
> ffl
> 
> "What I expected":
> 
> FFL
> 
> 
> 
> "What I expected" is based on
> http://unicode.org/Public/UNIDATA/SpecialCasing.txt which defines a
> bunch of special casing rules:
> 
> "The data in this file, combined with the simple case mappings in
> UnicodeData.txt, defines the full case mappings Lowercase_Mapping
> (lc), Titlecase_Mapping (tc), and Uppercase_Mapping (uc)."
> 
> The entry for ffl approximates to:
> 
> ;  ;  ; ; # 
> FB04;FB04; 0046 0066 006C;  0046 0046 004C;  # LATIN SMALL
> LIGATURE FFL
> 
> (Note difference between title case and upper case.)
> 
> 
> 
> A quick search of MoarVM's source code for SpecialCasing reveals this
> comment:
> 
> # XXX SpecialCasing.txt # haven't decided how to do it
> 
> (in the ucd2c.pl tool)
> 
> I'm surmising that Rakudo (MoarVM) does none of this special casing
> yet.
> 
> 
> 
We handle SpecialCasing in MoarVM now. I've added and unfudged various 
spectests covering that. The Greek final sigma is also properly handled, the 
various cases well tested.

The Turkish i is not something a generic Unicode implementation should do; it's 
marked with a regional condition in SpecialCasing.txt. Handling of those will 
be left to module space for the time being.



[perl #125556] Rakudo doesn't do Unicode Special Casing (uc, tc, uclc, tclc for ffl ligature,Turkish i, etc.)

2015-07-05 Thread via RT
# New Ticket Created by  raiph 
# Please include the string:  [perl #125556]
# in the subject line of all future correspondence about this issue. 
# URL: https://rt.perl.org/Ticket/Display.html?id=125556 


What I did:

 say 'ffl'.uc; # say the uppercased version of an ffl ligature

What I got with camelia (rakudo-moar 01edd3):

  ffl

What I expected:

  FFL



What I expected is based on 
http://unicode.org/Public/UNIDATA/SpecialCasing.txt which defines a bunch of 
special casing rules:

The data in this file, combined with the simple case mappings in 
UnicodeData.txt, defines the full case mappings Lowercase_Mapping (lc), 
Titlecase_Mapping (tc), and Uppercase_Mapping (uc).

The entry for ffl approximates to:

code;  lower;  title; upper; # comment
FB04;FB04; 0046 0066 006C;  0046 0046 004C;  # LATIN SMALL LIGATURE FFL

(Note difference between title case and upper case.)



A quick search of MoarVM's source code for SpecialCasing reveals this comment:

  # XXX SpecialCasing.txt # haven't decided how to do it

(in the ucd2c.pl tool)

I'm surmising that Rakudo (MoarVM) does none of this special casing yet.



Digging a little bit more in to What I did and What I got:

 say 'ffl'.uniname

  LATIN SMALL LIGATURE FFL

 say 'ffl'.NFD

  NFD:0xfb04

The canonical decomposition of this precomposed codepoint is to the individual 
'f' and 'l' characters of which the ligature is composed, i.e. three codepoints:

 say 'ffl'.NFKD, 'ffl'.NFKD.Str

  NFKD:0x0066 0066 006c, ffl



[perl #125057] Roast rakudo skip/todo test:./S04-declarations/constant.t line:151 reason: 'unicode constant name'

2015-05-08 Thread Christian Bartolomaeus via RT
This test actually works:

$ perl6 -e 'my $ok; constant λ = 42; $ok = λ == 42; say $ok'
True

I'm closing this ticket as 'resolved'.


[perl6/specs] 5c8213: Use unicode notation; \x escape is defined elsewhe...

2015-04-05 Thread GitHub
  Branch: refs/heads/master
  Home:   https://github.com/perl6/specs
  Commit: 5c8213fc5cc0ffc3e607f88c25dd7bed0976443d
  
https://github.com/perl6/specs/commit/5c8213fc5cc0ffc3e607f88c25dd7bed0976443d
  Author: Lucas Buchala lucasbuch...@gmail.com
  Date:   2015-04-04 (Sat, 04 Apr 2015)

  Changed paths:
M S03-operators.pod

  Log Message:
  ---
  Use unicode notation; \x escape is defined elsewhere


  Commit: 8e1c6ee1291e8ae4117889ed9583a317ee55c427
  
https://github.com/perl6/specs/commit/8e1c6ee1291e8ae4117889ed9583a317ee55c427
  Author: Lucas Buchala lucasbuch...@gmail.com
  Date:   2015-04-04 (Sat, 04 Apr 2015)

  Changed paths:
M S02-bits.pod

  Log Message:
  ---
  Wrap unicode chars in C


  Commit: 5da63dc8da6357bd61cc4ea2ca515cf83cffc86e
  
https://github.com/perl6/specs/commit/5da63dc8da6357bd61cc4ea2ca515cf83cffc86e
  Author: Lucas Buchala lucasbuch...@gmail.com
  Date:   2015-04-04 (Sat, 04 Apr 2015)

  Changed paths:
M S02-bits.pod
M S03-operators.pod
M S05-regex.pod
M S19-commandline.pod
M S28-special-names.pod
M S29-functions.pod

  Log Message:
  ---
  Small POD and typographical fixes


  Commit: df6c1b440a1a64a9c60ca2467b6d7298badeb4ae
  
https://github.com/perl6/specs/commit/df6c1b440a1a64a9c60ca2467b6d7298badeb4ae
  Author: Lucas Buchala lucasbuch...@gmail.com
  Date:   2015-04-04 (Sat, 04 Apr 2015)

  Changed paths:
M S19-commandline.pod

  Log Message:
  ---
  Update URLs mentioned in S19


  Commit: 43579ecc3e8ae1a4e7ae952636f028d935f86468
  
https://github.com/perl6/specs/commit/43579ecc3e8ae1a4e7ae952636f028d935f86468
  Author: Lucas Buchala lucasbuch...@gmail.com
  Date:   2015-04-04 (Sat, 04 Apr 2015)

  Changed paths:
M S03-operators.pod

  Log Message:
  ---
  Update comment about that Fido dog


  Commit: c040f335ccfd28a9bf1c461c7ce1f1d16e4f1d23
  
https://github.com/perl6/specs/commit/c040f335ccfd28a9bf1c461c7ce1f1d16e4f1d23
  Author: Zoffix Znet zoffixz...@users.noreply.github.com
  Date:   2015-04-04 (Sat, 04 Apr 2015)

  Changed paths:
M S02-bits.pod
M S03-operators.pod
M S05-regex.pod
M S19-commandline.pod
M S28-special-names.pod
M S29-functions.pod

  Log Message:
  ---
  Merge pull request #92 from lucasbuchala/random-changes1

Miscellaneous changes


Compare: https://github.com/perl6/specs/compare/dcd6f6c7c629...c040f335ccfd

[perl6/specs] c34d28: Add literal unicode chars in S02/Bracketing Charac...

2015-04-02 Thread GitHub
  Branch: refs/heads/master
  Home:   https://github.com/perl6/specs
  Commit: c34d2814ff20fac6530e35c32b3fbf18be534f87
  
https://github.com/perl6/specs/commit/c34d2814ff20fac6530e35c32b3fbf18be534f87
  Author: Lucas Buchala lucasbuch...@gmail.com
  Date:   2015-04-01 (Wed, 01 Apr 2015)

  Changed paths:
M S02-bits.pod

  Log Message:
  ---
  Add literal unicode chars in S02/Bracketing Characters


  Commit: 008870f2408915ae1d7008f629db2c6f66f3aa92
  
https://github.com/perl6/specs/commit/008870f2408915ae1d7008f629db2c6f66f3aa92
  Author: Zoffix Znet zoffixz...@users.noreply.github.com
  Date:   2015-04-01 (Wed, 01 Apr 2015)

  Changed paths:
M S02-bits.pod

  Log Message:
  ---
  Merge pull request #91 from lucasbuchala/unicode1

Add literal unicode chars in S02/Bracketing Characters


Compare: https://github.com/perl6/specs/compare/40163b8cab71...008870f24089

[perl #124185] Printing a Unicode surrogate code point fails with LTA error or segfault in Rakudo

2015-03-26 Thread via RT
# New Ticket Created by  Sam S. 
# Please include the string:  [perl #124185]
# in the subject line of all future correspondence about this issue. 
# URL: https://rt.perl.org/Ticket/Display.html?id=124185 


Trying to print a Unicode codepoint like 55296 can fail with multiple different 
failure modes, none of which are very helpful, and at least one of which 
(segfaullt) is a bug:

 $ perl6 -e 'say 55296.chr'
 Iteration past end of grapheme iterator
   in method print at src/gen/m-CORE.setting:17885
   in sub say at src/gen/m-CORE.setting:18644
   in block unit at -e:1

 $ perl6 -e 'say A ~ 55296.chr'
 Error encoding UTF-8 string near grapheme position 0 with codepoint 65
   in method print at src/gen/m-CORE.setting:17885
   in sub say at src/gen/m-CORE.setting:18644
   in block unit at -e:1

 $ perl6 -e 'say A, 55296.chr'
 A/home/smls/.rakudobrew/bin/perl6: line 2: 20876 Segmentation fault  (core 
dumped) ...


Discussion:

 moritz:well, it could die with Illegal codepoint
 moritz:or something like that
 moritz:but everything else is either a bug, or an LTA error (which is also 
a bug, IMHO)
 
 TimToady:  m: say uniname(55296)
 camelia:   rakudo-moar 9210cc: OUTPUT«Non Private Use High Surrogate␤»
 
 smls:  Does that mean it should print Invalid code point? for that
 TimToady:  maybe more No true codepoint would ever be a surrogate!


[perl #117683] Several unicode char (nick)names unrecognized

2015-03-02 Thread Will Coleda via RT
On Sun Jul 20 10:16:40 2014, coke wrote:
 On Sat Apr 20 19:12:48 2013, pmichaud wrote:
  On Sat, Apr 20, 2013 at 07:05:25PM -0700, Will Coleda wrote:
   This seems to be an OS X only error:
   
   ./perl6 -e 'say \c[LINE FEED (LF)]'
   ===SORRY!===
   Unrecognized character name LINE FEED (LF)
   at -e:1
   
   but:
   
   ./perl6 -e 'say \c[COLON]'
   :
  
  What version of icu?
  
  Pm
  
 
 Note that these failures are also happening on rakudo-moar

The test was being skipped on all platforms. It's now unfudged, but will 
probably require a platform-specific todo for OS X.

-- 
Will Coke Coleda


[perl #122654] [BUG] Operator starting with a bang (!) and having a Unicode character in it not recognized by Rakudo

2015-02-17 Thread Christian Bartolomaeus via RT
I added a test to S03-operators/misc.t with commit 
https://github.com/perl6/roast/commit/deaf607dfb

I'm closing this ticket as resolved.


[perl #111572] [BUG] Unicode problem with 2012.0 on Mac OSX 10.7.3

2014-11-10 Thread Christian Bartolomaeus via RT
With the latest parrot the tests in S19-command-line/dash-e.t pass on Mac OS X. 
I unfudged the tests with commit 
https://github.com/perl6/roast/commit/5d89b2877e and I'm closing this ticket 
now.


[perl #122341] Unicode Character 'PARAGRAPH SEPARATOR' (U+2029) cannot be used to separate lines

2014-11-09 Thread Christian Bartolomaeus via RT
With the latest Parrot the tests pass on all backends again.

I unfudged the tests with commit 
https://github.com/perl6/roast/commit/d6e7e47647 and I'm closing this ticket 
now.


[perl6/specs] e940cd: added reference to Unicode standard

2014-10-11 Thread GitHub
  Branch: refs/heads/master
  Home:   https://github.com/perl6/specs
  Commit: e940cd68c9ef04fff3a300f178ff8972059efdea
  
https://github.com/perl6/specs/commit/e940cd68c9ef04fff3a300f178ff8972059efdea
  Author: Helmut Wollmersdorfer hel...@wollmersdorfer.at
  Date:   2014-10-10 (Fri, 10 Oct 2014)

  Changed paths:
M S15-unicode.pod

  Log Message:
  ---
  added reference to Unicode standard


  Commit: 58611c38087e8e4f86d974d4bf8838f10b1f509b
  
https://github.com/perl6/specs/commit/58611c38087e8e4f86d974d4bf8838f10b1f509b
  Author: Helmut Wollmersdorfer hel...@wollmersdorfer.at
  Date:   2014-10-10 (Fri, 10 Oct 2014)

  Changed paths:
M S15-unicode.pod

  Log Message:
  ---
  Merge branch 'master' of github.com:perl6/specs


  Commit: 7a8127bc8353bde68b0537a01ccd8b091ccf34d0
  
https://github.com/perl6/specs/commit/7a8127bc8353bde68b0537a01ccd8b091ccf34d0
  Author: Helmut Wollmersdorfer hel...@wollmersdorfer.at
  Date:   2014-10-10 (Fri, 10 Oct 2014)

  Changed paths:
M S15-unicode.pod

  Log Message:
  ---
  added reference to Unicode document


Compare: https://github.com/perl6/specs/compare/05b6dfddc7f0...7a8127bc8353

[perl6/specs] 8702be: graphemes confirm to Unicode Grapheme Cluser Bound...

2014-10-11 Thread GitHub
  Branch: refs/heads/master
  Home:   https://github.com/perl6/specs
  Commit: 8702be6188b4f85bf031d39ec398bc8e09a80195
  
https://github.com/perl6/specs/commit/8702be6188b4f85bf031d39ec398bc8e09a80195
  Author: Helmut Wollmersdorfer hel...@wollmersdorfer.at
  Date:   2014-10-10 (Fri, 10 Oct 2014)

  Changed paths:
M S15-unicode.pod

  Log Message:
  ---
  graphemes confirm to Unicode Grapheme Cluser Boundaries extended


  Commit: 625f5d5f7180dec5c8e4df8b34bdfe2de6b3efd6
  
https://github.com/perl6/specs/commit/625f5d5f7180dec5c8e4df8b34bdfe2de6b3efd6
  Author: Helmut Wollmersdorfer hel...@wollmersdorfer.at
  Date:   2014-10-10 (Fri, 10 Oct 2014)

  Changed paths:
M S02-bits.pod
M S03-operators.pod
M S04-control.pod
M S06-routines.pod
A S16-io-OLD.pod
M S16-io.pod
M S19-commandline.pod
A S32-setting-library/IO-OLD.pod
M S32-setting-library/IO.pod
M S32-setting-library/Numeric.pod
M S32-setting-library/Str.pod
M S99-glossary.pod
M create_contents.p6

  Log Message:
  ---
  Merge branch 'master' of github.com:perl6/specs


Compare: https://github.com/perl6/specs/compare/e7593f051006...625f5d5f7180

[perl #122654] [BUG] Operator starting with a bang (!) and having a Unicode character in it not recognized by Rakudo

2014-08-30 Thread Carl Mäsak
# New Ticket Created by  Carl Mäsak 
# Please include the string:  [perl #122654]
# in the subject line of all future correspondence about this issue. 
# URL: https://rt.perl.org/Ticket/Display.html?id=122654 


masak ok, second iffy-related weirdness:
masak m: multi infix:«:» { $^l lt $^r }; multi infix:«!:» {
not($^l : $^r) }; say foo !: bar
camelia rakudo-moar d8c834: OUTPUT«True␤»
masak now watch as I perform my magic trick:
masak m: multi infix:«≃» { $^l lt $^r }; multi infix:«!≃» { not($^l
≃ $^r) }; say foo !≃ bar
camelia rakudo-moar d8c834: OUTPUT«===SORRY!=== Error while
compiling /tmp/q8BePJqzE4␤Cannot negate ≃ because it is not iffy
enough␤at /tmp/q8BePJqzE4:1␤-- x:«!:» { not($^l ≃ $^r) }; say
foo !≃⏏ bar␤»
masak *exactly* the same code -- just with a Unicode operator instead!
* masak submits rakudobug

Surely if the first bit of code works, the second should, too.


[perl #122654] [BUG] Operator starting with a bang (!) and having a Unicode character in it not recognized by Rakudo

2014-08-30 Thread Carl Mäsak via RT
Heh; just in case someone takes a close look at that error message and is 
terribly confused... here's the correctly pasted one:

masak m: multi infix:«≃» { $^l lt $^r }; multi infix:«!≃» { not($^l ≃ $^r) }; 
say foo !≃ bar
camelia rakudo-moar d8c834: OUTPUT«===SORRY!=== Error while compiling 
/tmp/weCLGmom3M␤Cannot negate ≃ because it is not iffy enough␤at 
/tmp/weCLGmom3M:1␤-- ix:«!≃» { not($^l ≃ $^r) }; say foo !≃⏏ bar␤»

That's better. Nothing to see here, move along.


[perl #122341] Unicode Character 'PARAGRAPH SEPARATOR' (U+2029) cannot be used to separate lines

2014-07-20 Thread via RT
# New Ticket Created by  Will Coleda 
# Please include the string:  [perl #122341]
# in the subject line of all future correspondence about this issue. 
# URL: https://rt.perl.org/Ticket/Display.html?id=122341 


S02-lexical-conventions/unicode.t has a test:

eval_lives_ok \{ 1 \} \x2029 \{ 1 \}

which currently fails.

-- 
Will Coke Coleda


[perl #117683] Several unicode char (nick)names unrecognized

2014-07-20 Thread Will Coleda via RT
On Sat Apr 20 19:12:48 2013, pmichaud wrote:
 On Sat, Apr 20, 2013 at 07:05:25PM -0700, Will Coleda wrote:
  This seems to be an OS X only error:
  
  ./perl6 -e 'say \c[LINE FEED (LF)]'
  ===SORRY!===
  Unrecognized character name LINE FEED (LF)
  at -e:1
  
  but:
  
  ./perl6 -e 'say \c[COLON]'
  :
 
 What version of icu?
 
 Pm
 

Note that these failures are also happening on rakudo-moar

-- 
Will Coke Coleda


[perl #121365] Quantifying a negated Unicode char class hangs on the JVM

2014-03-03 Thread via RT
# New Ticket Created by  Moritz Lenz 
# Please include the string:  [perl #121365]
# in the subject line of all future correspondence about this issue. 
# URL: https://rt.perl.org/Ticket/Display.html?id=121365 


The code

say o' ~~ /:!Upper*/;

hangs on rakudo-j (JVM), but works on the other backends.

Matching :!Upper without quantifier also works on rakudo-j

Originally reported by [Coke]++ on #perl6


[perl6/specs] b30fda: [S15] Add $*UNICODE and Unicode version pragma.

2014-02-27 Thread GitHub
  Branch: refs/heads/master
  Home:   https://github.com/perl6/specs
  Commit: b30fdaf8a76b5e662eb48604c58f14266d257cdb
  
https://github.com/perl6/specs/commit/b30fdaf8a76b5e662eb48604c58f14266d257cdb
  Author: lue rnd...@gmail.com
  Date:   2014-02-25 (Tue, 25 Feb 2014)

  Changed paths:
M S15-unicode.pod

  Log Message:
  ---
  [S15] Add $*UNICODE and Unicode version pragma.

This allows the user to gain some stability in the cases where Unicode's
stability policy isn't sufficient.




  1   2   3   4   5   >