Re: Breaking news: std.uni changes!

2023-01-03 Thread Richard (Rikki) Andrew Cattermole via Digitalmars-d-announce

On 04/01/2023 2:58 AM, Adam D Ruppe wrote:
On Tuesday, 3 January 2023 at 05:23:55 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
The main concern would be shared libraries, which Phobos should be 
able to be distributed as on all platforms by all compilers.


I said this on the discord chat but you should really just dynamic load 
the system icu if it is available.


Ideally. We still need an implementation for CTFE though. Its just a lot 
of work to shoehorn it in now.


Re: Breaking news: std.uni changes!

2023-01-03 Thread Richard (Rikki) Andrew Cattermole via Digitalmars-d-announce

On 04/01/2023 2:51 AM, Dukc wrote:
On Tuesday, 3 January 2023 at 04:13:53 UTC, Richard (Rikki) Andrew 
Cattermole wrote:

On 03/01/2023 10:24 AM, Dukc wrote:
Other things coming to mind: Bidirectional grapheme iteration, Word 
break and line break algorithms, lazy normalisation. Indeed, lots of 
improvement potential.


I've done word break, "lazy" normalization (so can stop at any point), 
and lazy case insensitive comparison with normalization.


Can't wait to see them in master!



But: Bidirectional grapheme iteration makes my eye twitch lol.


I did write a reverse grapheme iterator for Symmetry. It isn't fit for 
Phobos as-is since it only accepts UTF-8 strings (not other ranges) and 
is modeled after the Phobos grapheme walker, not the 15.0 standard. But 
I could ask for permission to give it to you if it'd help.


I probably won't be adding any new features to std.uni. Only finishing 
off the things that annoy me and reviewing other peoples work.


I've got enough on my plate just building my own "standard library" 
https://github.com/Project-Sidero/basic_memory :)


Re: Breaking news: std.uni changes!

2023-01-03 Thread Adam D Ruppe via Digitalmars-d-announce
On Tuesday, 3 January 2023 at 05:23:55 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
The main concern would be shared libraries, which Phobos should 
be able to be distributed as on all platforms by all compilers.


I said this on the discord chat but you should really just 
dynamic load the system icu if it is available.


Re: Breaking news: std.uni changes!

2023-01-03 Thread Dukc via Digitalmars-d-announce
On Tuesday, 3 January 2023 at 04:13:53 UTC, Richard (Rikki) 
Andrew Cattermole wrote:

On 03/01/2023 10:24 AM, Dukc wrote:
Other things coming to mind: Bidirectional grapheme iteration, 
Word break and line break algorithms, lazy normalisation. 
Indeed, lots of improvement potential.


I've done word break, "lazy" normalization (so can stop at any 
point), and lazy case insensitive comparison with normalization.


Can't wait to see them in master!



But: Bidirectional grapheme iteration makes my eye twitch lol.


I did write a reverse grapheme iterator for Symmetry. It isn't 
fit for Phobos as-is since it only accepts UTF-8 strings (not 
other ranges) and is modeled after the Phobos grapheme walker, 
not the 15.0 standard. But I could ask for permission to give it 
to you if it'd help.




Re: Breaking news: std.uni changes!

2023-01-02 Thread Richard (Rikki) Andrew Cattermole via Digitalmars-d-announce

On 03/01/2023 6:13 PM, H. S. Teoh wrote:

Is there a way to make these tables pay-as-you-go? As in, if you never
call a function that depends on a table, it would not be pulled into the
binary?


This should already be the case. I saw some stuff involving Rainer 10 
years ago who helped improve it along these lines.


The main concern would be shared libraries, which Phobos should be able 
to be distributed as on all platforms by all compilers.


Re: Breaking news: std.uni changes!

2023-01-02 Thread H. S. Teoh via Digitalmars-d-announce
On Tue, Jan 03, 2023 at 05:13:53PM +1300, Richard (Rikki) Andrew Cattermole via 
Digitalmars-d-announce wrote:
> On 03/01/2023 10:24 AM, Dukc wrote:
> > Other things coming to mind: Bidirectional grapheme iteration,
> > Word break and line break algorithms, lazy normalisation. Indeed,
> > lots of improvement potential.
> 
> I've done word break, "lazy" normalization (so can stop at any point),
> and lazy case insensitive comparison with normalization.
> 
> But: Bidirectional grapheme iteration makes my eye twitch lol.
> 
> My main concern for adding new features is increasing the size of
> Phobos binary for the tables. Most people don't need a lot of these
> optional algorithms, but they do need things like casing to work
> correctly (which makes increased size worth it).

Is there a way to make these tables pay-as-you-go? As in, if you never
call a function that depends on a table, it would not be pulled into the
binary?


T

-- 
They say that "guns don't kill people, people kill people." Well I think the 
gun helps. If you just stood there and yelled BANG, I don't think you'd kill 
too many people. -- Eddie Izzard, Dressed to Kill


Re: Breaking news: std.uni changes!

2023-01-02 Thread Richard (Rikki) Andrew Cattermole via Digitalmars-d-announce

On 03/01/2023 10:24 AM, Dukc wrote:
Other things coming to mind: Bidirectional grapheme iteration, Word 
break and line break algorithms, lazy normalisation. Indeed, lots of 
improvement potential.


I've done word break, "lazy" normalization (so can stop at any point), 
and lazy case insensitive comparison with normalization.


But: Bidirectional grapheme iteration makes my eye twitch lol.

My main concern for adding new features is increasing the size of Phobos 
binary for the tables. Most people don't need a lot of these optional 
algorithms, but they do need things like casing to work correctly (which 
makes increased size worth it).


Re: Breaking news: std.uni changes!

2023-01-02 Thread Dukc via Digitalmars-d-announce

(Sorry for the late answer)

On Wednesday, 28 December 2022 at 00:10:36 UTC, Richard (Rikki) 
Andrew Cattermole wrote:

On 28/12/2022 12:13 AM, Dukc wrote:
This is a big service for us at Symmetry. Getting Unicode 
support up to date was needed, we would have had to switch 
libraries at some point or update it ourselves. But now, 
nothing to do except perhaps dealing with a bit of breakage. 
Thank you!


I had no idea that this was becoming an issue for you guys. It 
wasn't in any of the meeting notes and I haven't seen it 
brought up anywhere. So if there is anything more like this, 
please talk about it!


Yes, I should have done that.



I see it's not quite Unicode 15 though. `graphemeStride` does 
not take Emoji sequences and prepend characters into account. 
I'm going to contribute a bit now since it's holiday, and this 
is a good task for me. PR coming soon unless I run into issues!


Yeah, there will be tons of small stuff currently missed out 
due to such a big jump and of course ping me @rikkimax, when 
you have something to review.


Loads of other work available such as culling all the version 
specific information out of the docs :)


Other things coming to mind: Bidirectional grapheme iteration, 
Word break and line break algorithms, lazy normalisation. Indeed, 
lots of improvement potential.





Re: Breaking news: std.uni changes!

2022-12-27 Thread Richard (Rikki) Andrew Cattermole via Digitalmars-d-announce

On 28/12/2022 12:13 AM, Dukc wrote:
This is a big service for us at Symmetry. Getting Unicode support up to 
date was needed, we would have had to switch libraries at some point or 
update it ourselves. But now, nothing to do except perhaps dealing with 
a bit of breakage. Thank you!


I had no idea that this was becoming an issue for you guys. It wasn't in 
any of the meeting notes and I haven't seen it brought up anywhere. So 
if there is anything more like this, please talk about it!


I see it's not quite Unicode 15 though. `graphemeStride` does not take 
Emoji sequences and prepend characters into account. I'm going to 
contribute a bit now since it's holiday, and this is a good task for me. 
PR coming soon unless I run into issues!


Yeah, there will be tons of small stuff currently missed out due to such 
a big jump and of course ping me @rikkimax, when you have something to 
review.


Loads of other work available such as culling all the version specific 
information out of the docs :)


Re: Breaking news: std.uni changes!

2022-12-27 Thread Dukc via Digitalmars-d-announce
On Saturday, 24 December 2022 at 21:26:40 UTC, Richard (Rikki) 
Andrew Cattermole wrote:

Hello one and all on this merry of all days!

Today unfortunately I bring all but joy. For std.uni has had a 
bout of work!


- Unicode tables have been updated to 15 from 6.2 (and with 
that the generator is now in Phobos!).
- Unicode categories C aka Other have been brought in line with 
TR44 specification. E.g. ``unicode.C``.


This is a big service for us at Symmetry. Getting Unicode support 
up to date was needed, we would have had to switch libraries at 
some point or update it ourselves. But now, nothing to do except 
perhaps dealing with a bit of breakage. Thank you!


I see it's not quite Unicode 15 though. `graphemeStride` does not 
take Emoji sequences and prepend characters into account. I'm 
going to contribute a bit now since it's holiday, and this is a 
good task for me. PR coming soon unless I run into issues!


Re: Breaking news: std.uni changes!

2022-12-26 Thread Walter Bright via Digitalmars-d-announce

A big thank you!


Re: Breaking news: std.uni changes!

2022-12-26 Thread Robert Schadek via Digitalmars-d-announce

Awesome work, thank you




Re: Breaking news: std.uni changes!

2022-12-25 Thread Dom Disc via Digitalmars-d-announce
On Saturday, 24 December 2022 at 21:26:40 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
- Unicode tables have been updated to 15 from 6.2 (and with 
that the generator is now in Phobos!).


Hurray!
Whatever problems this may cause, its problems in very very 
outdated code that would already need an overhaul, so what.
But it's super to have finally tables that are (at least now) up 
to date!


Breaking news: std.uni changes!

2022-12-24 Thread Richard Andrew Cattermole (Rikki) via Digitalmars-d-announce

Hello one and all on this merry of all days!

Today unfortunately I bring all but joy. For std.uni has had a 
bout of work!


- Unicode tables have been updated to 15 from 6.2 (and with that 
the generator is now in Phobos!).
- Unicode categories C aka Other have been brought in line with 
TR44 specification. E.g. ``unicode.C``.


In both cases if you use std.uni directly or indirectly (say 
std.regex), you may find yourself with code breakage on next 
release.


If you do find yourself with problems, first check that you are 
not referencing the C category, if you are, here is some code to 
mitigate your circumstance however it would be better to prevent 
such need.


```d
@property auto loadPropertyOriginal(string name)() pure
{
import std.uni : unicode;

static if (name == "C" || name == "c" || name == "other" || 
name == "Other")

{
auto target = unicode.Co;
target |= unicode.Lo;
target |= unicode.No;
target |= unicode.So;
target |= unicode.Po;
return target;
}
else
return unicode.opDispatch!name;
}
```

Lastly, the tables updating have already brought much joy to MIR, 
with a broken test. A character that was being tested wasn't 
allocated in 6.2 but was in 7 therefore results were different. 
If your test suite is not part of the Phobos runners, please be 
aware that once you update you may experience failed tests. These 
are not avoidable due to external specification its based upon. 
However in even worse news the table generator was not kept in a 
working condition in the last 10 years, so there is a chance that 
something may have been missed.


In all cases, please do contact me if you need assistance. I'm 
available on Discord, OFTC #d and of course N.G. or even email if 
you really need it (firstn...@lastname.co.nz).


--- Happy holidays to those that are currently enjoying them or 
about to!