Re: [R] Weird and changed as.roman() behavior
"Roman numerals" is actually a tricky subject, since there were
different versions at different times.
It is worth noting that the Unicode character set (and R does support
Unicode, does it not?) includes
the Roman numeral characters for 5,000 10,000 50,000 and 100,000 so
the idea that 3999 is an acceptable limit
doesn't quite make much sense any more.
It is also worth noting that Unicode also includes single-character
versions of 1-12.
The characters are U+2160 to U+2188.
For what it's worth, the Romans could express fractions that were
multiples of 1/12.
Converting between numbers and their Roman forms is not something that
regular expressions are a good tool for.
On Fri, 17 Jan 2025 at 00:05, Martin Maechler
wrote:
>
> > Stephanie Evert
> > on Wed, 15 Jan 2025 13:18:03 +0100 writes:
>
> > Well, the real issue then seems to be that .roman2numeric uses an
> invalid regular expression:
> >>> grepl("^M{,3}D?C{,4}L?X{,4}V?I{,4}$", cc)
> >> [1] TRUE TRUE TRUE TRUE TRUE
>
> > or
>
> >>> grepl("^I{,2}$", c("II", "III", ""))
> >> [1] TRUE TRUE FALSE
>
>
> > Both the TRE and the PCRE specification only allow repetition
> quantifiers of the form
>
> > {a}
> > {a,b}
> > {a,}
>
> > https://laurikari.net/tre/documentation/regex-syntax/
> > https://www.pcre.org/original/doc/html/pcrepattern.html#SEC17
>
> > {,2} and {,4} are thus invalid and seem to result in undefined
> behaviour (which PCRE and TRE fill in different ways, but consistently not
> what was intended).
>
> >> > grepl("^I{,2}$", c("II", "III", ""))
> >> [1] TRUE TRUE FALSE
>
> >> > grepl("^I{,2}$", c("II", "III", ""), perl=TRUE)
> >> [1] FALSE FALSE FALSE
>
> > Fix thus is easy: {,4} => {0,4}
>
> > Best,
> > Stephanie
>
> Thanks a lot, Stephanie -- indeed, I think I would not have searched in
> this direction at all
> ( To me it seemed "obvious" that if {3,} is well defined, {,3}
> would be so, too... But I was *wrong* and actually I also
> understand and that {,3} is not needed, and {0,3} is clearer,
> whereas {3,} is not easy to re-express ( '{0,inf}' or similar
> would make the code considerably more complicated and probably slower..)
>
> Actually, to remain back compatible (see Jani's original report:
> he'd like "I" to work, as it did for many/most of us),
> we should replace {,4} by {0,5}.
>
> But there's more: our current help page
> https://search.r-project.org/R/refmans/utils/html/roman.html
> says
>
> > Only numbers between 1 and 3999 have a unique representation
> > as roman numbers, and hence others result in as.roman(NA).
>
> which is really not quite true, in more than one sense:
>
> 1. as.roman(3899:3999) # works fine
>
> not producing any NA
>
> 2. I think, e.g., ""
> is a pretty unique representation of 4000.
>
> Also, one piece of other software (online)
> https://www.rapidtables.com/convert/number/date-to-roman-numerals.html
>
> does convert _dates_ up to the year 4999, see,
>
> https://www.rapidtables.com/convert/number/date-to-roman-numerals.html?msel=January&dsel=1&year=4999&fmtsel=MM.DD.
>
> giving CMXCIX for 4999.
>
> Hence, I also think we should enlarge the valid range from current
> {1 .. 3999} to
> {1 .. 4999}
>
> Martin
>
> __
> [email protected] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weird and changed as.roman() behavior
On Wed, 15 Jan 2025 11:41:34 +0100
Martin Maechler wrote:
> > Jani V?limaa
> > on Tue, 14 Jan 2025 20:39:19 +0200 writes:
>
> > Hello,
> > I don't know what's changed or how to figure out why as.roman() started
> > to work different way lately on Mageia Cauldron. Cauldron is the
> > latest development version of Mageia Linux.
>
> > Expected bahavior:
> >> as.roman(strrep("I", 1:5))
> > [1] I II III IV V
>
> > Current behavior:
> >> as.roman(strrep("I", 1:5))
> > [1] III III IV
> > Warning message:
> > In .roman2numeric(x) : invalid roman numeral: I
>
> > as.roman() doesn't handle "I" -> "V" anymore and thus 'make check'
> > fails when building any 4.3.x or 4.4.x versions from the sources.
>
> Not yet.
> For me, (on Linux Fedora 40),
> on current R-4.4.2, R-patched and R-devel I get the same good
> results from
>
> (cc <- strrep("I", 1:5)); (rr <- as.roman(cc)); dput(rr)
>
> > (cc <- strrep("I", 1:5)); (rr <- as.roman(cc)); dput(rr)
> [1] "I" "II""III" "" "I"
> [1] I II III IV V
> structure(1:5, class = "roman")
> >
>
> The code behind this uses grep() and grepl()
> and I assume this somehow does not work correctly on your
> platform?
>
> Digging a bit further, the crucial part in this case happens in
> the (namespace hidden) function utils ::: .roman2numeric
> which you probably already know from the above warning.
> For me,
>
> (cc <- strrep("I", 1:5)); (r2 <- utils:::.roman2numeric(cc)); dput(r2)
>
> gives
>
> > (cc <- strrep("I", 1:5)); (r2 <- utils:::.roman2numeric(cc))
> [1] "I" "II""III" "" "I"
> [1] 1 2 3 4 5
> >
>
> this must be different in your case.
>
> You can use
> debug(utils:::.roman2numeric)
> and
> utils:::.roman2numeric(cc)
>
> to find out where the problem happens.
> This will show almost surely that the problem is indeed in a
> grepl() call.
>
> I'm close to sure it is this:
>
> > grepl("^M{,3}D?C{,4}L?X{,4}V?I{,4}$", cc)
> [1] TRUE TRUE TRUE TRUE TRUE
>
> where you don't get the same, but probably
>
> [1] TRUE TRUE TRUE TRUE FALSE
>
> which I *do* get, too if I use grepl(., perl=TRUE)
> .. see also below.
>
>
> The code we use is our own tweaked version of 'TRE' (in /extra/tre/ ),
> and I do think we've occasionally seen platform dependencies.
>
> Also, yes, in 2022 there have been several changes, related to
> fixing bugs, though several ones *before* releasing R 4.3.0.
>
> Last, but not (at all!) least:
>
> Actually, I *am* confused a bit why this ever worked (and still
> works for most of us):
>
> I'm using {,2} instead of {,4} to make things faster to grasp;
> I see
>
> > grepl("^I{,2}$", c("II", "III", ""))
> [1] TRUE TRUE FALSE
> >
>
> and I wonder why 'I{,2}' matches 3 "I"s. ... I'd thought {,2} to
> mean " up to 2 occurrences (of the previous )"
> (where here = character).
>
> In our real example, I{,4} matched 5 "I"s
>
> and as I mentioned above, the somewhat more maintained
> perl=TRUE option does *not*.
>
> We could change the code to use I{,5} to make 5x"I", i.e. "I"
> work for you .. but then that would also match
> "II" (6 x "I") for "everybody" else with our current TRE engine..
>
Thanks for your insights.
Mageia uses system TRE with R via --with-system-tre configure option.
TRE was updated some time ago to version 0.9.0, and looks like the
'issue' started at the same time.
And indeed as.roman() works as before after I rebuilt R with bundled
TRE 0.8.0 using --with-system-tre=no.
So, something changed in TRE 0.9.0 and grepl().
pgpIUarjY9lqc.pgp
Description: OpenPGP-allekirjoitus
__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weird and changed as.roman() behavior
> Martin Maechler > on Thu, 16 Jan 2025 12:04:44 +0100 writes: [..] > But there's more: our current help page > https://search.r-project.org/R/refmans/utils/html/roman.html > says >> Only numbers between 1 and 3999 have a unique representation >> as roman numbers, and hence others result in as.roman(NA). > which is really not quite true, in more than one sense: > 1. as.roman(3899:3999) # works fine > not producing any NA The above (from "in more than one sense" on) must be somewhat confusing. I thought I read '3899' (instead of '3999') on one version of the help page ... I hope that everything else I wrote *does* make sense and is hopefully correct... Martin __ [email protected] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weird and changed as.roman() behavior
> Stephanie Evert
> on Wed, 15 Jan 2025 13:18:03 +0100 writes:
> Well, the real issue then seems to be that .roman2numeric uses an invalid
regular expression:
>>> grepl("^M{,3}D?C{,4}L?X{,4}V?I{,4}$", cc)
>> [1] TRUE TRUE TRUE TRUE TRUE
> or
>>> grepl("^I{,2}$", c("II", "III", ""))
>> [1] TRUE TRUE FALSE
> Both the TRE and the PCRE specification only allow repetition quantifiers
of the form
> {a}
> {a,b}
> {a,}
> https://laurikari.net/tre/documentation/regex-syntax/
> https://www.pcre.org/original/doc/html/pcrepattern.html#SEC17
> {,2} and {,4} are thus invalid and seem to result in undefined behaviour
(which PCRE and TRE fill in different ways, but consistently not what was
intended).
>> > grepl("^I{,2}$", c("II", "III", ""))
>> [1] TRUE TRUE FALSE
>> > grepl("^I{,2}$", c("II", "III", ""), perl=TRUE)
>> [1] FALSE FALSE FALSE
> Fix thus is easy: {,4} => {0,4}
> Best,
> Stephanie
Thanks a lot, Stephanie -- indeed, I think I would not have searched in
this direction at all
( To me it seemed "obvious" that if {3,} is well defined, {,3}
would be so, too... But I was *wrong* and actually I also
understand and that {,3} is not needed, and {0,3} is clearer,
whereas {3,} is not easy to re-express ( '{0,inf}' or similar
would make the code considerably more complicated and probably slower..)
Actually, to remain back compatible (see Jani's original report:
he'd like "I" to work, as it did for many/most of us),
we should replace {,4} by {0,5}.
But there's more: our current help page
https://search.r-project.org/R/refmans/utils/html/roman.html
says
> Only numbers between 1 and 3999 have a unique representation
> as roman numbers, and hence others result in as.roman(NA).
which is really not quite true, in more than one sense:
1. as.roman(3899:3999) # works fine
not producing any NA
2. I think, e.g., ""
is a pretty unique representation of 4000.
Also, one piece of other software (online)
https://www.rapidtables.com/convert/number/date-to-roman-numerals.html
does convert _dates_ up to the year 4999, see,
https://www.rapidtables.com/convert/number/date-to-roman-numerals.html?msel=January&dsel=1&year=4999&fmtsel=MM.DD.
giving CMXCIX for 4999.
Hence, I also think we should enlarge the valid range from current
{1 .. 3999} to
{1 .. 4999}
Martin
__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weird and changed as.roman() behavior
Well, the real issue then seems to be that .roman2numeric uses an invalid
regular expression:
>> grepl("^M{,3}D?C{,4}L?X{,4}V?I{,4}$", cc)
> [1] TRUE TRUE TRUE TRUE TRUE
or
>> grepl("^I{,2}$", c("II", "III", ""))
> [1] TRUE TRUE FALSE
Both the TRE and the PCRE specification only allow repetition quantifiers of
the form
{a}
{a,b}
{a,}
https://laurikari.net/tre/documentation/regex-syntax/
https://www.pcre.org/original/doc/html/pcrepattern.html#SEC17
{,2} and {,4} are thus invalid and seem to result in undefined behaviour (which
PCRE and TRE fill in different ways, but consistently not what was intended).
> > grepl("^I{,2}$", c("II", "III", ""))
> [1] TRUE TRUE FALSE
> > grepl("^I{,2}$", c("II", "III", ""), perl=TRUE)
> [1] FALSE FALSE FALSE
Fix thus is easy: {,4} => {0,4}
Best,
Stephanie
__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weird and changed as.roman() behavior
> Jani Välimaa
> on Tue, 14 Jan 2025 20:39:19 +0200 writes:
> Hello,
> I don't know what's changed or how to figure out why as.roman() started
> to work different way lately on Mageia Cauldron. Cauldron is the
> latest development version of Mageia Linux.
> Expected bahavior:
>> as.roman(strrep("I", 1:5))
> [1] I II III IV V
> Current behavior:
>> as.roman(strrep("I", 1:5))
> [1] III III IV
> Warning message:
> In .roman2numeric(x) : invalid roman numeral: I
> as.roman() doesn't handle "I" -> "V" anymore and thus 'make check'
> fails when building any 4.3.x or 4.4.x versions from the sources.
Not yet.
For me, (on Linux Fedora 40),
on current R-4.4.2, R-patched and R-devel I get the same good
results from
(cc <- strrep("I", 1:5)); (rr <- as.roman(cc)); dput(rr)
> (cc <- strrep("I", 1:5)); (rr <- as.roman(cc)); dput(rr)
[1] "I" "II""III" "" "I"
[1] I II III IV V
structure(1:5, class = "roman")
>
The code behind this uses grep() and grepl()
and I assume this somehow does not work correctly on your
platform?
Digging a bit further, the crucial part in this case happens in
the (namespace hidden) function utils ::: .roman2numeric
which you probably already know from the above warning.
For me,
(cc <- strrep("I", 1:5)); (r2 <- utils:::.roman2numeric(cc)); dput(r2)
gives
> (cc <- strrep("I", 1:5)); (r2 <- utils:::.roman2numeric(cc))
[1] "I" "II""III" "" "I"
[1] 1 2 3 4 5
>
this must be different in your case.
You can use
debug(utils:::.roman2numeric)
and
utils:::.roman2numeric(cc)
to find out where the problem happens.
This will show almost surely that the problem is indeed in a
grepl() call.
I'm close to sure it is this:
> grepl("^M{,3}D?C{,4}L?X{,4}V?I{,4}$", cc)
[1] TRUE TRUE TRUE TRUE TRUE
where you don't get the same, but probably
[1] TRUE TRUE TRUE TRUE FALSE
which I *do* get, too if I use grepl(., perl=TRUE)
.. see also below.
The code we use is our own tweaked version of 'TRE' (in /extra/tre/ ),
and I do think we've occasionally seen platform dependencies.
Also, yes, in 2022 there have been several changes, related to
fixing bugs, though several ones *before* releasing R 4.3.0.
Last, but not (at all!) least:
Actually, I *am* confused a bit why this ever worked (and still
works for most of us):
I'm using {,2} instead of {,4} to make things faster to grasp;
I see
> grepl("^I{,2}$", c("II", "III", ""))
[1] TRUE TRUE FALSE
>
and I wonder why 'I{,2}' matches 3 "I"s. ... I'd thought {,2} to
mean " up to 2 occurrences (of the previous )"
(where here = character).
In our real example, I{,4} matched 5 "I"s
and as I mentioned above, the somewhat more maintained
perl=TRUE option does *not*.
We could change the code to use I{,5} to make 5x"I", i.e. "I"
work for you .. but then that would also match
"II" (6 x "I") for "everybody" else with our current TRE engine..
__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

