In perl.git, the branch smoke-me/khw-locale has been created
<https://perl5.git.perl.org/perl.git/commitdiff/d45070f2014949b45942625e8b3575aa53c9df11?hp=0000000000000000000000000000000000000000>
at d45070f2014949b45942625e8b3575aa53c9df11 (commit)
- Log -----------------------------------------------------------------
commit d45070f2014949b45942625e8b3575aa53c9df11
Author: Karl Williamson <[email protected]>
Date: Sun Jan 28 14:55:31 2018 -0700
Forbid 'pig' locale
This is a toy locale found on some systems, which isn't fully
implemented, and if one tries to switch to it can cause failures.
commit 9587bf6005b05a20833e13fee4ffa63bdae7fadf
Author: Karl Williamson <[email protected]>
Date: Thu Jan 18 16:20:02 2018 -0700
Avoid changing locale when finding radix char
On systems that have the POSIX 2008 operations, including
nl_langinfo_l(), this commit causes them to not have to actually change
the locale when determining what the decimal point character is.
The locale may have to change during the printing/reading of numbers,
but eventually we can use sprintf_l(), if available, to avoid that too.
commit 7872ca9a0037c8e31de37134d4f783afcbd22c28
Author: Karl Williamson <[email protected]>
Date: Thu Jan 18 15:56:33 2018 -0700
Perl_sv_2pv_flags: Potentially avoid work
By using a macro that is private to the core, this code can avoid
thinking it has to deal with a non-dot radix character, as even if we
are using the locale radix, that is often a dot.
commit 91bec7e459e59db1ddb95eaf0951b3b0ff36a780
Author: Karl Williamson <[email protected]>
Date: Thu Jan 18 15:53:42 2018 -0700
numeric.c: Remove duplicate PERL_ARGS_ASSERT
By moving the call to one instance of this macro, the other can be
removed.
commit d881d920ed9cc5780b44c3e10b4886b30cd94981
Author: Karl Williamson <[email protected]>
Date: Thu Jan 18 15:52:50 2018 -0700
locale.c: White-space only
Outdent to compensate for previous patch removing several blocks
commit e8151d4d6eb9923ac5bd72cda0a9e78ad5b3f582
Author: Karl Williamson <[email protected]>
Date: Thu Jan 18 15:45:19 2018 -0700
Keep PL_numeric_radix_sv always set
Previously this was removed if the radix was dot. By keeping it set to
a dot, we simplify some code, removing some branches.
commit 5567285f9b2fa2337e63115602a4605b10880e7d
Author: Karl Williamson <[email protected]>
Date: Thu Jan 18 15:32:45 2018 -0700
locale.c: Replace by function that does the same thing
This logic occurs often enough that a function has been created to do
it. So use that.
commit dc9d5116a2199d0693317c600b114902ce0b7a1e
Author: Karl Williamson <[email protected]>
Date: Wed Jan 17 15:20:44 2018 -0700
Latch LC_NUMERIC during critical sections
It is possible for operations on threaded perls which don't 'use locale'
to still change the locale. This happens when calling
POSIX::localeconv() and I18N::Langinfo(), and in earlier perls, it can
happen for other operations when perl has been initialized with the
environment causing the various locale categories to not have a uniform
locale.
This commit causes the areas where the locale for this category should
predictably be in one or the other state to be a critical section where
another thread can't interrupt and change it. This is a separate
mutex, so that only these particular operations will be held up.
commit 9778d043fae0c588a3dd06ed68628084a536c4ae
Author: Karl Williamson <[email protected]>
Date: Wed Jan 17 13:54:27 2018 -0700
locale.c: Do savepv() ASAP
When this code is called on a threaded perl, it's possible that another
thread could zap the setlocale return buffer, if it's not reentrant. I
suspect we would have seen this more often if that was the case, but
this commit improves things by doing the save immediately, reducing the
unsafe interval.
commit bd164d99f10e25d0b0f65cde5e2f9b48260a1578
Author: Karl Williamson <[email protected]>
Date: Wed Jan 17 13:47:17 2018 -0700
locale.c: #ifdef'd out code for making thread safe on not equipped platforms
commit 656ba7e1cd94284e8cd8eeed705695b0a7c2dfad
Author: Karl Williamson <[email protected]>
Date: Wed Jan 17 12:55:13 2018 -0700
Add mutex for changing LC_NUMERIC
But don't use it yet.
Changing of LC_NUMERIC is done by the perl core, and is a potential race
condition on threaded perls. This adds a mutex that later commits will
use to create critical sections where the value of LC_NUMERIC matters.
commit 55020df7e10087be6ec1f2adaee73d7d74b7c669
Author: Karl Williamson <[email protected]>
Date: Wed Jan 17 13:32:32 2018 -0700
POSIX::localconv(): Prefer localeconv_l()
This is a thread-safe version of localeconv(), so use it under threads.
commit 3abd09a00275489826f33ce2cbabb7f4ef962f83
Author: Karl Williamson <[email protected]>
Date: Wed Jan 17 12:40:40 2018 -0700
POSIX::localeconv() Use new fcn; avoid recalcs
This calls strlen() once, instead of passing 0 to the the subsidiary
functions which causes them to call it each time. It also uses the new
function is_utf8_non_invariant_string() instead of doing here what that
function does.
commit ce44c81090ef46125c956ca261cd5f4f8ea8c862
Author: Karl Williamson <[email protected]>
Date: Wed Jan 17 11:22:02 2018 -0700
XXX pod change so XS code doesn't call localeconv directly POSIX.xs: Add
mutex around localeconv()
If another thread calls localeconv(), it can destroy the returned
buffer. This adds a mutex around this call; the only other place in the
core that calls it already has this mutex, so they now are thread-safe.
commit d5aae2c284283ae20133c4be8a9bde526f5d69be
Author: Karl Williamson <[email protected]>
Date: Wed Jan 17 11:16:15 2018 -0700
POSIX.xs: White space only
Vertically align for readability
commit a8af0615b047e36b9503cc1959a455c56c11cb67
Author: Karl Williamson <[email protected]>
Date: Wed Jan 17 11:07:42 2018 -0700
locale.c: Move some mutex ops
A future commit will add a mutex, and create the convention that this
mutex if used in combination with the new one always be tried after the
new one is in effect, in order to prevent the possibility of deadlock.
Do it now, before the new one gets added.
This also adds some comments about the reason for this mutex.
commit acb522a0118934ec211de5827aba2e226072b272
Author: Karl Williamson <[email protected]>
Date: Wed Jan 17 10:12:30 2018 -0700
locale.c: White-space only
Indent code to account for previous commits adding some blocks
commit a4e4dccd78fb260eab194dec8450ae2544711b51
Author: Karl Williamson <[email protected]>
Date: Wed Jan 17 08:27:04 2018 -0700
locale.c: Use macro instead of its expansion
This macro in a future commit will become more complex.
commit 5be02fb547af6506df3f4b1a304561b7e580624b
Author: Karl Williamson <[email protected]>
Date: Tue Jan 16 22:09:27 2018 -0700
locale.c: Do common task in one place
This function in some cases may need to temporarily switch the
LC_NUMERIC code. Instead of repeating the logic to determine if this is
needed, do it once.
commit 2b4c5f171d1804fd7b3ca25e6a437dc4ee87b0a1
Author: Karl Williamson <[email protected]>
Date: Tue Jan 16 18:47:16 2018 -0700
More debug
commit 5e22b386010087d9877c196229d2b2ec25fbd0e2
Author: Karl Williamson <[email protected]>
Date: Tue Jan 16 17:38:45 2018 -0700
POSIX.xs: Keep locale change to minimum span
Move the restore to as close to the save as possible so that the locale
is in an unstable state for as short a time as possible.
commit 4cd802fedc8670ada1ae5fe8586f15e340629217
Author: Karl Williamson <[email protected]>
Date: Wed Jan 17 13:24:46 2018 -0700
POSIX::strftime: Add better fallback about UTF-8
If the function returns a valid string that isn't completely UTF-8
invariant, the function assumes it is UTF-8 if we are in a UTF-8 locale.
This works, but in the unlikely event that the system has no LC_TIME, we
can't tesll if it is in a UTF-8 locale. As a better fallback position,
this commit adds the check that there is just a single script of the
time string, adding a measure of reassurance that out call that it is
UTF-8 is correct.
This is unlikely to be used, but now that there is a function to call
that determines if this is a script run, it's easy to add, and unlikely
to actually get compiled.
commit 4a31b94114116875bbab7436b48f5de0d04169d9
Author: Karl Williamson <[email protected]>
Date: Wed Jan 17 13:18:50 2018 -0700
grok_numeric_radix(): Avoid recalculating
This function just determined that we are in the scope of 'use locale',
hence the underlying radix character should be used. This commit
changes to use the macro that directly does that; previously the macro
that redundantly looks at if we are in the scope was used.
commit 9a6743210fd336ab10803b94cffdfa6d06fe8154
Author: Karl Williamson <[email protected]>
Date: Wed Jan 17 13:00:44 2018 -0700
sv_vcatpvfn_flags() Balance LC_NUMERIC changes/restores
Prior to this commit, the restore for LC_NUMERIC was getting called even
if there were no corresponding store. Change so they are balanced; a
future commit will require this.
commit a16ff65d395ac289565add39a549064bd086296c
Author: Karl Williamson <[email protected]>
Date: Mon Jan 15 16:39:44 2018 -0700
perl.h: Remove some obsolete macros
These no longer make sense; were for core internal use only
commit 472872db3d49ec65fba79f1d2529e7e7c8e80822
Author: Karl Williamson <[email protected]>
Date: Mon Jan 15 15:56:43 2018 -0700
vutil.c: White_space only
Properly indent a block, and add spaces where C11++ deprecates not
having them
commit 4c7b9c1281893387df50c41f9822aa2de29619be
Author: Karl Williamson <[email protected]>
Date: Mon Jan 15 15:48:57 2018 -0700
Simplify some LC_NUMERIC macros
These macros are marked as subject to change and are not documented
externally. I don't know what I was thinking when I named some of them,
but whatever no longer makes sense to me. Simplify them, and change so
there is only one restore macro to remember.
commit 6f5579db16ed0b2b1f8ef698b9272d48ff9ef1bb
Author: Karl Williamson <[email protected]>
Date: Sun Jan 14 22:21:31 2018 -0700
for debug Carlos
commit a75ed93e424fef1e022b13f463db63c7fb672b22
Author: Karl Williamson <[email protected]>
Date: Sun Jan 14 21:43:43 2018 -0700
toke.c: Remove unnecessary macro calls
These macros were to shift the LC_NUMERIC state into using a dot for the
radix character. When I wrote this code, I assumed that parsing should
be using just the dot. Since then, I have discovered that this wraps
other uses where the dot is not correct, so remove it.
commit 5814d72fe22a8d156321dad1a5d69dbece1b1152
Author: Karl Williamson <[email protected]>
Date: Sun Jan 14 21:37:16 2018 -0700
perl.h: Remove unused locale core macro
This undocumented macro is unused in the core, and all these are
commented that they are subject to change. And it confuses things, so
just remove it.
commit 0927d2852e8e68e94ece279e22d77bd45f72f5ab
Author: Karl Williamson <[email protected]>
Date: Wed Jan 10 22:35:12 2018 -0700
POSIX.xs: Prefer mbrtowc() over mbtowc()
mbrtowc is reentrant, so use it on threaded perls if available when
POSIX::mbtowc() is called.
commit 8adf9208afe78935a9bacd0aeeb16aac6923dd9d
Author: Karl Williamson <[email protected]>
Date: Wed Jan 10 22:28:34 2018 -0700
POSIX.xs: Prefer mbrlen() over mblen()
mbrlen is reentrant, so use it on threaded perls if available when
POSIX::mblen() is called.
commit 205211c1fcb1262af42ebbaa6f8f60b35340f163
Author: Karl Williamson <[email protected]>
Date: Mon Jan 8 18:21:12 2018 -0700
locale.c: Revamp fallback detection of UTF-8 locales
This commit continues the process started in the previous few commits to
improve the detection of whether a locale is UTF-8 or not when the
platform doesn't have the more modern tools available.
What was done before was examine various texts, like the days of the
week, in a locale, and see if they are legal UTF-8 or not. If there
were any, and all were legal, it assumed that UTF-8 was needed. If
there weren't any (as in American English), it looked at the locale's
name. This presents false negatives and false positives.
Basically, it adds the constraint that all the texts need to be in the
same script when interpreted as UTF-8, which basically rules out any
false positives when the script isn't Latin. With Latin, it isn't so
clear cut, as the text can be intermixed with ASCII Latin letters and
UTF-8 variant sequences that could be some Latin locale, or UTF-8, and
they just coincidentally happen to be syntactically UTF-8. Because of
the structuredness of UTF-8, the odds of a coincidence go down with
increasing numbers of variants in a row. This also isn't likely to
happen with ISO 8859-1, as the bytes that could be legal continuations
in UTF-8 are almost entirely controls or punctuation. But in other
locales in the 8859 series, there are some legal continuations that
could be part of a month name, say.
As an example of the issues, in 8859-2, one could have \xC6 (C with
acute) followed by \xB1 (a with ogonek), which in UTF-8 would be
U+01B1: LATIN CAPITAL LETTER UPSILON. However, something like \xCD
(i acute) followed by \xB3 (l with stroke) yields U+0373: GREEK
SMALL LETTER ARCHAIC SAMPI, and the script check added by this commit
would catch that. In non-Latin texts, the only permissible ASCII
characters would be punctuation, and you aren't going to have many of
those in the LC_TIME strings, and certainly not in a row. Instead those
will consist of at least several variant characters in a row, and the
odds of those coincidentally being syntactically valid UTF-8 and
semantically in the same script are exceedingly low.
To catch Latin UTF-8 locales, this commit adds a list of the distinct
variants found so far. If there are even just several of these, the
odds of the syntax being coincidentally UTF-8 greatly diminish. The
number needed for this to conclude that the locale is UTF-8, is easily
tweakable at compile time.
The problem remains for English and other Latin script languages that
have rare accented characters. The name is still then examined for
containing "UTF-8". Note that previous commits have guaranteed that if
the locale has a non-ASCII currency symbol that is recognized by
Unicode, such as the Euro or Pound Sterling, that will correctly be
recognized.
commit 5e9eafd988f94a5889606690fa4e1f6af3873842
Author: Karl Williamson <[email protected]>
Date: Sun Jan 7 16:22:27 2018 -0700
locale.c: Improved fallback UTF-8 locale detection
This adds some more checks for when the platform lacks mbtowc(). We can
check if things like isprint(), toupper() match what a UTF-8 locale
would do. If not, we can rule out UTF-8.
commit ba4a232847bb50b1a391e8ca432249ba59f39bb9
Author: Karl Williamson <[email protected]>
Date: Sat Jan 6 16:00:02 2018 -0700
Improve fallback UTF-8 locale detection
If the libc doesn't have modern enough routines, we use a fallback
mechanism to see if a locale is UTF-8 or not. One component of this is
to look at the byte sequence for the currency symbol. Obviously, if the
sequence isn't valid UTF-8, the locale isn't either. But if it is valid
UTF-8, and hence might be a UTF-8 locale, this commit changes the
detection mechanism to see if the sequence evaluates, when interpreted
as UTF-8 to be a known Unicode currency symbol. If so, the locale must
be UTF-8, as the odds of some other locale having a sequence that does
this are vanishingly small.
If the sequence doesn't evaluate to a currency symbol, that doesn't tell
us anything, as plenty of places have a string of letters be their
currency symbol. Nor if the symbol is a '$', as that is invariant under
UTF-8 vs not, so doesn't help us.
This pretty much guarantees that a UTF-8 locale for the European Union
or the UK that otherwise looks like plain English (Latin script) will be
properly determined to be UTF-8, as the symbols for their currencies
will pass this test.
commit 29e11f0e0248413798e63d555c362298b0f44419
Author: Karl Williamson <[email protected]>
Date: Sat Jan 6 14:24:30 2018 -0700
locale.c: Avoid localeconv()
my_langinfo() is a recently added function which presents a better API
than localeconv, and returns the needed information here, and is easier
to make thread-safe.
commit aeaae9e3bc54e35480dcc5e852f796781066e64f
Author: Karl Williamson <[email protected]>
Date: Mon Jan 8 17:37:15 2018 -0700
locale.c: White-space only
This indents all this code, with no other changes, in preparation for a
future commit which will add a block around it.
commit 440e4d1aaafd74a6cdf1d8878fe825d248ac2237
Author: Karl Williamson <[email protected]>
Date: Sun Jan 7 15:58:52 2018 -0700
locale.c: Remove branch to label
The code at this label was branched to because it contained common
cleanup code. But now that code is in a function, so the cleanup call
is trivial, so just skip this intermediate label.
commit 0b31d2318f4d39d09cb03656a2ae23a0dbb3b668
Author: Karl Williamson <[email protected]>
Date: Sat Jan 6 12:42:35 2018 -0700
locale.c: Extract duplicated code into subroutines
These two paradigms are each repeated in 4 places. Make into two
subroutines
commit cff9e8ed68d462106528f99e33df0ee2b4ed0bae
Author: Karl Williamson <[email protected]>
Date: Fri Jan 5 21:41:27 2018 -0700
locale.c: Prefer mbrtowc(), as its reentrant
If it's available and this is a threaded build, it's preferred.
commit dd856d229a7307b116a9d09e13c71ec6177eae1f
Author: Karl Williamson <[email protected]>
Date: Sun Jan 7 15:43:01 2018 -0700
locale.c: White-space only
Indent to correspond with new block from previous commit
commit 543161a21069818a20e6a406dd042c7091232e63
Author: Karl Williamson <[email protected]>
Date: Fri Jan 5 14:09:40 2018 -0700
locale.c: Revamp finding if locale is UTF-8
This changes how this functionality works for the LC_CTYPE locale. On
systems that have nl_langinfo() one can get a definitive answer from
just that. Otherwise (or if that doesn't return properly) one can use
mbtowc() to check if the UTF-8 byte sequence for the Unicode REPLACEMENT
CHARACTER actually is considered to be that code point. This is also
definitive. If the maximum byte string length for a character is too
short to handle all Unicode UTF-8, we know without further checking that
this isn't a UTF-8 locale, so can avoid the mbtowc check.
commit e9ae0e938155ccc7a4316c27cbfcbaeaddcb268c
Author: Karl Williamson <[email protected]>
Date: Sun Jan 7 15:30:06 2018 -0700
locale.c: Windows will never be EBCDIC
This adjusts the conditional compilation so that win32 is a subset of
non-EBCDIC. This will be useful in the next commit.
commit 4fdc10a540b0c521a978be601d8de61ba43f0680
Author: Karl Williamson <[email protected]>
Date: Fri Jan 5 12:57:37 2018 -0700
locale.c: Simplify expression
Since this is operating on C strings, we don't have to check the
lengths, but can rely on the underlying functions to work.
commit ea47e3c85b7e8111a5a3b30102fef7ef6dd3c606
Author: Karl Williamson <[email protected]>
Date: Fri Jan 5 11:35:00 2018 -0700
Change some "shouldn't happen" failures into panics
If the system is so broken that these libc calls are failing, soldiering
on won't lead to sane results.
THis rewords some existing panics, and adds the errno to the output for
all of them.
commit 1a0eb24ab398a6ecfbba8773bfcb85f0773554f6
Author: Karl Williamson <[email protected]>
Date: Tue Jan 2 16:54:28 2018 -0700
Cache locale UTF8-ness lookups
Some locales are UTF-8, some are not. Knowledge of this is needed in
various circumstances. This commit saves the results of the last
several lookups so they don't have to be recalculated each time.
The full generality of POSIX locales is such that you can have error
messages be displayed in one locale, say Spanish, while other things are
in French. To accommodate this generality, the program can loop through
all the locale categories finding the UTF8ness of the locale it points
to. However, in almost all instances, people are going to be in either
French or in Spanish, and not in some combination. Suppose it is a
French UTF-8 locale for all categories. This new cache will know that
the French locale is UTF-8, and the queries for all but the first
category can return that immediately.
This simple cache avoids the overhead of hashes.
This also fixes a bug I realized exists in threaded perls, but haven't
reproduced. We do not support locales in such perls, and the user must
not change the locale or 'use locale'. But perl itself could change the
locale behind the scenes, leading to segfaults or incorrect results.
One such instance is the determination of UTF8ness. But this only could
happen if the full generality of locales is used so that the categories
are not all in the same locale. This could only happen (if the user
doesn't change locales) if the environment is such that the perl program
is started up so that the categories are in such a state. This commit
fixes this potential bug by caching the UTF8ness of each category at
startup, before any threads are instantiated, and so checking for it
later just looks it up in the cache, without perl changing the locale.
commit 4544ae8577f6522921a90f054075ff43813831b7
Author: Karl Williamson <[email protected]>
Date: Tue Jan 2 14:23:24 2018 -0700
locale.c: Avoid duplicate work
As the comments say, the needed value is already readily available
commit 254b5ccf17b0c9288d66beec17fd1ecce41d4470
Author: Karl Williamson <[email protected]>
Date: Tue Jan 2 13:38:16 2018 -0700
locale.c: Avoid some work
We've already worked out whether the decimal point is a dot or not. We
can pass that information to the called routine so it doesn't have to
figure it out again.
commit b0d043baf597c6919911dbc505bfdc4d57b0baaf
Author: Karl Williamson <[email protected]>
Date: Tue Jan 2 13:19:03 2018 -0700
locale.c: Use non-control for a format dummy
We need a plain character here. I used a '\e' before, but it would be
better to have something that isn't a control, so just change it to a
blank
commit 5ae6f895eb8134398d6e7fda2384d37df41eef1f
Author: Karl Williamson <[email protected]>
Date: Thu Jan 25 11:28:54 2018 -0700
locale.c: Create a block around some code; indent
Under some configurations depending on platform and Configure options,
these declarations are not at the beginning of a block. violating C
language rules.
commit 3e7450ad5e3aab37ed47c8bc8492e55a2515f575
Author: Karl Williamson <[email protected]>
Date: Tue Jan 2 12:25:35 2018 -0700
locale.c: Avoid some more locale changes
In a few places here we can test if we are already in the locale we want
to be in, and not switch unnecessarily if so.
commit 02d0432e5ef8a6a90eb57555ca48ebed7204494a
Author: Karl Williamson <[email protected]>
Date: Mon Jan 1 23:03:34 2018 -0700
Avoid some unnecessary changing of locales
The LC_NUMERIC locale category is kept so that generally the decimal
point (radix) is a dot. For some (mostly) output purposes, it needs to
be swapped into the program's current underlying locale so that a
non-dot can be printed.
This commit changes things so that if the current underlying locale uses
a decimal point, the swap doesn't happen, as it's not needed.
commit 43691a857f78376948ecfa91a47d13dfafd172a4
Author: Karl Williamson <[email protected]>
Date: Mon Jan 1 22:20:25 2018 -0700
perl.h: White-space only
commit a50e930acd7378866aa6d1f09a7f51611f20e82b
Author: Karl Williamson <[email protected]>
Date: Mon Jan 1 20:41:21 2018 -0700
locale.c: Add compile check for unimplemented behavior
Instead of silently not working.
commit f2faa380a61674c20202b577f02b45c4ab45319b
Author: Karl Williamson <[email protected]>
Date: Mon Jan 1 20:30:39 2018 -0700
locale.c: White-space only
Indent because the previous commit created an enclosing block, and
add a blank line elsewhere
commit 065109cdc7750b2ae0eb175e09b16043696ce057
Author: Karl Williamson <[email protected]>
Date: Mon Jan 1 20:00:03 2018 -0700
locale.c: Refactor Ultrix code
Examination shows that this code does nothing unless LC_ALL is defined.
So explicitly test at compile time for that.
Also, two variables don't have to be declared so globally, and by
reducing their scope, by creating a new block we don't have to have
PERL_UNUSED_ARG()s for them
commit 9b2b7e34ef9ed1137c78ddef5e79db9d5dccde67
Author: Karl Williamson <[email protected]>
Date: Mon Jan 1 19:07:19 2018 -0700
locale.c: Avoid rescanning a string
We can use a parameter to find out where in the string the portion of
interest starts. Do that to avoid starting again from scratch.
commit e9a74e3f305fefc6d71e241c546a54059a524eca
Author: Karl Williamson <[email protected]>
Date: Mon Jan 1 18:33:59 2018 -0700
locale.c: Use fcns instead of macros
Here the macros being used expand into the functions being called,
without adding any value to using the macros, and making things slightly
less clear.
commit 53320fc362794bebe950ff78570abe906e2172ca
Author: Karl Williamson <[email protected]>
Date: Mon Jan 1 18:17:41 2018 -0700
locale.c: Add const to several variables
commit e5212d6dd62cca412f7a9e35b53eb72d724ec235
Author: Karl Williamson <[email protected]>
Date: Mon Jan 1 18:15:27 2018 -0700
locale.c: Improve, add comments
commit f517154f2da22278aad76877b3390d1cfeefe7a2
Author: Karl Williamson <[email protected]>
Date: Mon Jan 1 18:01:45 2018 -0700
perl.h: Add comment, rephrase another
commit 625e80aa9f5a6044ce459efc743c03fbb34c0b79
Author: Karl Williamson <[email protected]>
Date: Sat Nov 18 17:34:25 2017 -0700
Perl_langinfo: Teach about YESSTR and NOSTR
These are items that nl_langinfo() used to be required to return, but
are considered obsolete. Nonetheless, this drop-in replacement for that
function should know about them for backward compatibility.
commit 052ff9bcc8e064cec980b3069a7c8ce948bfd51d
Author: Karl Williamson <[email protected]>
Date: Mon Jan 1 15:07:45 2018 -0700
APItest/t/locale.t: Add some tests
This makes sure that the entries for which the expected return value may
legitimately vary from platform to platform get tested as returning
something, skipping the test if the item isn't known on the platform.
A couple of comments are also added.
commit e255b40569426f2c2445002f68124696e0b522a6
Author: Karl Williamson <[email protected]>
Date: Mon Aug 28 18:01:43 2017 -0600
XXX may include other things after final edits:
ExtUtils::ParseXS/lib/perlxs.pod: Nits
This removes extra blanks following colons that don't mean the normal
thing for colons that traditionally have two spaces after them, and
capitalizes Perl.
commit 9855e0c932283ee638ffee4c32d4d67fa2ac4aca
Author: Karl Williamson <[email protected]>
Date: Wed Jul 26 08:59:33 2017 -0600
Teach perl about more locale categories
glibc has various other categories than the ones perl handles, for
example LC_PAPER. This commit adds knowledge of these to perl, so that
one can set them, interrogate them, and have libraries work on them,
even though perl itself does not.
This is in preparation for future commits, where it becomes more
important than currently for perl to know about all the locale
categories on the system.
I looked through various other systems to try to find other categories,
but did not see any. If a system does have such a category, it is
pretty easy to tell perl about it, and recompile. Use the changes in
this commit as a template, and send an email to [email protected], so
that the next Perl release will have it.
commit 9e98dccf020efc7705ba1a4edbd8665708dd4114
Author: Karl Williamson <[email protected]>
Date: Wed Jan 3 20:41:29 2018 -0700
Add check that "$!" is correctly interpreted as UTF-8
We sometimes need to know if an error message is UTF-8 or not.
Previously we checked that it is syntactically valid UTF-8, and that the
LC_MESSAGES locale is UTF-8. But some systems, notably Windows, do not
have LC_MESSAGES. For those, this commit adds a different, semantic,
check that the text of the message when interpreted as UTF-8 is all in
the same Unicode script. This is not foolproof, unlike the LC_MESSAGES
check, but it's better than what we have now for such systems. It
likely is foolproof for non-Latin locales, as any message will have a
bunch of characters in that locale, and no ASCII Latin ones. For a
Latin locale, these ASCII letters could be intermixed with the UTF-8
ones, causing potential ambiguity.
commit db3406e2c5de5f1604faeeb9e8930a40e187c04c
Author: Karl Williamson <[email protected]>
Date: Tue Nov 14 22:27:06 2017 -0700
Remove uncompilable code
This code was never compiled because of a misspelling in the #ifdef.
No problem surfaced, so just remove it. The next commit adds a different
check.
commit 60a0db688604a887534ff342ac50490fdfece303
Author: Karl Williamson <[email protected]>
Date: Mon Jan 8 19:11:52 2018 -0700
XXX rethink empty script_run
commit 66ae9a9042eab08a3a2b9d7cc4637721a807229a
Author: Karl Williamson <[email protected]>
Date: Mon Jan 8 19:08:54 2018 -0700
perl.c: Move initialization of inversion lists
This is now done very early in the file, as it may be needed for
initializing the locale handling.
commit 5a4111c0dbdbb967666bbbcbd2953aa227b5a866
Author: Karl Williamson <[email protected]>
Date: Thu Jan 18 14:09:24 2018 -0700
isSCRIPT_RUN: Document in perlintern
commit affc869a0db9555708cab1e76a9a6d19be034cb3
Author: Karl Williamson <[email protected]>
Date: Thu Jan 18 14:08:47 2018 -0700
isSCRIPT_RUN: A sequence of entirely Inherited chars is Inherited
commit a2c0c14d391c1963b811805ef5d315779df3fd85
Author: Karl Williamson <[email protected]>
Date: Thu Jan 18 14:07:43 2018 -0700
regexec.c: Add comment
commit c8f92a9990db2589edddefd3d0bb98b4471f908b
Author: Karl Williamson <[email protected]>
Date: Thu Jan 18 14:05:23 2018 -0700
Fix bug in isSCRIPT_RUN with digit following unassigned
This was being treated as a run, but shouldn't be one.
commit fbc2550e1b6a6147c4f1463926e75301e48009f6
Author: Karl Williamson <[email protected]>
Date: Thu Jan 18 13:00:06 2018 -0700
isSCRIPT_RUN: Can short cut if not in UTF-8
All characters representable by single bytes are either Common or Latin,
so must be a script run. If we aren't asking for what the script is we
can return immediately. If we are, the run is Latin if any character in
it is Latin, otherwise is Common.
commit 1c907d2f0e048f8be2da67984eae49ad56d40e2b
Author: Karl Williamson <[email protected]>
Date: Sat Jan 6 21:16:15 2018 -0700
Give isSCRIPT_RUN() an extra parameter
This allows it to return the script of the run.
commit a3b7ab01447ce12c49ff1bf794046a9eb4f2ca29
Author: Karl Williamson <[email protected]>
Date: Sat Jan 6 16:15:12 2018 -0700
charclasslists.h: script enums visible to CORE,EXT
This exposes the enum definitions for the script extensions property to
the perl code and extensions, for use in future commits.
commit bcacb14d1bcb3ef778ecbeda63ca7c78716cb3aa
Author: Karl Williamson <[email protected]>
Date: Sat Jan 6 16:13:06 2018 -0700
regen/mk_invlists.pl: Allow override of where enums get defined
This adds code so that the enums defined by this, which are ordinarily
only used by regexec.c ban be specified to be somewhere else instead.
commit 79f463ffe124f9c84120b1f426e5cb3e41959151
Author: Karl Williamson <[email protected]>
Date: Sat Jan 6 16:09:57 2018 -0700
regen/mk_invlists.pl: Allow multiple files to access
This changes the code so that the symbols defined by this program
can be #define'd in more than one file.
commit 48f37bd5e94bce709d8c85b658e7a9ecf125203e
Author: Karl Williamson <[email protected]>
Date: Thu Jan 18 14:02:33 2018 -0700
regexec.c: Fix typo in comment
commit 1970d025fc9e699dc1188f26a13c417870510ea9
Author: Karl Williamson <[email protected]>
Date: Sat Jan 6 16:18:45 2018 -0700
Fix bug in script runs that start with Common
This is a follow on to 8535a06fea02528fe726855a139fcbd360d1fc6e. That
fixed one case where the first character was in the Common script,
things did not work properly. It did not catch the case where a future
character in the string was non-Common from a script that has its own
set of digits, and this commit fixes that.
This just entails a block of code to slightly earlier.
commit a4826d953551331024cc11802a885b90d73d988b
Author: Karl Williamson <[email protected]>
Date: Wed Jan 10 17:10:09 2018 -0700
locale.c: Make sure variable is always defined
A future commit assumes this variable is there even on non-DEBUGGING
builds. #define it to 0 for those.
commit 67a490457e3c1485bd20a99e061ee745cf8f0276
Author: Karl Williamson <[email protected]>
Date: Wed Jan 17 17:01:00 2018 -0700
my_atof(): Lock dot radix
This commit shows some redundant checks. It examines the text and if it
finds a dot in the middle of the number, and the locale is expecting
something else, it toggles LC_NUMERIC to be the C locale so that the dot
is understood. However, during further parsing, grok_numeric_radix()
gets called and sees that the locale shouldn't be C, and toggles it
back. That ordinarily would cause the dot to not be recognized, but
this function always recognizes a dot no matter what the locale. So
none of our tests fails. I'm not sure if this is always the case, and I
don't understand this area of the code all that well, but there is a
simple way to cause grok_numeric_radix to not change the locale back,
and that is to call the macro LOCK_LC_NUMERIC_STANDARD() when changing
it the first time in my_atof(). The purpose of this macro is precisely
this situation, so that recursed calls don't try to override the
decisions of the outer calls
commit d29bf001c37ea6a5111a0d169309cca34a48fd57
Author: Karl Williamson <[email protected]>
Date: Wed Jan 24 15:57:30 2018 -0700
hints/hpux.sh: HP-UX mbrlen() and mbrtowc() don't work
In spite of there being man pages for these, the #include file doesn't
define the mbstate_t type which is required for a parameter to these
functions.
Perhaps the Configure probe could be enhanced so it doesn't return
defined unless these can be successfully compiled, but for now use the
hints file.
commit 8730dc628065a35ddd06f140c7253a6bfd1625fa
Author: Karl Williamson <[email protected]>
Date: Sun Jan 21 10:08:33 2018 -0700
perlembed: Fix typos
Perl is capitalized when referring to the language; lowercased when
referring to a particular executable.
-----------------------------------------------------------------------
--
Perl5 Master Repository