In perl.git, the branch smoke-me/khw-locale has been created

<https://perl5.git.perl.org/perl.git/commitdiff/67cae7ebe247f9fc93cb67355bc5ea2a03886626?hp=0000000000000000000000000000000000000000>

        at  67cae7ebe247f9fc93cb67355bc5ea2a03886626 (commit)

- Log -----------------------------------------------------------------
commit 67cae7ebe247f9fc93cb67355bc5ea2a03886626
Author: Karl Williamson <[email protected]>
Date:   Thu Jan 18 16:20:02 2018 -0700

    Avoid changing locale when finding radix char
    
    On systems that have the POSIX 2008 operations, including
    nl_langinfo_l(), this commit causes them to not have to actually change
    the locale when determining what the decimal point character is.
    
    The locale may have to change during the printing/reading of numbers,
    but eventually we can use sprintf_l(), if available, to avoid that too.

commit bdb9e7d92c80fc7990149b92dc99b11a08cfd878
Author: Karl Williamson <[email protected]>
Date:   Thu Jan 18 15:56:33 2018 -0700

    Perl_sv_2pv_flags: Potentially avoid work
    
    By using a macro that is private to the core, this code can avoid
    thinking it has to deal with a non-dot radix character, as even if we
    are using the locale radix, that is often a dot.

commit eeb1810f769ad18cb7304e37766e4f276f363192
Author: Karl Williamson <[email protected]>
Date:   Thu Jan 18 15:53:42 2018 -0700

    numeric.c: Remove duplicate PERL_ARGS_ASSERT
    
    By moving the call to one instance of this macro, the other can be
    removed.

commit 8e523ce6bc545b99dacf2766a59778a3d824912b
Author: Karl Williamson <[email protected]>
Date:   Thu Jan 18 15:52:50 2018 -0700

    locale.c: White-space only
    
    Outdent to compensate for previous patch removing several blocks

commit 1b771d00228c80c3f2c7e3e7381c072bd3a35aca
Author: Karl Williamson <[email protected]>
Date:   Thu Jan 18 15:45:19 2018 -0700

    Keep PL_numeric_radix_sv always set
    
    Previously this was removed if the radix was dot.  By keeping it set to
    a dot, we simplify some code, removing some branches.

commit 631604ad6afd209da8611aacc4b2ed6121a93b65
Author: Karl Williamson <[email protected]>
Date:   Thu Jan 18 15:32:45 2018 -0700

    locale.c: Replace by function that does the same thing
    
    This logic occurs often enough that a function has been created to do
    it.  So use that.

commit 46fba7afa5a69f688379c37ea234c42e202c9930
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 17 15:20:44 2018 -0700

    Latch LC_NUMERIC during critical sections
    
    It is possible for operations on threaded perls which don't 'use locale'
    to still change the locale.  This happens when calling
    POSIX::localeconv() and I18N::Langinfo(), and in earlier perls, it can
    happen for other operations when perl has been initialized with the
    environment causing the various locale categories to not have a uniform
    locale.
    
    This commit causes the areas where the locale for this category should
    predictably be in one or the other state to be a critical section where
    another thread can't interrupt and change it.  This is a separate
    mutex, so that only these particular operations will be held up.

commit 39332987e4d1ae54d65c4283e20633a9d75c9f59
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 17 13:54:27 2018 -0700

    locale.c: Do savepv() ASAP
    
    When this code is called on a threaded perl, it's possible that another
    thread could zap the setlocale return buffer, if it's not reentrant.  I
    suspect we would have seen this more often if that was the case, but
    this commit improves things by doing the save immediately, reducing the
    unsafe interval.

commit 8cb4bbae8d705de30c8743635552879f46a3830b
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 17 13:47:17 2018 -0700

    locale.c: #ifdef'd out code for making thread safe on not equipped platforms

commit ecb09f71868e146980ec8f91324423021422c1e8
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 17 12:55:13 2018 -0700

    Add mutex for changing LC_NUMERIC
    
    But don't use it yet.
    
    Changing of LC_NUMERIC is done by the perl core, and is a potential race
    condition on threaded perls.  This adds a mutex that later commits will
    use to create critical sections where the value of LC_NUMERIC matters.

commit cdb91ddd4a889770ac9d3aed9622c4bd3b909f38
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 17 13:32:32 2018 -0700

    POSIX::localconv(): Prefer localeconv_l()
    
    This is a thread-safe version of localeconv(), so use it under threads.

commit 7ba29558f5d1e27380676d3d49068e255eb46db6
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 17 12:40:40 2018 -0700

    POSIX::localeconv() Use new fcn; avoid recalcs
    
    This calls strlen() once, instead of passing 0 to the the subsidiary
    functions which causes them to call it each time.  It also uses the new
    function is_utf8_non_invariant_string() instead of doing here what that
    function does.

commit 42f8c0d2a399a6025ab3007be09a757951254a13
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 17 11:22:02 2018 -0700

    XXX pod change so XS code doesn't call localeconv directly POSIX.xs: Add 
mutex around localeconv()
    
    If another thread calls localeconv(), it can destroy the returned
    buffer.  This adds a mutex around this call; the only other place in the
    core that calls it already has this mutex, so they now are thread-safe.

commit 404f91c46f4b123558768fa8d05dada74f3044ab
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 17 11:16:15 2018 -0700

    POSIX.xs: White space only
    
    Vertically align for readability

commit aa449efb3a9b533c103e30964be13e71d8b1774b
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 17 11:07:42 2018 -0700

    locale.c: Move some mutex ops
    
    A future commit will add a mutex, and create the convention that this
    mutex if used in combination with the new one always be tried after the
    new one is in effect, in order to prevent the possibility of deadlock.
    Do it now, before the new one gets added.
    
    This also adds some comments about the reason for this mutex.

commit a8300463740e93c0301718f473fcf1ce16a680b5
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 17 10:12:30 2018 -0700

    locale.c: White-space only
    
    Indent code to account for previous commits adding some blocks

commit 006ada1799de7524ff49035a8027826dc4010291
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 17 08:27:04 2018 -0700

    locale.c: Use macro instead of its expansion
    
    This macro in a future commit will become more complex.

commit 01efb89e04118e7a748f24bf46c0b1ec209940a4
Author: Karl Williamson <[email protected]>
Date:   Tue Jan 16 22:09:27 2018 -0700

    locale.c: Do common task in one place
    
    This function in some cases may need to temporarily switch the
    LC_NUMERIC code.  Instead of repeating the logic to determine if this is
    needed, do it once.

commit 779e7d6252e2a20f3d1565c160f4562ab79d8db6
Author: Karl Williamson <[email protected]>
Date:   Tue Jan 16 18:47:16 2018 -0700

    More debug

commit fdf6e5001bc55334eac4b643602dd61228d5d9e1
Author: Karl Williamson <[email protected]>
Date:   Tue Jan 16 17:38:45 2018 -0700

    POSIX.xs: Keep locale change to minimum span
    
    Move the restore to as close to the save as possible so that the locale
    is in an unstable state for as short a time as possible.

commit 2d1821590c20e1e1dbff66a8cca4c051fab8fc69
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 17 13:24:46 2018 -0700

    POSIX::strftime: Add better fallback about UTF-8
    
    If the function returns a valid string that isn't completely UTF-8
    invariant, the function assumes it is UTF-8 if we are in a UTF-8 locale.
    This works, but in the unlikely event that the system has no LC_TIME, we
    can't tesll if it is in a UTF-8 locale.  As a better fallback position,
    this commit adds the check that there is just a single  script of the
    time string, adding a measure of reassurance that out call that it is
    UTF-8 is correct.
    
    This is unlikely to be used, but now that there is a function to call
    that determines if this is a script run, it's easy to add, and unlikely
    to actually get compiled.

commit 37f0248661948cad760b8565803c7442f7c667d3
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 17 13:18:50 2018 -0700

    grok_numeric_radix(): Avoid recalculating
    
    This function just determined that we are in the scope of 'use locale',
    hence the underlying radix character should be used.  This commit
    changes to use the macro that directly does that; previously the macro
    that redundantly looks at if we are in the scope was used.

commit 2067f0f73661699d941e33f397450e6fbab8e500
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 17 13:00:44 2018 -0700

    sv_vcatpvfn_flags() Balance LC_NUMERIC changes/restores
    
    Prior to this commit, the restore for LC_NUMERIC was getting called even
    if there were no corresponding store.  Change so they are balanced; a
    future commit will require this.

commit 65e17bb293c5f84a655dc079269dbe2ce389ad50
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 15 16:39:44 2018 -0700

    perl.h: Remove some obsolete macros
    
    These no longer make sense; were for core internal use only

commit a06be3c15c1a50b4b45202d98995ebb551af9b51
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 15 15:56:43 2018 -0700

    vutil.c: White_space only
    
    Properly indent a block, and add spaces where C11++ deprecates not
    having them

commit b1d610c3c31e12401b8c9e8f8349cb316ba8ff5a
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 15 15:48:57 2018 -0700

    Simplify some LC_NUMERIC macros
    
    These macros are marked as subject to change and are not documented
    externally.  I don't know what I was thinking when I named some of them,
    but whatever no longer makes sense to me.  Simplify them, and change so
    there is only one restore macro to remember.

commit 7d090bbcf93922d665eb06e26b05be96cd8c99d0
Author: Karl Williamson <[email protected]>
Date:   Sun Jan 14 22:21:31 2018 -0700

    for debug Carlos

commit 8b5650f510175f67ea4dca34562e6e3d995fb386
Author: Karl Williamson <[email protected]>
Date:   Sun Jan 14 21:43:43 2018 -0700

    toke.c: Remove unnecessary macro calls
    
    These macros were to shift the LC_NUMERIC state into using a dot for the
    radix character.  When I wrote this code, I assumed that parsing should
    be using just the dot.  Since then, I have discovered that this wraps
    other uses where the dot is not correct, so remove it.

commit a9aefe9c22490669390dcdacd806c33622c8da77
Author: Karl Williamson <[email protected]>
Date:   Sun Jan 14 21:37:16 2018 -0700

    perl.h: Remove unused locale core macro
    
    This undocumented macro is unused in the core, and all these are
    commented that they are subject to change.  And it confuses things, so
    just remove it.

commit a78e42b974162356a1a546abd1ce82d9c23b0cbb
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 10 22:35:12 2018 -0700

    POSIX.xs: Prefer mbrtowc() over mbtowc()
    
    mbrtowc is reentrant, so use it on threaded perls if available when
    POSIX::mbtowc() is called.

commit 183c400de81e20379ebc6572020121a3b0633024
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 10 22:28:34 2018 -0700

    POSIX.xs: Prefer mbrlen() over mblen()
    
    mbrlen is reentrant, so use it on threaded perls if available when
    POSIX::mblen() is called.

commit dfdb9ca94411438768cb2b80fc19d042a51189c2
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 8 18:21:12 2018 -0700

    locale.c: Revamp fallback detection of UTF-8 locales
    
    This commit continues the process started in the previous few commits to
    improve the detection of whether a locale is UTF-8 or not when the
    platform doesn't have the more modern tools available.
    
    What was done before was examine various texts, like the days of the
    week, in a locale, and see if they are legal UTF-8 or not.  If there
    were any, and all were legal, it assumed that UTF-8 was needed.  If
    there weren't any (as in American English), it looked at the locale's
    name.  This presents false negatives and false positives.
    
    Basically, it adds the constraint that all the texts need to be in the
    same script when interpreted as UTF-8, which basically rules out any
    false positives when the script isn't Latin.  With Latin, it isn't so
    clear cut, as the text can be intermixed with ASCII Latin letters and
    UTF-8 variant sequences that could be some Latin locale, or UTF-8, and
    they just coincidentally happen to be syntactically UTF-8.  Because of
    the structuredness of UTF-8, the odds of a coincidence go down with
    increasing numbers of variants in a row.  This also isn't likely to
    happen with ISO 8859-1, as the bytes that could be legal continuations
    in UTF-8 are almost entirely controls or punctuation.  But in other
    locales in the 8859 series, there are some legal continuations that
    could be part of a month name, say.
    
    As an example of the issues, in 8859-2, one could have \xC6 (C with
    acute) followed by \xB1 (a with ogonek), which in UTF-8 would be
    U+01B1: LATIN CAPITAL LETTER UPSILON.  However, something like \xCD
    (i acute) followed by \xB3 (l with stroke) yields U+0373: GREEK
    SMALL LETTER ARCHAIC SAMPI, and the script check added by this commit
    would catch that.  In non-Latin texts, the only permissible ASCII
    characters would be punctuation, and you aren't going to have many of
    those in the LC_TIME strings, and certainly not in a row.  Instead those
    will consist of at least several variant characters in a row, and the
    odds of those coincidentally being syntactically valid UTF-8 and
    semantically in the same script are exceedingly low.
    
    To catch Latin UTF-8 locales, this commit adds a list of the distinct
    variants found so far.  If there are even just several of these, the
    odds of the syntax being coincidentally UTF-8 greatly diminish.  The
    number needed for this to conclude that the locale is UTF-8, is easily
    tweakable at compile time.
    
    The problem remains for English and other Latin script languages that
    have rare accented characters.  The name is still then examined for
    containing "UTF-8".  Note that previous commits have guaranteed that if
    the locale has a non-ASCII currency symbol that is recognized by
    Unicode, such as the Euro or Pound Sterling, that will correctly be
    recognized.

commit eb9b653a57d867209f91c9825417bf87cc5b9a6d
Author: Karl Williamson <[email protected]>
Date:   Sun Jan 7 16:22:27 2018 -0700

    locale.c: Improved fallback UTF-8 locale detection
    
    This adds some more checks for when the platform lacks mbtowc().  We can
    check if things like isprint(), toupper() match what a UTF-8 locale
    would do.  If not, we can rule out UTF-8.

commit d9d74371da9a8574934b26e8545cac5b6e1dee9d
Author: Karl Williamson <[email protected]>
Date:   Sat Jan 6 16:00:02 2018 -0700

    Improve fallback UTF-8 locale detection
    
    If the libc doesn't have modern enough routines, we use a fallback
    mechanism to see if a locale is UTF-8 or not.  One component of this is
    to look at the byte sequence for the currency symbol.  Obviously, if the
    sequence isn't valid UTF-8, the locale isn't either.  But if it is valid
    UTF-8, and hence might be a UTF-8 locale, this commit changes the
    detection mechanism to see if the sequence evaluates, when interpreted
    as UTF-8 to be a known Unicode currency symbol.  If so, the locale must
    be UTF-8, as the odds of some other locale having a sequence that does
    this are vanishingly small.
    
    If the sequence doesn't evaluate to a currency symbol, that doesn't tell
    us anything, as plenty of places have a string of letters be their
    currency symbol.  Nor if the symbol is a '$', as that is invariant under
    UTF-8 vs not, so doesn't help us.
    
    This pretty much guarantees that a UTF-8 locale for the European Union
    or the UK that otherwise looks like plain English (Latin script) will be
    properly determined to be UTF-8, as the symbols for their currencies
    will pass this test.

commit 31439c2eafcf41a56f85119e36d8f4a56b2df8a4
Author: Karl Williamson <[email protected]>
Date:   Sat Jan 6 14:24:30 2018 -0700

    locale.c: Avoid localeconv()
    
    my_langinfo() is a recently added function which presents a better API
    than localeconv, and returns the needed information here, and is easier
    to make thread-safe.

commit 7a1a577f9259723a59899ec1896ebf55ecfa93a7
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 8 17:37:15 2018 -0700

    locale.c: White-space only
    
    This indents all this code, with no other changes, in preparation for a
    future commit which will add a block around it.

commit 79a81eaee4bce5af05ddc9e68e63a32b2fd3f142
Author: Karl Williamson <[email protected]>
Date:   Sun Jan 7 15:58:52 2018 -0700

    locale.c: Remove branch to label
    
    The code at this label was branched to because it contained common
    cleanup code.  But now that code is in a function, so the cleanup call
    is trivial, so just skip this intermediate label.

commit 4700eed832c78ef8925a6921682162f16f17049e
Author: Karl Williamson <[email protected]>
Date:   Sat Jan 6 12:42:35 2018 -0700

    locale.c: Extract duplicated code into subroutines
    
    These two paradigms are each repeated in 4 places.  Make into two
    subroutines

commit b6a3ba1b72b88a3989616c3992d02d88e20d9cdb
Author: Karl Williamson <[email protected]>
Date:   Fri Jan 5 21:41:27 2018 -0700

    locale.c: Prefer mbrtowc(), as its reentrant
    
    If it's available and this is a threaded build, it's preferred.

commit 213b93b1aef1c1a65b239a689e6001e7dca00ca5
Author: Karl Williamson <[email protected]>
Date:   Sun Jan 7 15:43:01 2018 -0700

    locale.c: White-space only
    
    Indent to correspond with new block from previous commit

commit dbbaff3ffb49374aa4f7366c9b6db01d7a3f8e75
Author: Karl Williamson <[email protected]>
Date:   Fri Jan 5 14:09:40 2018 -0700

    locale.c: Revamp finding if locale is UTF-8
    
    This changes how this functionality works for the LC_CTYPE locale.  On
    systems that have nl_langinfo() one can get a definitive answer from
    just that.  Otherwise (or if that doesn't return properly) one can use
    mbtowc() to check if the UTF-8 byte sequence for the Unicode REPLACEMENT
    CHARACTER actually is considered to be that code point.  This is also
    definitive.  If the maximum byte string length for a character is too
    short to handle all Unicode UTF-8, we know without further checking that
    this isn't a UTF-8 locale, so can avoid the mbtowc check.

commit a4915efc6364bb7a8fe246f666a9184ba22b9474
Author: Karl Williamson <[email protected]>
Date:   Sun Jan 7 15:30:06 2018 -0700

    locale.c: Windows will never be EBCDIC
    
    This adjusts the conditional compilation so that win32 is a subset of
    non-EBCDIC.  This will be useful in the next commit.

commit 2ca0f39c1119c828f2d3a93579c47db4ecd6de25
Author: Karl Williamson <[email protected]>
Date:   Fri Jan 5 12:57:37 2018 -0700

    locale.c: Simplify expression
    
    Since this is operating on C strings, we don't have to check the
    lengths, but can rely on the underlying functions to work.

commit cad220a931a790c75b974779db652048fe5231f2
Author: Karl Williamson <[email protected]>
Date:   Fri Jan 5 11:35:00 2018 -0700

    Change some "shouldn't happen" failures into panics
    
    If the system is so broken that these libc calls are failing, soldiering
    on won't lead to sane results.
    
    THis rewords some existing panics, and adds the errno to the output for
    all of them.

commit 447d3c0b8024085382851b6f8dc0e345ecb8e78c
Author: Karl Williamson <[email protected]>
Date:   Tue Jan 2 16:54:28 2018 -0700

    Cache locale UTF8-ness lookups
    
    Some locales are UTF-8, some are not.  Knowledge of this is needed in
    various circumstances.  This commit saves the results of the last
    several lookups so they don't have to be recalculated each time.
    
    The full generality of POSIX locales is such that you can have error
    messages be displayed in one locale, say Spanish, while other things are
    in French.  To accommodate this generality, the program can loop through
    all the locale categories finding the UTF8ness of the locale it points
    to.  However, in almost all instances, people are going to be in either
    French or in Spanish, and not in some combination.  Suppose it is a
    French UTF-8 locale for all categories.  This new cache will know that
    the French locale is UTF-8, and the queries for all but the first
    category can return that immediately.
    
    This simple cache avoids the overhead of hashes.
    
    This also fixes a bug I realized exists in threaded perls, but haven't
    reproduced.  We do not support locales in such perls, and the user must
    not change the locale or 'use locale'.  But perl itself could change the
    locale behind the scenes, leading to segfaults or incorrect results.
    One such instance is the determination of UTF8ness.  But this only could
    happen if the full generality of locales is used so that the categories
    are not all in the same locale.  This could only happen (if the user
    doesn't change locales) if the environment is such that the perl program
    is started up so that the categories are in such a state.  This commit
    fixes this potential bug by caching the UTF8ness of each category at
    startup, before any threads are instantiated, and so checking for it
    later just looks it up in the cache, without perl changing the locale.

commit c4ae4d06324c461b2f637e678da2cad6ca20f0d6
Author: Karl Williamson <[email protected]>
Date:   Tue Jan 2 14:23:24 2018 -0700

    locale.c: Avoid duplicate work
    
    As the comments say, the needed value is already readily available

commit 69abaf328a0da5f95b476ddeb70fac5895c35fd6
Author: Karl Williamson <[email protected]>
Date:   Tue Jan 2 13:38:16 2018 -0700

    locale.c: Avoid some work
    
    We've already worked out whether the decimal point is a dot or not.  We
    can pass that information to the called routine so it doesn't have to
    figure it out again.

commit 7d3e12d0ca1eb18bba2f7a1eb5100ae9d010c790
Author: Karl Williamson <[email protected]>
Date:   Tue Jan 2 13:19:03 2018 -0700

    locale.c: Use non-control for a format dummy
    
    We need a plain character here.  I used a '\e' before, but it would be
    better to have something that isn't a control, so just change it to a
    blank

commit c81fdf19499368db64e33ca6caa0c7338099b007
Author: Karl Williamson <[email protected]>
Date:   Thu Jan 25 11:28:54 2018 -0700

    locale.c: Create a block around some code; indent
    
    Under some configurations depending on platform and Configure options,
    these declarations are not at the beginning of a block. violating C
    language rules.

commit d85a09f58088b6a31f6c0f8f46fd023354d78e45
Author: Karl Williamson <[email protected]>
Date:   Tue Jan 2 12:25:35 2018 -0700

    locale.c: Avoid some more locale changes
    
    In a few places here we can test if we are already in the locale we want
    to be in, and not switch unnecessarily if so.

commit 765d2c882a4935e42ed3b53db2948111391cb3a1
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 1 23:03:34 2018 -0700

    Avoid some unnecessary changing of locales
    
    The LC_NUMERIC locale category is kept so that generally the decimal
    point (radix) is a dot.  For some (mostly) output purposes, it needs to
    be swapped into the program's current underlying locale so that a
    non-dot can be printed.
    
    This commit changes things so that if the current underlying locale uses
    a decimal point, the swap doesn't happen, as it's not needed.

commit d047812f6140ef08e5d20ff372620ca9cdad0c2a
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 1 22:20:25 2018 -0700

    perl.h: White-space only

commit a74b0763f84165906f13352fed4e36332e4b90c2
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 1 20:41:21 2018 -0700

    locale.c: Add compile check for unimplemented behavior
    
    Instead of silently not working.

commit 09996a85df1951c9313c51854e0c1a0ccd5e6378
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 1 20:30:39 2018 -0700

    locale.c: White-space only
    
    Indent because the previous commit created an enclosing block, and
    add a blank line elsewhere

commit fc718d2668efc5b13528c3ceb6c235489c9d503c
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 1 20:00:03 2018 -0700

    locale.c: Refactor Ultrix code
    
    Examination shows that this code does nothing unless LC_ALL is defined.
    So explicitly test at compile time for that.
    
    Also, two variables don't have to be declared so globally, and by
    reducing their scope, by creating a new block we don't have to have
    PERL_UNUSED_ARG()s for them

commit 7a067f830d63f12f9cf1550da08ddafaea8ae562
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 1 19:07:19 2018 -0700

    locale.c: Avoid rescanning a string
    
    We can use a parameter to find out where in the string the portion of
    interest starts.  Do that to avoid starting again from scratch.

commit 03f4aa34bacafdacb0a1a26ce41c53c289d10256
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 1 18:33:59 2018 -0700

    locale.c: Use fcns instead of macros
    
    Here the macros being used expand into the functions being called,
    without adding any value to using the macros, and making things slightly
    less clear.

commit e966a0dc50350fd23e1cd31523686d30212952d8
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 1 18:17:41 2018 -0700

    locale.c: Add const to several variables

commit 450a936c28d9a685b388afbad1b9d8c20443af37
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 1 18:15:27 2018 -0700

    locale.c: Improve, add comments

commit 8c0653acfefaf9f7954ad5a7f9d9f291520a03c2
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 1 18:01:45 2018 -0700

    perl.h: Add comment, rephrase another

commit 8fbf304aff5b326a1ff4074543ae3987aa6b0954
Author: Karl Williamson <[email protected]>
Date:   Sat Nov 18 17:34:25 2017 -0700

    Perl_langinfo: Teach about YESSTR and NOSTR
    
    These are items that nl_langinfo() used to be required to return, but
    are considered obsolete.  Nonetheless, this drop-in replacement for that
    function should know about them for backward compatibility.

commit 074d6c2b60076f7fa6493deddcdcdb9c2c346c8c
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 1 15:07:45 2018 -0700

    APItest/t/locale.t: Add some tests
    
    This makes sure that the entries for which the expected return value may
    legitimately vary from platform to platform get tested as returning
    something,  skipping the test if the item isn't known on the platform.
    
    A couple of comments are also added.

commit 3cf284bff3b7a4d22d462b8894acc1678b8c14c9
Author: Karl Williamson <[email protected]>
Date:   Mon Aug 28 18:01:43 2017 -0600

    XXX may include other things after final edits: 
ExtUtils::ParseXS/lib/perlxs.pod: Nits
    
    This removes extra blanks following colons that don't mean the normal
    thing for colons that traditionally have two spaces after them, and
    capitalizes Perl.

commit 8e34594ce7c018630b402b425621cdc74c777d94
Author: Karl Williamson <[email protected]>
Date:   Wed Jul 26 08:59:33 2017 -0600

    Teach perl about more locale categories
    
    glibc has various other categories than the ones perl handles, for
    example LC_PAPER.  This commit adds knowledge of these to perl, so that
    one can set them, interrogate them, and have libraries work on them,
    even though perl itself does not.
    
    This is in preparation for future commits, where it becomes more
    important than currently for perl to know about all the locale
    categories on the system.
    
    I looked through various other systems to try to find other categories,
    but did not see any.  If a system does have such a category, it is
    pretty easy to tell perl about it, and recompile.  Use the changes in
    this commit as a template, and send an email to [email protected], so
    that the next Perl release will have it.

commit 0ccaffe7f86349c4611d21d435d816dcc248a542
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 3 20:41:29 2018 -0700

    Add check that "$!" is correctly interpreted as UTF-8
    
    We sometimes need to know if an error message is UTF-8 or not.
    Previously we checked that it is syntactically valid UTF-8, and that the
    LC_MESSAGES locale is UTF-8.  But some systems, notably Windows, do not
    have LC_MESSAGES.  For those, this commit adds a different, semantic,
    check that the text of the message when interpreted as UTF-8 is all in
    the same Unicode script.  This is not foolproof, unlike the LC_MESSAGES
    check, but it's better than what we have now for such systems.  It
    likely is foolproof for non-Latin locales, as any message will have a
    bunch of characters in that locale, and no ASCII Latin ones.  For a
    Latin locale, these ASCII letters could be intermixed with the UTF-8
    ones, causing potential ambiguity.

commit e51dd1e997306e7c46e38366e393dfce1e3ac91d
Author: Karl Williamson <[email protected]>
Date:   Tue Nov 14 22:27:06 2017 -0700

    Remove uncompilable code
    
    This code was never compiled because of a misspelling in the #ifdef.
    No problem surfaced, so just remove it.  The next commit adds a different
    check.

commit 1abb8c87877d1a8ad05f1982b37441cf7c1859c0
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 8 19:11:52 2018 -0700

    XXX rethink empty script_run

commit 9618b06bb26040e06cfcce1438f4e57725ccd118
Author: Karl Williamson <[email protected]>
Date:   Mon Jan 8 19:08:54 2018 -0700

    perl.c: Move initialization of inversion lists
    
    This is now done very early in the file, as it may be needed for
    initializing the locale handling.

commit 6f794cc5312301a921275f2b4d9fa279b8d676e1
Author: Karl Williamson <[email protected]>
Date:   Thu Jan 18 14:09:24 2018 -0700

    isSCRIPT_RUN: Document in perlintern

commit c2b1bd40f73bbf65b11b76c5fd2e722ff8c29e14
Author: Karl Williamson <[email protected]>
Date:   Thu Jan 18 14:08:47 2018 -0700

    isSCRIPT_RUN: A sequence of entirely Inherited chars is Inherited

commit 3883697d493c3654283d40c507f94a5cb524a031
Author: Karl Williamson <[email protected]>
Date:   Thu Jan 18 14:07:43 2018 -0700

    regexec.c: Add comment

commit 71bb7067ce0e8ccbbc51702a58ba7a197ac3a0cc
Author: Karl Williamson <[email protected]>
Date:   Thu Jan 18 14:05:23 2018 -0700

    Fix bug in isSCRIPT_RUN with digit following unassigned
    
    This was being treated as a run, but shouldn't be one.

commit 3930507a64b6ca99f1464a803da2ed9f80f35874
Author: Karl Williamson <[email protected]>
Date:   Thu Jan 18 13:00:06 2018 -0700

    isSCRIPT_RUN: Can short cut if not in UTF-8
    
    All characters representable by single bytes are either Common or Latin,
    so must be a script run.  If we aren't asking for what the script is we
    can return immediately.  If we are, the run is Latin if any character in
    it is Latin, otherwise is Common.

commit 1f5420c4c1a8b923eeeae26f7faa7de97381f13e
Author: Karl Williamson <[email protected]>
Date:   Sat Jan 6 21:16:15 2018 -0700

    Give isSCRIPT_RUN() an extra parameter
    
    This allows it to return the script of the run.

commit 4d4bd006361d4f078c3b9207fc94c452143fe974
Author: Karl Williamson <[email protected]>
Date:   Sat Jan 6 16:15:12 2018 -0700

    charclasslists.h: script enums visible to CORE,EXT
    
    This exposes the enum definitions for the script extensions property to
    the perl code and extensions, for use in future commits.

commit 2a27d735a9d8c4eeb0d4f32fa2e28a9a149d9c90
Author: Karl Williamson <[email protected]>
Date:   Sat Jan 6 16:13:06 2018 -0700

    regen/mk_invlists.pl: Allow override of where enums get defined
    
    This adds code so that the enums defined by this, which are ordinarily
    only used by regexec.c ban be specified to be somewhere else instead.

commit 35ca3d806b25135eac49e17ee22ddf9ca0439080
Author: Karl Williamson <[email protected]>
Date:   Sat Jan 6 16:09:57 2018 -0700

    regen/mk_invlists.pl: Allow multiple files to access
    
    This changes the code so that the symbols defined by this program
    can be #define'd in more than one file.

commit 92b38f42e3ee4664f0e14da0cf3595ff96b0a05b
Author: Karl Williamson <[email protected]>
Date:   Thu Jan 18 14:02:33 2018 -0700

    regexec.c: Fix typo in comment

commit fd5264c31ce46b816a4137daa40e6925c43b8ce7
Author: Karl Williamson <[email protected]>
Date:   Sat Jan 6 16:18:45 2018 -0700

    Fix bug in script runs that start with Common
    
    This is a follow on to 8535a06fea02528fe726855a139fcbd360d1fc6e.  That
    fixed one case where the first character was in the Common script,
    things did not work properly.  It did not catch the case where a future
    character in the string was non-Common from a script that has its own
    set of digits, and this commit fixes that.
    
    This just entails a block of code to slightly earlier.

commit d8ef750aaa2a832ad918be91d4cac48e47f0172c
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 10 17:10:09 2018 -0700

    locale.c: Make sure variable is always defined
    
    A future commit assumes this variable is there even on non-DEBUGGING
    builds.  #define it to 0 for those.

commit 69b5a3b63455af22268e2a05a0eb84b1fddb823d
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 17 17:01:00 2018 -0700

    my_atof(): Lock dot radix
    
    This commit shows some redundant checks.  It examines the text and if it
    finds a dot in the middle of the number, and the locale is expecting
    something else, it toggles LC_NUMERIC to be the C locale so that the dot
    is understood.  However, during further parsing, grok_numeric_radix()
    gets called and sees that the locale shouldn't be C, and toggles it
    back.  That ordinarily would cause the dot to not be recognized, but
    this function always recognizes a dot no matter what the locale.  So
    none of our tests fails.  I'm not sure if this is always the case, and I
    don't understand this area of the code all that well, but there is a
    simple way to cause grok_numeric_radix to not change the locale back,
    and that is to call the macro LOCK_LC_NUMERIC_STANDARD() when changing
    it the first time in my_atof().  The purpose of this macro is precisely
    this situation, so that recursed calls don't try to override the
    decisions of the outer calls

commit f84dd337e4c8b82bf83f588e965795fbb9507fa1
Author: Karl Williamson <[email protected]>
Date:   Wed Jan 24 15:57:30 2018 -0700

    hints/hpux.sh: HP-UX mbrlen() and mbrtowc() don't work
    
    In spite of there being man pages for these, the #include file doesn't
    define the mbstate_t type which is required for a parameter to these
    functions.
    
    Perhaps the Configure probe could be enhanced so it doesn't return
    defined unless these can be successfully compiled, but for now use the
    hints file.

commit 530401da4af586fd50df6ec55d4d1f8f4594e73c
Author: Karl Williamson <[email protected]>
Date:   Sun Jan 21 10:08:33 2018 -0700

    perlembed: Fix typos
    
    Perl is capitalized when referring to the language; lowercased when
    referring to a particular executable.

-----------------------------------------------------------------------

-- 
Perl5 Master Repository

Reply via email to