[perl.git] branch smoke-me/khw-new_locale, created. v5.24.0-RC1-32-g46d6c85

Karl Williamson Tue, 19 Apr 2016 12:27:54 -0700

In perl.git, the branch smoke-me/khw-new_locale has been created

<http://perl5.git.perl.org/perl.git/commitdiff/46d6c8578f21f9fed1378b888e8117f005121c01?hp=0000000000000000000000000000000000000000>


        at  46d6c8578f21f9fed1378b888e8117f005121c01 (commit)

- Log -----------------------------------------------------------------
commit 46d6c8578f21f9fed1378b888e8117f005121c01
Author: Karl Williamson <[email protected]>
Date:   Tue Apr 19 13:26:02 2016 -0600

    comment

M       locale.c

commit d33d215c6e44ff60ed1c9a3ccbba851d1ab00904
Author: Karl Williamson <[email protected]>
Date:   Tue Apr 19 13:25:14 2016 -0600

    comment

M       locale.c

commit f289a465116237ae61a85350c632a1b42d558da8
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 13 14:07:22 2016 -0600

    adapt

M       locale.c

commit e89b052a4ea533abe66d17e664e37a94fb4b2881
Author: Karl Williamson <[email protected]>
Date:   Tue Apr 12 14:28:57 2016 -0600

    locale.c: XXX not so aggressive guess incre
    
    On platforms where  strxfrm() is not well-behaved and it fails because
    it needs a larger buffer, prior to this commit, the size was doubled
    before trying again.  This could require a lot of memory on large
    inputs.  This commit changes it so it is not so aggressive.  I think the
    size prediction is better due to a recent commit, and there isn't much
    of a downside in not gobbling up memory so fast (although the excess is
    soon freed).

M       locale.c

commit 0004d93cb1562d657dce82d9704975a894abe862
Author: Karl Williamson <[email protected]>
Date:   Tue Apr 12 14:26:53 2016 -0600

    locale.c: Add some debugging statements

M       locale.c

commit caeda719bfdc41370eee4bbf25708574f0046fc6
Author: Karl Williamson <[email protected]>
Date:   Thu Apr 14 11:53:51 2016 -0600

    locale.c: Minor cleanup
    
    This replaces an expression with what I think is an easier to understand
    macro, and eliminates a couple of temporary variables that just
    cluttered things up.

M       locale.c

commit 54a0b67575c6974879ae8d09e7401a6b600fa412
Author: Karl Williamson <[email protected]>
Date:   Tue Apr 12 14:19:21 2016 -0600

    locale.c: Fix some debugging so will output during initialization
    
    Because the command line options are currently parsed after the locale
    initialization is done, an environment variable is read to allow
    debugging of the function that is called to do the initialization.
    However, any functions that it calls, prior to this commit, were unaware
    of this and so did not output debugging.  This commit fixes most of
    them.

M       locale.c

commit ce5cd7a8d55f89d79cd999947107fef8bf48d2b9
Author: Karl Williamson <[email protected]>
Date:   Tue Apr 12 13:54:32 2016 -0600

    perllocale: Document collation changes

M       pod/perllocale.pod

commit d99bfc78386c85ebda40e1f934e154a4b96d6184
Author: Karl Williamson <[email protected]>
Date:   Tue Apr 12 13:51:48 2016 -0600

    perllocale: Change headings so two aren't identical
    
    Two html anchors in this pod were identical, which isn't a problme
    unless you try to link to one of them, as the next commit does

M       pod/perllocale.pod

commit 300bea9b8b600d7e279f0e74a562ea7f9edf7839
Author: Karl Williamson <[email protected]>
Date:   Tue Apr 12 12:49:36 2016 -0600

    mv function from locale.c to mathoms.c
    
    The previous function causes this function being moved to be just a
    wrapper not called in core.  Just in case someone is calling it, it is
    retained, but moved to mathoms.c

M       embed.fnc
M       embed.h
M       locale.c
M       mathoms.c
M       proto.h

commit cf964de27c46db8887ab1fdba8b94c0e22f28e8c
Author: Karl Williamson <[email protected]>
Date:   Tue Apr 12 12:17:48 2016 -0600

    XXX tests Do better locale collation in UTF-8 locales
    
    strxfrm() works reasonably well on some platforms under UTF-8 locales.
    It will assume that every string passed to it is in UTF-8.  This commit
    changes perl to make sure that strxfrm's expectations are met.
    
    Likewise under a non-UTF-8 locale, strxfrm is expecting a non-UTF-8
    string.   And this commit makes sure of that.  If the passed string
    contains code points representable only in UTF-8, they are changed into
    the highest collating code point that doesn't require UTF-8.  This
    provides seamless operation, as they end up collating after every
    non-UTF-8 code point.  If two transformed strings compare equal, perl
    already uses the un-transformed versions to break ties, and there, these
    faked-up strings will collate after everything else, and in code point
    order amongst themselves.

M       embed.fnc
M       embed.h
M       embedvar.h
M       intrpvar.h
M       locale.c
M       proto.h
M       sv.c

commit 0bc6ad9732f654b496e0ce6cb2741a1b30bcda73
Author: Karl Williamson <[email protected]>
Date:   Tue Apr 12 11:21:40 2016 -0600

    XXX delta, RT Change calculation of locale collation constants
    
    Every time a new collation locale is set, two constants are calculated
    that are used in predicting how much space is needed in the
    transformation of a string by strxfrm().  The transformed string is
    roughly linear with the the length of the input string, so we are
    calcaulating 'm' and 'b' such that
    
        transformed_length = m * input_length + b
    
    Space is allocated based on this prediction.  If it is too small, the
    strxfrm() will fail, and we will have to increase the allotted amount
    and try again.  It's better to get the prediction right to avoid
    multiple, expensive strxfrm() calls.
    
    Prior to this commit, the calculation was not rigorous, and failed on
    some platforms that don't have a fully conforming strxfrm().
    
    This commit changes to not panic if a locale has an apparent defective
    collation, but instead silently ignores it.  It could be argued that a
    warning should instead be raised.
    
    This commit fixes [perl #121734].

M       locale.c

commit 1cb81b975cf60a01b7713a1cc951e434fe75135b
Author: Karl Williamson <[email protected]>
Date:   Mon Apr 11 19:11:07 2016 -0600

    locale.c: Change algorithm for strxfrm() trials
    
    It's kind of guess work deciding how big a buffer to give to strxfrm().
    If you give it too small a one, it will fail.  Prior to this commit, the
    buffer size was doubled and then strxfrm() was called again, looping
    until it worked, or we used too much memory.
    
    Each time a new locale is made, we try to minimize the necessity of
    doing this by calculating numbers 'm' and 'b' that can be plugged into
    the equation
    
        mx + b
    
    where 'x' is the size of the string passed to strxfrm().  strxfrm() is
    roughly linear with respect to its input's length, so this generally
    works without us having to do many loops to get a large enough size.
    
    But on many systems, strxfrm(), in failing, returns how much space you
    should have given it.  On such systems, we can just use that number on
    the 2nd try and not have to keep guessing.  This commit changes to do
    that.
    
    But on other systems this doesn't work.  So the original method is
    retained if the 2nd try didn't work (or the return value of the original
    strxfrm() is such that we know immediately that it isn't well behaved).

M       locale.c

commit 6ce73761a0e3ee4a8a11ab558a52d1b36a6306ca
Author: Karl Williamson <[email protected]>
Date:   Sat Apr 9 20:40:48 2016 -0600

    locale.c: Free over-allocated space early
    
    We may over malloc some space in buffers to strxfrm().  This frees it
    now instead of waiting for the whole block to be freed sometime later.
    This can be a significant amount of memory if the input string to
    strxfrm() is long.

M       locale.c

commit 3dbb06015b8f87af2d732887d40a6717e2679263
Author: Karl Williamson <[email protected]>
Date:   Sat Apr 9 20:36:01 2016 -0600

    locale.c:  White-space only
    
    Outdent and reflow because the previous commit removed an enclosing
    block.

M       locale.c

commit 749cbbb314cf126b9999c469cbf38d7ad5be9462
Author: Karl Williamson <[email protected]>
Date:   Sat Apr 9 15:52:05 2016 -0600

    XXX Tests Use different algorithm in mem_collxfrm() to handle embedded NULs
    
    Perl uses strxfrm() to handle collation.  This C library function
    expects a NUL-terminated input string.  But Perl accepts interior NUL
    charaters, so something has to happen.
    
    Until this commit, what happened was that each NUL-terminated
    sub-segment would be individually passed to strxfrm(), with the results
    concatenated together to form the transformation of the whole string
    with NULs ignored.  But this isn't guaranteed to give good results, as
    strxfrm() is highly context sensitive, and needs the whole string, not
    segments, to work properly.  The way strxfrm() works, more or less,  is
    that it returns a string consisting of the primary weights, in order,
    of the characters of the input, concatenated with the secondary weights,
    and so on.  Giving strxfrm() only substrings defeats this.
    
    Another possibility would be to just remove the NULs before transforming
    the string.  The problem with this method is that it screws up the
    context.  In some locales, two adjacent characters can behave
    differently than if they were separated.
    
    What this commit does is to change to replace each NUL with a \001.
    \001 is almost certainly going to behave like we expect a NUL would if
    it were legal.  Just about every locale treats low code points as
    controls, to be ignored in at least primary weighting.
    
    And this method gives the expected sort order.  This is because perl
    uses the original strings as a tie breaker.  So, given two strings, one
    that originally had \001, and one that differed only in that it had \000
    instead, they both will get the same transformation, so will sort equal
    there, but the tie breaker will cause the one with NULs to sort first.
    
    As stated in the comments, we could go through the first 256 code points
    to determine the lowest collating one, instead of assuming it is \001.
    But this is a lot of work (UTF-8ness must be considered) and it will be
    extremely rare that the answer isn't going to be \001.

M       embed.fnc
M       locale.c
M       proto.h

commit 48a7af23ce6ee7c9c5fd8bb4660dcb6d26f98274
Author: Karl Williamson <[email protected]>
Date:   Sat Apr 9 15:03:48 2016 -0600

    locale.c, sv.c: Add some comments
    
    And a couple empty lines

M       locale.c
M       sv.c

commit 97977e76bba4d52b86f7619927a72242a6a2b74c
Author: Karl Williamson <[email protected]>
Date:   Sat Apr 9 15:16:59 2016 -0600

    locale.c: Some nano-optimizations
    
    Reorder two branches so the most likely is tested before the much less
    likely, and add some UNLIKELY()

M       locale.c

commit 5d9d5d0265c06528447d50c4d8a7e36b6a2e62b4
Author: Karl Williamson <[email protected]>
Date:   Sat Apr 9 14:47:21 2016 -0600

    locale.c: Clarify a debugging statement

M       locale.c

commit 917528fd9b02a806c4381826b41cc9d1c11d73f6
Author: Karl Williamson <[email protected]>
Date:   Fri Apr 8 13:46:24 2016 -0600

    XXX 5.25 strxfrm, cautions

M       ext/POSIX/lib/POSIX.pod

commit 954392da8720867ead32088bd3483655b6623428
Author: Tony Cook <[email protected]>
Date:   Mon Mar 21 12:12:58 2016 +1100

    add d_duplocale and i_locale Configure probes

M       Configure
M       Cross/config.sh-arm-linux
M       NetWare/config.wc
M       Porting/Glossary
M       Porting/config.sh
M       config_h.SH
M       configure.com
M       plan9/config_sh.sample
M       symbian/config.sh
M       uconfig.h
M       uconfig.sh
M       uconfig64.sh
M       win32/config.ce
M       win32/config.gc
M       win32/config.vc
-----------------------------------------------------------------------

--
Perl5 Master Repository

[perl.git] branch smoke-me/khw-new_locale, created. v5.24.0-RC1-32-g46d6c85

Reply via email to