In perl.git, the branch smoke-me/khw-new_locale has been created
<http://perl5.git.perl.org/perl.git/commitdiff/e2b415cde165779e626a6c564da8900b20cf7879?hp=0000000000000000000000000000000000000000>
at e2b415cde165779e626a6c564da8900b20cf7879 (commit)
- Log -----------------------------------------------------------------
commit e2b415cde165779e626a6c564da8900b20cf7879
Author: Karl Williamson <[email protected]>
Date: Wed Apr 13 14:07:22 2016 -0600
adapt
M locale.c
commit 67dc442f3c8242e13ffd40095a9a6fc4fd6e1828
Author: Karl Williamson <[email protected]>
Date: Tue Apr 12 14:28:57 2016 -0600
locale.c: XXX not so aggressive guess incre
On platforms where strxfrm() is not well-behaved and it fails because
it needs a larger buffer, prior to this commit, the size was doubled
before trying again. This could require a lot of memory on large
inputs. This commit changes it so it is not so aggressive. I think the
size prediction is better due to a recent commit, and there isn't much
of a downside in not gobbling up memory so fast (although the excess is
soon freed).
M locale.c
commit 421fca46b54f37bdbb0c8e4c10a27be8307c991d
Author: Karl Williamson <[email protected]>
Date: Tue Apr 12 14:26:53 2016 -0600
locale.c: Add some debugging statements
M locale.c
commit 4dc8bccdb301981dc11efd79706b484c32f03cca
Author: Karl Williamson <[email protected]>
Date: Thu Apr 14 11:53:51 2016 -0600
locale.c: Minor cleanup
This replaces an expression with what I think is an easier to understand
macro, and eliminates a couple of temporary variables that just
cluttered things up.
M locale.c
commit f36b067e171d8c62dd319d2aa337396ac198a364
Author: Karl Williamson <[email protected]>
Date: Tue Apr 12 14:19:21 2016 -0600
locale.c: Fix some debugging so will output during initialization
Because the command line options are currently parsed after the locale
initialization is done, an environment variable is read to allow
debugging of the function that is called to do the initialization.
However, any functions that it calls, prior to this commit, were unaware
of this and so did not output debugging. This commit fixes most of
them.
M locale.c
commit 16802308f3ce6bcc12bb5cda262f5a750c5cc388
Author: Karl Williamson <[email protected]>
Date: Tue Apr 12 13:54:32 2016 -0600
perllocale: Document collation changes
M pod/perllocale.pod
commit dcaea06a125001f8381870a1f3065648b4825afb
Author: Karl Williamson <[email protected]>
Date: Tue Apr 12 13:51:48 2016 -0600
perllocale: Change headings so two aren't identical
Two html anchors in this pod were identical, which isn't a problme
unless you try to link to one of them, as the next commit does
M pod/perllocale.pod
commit 52d72ffb2b5e6411d9f1e7e71094eae53cc78d8e
Author: Karl Williamson <[email protected]>
Date: Tue Apr 12 12:49:36 2016 -0600
mv function from locale.c to mathoms.c
The previous function causes this function being moved to be just a
wrapper not called in core. Just in case someone is calling it, it is
retained, but moved to mathoms.c
M embed.fnc
M embed.h
M locale.c
M mathoms.c
M proto.h
commit cd74b0ab1afd53d1ec497f082e3970149f72d51c
Author: Karl Williamson <[email protected]>
Date: Tue Apr 12 12:17:48 2016 -0600
XXX tests Do better locale collation in UTF-8 locales
strxfrm() works reasonably well on some platforms under UTF-8 locales.
It will assume that every string passed to it is in UTF-8. This commit
changes perl to make sure that strxfrm's expectations are met.
Likewise under a non-UTF-8 locale, strxfrm is expecting a non-UTF-8
string. And this commit makes sure of that. If the passed string
contains code points representable only in UTF-8, they are changed into
the highest collating code point that doesn't require UTF-8. This
provides seamless operation, as they end up collating after every
non-UTF-8 code point. If two transformed strings compare equal, perl
already uses the un-transformed versions to break ties, and there, these
faked-up strings will collate after everything else, and in code point
order amongst themselves.
M embed.fnc
M embed.h
M embedvar.h
M intrpvar.h
M locale.c
M proto.h
M sv.c
commit fdfcb1f1f15c6dce742fe93ddf47502cf68ac48c
Author: Karl Williamson <[email protected]>
Date: Tue Apr 12 11:21:40 2016 -0600
XXX delta, RT Change calculation of locale collation constants
Every time a new collation locale is set, two constants are calculated
that are used in predicting how much space is needed in the
transformation of a string by strxfrm(). The transformed string is
roughly linear with the the length of the input string, so we are
calcaulating 'm' and 'b' such that
transformed_length = m * input_length + b
Space is allocated based on this prediction. If it is too small, the
strxfrm() will fail, and we will have to increase the allotted amount
and try again. It's better to get the prediction right to avoid
multiple, expensive strxfrm() calls.
Prior to this commit, the calculation was not rigorous, and failed on
some platforms that don't have a fully conforming strxfrm().
This commit changes to not panic if a locale has an apparent defective
collation, but instead silently ignores it. It could be argued that a
warning should instead be raised.
This commit fixes [perl #121734].
M locale.c
commit 2f4351c6da3169141491137461e6962d87485e6d
Author: Karl Williamson <[email protected]>
Date: Mon Apr 11 19:11:07 2016 -0600
locale.c: Change algorithm for strxfrm() trials
It's kind of guess work deciding how big a buffer to give to strxfrm().
If you give it too small a one, it will fail. Prior to this commit, the
buffer size was doubled and then strxfrm() was called again, looping
until it worked, or we used too much memory.
Each time a new locale is made, we try to minimize the necessity of
doing this by calculating numbers 'm' and 'b' that can be plugged into
the equation
mx + b
where 'x' is the size of the string passed to strxfrm(). strxfrm() is
roughly linear with respect to its input's length, so this generally
works without us having to do many loops to get a large enough size.
But on many systems, strxfrm(), in failing, returns how much space you
should have given it. On such systems, we can just use that number on
the 2nd try and not have to keep guessing. This commit changes to do
that.
But on other systems this doesn't work. So the original method is
retained if the 2nd try didn't work (or the return value of the original
strxfrm() is such that we know immediately that it isn't well behaved).
M locale.c
commit 726749784a67fb46afca1a96ea707051cebfc40d
Author: Karl Williamson <[email protected]>
Date: Sat Apr 9 20:40:48 2016 -0600
locale.c: Free over-allocated space early
We may over malloc some space in buffers to strxfrm(). This frees it
now instead of waiting for the whole block to be freed sometime later.
This can be a significant amount of memory if the input string to
strxfrm() is long.
M locale.c
commit 5f3bfa9895bb56c69df9eb963de3be3a9f084981
Author: Karl Williamson <[email protected]>
Date: Sat Apr 9 20:36:01 2016 -0600
locale.c: White-space only
Outdent and reflow because the previous commit removed an enclosing
block.
M locale.c
commit 52df7932eca4f034f90b52ec10a3d7cd4e462527
Author: Karl Williamson <[email protected]>
Date: Sat Apr 9 15:52:05 2016 -0600
XXX Tests Use different algorithm in mem_collxfrm() to handle embedded NULs
Perl uses strxfrm() to handle collation. This C library function
expects a NUL-terminated input string. But Perl accepts interior NUL
charaters, so something has to happen.
Until this commit, what happened was that each NUL-terminated
sub-segment would be individually passed to strxfrm(), with the results
concatenated together to form the transformation of the whole string
with NULs ignored. But this isn't guaranteed to give good results, as
strxfrm() is highly context sensitive, and needs the whole string, not
segments, to work properly. The way strxfrm() works, more or less, is
that it returns a string consisting of the primary weights, in order,
of the characters of the input, concatenated with the secondary weights,
and so on. Giving strxfrm() only substrings defeats this.
Another possibility would be to just remove the NULs before transforming
the string. The problem with this method is that it screws up the
context. In some locales, two adjacent characters can behave
differently than if they were separated.
What this commit does is to change to replace each NUL with a \001.
\001 is almost certainly going to behave like we expect a NUL would if
it were legal. Just about every locale treats low code points as
controls, to be ignored in at least primary weighting.
And this method gives the expected sort order. This is because perl
uses the original strings as a tie breaker. So, given two strings, one
that originally had \001, and one that differed only in that it had \000
instead, they both will get the same transformation, so will sort equal
there, but the tie breaker will cause the one with NULs to sort first.
As stated in the comments, we could go through the first 256 code points
to determine the lowest collating one, instead of assuming it is \001.
But this is a lot of work (UTF-8ness must be considered) and it will be
extremely rare that the answer isn't going to be \001.
M embed.fnc
M locale.c
M proto.h
commit 926c567be0c93cd31de1468369f567ae8178b493
Author: Karl Williamson <[email protected]>
Date: Sat Apr 9 15:03:48 2016 -0600
locale.c, sv.c: Add some comments
And a couple empty lines
M locale.c
M sv.c
commit 5886831703e0e75b60a13dd8ac58c1a79aee071e
Author: Karl Williamson <[email protected]>
Date: Sat Apr 9 15:16:59 2016 -0600
locale.c: Some nano-optimizations
Reorder two branches so the most likely is tested before the much less
likely, and add some UNLIKELY()
M locale.c
commit fe89eefc2074f490d1dabc8f6e374b24d07086ca
Author: Karl Williamson <[email protected]>
Date: Sat Apr 9 14:47:21 2016 -0600
locale.c: Clarify a debugging statement
M locale.c
commit e4624b99c4f0c1007d5023168eb3131467f3a5c3
Author: Karl Williamson <[email protected]>
Date: Fri Apr 8 13:46:24 2016 -0600
XXX 5.25 strxfrm, cautions
M ext/POSIX/lib/POSIX.pod
commit 78cc9eb823bee5732e9421803d1b65f3601a3792
Author: Tony Cook <[email protected]>
Date: Mon Mar 21 12:12:58 2016 +1100
add d_duplocale and i_locale Configure probes
M Configure
M Cross/config.sh-arm-linux
M NetWare/config.wc
M Porting/Glossary
M Porting/config.sh
M config_h.SH
M configure.com
M plan9/config_sh.sample
M symbian/config.sh
M uconfig.h
M uconfig.sh
M uconfig64.sh
M win32/config.ce
M win32/config.gc
M win32/config.vc
-----------------------------------------------------------------------
--
Perl5 Master Repository