In perl.git, the branch smoke-me/khw-new_locale has been created
<http://perl5.git.perl.org/perl.git/commitdiff/275f5e1e59f1bee34779473a4d664b7de4336ff5?hp=0000000000000000000000000000000000000000>
at 275f5e1e59f1bee34779473a4d664b7de4336ff5 (commit)
- Log -----------------------------------------------------------------
commit 275f5e1e59f1bee34779473a4d664b7de4336ff5
Author: Karl Williamson <[email protected]>
Date: Wed Apr 13 10:43:38 2016 -0600
f
M locale.c
commit cc71c63fcfdb553647845a69d63dbe55a07c94b1
Author: Karl Williamson <[email protected]>
Date: Tue Apr 12 14:28:57 2016 -0600
locale.c: XXX
When strxfrm() fails due to needing a larger buffer, prior to this
commit, the size was doubled before trying again. This could require a
lot of memory on large inputs. This commit changes it so it is not so
aggressive. There are two changes that I think warrant this. One is
that the formula that is computed about how large to guess is needed is
more accurate than before, and so our guess is not likely to be that far
off. The second is that on many platforms, we can calculate precisely
how much is needed, which we do after two times of it not being enough.
M locale.c
commit bbe98f0a3624a84e1ce4be00da439cd1092e4679
Author: Karl Williamson <[email protected]>
Date: Tue Apr 12 14:26:53 2016 -0600
locale.c: Add some debugging statements
M locale.c
commit 5792d0511b1f719e0c9001c6701a0617ffb9f800
Author: Karl Williamson <[email protected]>
Date: Tue Apr 12 14:19:21 2016 -0600
locale.c: Fix some debugging so will output during initialization
Because the command line options are currently parsed after the locale
initialization is done, an environment variable is read to allow
debugging of the function that is called to do the initialization.
However, any functions that it calls, prior to this commit, were unaware
of this and so did not output debugging. This commit fixes most of
them.
M locale.c
commit 1ae74422490af32b77cf3ce57b30f437a7d9baaa
Author: Karl Williamson <[email protected]>
Date: Tue Apr 12 13:54:32 2016 -0600
perllocale: Document collation changes
M pod/perllocale.pod
commit 57b960281d7470c0570e8e537f8268c57fbb5396
Author: Karl Williamson <[email protected]>
Date: Tue Apr 12 13:51:48 2016 -0600
perllocale: Change headings so two aren't identical
Two html anchors in this pod were identical, which isn't a problme
unless you try to link to one of them.
M pod/perllocale.pod
commit 1f626206e162c5d37283028893f785a9778f42fd
Author: Karl Williamson <[email protected]>
Date: Tue Apr 12 12:49:36 2016 -0600
mv function from locale.c to mathoms.c
The previous function causes this function being moved to be just a
wrapper not called in core. Just in case someone is calling it, it is
retained, but moved to mathoms.c
M embed.fnc
M embed.h
M locale.c
M mathoms.c
M proto.h
commit adeaaf7c529820737a109f62d294f1832796cb8e
Author: Karl Williamson <[email protected]>
Date: Tue Apr 12 12:17:48 2016 -0600
Do better locale collation in UTF-8 locales
strxfrm() works reasonably well on some platforms under UTF-8 locales.
It will assume that every string passed to it is in UTF-8. This commit
changes perl to make sure that strxfrm's expectations are met.
Likewise under a non-UTF-8 locale, strxfrm is expecting a non-UTF-8
string. And this commit makes sure of that. If the passed string
contains code points representable only in UTF-8, they are changed into
the highest collating code point that doesn't require UTF-8. This
provides seamless operation, as they end up collating after every
non-UTF-8 code point. If two transformed strings compare equal, perl
already uses the un-transformed versions to break ties, and there, these
faked-up strings will collate after everything else, and in code point
order amongst themselves.
M embed.fnc
M embed.h
M embedvar.h
M intrpvar.h
M locale.c
M proto.h
M sv.c
commit f1ee08ff50ca7c01f4a053ca54b84de1333b0b15
Author: Karl Williamson <[email protected]>
Date: Tue Apr 12 11:21:40 2016 -0600
XXX delta, RT Change calculation of locale collation constants
Every time a new collation locale is set, two constants are calculated
that are used in an equation determining how much space to pre-allocate
for the results of strxfrm() on strings that need to be collated. If
these are too small, it is not the end of the world, as we will increase
the space until we have enough. But each time we do, we have to throw
away an expensive strxfrm() result. So it's good to get these constants
set so that rarely happens.
The transformed string is roughly linear with the the length of the
input string, so we are calcaulating 'm' and 'b' such that
transformed_length = m * input_length + b
Prior to this commit, the calculation was not rigorous, and failed on
some platforms that don't have a fully conforming strxfrm().
This commit changes to not panic if a locale has an apparent defective
collation, but instead silently ignores it. It could be argued that a
warning should instead be raised.
This commit fixes [perl #121734].
M locale.c
commit 5d93ccf08d996f0a4dd2b4531e1c2a9cc89b70e2
Author: Karl Williamson <[email protected]>
Date: Mon Apr 11 19:11:07 2016 -0600
locale.c: Change algorithm for strxfrm() trials
It's kind of guess work deciding how big a buffer to give to strxfrm().
If you give it too small a one, it will fail. Prior to this commit, the
buffer size was doubled and then strxfrm() was called again, looping
until it worked, or we used too much memory.
Each time a new locale is made, we try to minimize the necessity of
doing this by calculating numbers 'm' and 'b' that can be plugged into
the equation
mx + b
where 'x' is the size of the string passed to strxfrm(). strxfrm() is
roughly linear with respect to its input's length, so this generally
works without us having to do many loops to get a large enough size.
But on many systems, strxfrm(), in failing, returns how much space you
should have given it. On such systems, we can just use that number on
the 2nd try and not have to keep guessing. This commit changes to do
that.
But on other systems this doesn't work. So the original method is
retained if the 2nd try didn't work (or the return value of the original
strxfrm() is such that we know immediately that it isn't well behaved).
M locale.c
commit e2a94edef234da512f0ad3df10217b990046488b
Author: Karl Williamson <[email protected]>
Date: Sat Apr 9 20:40:48 2016 -0600
locale.c: Free over-allocated space early
We may over malloc some space in buffers to strxfrm(). This frees it
now instead of waiting for the whole block to be freed sometime later.
This can be a significant amount of memory if the input string to
strxfrm() is long.
M locale.c
commit 11211b79d87b982b6895c9217264f2ba0d2c271f
Author: Karl Williamson <[email protected]>
Date: Sat Apr 9 20:36:01 2016 -0600
locale.c: White-space only
Outdent and reflow because the previous commit removed an enclosing
block.
M locale.c
commit adabade5bc229c24a9b905b954089ee1cca6cd9b
Author: Karl Williamson <[email protected]>
Date: Sat Apr 9 15:52:05 2016 -0600
Use different algorithm in mem_collxfrm() to handle embedded NULs
Perl uses strxfrm() to handle collation. This C library function
expects a NUL-terminated input string. But Perl accepts interior NUL
charaters, so something has to happen.
Until this commit, what happened was that each NUL-terminated
sub-segment would be individually passed to strxfrm(), with the results
concatenated together to form the transformation of the whole string
with NULs ignored. But this isn't guaranteed to give good results, as
strxfrm() is highly context sensitive, and needs the whole string, not
segments, to work properly. The result of strxfrm() is likely to be
something like several modified copies of the entire input string
concatenated together, with the first copy being the primary collation
weights, the second being the secondary weights, etc. Giving strxfrm()
only substrings defeats this.
Another possibility is to just remove the NULs before transforming the
string. The problem with this method is that in some locales, two
adjacent characters can behave differently than if they were separated,
so removing the NUL screws up the context strxfrm() may need.
What this commit does is to change to replace each NUL with a \001.
This is almost certainly going to behave like we expect a NUL would if
it were legal. Just about every locale treats low code points as
controls, to be ignored in primary weighting, and perhaps secondary as
well.
If two strings compare identically, and one had initially NULs, and the
other \001's, then the tie breaker is to compare the original strings,
so the NUL string would sort earlier than the \001 one. Hence this
method gives the desired results.
As stated in the comments, we could go through the first 256 code points
to determine the lowest collating one, instead of assuming it is \001.
But this is a lot of work (UTF-8ness must be considered) and it will be
extremely rare that the answer isn't going to be \001.
M embed.fnc
M locale.c
M proto.h
commit 2a8015617e20907ee4bd9922556f824480d731fa
Author: Karl Williamson <[email protected]>
Date: Sat Apr 9 15:03:48 2016 -0600
locale.c, sv.c: Add some comments
And a couple empty lines
M locale.c
M sv.c
commit 5c18318891d860f529f46836efb0bc4c15fa658f
Author: Karl Williamson <[email protected]>
Date: Sat Apr 9 15:16:59 2016 -0600
locale.c: Some nano-optimizations
Reorder two branches so the most likely is tested before the much less
likely, and add some UNLIKELY()
M locale.c
commit d528ec795fa236f7e9a7b1aa1c4f7dce60b8c1aa
Author: Karl Williamson <[email protected]>
Date: Sat Apr 9 14:47:21 2016 -0600
locale.c: Clarify a debugging statement
M locale.c
commit 03164b4cd079dd513a56a3a7d07079817b7eaaef
Author: Karl Williamson <[email protected]>
Date: Fri Apr 8 13:46:24 2016 -0600
XXX 5.25 strxfrm, cautions
M ext/POSIX/lib/POSIX.pod
commit 6c2d84be46283279182378415f03e9fc1f21cf39
Author: Tony Cook <[email protected]>
Date: Mon Mar 21 12:12:58 2016 +1100
add d_duplocale and i_locale Configure probes
M Configure
M Cross/config.sh-arm-linux
M NetWare/config.wc
M Porting/Glossary
M Porting/config.sh
M config_h.SH
M configure.com
M plan9/config_sh.sample
M symbian/config.sh
M uconfig.h
M uconfig.sh
M uconfig64.sh
M win32/config.ce
M win32/config.gc
M win32/config.vc
-----------------------------------------------------------------------
--
Perl5 Master Repository