In perl.git, the branch smoke-me/khw-core has been created <https://perl5.git.perl.org/perl.git/commitdiff/23b3611f082ae3fc4986158b381cf5a7d9922374?hp=0000000000000000000000000000000000000000>
at 23b3611f082ae3fc4986158b381cf5a7d9922374 (commit) - Log ----------------------------------------------------------------- commit 23b3611f082ae3fc4986158b381cf5a7d9922374 Author: Karl Williamson <k...@cpan.org> Date: Tue May 1 16:42:29 2018 -0600 regcomp.c: Simplify Under /a pattern matching, the matches of the [:posix:] classes are restricted to the ASCII range. Previously, in a time/space trade-off that favored space, we created the list of matching characters at pattern compilation time by ANDing the full-range Posix class with the set of ASCII characters. But now, the tables for just the ASCII-range classes are generated anyway, so there's no need to do that compilation-time intersection. This slightly simplifies the code. commit 0cf3c7f63232b206e9a9183e706337a70960ccb7 Author: Karl Williamson <k...@cpan.org> Date: Tue May 1 15:47:11 2018 -0600 mktables: Add guard against Unicode breakage This adds a check that a new Unicode version doesn't create a rational number that is too close to a current rational for our existing floating point precision. Should this happen, we can increase the precision we use. commit 2939641d324c8fd3b458434e380b48198b55be59 Author: Karl Williamson <k...@cpan.org> Date: Tue May 1 15:24:19 2018 -0600 Add tests for qr/\p{}/ This adds tests for nv=integer, where 'integer' is expressed in %e. commit d8243a85e2e75430780629e62259bb5b9bd1fdc4 Author: Karl Williamson <k...@cpan.org> Date: Mon Apr 30 19:05:54 2018 -0600 utf8.c: Handle qr!\p{nv=6/8}! I thought this worked before, but it turns out it never did. This commit allows the rational number specified in looking up the Numeric Value property to not be in lowest possible terms. Unicode even furnishes some of its data in non-lowest form, so we should accept this. commit 8e21e26f5c7624a19d62dc78ead914e42f2ff6c3 Author: Karl Williamson <k...@cpan.org> Date: Mon Apr 30 10:39:46 2018 -0600 utf8.c: Use \p{nv=float} Now that the float data is available to us (in the previous commit), we can take advantage of it, and avoid swash creation. We just use the perl atof() to convert the input string to an NV, and then convert back to a string, but in guaranteed canonical form. Then we look that up. commit 2b5db90dbdc0fa0e4c630c007f8c81509c571263 Author: Karl Williamson <k...@cpan.org> Date: Thu Apr 26 12:29:54 2018 -0600 regen/mk_invlists.pl: Add \p{nv=float} data The previous commit revised how nv=float is handled. This commit adds data for handling that to charclass_invlists.h, so that the next commit can use that and avoid swash creation. commit 2def2c2f2ae03f4a1d67861742326b5ad00f7adf Author: Karl Williamson <k...@cpan.org> Date: Sun Apr 29 21:08:37 2018 -0600 Revise \p{nv=float} lookup The Numeric Value property allows one to find all code points that have a certain numeric value. An example would be to match against any character in any of the world's scripts which is effectively equivalent to the digit zero. It is documented that we accept either integers (like \p{nv=9}) or rationals (like \p{nv=1/2}). But we also accept floating point representations in case a conversion to numeric has happened. I think it is right that we not document these and their vagaries. One reason is that Unicode might someday create a new rational number that, to the precision we currently accept, is indistinguishable from an existing one, so that we would have to increase the precision. But there was a bug I introduced years ago. I thought that in order for a float to be considered to match a close rational, that 3 significant digits of precision would be needed, like .667 to match 2/3. That still seems reasonable. But I didn't implement that concept. Instead, prior to this commit, it was 3 (not necessarily significant) digits, so that for 1/160, it would match .001. This commit corrects that, and makes the lookup simpler. mktables will use sprintf %e to get the number normalized and having the 3 signicant digits required. At runtime, a floating number is normalized using the same format, and the result looked up in a hash. This eliminates the need to worry about matching within some epsilon. Further simplifications in utf8_heavy.pl are achieved by making a more precise definition as to what an acceptable number looks like, so we don't have to check later to see if what matched really was one. commit 836b8d71bd21ea31e50b480a9acda0ae03b07e19 Author: Karl Williamson <k...@cpan.org> Date: Fri Apr 20 00:41:18 2018 -0600 regcomp commit c4dc784f6dbbca3808bcfc100bf053a0638db76e Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 21:18:59 2018 -0600 regen/mk_invlists.pl: Add to list of props to keep together Using the same idea as pp_hot.c, the Unicode properties actually used by perl are attempted to be kept together so that paging in one is likely to page in others. A few were omitted prior to this commit. commit ea68e4ed1b659815d12acc63eb886b9fdf3f6e69 Author: Karl Williamson <k...@cpan.org> Date: Thu Apr 26 02:08:53 2018 -0600 regen/mk_invlists.pl: Create synonyms for perl props This allows our code to not have to be so precise as to which alias for a property it uses. commit fd60db6d4d73b165dbe4b3f4401bcd570afd80f0 Author: Karl Williamson <k...@cpan.org> Date: Thu Apr 26 02:02:05 2018 -0600 regen/mk_invlists.pl: Prefer certain property names This sorts various properties to be first, so that there names will be used instead of others. This gives more stability to the core using particular names: a new version of the Unicode standard is less likely to come up with a different name, which, if it did, the core would have to change to use it. The preferred names are available in all Unicode versions commit d6d5ba4fcaf2baf491c80e08437f1db2baa1e131 Author: Karl Williamson <k...@cpan.org> Date: Thu Apr 26 01:55:34 2018 -0600 regen/mk_invlists.pl: Add comment commit fab88b70309d60aeae3116088050c6ab84fe0d9b Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 20:58:47 2018 -0600 regen/mk_invlists.pl: Remove some unnecessary #if's Things aren't actually getting switched here, so no need for them. commit 715bb74edaf78c9abde56723ad4d66a7d5f6a58f Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 16:53:07 2018 -0600 regen/mk_invlists.pl: Change die into warning I found an instance in compiling early Unicode releases where this circumstance is legitimate commit 98be866c74dbd61afaa1bc554ca19909ef0de773 Author: Karl Williamson <k...@cpan.org> Date: Mon Apr 30 10:06:14 2018 -0600 utf8.c: Use menominic variable name commit 78184b20c38761075dd4ba31208674692a5c499e Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 16:51:22 2018 -0600 utf8.c: Fix typo in comment commit d2bbe444770eef30fdeb1ae5c305aa3f75d080c1 Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 16:36:09 2018 -0600 regen/mk_invlists.pl: Slight speed up Instead of checking each time if an element already exists in an array before adding it, just add it, and afterwards remove all redundant ones. commit 242322576b65ac56975ae744b89b8e750473d03a Author: Karl Williamson <k...@cpan.org> Date: Sun Apr 29 21:14:48 2018 -0600 utf8.c: Use variable instead of repeating expression Set a variable to the result of this expression which is used in multiple places. commit d33babddc311858e78e06726a0eb2923444fa3d4 Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 16:20:50 2018 -0600 Remove support for qr/\p{_CanonDCIJ} This is the third and final obsolete property that is being removed in 3 sequential commits. The property is not used in cpan, and is being removed as part of the cleanup instigated because another of the 3 would require extra code to handle if we were to keep it around. commit 491210fed09dadfd6e1f2c33e82cbb927e97df48 Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 14:21:04 2018 -0600 Remove support for qr/\p{_Comb_Above}/ This property is no longer used in the core, nor in cpan, and is marked as for core use only, not necessarily stable. I have kept it around because it was work to remove it, but now the revamping of the property lookup scheme was causing failures with a similar property, and the previous commit removed that one. There are just three of these properties, and I think it's time to remove support for all three. The next commit will do the same for the third one. commit 7b9d27be749f2d2252aa3b2853fa1711d92ad73c Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 15:07:14 2018 -0600 Remove qr/\p{_Case_Ignorable}/ This property is no longer used in the core, nor in cpan, and is marked as for core use only, not necessarily stable. I have kept it around because it was work to remove it, but now the revamping of the property lookup scheme was causing failures with it, when compiling on early Unicode releases. That could be fixed with extra work, but simply removing it also fixes the problem and avoids future maintenance costs. commit 19c6ad3d5c324785b717793ce32f92e6f702932a Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 13:27:34 2018 -0600 qr/\p{...}/: Rmv redundant text from warning msg detail This text is emitted when compiling a pattern using a deprecated property. The text is added detail to the main text of the message (which isn't changing), and is redundant because it just says it's deprecated, and the main message already says that. commit 7331af34460b6011ca240c5b6e9862c0c9d69507 Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 12:49:19 2018 -0600 Unicode::UCD: Avoid uninit message I found a case where this array can be empty, so add a test for that to avoid trying to look at the first (non-existent) element. commit c166c77e6e7fda22f13942ae4d65362166b98c26 Author: Karl Williamson <k...@cpan.org> Date: Tue Apr 24 22:02:21 2018 -0600 regen/mph.pl: Add comment to generated code That code is uni_keywords.h commit b9ad5a53cc61a3f9558cb399331b3dc23a35fbdf Author: Karl Williamson <k...@cpan.org> Date: Fri Apr 20 11:37:20 2018 -0600 regen/charset_translations.pl: #if indent is 2 spaces It was instead making it 3,7,11... commit e5fbb3016e2c63731f67114a1b73e9c340f89ae8 Author: Karl Williamson <k...@cpan.org> Date: Fri Apr 20 11:26:27 2018 -0600 Make the SCX enums public These enums are scheduled to be used outside the files that they now are defined in. commit 7be7f7dd26c6f9cae7832d53ce45d0b6493e65d3 Author: Karl Williamson <k...@cpan.org> Date: Tue Apr 24 17:46:03 2018 -0600 regen/mk_invlists.pl: Omit unnecessary #if's In places, #endifs were unconditionally added followed by the same #ifdef they just ended. commit 79804a5c1a71873d0c59db6097efaa48ce33b2aa Author: Karl Williamson <k...@cpan.org> Date: Fri Apr 20 10:59:40 2018 -0600 regen/mk_invlists.pl: uni_keywords.c no longer exists So no need to do an #ifdef for it. commit d3dfd995a49e849d93d685ab52fc320aca683f2d Author: Karl Williamson <k...@cpan.org> Date: Tue Apr 24 17:00:13 2018 -0600 regen/mk_invlists.pl: depends on mk_PL_charclass.pl The previous 2 commits show that this script is subtly dependent on mk_PL_charclass.pl. Make that explicit. commit ce6533188f8bbb47209d0429c9a9b97b5ced4409 Author: Karl Williamson <k...@cpan.org> Date: Tue Apr 24 16:55:40 2018 -0600 regen/mk_PL_charclass.pl: White-space only Outdent code that had its surrounding block removed commit 790ba44940a2b8fcbca841fad5654a196904165a Author: Karl Williamson <k...@cpan.org> Date: Tue Apr 24 16:47:58 2018 -0600 regen/mk_PL_charclass.pl: Revamp The change in 5.28 to having precompiled Unicode properties leaves this program with a chicken-and-egg problem. Prior to this commit, it used those properties to construct its output, relying on them to be using the latest Unicode data, but the code that generates the tables from that data uses the output of this program, with potentially disastrous results. This commit changes to use the data itself, through Unicode::UCD. commit 7bf806ffc70fb097ff5883e5a5743feb59f5bf7b Author: Karl Williamson <k...@cpan.org> Date: Tue Apr 24 15:30:05 2018 -0600 regen/mk_PL_charclass.pl: sort output table This makes it easier to verify that future commits don't change anything. commit 964897c6177f29f6b1ad44199154b0fae2c437a6 Author: Karl Williamson <k...@cpan.org> Date: Tue May 1 16:11:39 2018 -0600 numeric.c: White-space only Outdent after the previous commit removed an enclosing block commit f2d30f819670243f226aca4f7d23665ee4040b66 Author: Karl Williamson <k...@cpan.org> Date: Tue May 1 14:23:23 2018 -0600 grok_atoUV: allow non-C strings and document This changes the internal function grok_atoUV() to not require its input to be NUL-terminated. That means the existing calls to it must be changed to set the ending position before calling it, as some did already. This function is recommended to use in a couple of pods, but it wasn't documented in perlintern. This commit does that as well. commit 3ff3d32dc447728230d64a4808069c03486d4166 Author: Karl Williamson <k...@cpan.org> Date: Mon Apr 30 10:46:01 2018 -0600 Create my_atof3() This is like my_atof2(), but with an extra argument signifying the length of the input string to parse. If that length is 0, it uses strlen() to determine it. Then my_atof2() just calls my_atof3() with a zero final parameter. And this commit just uses the bulk of the current my_atof2() as the core of my_atof3(). Changes were needed however, because it relied on NUL-termination in a number of places. This allows one to convert a string that isn't necessarily NUL-terminated to an NV. commit a9fc4579b29d8e3111642c71c6b68a4340640cbe Author: Karl Williamson <k...@cpan.org> Date: Mon Apr 30 11:48:46 2018 -0600 embed.fnc: Fix my_atof2() entry This was using the incorrect formal parameter name. It did not generate an error because the function declares a variable with the incorrect name, so that this was actually asserting on the wrong thing. commit c769ca3d9fc3104c823f5445846c17a9a250f8b9 Author: Karl Williamson <k...@cpan.org> Date: Fri Feb 23 11:18:56 2018 -0700 XXX combine with something else pp.c: Add blank line ----------------------------------------------------------------------- -- Perl5 Master Repository