In perl.git, the branch smoke-me/khw-core has been created <https://perl5.git.perl.org/perl.git/commitdiff/743b6ca19e5c571abec4cc8b5e0877ea30af537b?hp=0000000000000000000000000000000000000000>
at 743b6ca19e5c571abec4cc8b5e0877ea30af537b (commit) - Log ----------------------------------------------------------------- commit 743b6ca19e5c571abec4cc8b5e0877ea30af537b Author: Karl Williamson <k...@cpan.org> Date: Sun May 6 09:08:06 2018 -0600 t/porting/regen.t: Add test for new uni_keywords.h commit 30cb9b0fa4b3514254f5f39a016cd2ac94e08d34 Author: Karl Williamson <k...@cpan.org> Date: Sat May 5 22:07:55 2018 -0600 regen/mk_invlists.pl: Fix outdated comments commit 9dc5476bf6017820fd9a88fcde48a8961486850d Author: Karl Williamson <k...@cpan.org> Date: Sat May 5 21:21:45 2018 -0600 regen/mk_invlists.pl: use re 'qr/aa' This makes sure that all patterns in this file are compiled under /aa. Doing this can catch bugs. The bug the previous commit fixes would have been caught if we did this. commit a4783fb2193e2c3dab8296d30cb4ea657d0451f1 Author: Karl Williamson <k...@cpan.org> Date: Sat May 5 20:46:21 2018 -0600 regen/mk_invlists.pl: Fix chicken and egg problem The problem here is that it was using a regular expression pattern to determine if a code point is the integer 0. When a new Unicode release comes along and adds a new block of decimals, this routine should be run before the interpreter is compiled for real. And the pattern won't know about the new block, so this would fail. Solve the problem by using only Unicode::UCD to discover this info, and not a pattern. commit 1158b614b7a19bf10b1740bd47b6a4e2e3d5fb35 Author: Karl Williamson <k...@cpan.org> Date: Sat May 5 19:53:18 2018 -0600 mktables: Add, change some comments commit a390add465d5df4ce84256ae8b402e51d4fa61eb Author: Karl Williamson <k...@cpan.org> Date: Sat May 5 12:13:37 2018 -0600 utf8.c: Use a more generic enum instead of explicit ptr This changes, where possible, the reference to an inversion list, from its specific name, to using an enum value (or a #define to an enum value) which is an offset into a list of inversion lists. This seems slightly more robust to me, as we don't have to know the precise name of the table, but can use an enum which may have #define's for it to create synonyms. Some versions of Unicode may not have the precise name, but regen/mk_invlists.pl creates synonyms where possible, so the chances of it being undefined go down. Currently there is an inconsistency in the tables' names. Some recent ones all begin with 'PL_'. That was when I thought these tables were all going to be public. But then it turned out that they could just be defined in one file (utf8.c), so the prefix is probably unnecessary. Older tables didn't have that, and haven't changed. I'm not sure how it will or should turn out. commit dff12247092e76f6af7388eeabf5e0880c8c4ce8 Author: Karl Williamson <k...@cpan.org> Date: Sat May 5 12:01:27 2018 -0600 utf8.c: Reorder some initialization code This puts the code into various related groups. commit a9022eee5e4a513e078bc8fc3e97ea8100b199cf Author: Karl Williamson <k...@cpan.org> Date: Sat May 5 11:38:18 2018 -0600 utf8.c: Fix \p{} work on old Unicodes This change to use one #define instead of a synonym causes the code to work unchanged on any Unicode version. The synonym isn't defined in very old Unicodes, so this wouldn't compile for them. commit 2ee52ad108a5adb49858ef16a001fc3db76bb52e Author: Karl Williamson <k...@cpan.org> Date: Sat May 5 11:28:09 2018 -0600 utf8.c: qr/\p{}/ Handle Unihan numeric properties The Unihan data base is not shipped with perl due to its size. But we allow someone to copy its files into the unicore directory and recompile perl in order to get access to its properties. Some of those properties are numeric, which, like the nv property, require special handling in utf8.c. This commit adds that handling. commit be0a00edd128ba8dbb96dc82f0de34adc5236851 Author: Karl Williamson <k...@cpan.org> Date: Fri May 4 22:25:54 2018 -0600 mktables: Handle cjkiicore properly This property is not normally compiled by perl, but an installation may choose to use it. It was failing some tests because this is a special property that is like a perl dual-var. It is both binary, and non-binary, and commit 346f9bfbe12 forgot that. commit 1caeedf042be23d2105bc83261c80fdcbbd6206b Author: Karl Williamson <k...@cpan.org> Date: Fri May 4 21:26:31 2018 -0600 PATCH: [perl #133175] script run free from wrong pool panic Setting the pointer to NULL after freeing signals the code in later interations that it has been freed already commit 045b2702237f86159a6997fed6a95163dc5ebfe1 Author: Karl Williamson <k...@cpan.org> Date: Tue May 1 17:26:42 2018 -0600 regen/mk_invlists.pl: Fix-ups for early Unicode versions In some of these, certain properties aren't defined yet, so have no entries. Just add a check for that, and compensate. commit 2d6d50a0b32ce180cab03c5384ba1d1f534d0549 Author: Karl Williamson <k...@cpan.org> Date: Tue May 1 16:42:29 2018 -0600 regcomp.c: Simplify Under /a pattern matching, the matches of the [:posix:] classes are restricted to the ASCII range. Previously, in a time/space trade-off that favored space, we created the list of matching characters at pattern compilation time by ANDing the full-range Posix class with the set of ASCII characters. But now, the tables for just the ASCII-range classes are generated anyway, so there's no need to do that compilation-time intersection. This slightly simplifies the code. commit eaf915a33ac6b66097fbd6f5a5c0f872cf0cc67f Author: Karl Williamson <k...@cpan.org> Date: Tue May 1 15:47:11 2018 -0600 mktables: Add guard against Unicode breakage This adds a check that a new Unicode version doesn't create a rational number that is too close to a current rational for our existing floating point precision. Should this happen, we can increase the precision we use. commit eebe620ad6c29a0eaa2ea2827b56dcd248458978 Author: Karl Williamson <k...@cpan.org> Date: Tue May 1 15:24:19 2018 -0600 Add tests for qr/\p{}/ This adds tests for nv=integer, where 'integer' is expressed in %e. commit 65a7e777edd2c114c05e7e40adacdb0f5e5b0d3b Author: Karl Williamson <k...@cpan.org> Date: Mon Apr 30 19:05:54 2018 -0600 utf8.c: Handle qr!\p{nv=6/8}! I thought this worked before, but it turns out it never did. This commit allows the rational number specified in looking up the Numeric Value property to not be in lowest possible terms. Unicode even furnishes some of its data in non-lowest form, so we should accept this. commit 809b6b625641ed8ab3e9d1d0e21543a70892c60e Author: Karl Williamson <k...@cpan.org> Date: Mon Apr 30 10:39:46 2018 -0600 utf8.c: Use \p{nv=float} Now that the float data is available to us (in the previous commit), we can take advantage of it, and avoid swash creation. We just use the perl atof() to convert the input string to an NV, and then convert back to a string, but in guaranteed canonical form. Then we look that up. commit d46e9c5da76a5ccdc8d1c7b7a8a3c9aaa1db0dc6 Author: Karl Williamson <k...@cpan.org> Date: Thu Apr 26 12:29:54 2018 -0600 regen/mk_invlists.pl: Add \p{nv=float} data The previous commit revised how nv=float is handled. This commit adds data for handling that to charclass_invlists.h, so that the next commit can use that and avoid swash creation. commit 8f05e6682fb6f6b36d6ed9eda8e21bd40c100c77 Author: Karl Williamson <k...@cpan.org> Date: Sun Apr 29 21:08:37 2018 -0600 Revise \p{nv=float} lookup The Numeric Value property allows one to find all code points that have a certain numeric value. An example would be to match against any character in any of the world's scripts which is effectively equivalent to the digit zero. It is documented that we accept either integers (like \p{nv=9}) or rationals (like \p{nv=1/2}). But we also accept floating point representations in case a conversion to numeric has happened. I think it is right that we not document these and their vagaries. One reason is that Unicode might someday create a new rational number that, to the precision we currently accept, is indistinguishable from an existing one, so that we would have to increase the precision. But there was a bug I introduced years ago. I thought that in order for a float to be considered to match a close rational, that 3 significant digits of precision would be needed, like .667 to match 2/3. That still seems reasonable. But I didn't implement that concept. Instead, prior to this commit, it was 3 (not necessarily significant) digits, so that for 1/160, it would match .001. This commit corrects that, and makes the lookup simpler. mktables will use sprintf %e to get the number normalized and having the 3 signicant digits required. At runtime, a floating number is normalized using the same format, and the result looked up in a hash. This eliminates the need to worry about matching within some epsilon. Further simplifications in utf8_heavy.pl are achieved by making a more precise definition as to what an acceptable number looks like, so we don't have to check later to see if what matched really was one. commit e85cc6749f4f936c9f8a79e2b69a92ffc81bc6ad Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 21:18:59 2018 -0600 regen/mk_invlists.pl: Add to list of props to keep together Using the same idea as pp_hot.c, the Unicode properties actually used by perl are attempted to be kept together so that paging in one is likely to page in others. A few were omitted prior to this commit. commit db6e1c8d2febafb0cb0a3bf95c73ee7a990e73b4 Author: Karl Williamson <k...@cpan.org> Date: Thu Apr 26 02:08:53 2018 -0600 regen/mk_invlists.pl: Create synonyms for perl props This allows our code to not have to be so precise as to which alias for a property it uses. commit 0bd2e10de9e2742b6693f3f530c35e0e0997a913 Author: Karl Williamson <k...@cpan.org> Date: Thu Apr 26 02:02:05 2018 -0600 regen/mk_invlists.pl: Prefer certain property names This sorts various properties to be first, so that there names will be used instead of others. This gives more stability to the core using particular names: a new version of the Unicode standard is less likely to come up with a different name, which, if it did, the core would have to change to use it. The preferred names are available in all Unicode versions commit e2bff3f50b19eac6a4d9399e28a408752818492f Author: Karl Williamson <k...@cpan.org> Date: Thu Apr 26 01:55:34 2018 -0600 regen/mk_invlists.pl: Add comment commit 27ed9f2c200ef58e907156bcb75f621f77062921 Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 20:58:47 2018 -0600 regen/mk_invlists.pl: Remove some unnecessary #if's Things aren't actually getting switched here, so no need for them. commit 165eb7d2cebe5425d959575d23a4b14a183e8912 Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 16:53:07 2018 -0600 regen/mk_invlists.pl: Change die into warning I found an instance in compiling early Unicode releases where this circumstance is legitimate commit 1fa927891ac65108a5dcdf334a2117db0fd56861 Author: Karl Williamson <k...@cpan.org> Date: Mon Apr 30 10:06:14 2018 -0600 utf8.c: Use menominic variable name commit 79ccbc27822114986daa681d60477613e3f9eff8 Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 16:51:22 2018 -0600 utf8.c: Fix typo in comment commit 400c63b7af065f29f610f9d01bc419803192fc3b Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 16:36:09 2018 -0600 regen/mk_invlists.pl: Slight speed up Instead of checking each time if an element already exists in an array before adding it, just add it, and afterwards remove all redundant ones. commit 89cebcf3f936d57870ac5b6d1a1c81a12575a88b Author: Karl Williamson <k...@cpan.org> Date: Sun Apr 29 21:14:48 2018 -0600 utf8.c: Use variable instead of repeating expression Set a variable to the result of this expression which is used in multiple places. commit 73b559ca79e4db0d3bb97eedfa3b5ec15c76017c Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 16:20:50 2018 -0600 Remove support for qr/\p{_CanonDCIJ} This is the third and final obsolete property that is being removed in 3 sequential commits. The property is not used in cpan, and is being removed as part of the cleanup instigated because another of the 3 would require extra code to handle if we were to keep it around. commit fc4634d310fe592d81eb37ea4fab005bca7e6eac Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 14:21:04 2018 -0600 Remove support for qr/\p{_Comb_Above}/ This property is no longer used in the core, nor in cpan, and is marked as for core use only, not necessarily stable. I have kept it around because it was work to remove it, but now the revamping of the property lookup scheme was causing failures with a similar property, and the previous commit removed that one. There are just three of these properties, and I think it's time to remove support for all three. The next commit will do the same for the third one. commit 832ef61c1992178f259f0b2a809e48049ae27268 Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 15:07:14 2018 -0600 Remove qr/\p{_Case_Ignorable}/ This property is no longer used in the core, nor in cpan, and is marked as for core use only, not necessarily stable. I have kept it around because it was work to remove it, but now the revamping of the property lookup scheme was causing failures with it, when compiling on early Unicode releases. That could be fixed with extra work, but simply removing it also fixes the problem and avoids future maintenance costs. commit 3f5c80b30e8d4b39848092342fb85a885ec23c79 Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 13:27:34 2018 -0600 qr/\p{...}/: Rmv redundant text from warning msg detail This text is emitted when compiling a pattern using a deprecated property. The text is added detail to the main text of the message (which isn't changing), and is redundant because it just says it's deprecated, and the main message already says that. commit 3abb0d51aca7c14ecf09dddcbdff99047e2dc4c0 Author: Karl Williamson <k...@cpan.org> Date: Wed Apr 25 12:49:19 2018 -0600 Unicode::UCD: Avoid uninit message I found a case where this array can be empty, so add a test for that to avoid trying to look at the first (non-existent) element. commit 2e22d131ecdebaf5dbb9989144b504ea1717e9dc Author: Karl Williamson <k...@cpan.org> Date: Tue Apr 24 22:02:21 2018 -0600 regen/mph.pl: Add comment to generated code That code is uni_keywords.h commit a1bc151896b0085b96b271807c121fc568f5e98b Author: Karl Williamson <k...@cpan.org> Date: Fri Apr 20 11:37:20 2018 -0600 regen/charset_translations.pl: #if indent is 2 spaces It was instead making it 3,7,11... commit 9d2c511a683d948c0bb441b0746aa25904357649 Author: Karl Williamson <k...@cpan.org> Date: Fri Apr 20 11:26:27 2018 -0600 Make the SCX enums public These enums are scheduled to be used outside the files that they now are defined in. commit 6b11eb64d8b9a4bbc645d0b8a2a33c9f71f4e933 Author: Karl Williamson <k...@cpan.org> Date: Tue Apr 24 17:46:03 2018 -0600 regen/mk_invlists.pl: Omit unnecessary #if's In places, #endifs were unconditionally added followed by the same #ifdef they just ended. commit e9fecd5cdb4309ccd8534bfb2041a633a79c1b99 Author: Karl Williamson <k...@cpan.org> Date: Fri Apr 20 10:59:40 2018 -0600 regen/mk_invlists.pl: uni_keywords.c no longer exists So no need to do an #ifdef for it. commit b17fe1e1bc673bf480c24f2cb1a390b75e889c4f Author: Karl Williamson <k...@cpan.org> Date: Tue Apr 24 17:00:13 2018 -0600 regen/mk_invlists.pl: depends on mk_PL_charclass.pl The previous 2 commits show that this script is subtly dependent on mk_PL_charclass.pl. Make that explicit. commit c83f3e932b67430b6a26dacf8558837064d66b16 Author: Karl Williamson <k...@cpan.org> Date: Tue Apr 24 16:55:40 2018 -0600 regen/mk_PL_charclass.pl: White-space only Outdent code that had its surrounding block removed commit 31aba025ab59e5093a40adeb656cbba0914f3cb6 Author: Karl Williamson <k...@cpan.org> Date: Tue Apr 24 16:47:58 2018 -0600 regen/mk_PL_charclass.pl: Revamp The change in 5.28 to having precompiled Unicode properties leaves this program with a chicken-and-egg problem. Prior to this commit, it used those properties to construct its output, relying on them to be using the latest Unicode data, but the code that generates the tables from that data uses the output of this program, with potentially disastrous results. This commit changes to use the data itself, through Unicode::UCD. commit abb222f5fa50d491645c4e65c57241ecdc24d6a2 Author: Karl Williamson <k...@cpan.org> Date: Tue Apr 24 15:30:05 2018 -0600 regen/mk_PL_charclass.pl: sort output table This makes it easier to verify that future commits don't change anything. commit b9e05e5b69eba85761699fad9724fccf9d8f922a Author: Karl Williamson <k...@cpan.org> Date: Tue May 1 16:11:39 2018 -0600 numeric.c: White-space only Outdent after the previous commit removed an enclosing block commit d27b3c3e8b6b018f11801187efc2342f8290e8bb Author: Karl Williamson <k...@cpan.org> Date: Tue May 1 14:23:23 2018 -0600 grok_atoUV: allow non-C strings and document This changes the internal function grok_atoUV() to not require its input to be NUL-terminated. That means the existing calls to it must be changed to set the ending position before calling it, as some did already. This function is recommended to use in a couple of pods, but it wasn't documented in perlintern. This commit does that as well. commit 64f27c42a5394d1a58684ea72063ce725b52eeae Author: Karl Williamson <k...@cpan.org> Date: Mon Apr 30 10:46:01 2018 -0600 Create my_atof3() This is like my_atof2(), but with an extra argument signifying the length of the input string to parse. If that length is 0, it uses strlen() to determine it. Then my_atof2() just calls my_atof3() with a zero final parameter. And this commit just uses the bulk of the current my_atof2() as the core of my_atof3(). Changes were needed however, because it relied on NUL-termination in a number of places. This allows one to convert a string that isn't necessarily NUL-terminated to an NV. commit 716d5b7246bf2d7dc430c22b0ad7a74423d1f296 Author: Karl Williamson <k...@cpan.org> Date: Mon Apr 30 11:48:46 2018 -0600 embed.fnc: Fix my_atof2() entry This was using the incorrect formal parameter name. It did not generate an error because the function declares a variable with the incorrect name, so that this was actually asserting on the wrong thing. ----------------------------------------------------------------------- -- Perl5 Master Repository