In perl.git, the branch blead has been updated

<https://perl5.git.perl.org/perl.git/commitdiff/be76079c87db438e1123ff79ee161badcb258605?hp=9e8e4a84c278536d3094b33ba0a7af5b04b31430>

- Log -----------------------------------------------------------------
commit be76079c87db438e1123ff79ee161badcb258605
Merge: 9e8e4a84c2 4c404f2639
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Feb 14 22:14:12 2019 -0700

    Merge branch 'incore' into blead
    
    This branch moves the handling of user-defined \p{} properties from
    lib/utf8_heavy.pl into regcomp.c  (rewriting it in C).  This fixes a
    bunch of bugs, and removes all uses of swashes from regular expression
    compilation and execution.

commit 4c404f263914b5bf989d64b86ad715cc085b84a0
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Aug 21 22:27:19 2018 -0600

    Remove relics of regex swash use
    
    This removes the most obvious and easy things that are no longer needed
    since regexes no longer use swashes at all.
    
    tr/// continues, for the time being, to use swashes, so not all swash
    handling is removable now.  But tr/// doesn't use inversion lists, and
    so a bunch of code is ripped out here.  Other code could have been, but
    I did only the relatively easy stuff.  The rest can be ripped out all at
    once when tr/// is stops using swashes.

commit 4ebed06a4c0245fffc2d13602f9b0373e0d5f49e
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Feb 14 12:34:49 2019 -0700

    Use mnemonics for array indices
    
    The element at say, [0] is a particular thing.  This commit changes to
    use a mnemonic instead of [0], for clarity

commit 2410ba250b11206fd38a8e7d612d247e21ccfb11
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Aug 23 13:54:48 2018 -0600

    regcomp.c: Arrays no longer need PL_sv_undef placeholders
    
    An empty entry is now just NULL.

commit 6ed02bb63446ea2cd1d4561d73bac2adc5484275
Author: Karl Williamson <k...@cpan.org>
Date:   Tue Aug 21 20:12:00 2018 -0600

    regcomp.c: Simplify args passing for ANYOF nodes
    
    A swash is no longer used, so we can remove some elements from the array
    of data that gets stored with the compiled pattern for use in runtime
    matching.  This is the first step in more simplifications.
    
    Since a swash isn't used, this change also requires regexec.c to change
    to use a straight inversion list lookup.  This has the salutary effect
    of eliminating a conversion between code point and UTF-8.

commit 36c2b2aa3853cfaf4237aec66eccdc6faaf9fe58
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Feb 14 12:16:13 2019 -0700

    Add .t for testing user-defined \p{} races

commit a2fe6cf28f613880c73ad6fec4d2cbf4ebe5975f
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Aug 6 17:00:40 2018 -0600

    t/re/regexp_unicode_prop.t: Make sure sub called only once
    
    User-defined properties are supposed to be called just once for /i and
    once for non-/i.  This adds tests for that.
    
    It turns out that this was broken in blead.

commit 3b071feee62d0713bd7e9f33098c084e3ee4fdeb
Author: Karl Williamson <k...@cpan.org>
Date:   Fri Aug 24 12:34:18 2018 -0600

    t/re/regexp_unicode_prop.t: Add tests
    
    Add some tests.  These test various error conditions that haven't been
    tested before.

commit e4f9f79853e160e0e5d0cbde06c1a60e8e85a94e
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Aug 15 17:11:15 2018 -0600

    t/re/regexp_unicode_prop.t: Test that can have nested pkgs
    
    That is, in \p{user-defined}

commit 61ac831b8aceb59292577c3d45b6504952ead7cc
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Aug 15 17:09:45 2018 -0600

    t/re/regexp_unicode_prop.t: Add some stress
    
    This adds some trailing spaces and comments in expansion of
    \p{user-defined}/ to verify things work.

commit c413aa1fa6bdc64c4a1ca5967a4f45c0ea003da8
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Aug 15 17:07:51 2018 -0600

    t/op/taint.t: Add test

commit de5659380e7c1c711fbe4cef1ee09c7460bd5f55
Author: Karl Williamson <k...@cpan.org>
Date:   Thu Aug 23 14:05:29 2018 -0600

    regcomp.c: Add some potential code that's #ifdef'd out
    
    This is in case we ever need it.  This checks for portability in the
    code points specified in user-defined properties.  Previously there was
    a check, but I couldn't get a warning to trigger unless there was also
    overflow.  So that means the pattern compile failed due to the overflow,
    and the portability warning was superfluous.  But, one can have
    non-portable code points without overflow; just the old method didn't
    properly detect them.  If we do ever need to detect and report on them,
    the code is mostly written and in this commit.

commit 73b95840bb1b55d761ec2dd075d2a8c37fa94bf4
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Aug 20 18:31:04 2018 -0600

    Move \p{user-defined} to core from utf8_heavy.pl
    
    This large commit moves the handling of user-defined properties to C
    code.  This should speed it up, but the main reason to do this is to
    stop using swashes in this case, leaving only tr/// using them.  Once
    that too is converted, all swash handling can be ripped out of perl.
    
    Doing this in perl has caused some nasty interactions that will now be
    fixed automatically.
    
    The change is not entirely transparent, however (besides speed and the
    possibility of removing these interactions).  perldelta in this commit
    details these.

commit dd52e3cc434f4c6a495379f06a99d35da217eecb
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Aug 15 16:11:04 2018 -0600

    Add global hash to handle \p{user-defined}
    
    A global hash has to be specially handled.  The keys can't be shared,
    and all the SVs stored into it must be in its thread.  This commit adds
    the hash, and initialization, and macros for context change, but doesn't
    use them.  The code to deal with this is entirely confined to regcomp.c.

commit 8310e7fa48c5bce320e9c36df267f587d84cebce
Author: Karl Williamson <k...@cpan.org>
Date:   Wed Aug 15 15:45:14 2018 -0600

    Add mutex for dealing with qr/\p{user-defined}/
    
    This will be used in future commits

commit 3c5142a9e5aa4720092b95914808c82b46ca78b2
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Aug 6 17:39:35 2018 -0600

    regcomp.c: Add/reword some comments/white-space

commit ba7ca5a8b3ea4d8573f365a77fb525f46e812a19
Author: Karl Williamson <k...@cpan.org>
Date:   Fri Aug 3 14:12:49 2018 -0600

    regcomp.c: Change variable name
    
    The new name more closely corresponds with its use.

-----------------------------------------------------------------------

Summary of changes:
 MANIFEST                   |    1 +
 dosish.h                   |    2 +-
 embed.fnc                  |   38 +-
 embed.h                    |   12 +-
 embedvar.h                 |    6 +
 lib/utf8_heavy.pl          |    3 -
 makedef.pl                 |    2 +
 perl.c                     |    1 +
 perl.h                     |    8 +
 perlapi.h                  |    6 +
 perlvars.h                 |   11 +
 pod/perldelta.pod          |   66 ++
 pod/perlunicode.pod        |    3 +-
 proto.h                    |   30 +-
 regcomp.c                  | 1620 +++++++++++++++++++++++++++++---------------
 regexec.c                  |   39 +-
 t/op/taint.t               |   20 +-
 t/re/anyof.t               |   10 +-
 t/re/regexp_unicode_prop.t |  117 +++-
 t/re/user_prop_race_thr.t  |  117 ++++
 unixish.h                  |    4 +-
 utf8.c                     |  511 +-------------
 utf8.h                     |    5 -
 23 files changed, 1495 insertions(+), 1137 deletions(-)
 create mode 100644 t/re/user_prop_race_thr.t

diff --git a/MANIFEST b/MANIFEST
index 9e58fbefa1..80e676c49b 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -5943,6 +5943,7 @@ t/re/uniprops07.t         Test unicode \p{} regex 
constructs
 t/re/uniprops08.t              Test unicode \p{} regex constructs
 t/re/uniprops09.t              Test unicode \p{} regex constructs
 t/re/uniprops10.t              Test unicode \p{} regex constructs
+t/re/user_prop_race_thr.t      Test races in user-defined \p{} under threads
 t/README                       Instructions for regression tests
 t/run/cloexec.t                        Test close-on-exec.
 t/run/dtrace.pl                        For dtrace.t
diff --git a/dosish.h b/dosish.h
index 16ee9b7359..dff759b309 100644
--- a/dosish.h
+++ b/dosish.h
@@ -51,7 +51,7 @@
 #  define PERL_SYS_TERM_BODY()                         \
     HINTS_REFCNT_TERM; KEYWORD_PLUGIN_MUTEX_TERM;      \
     OP_CHECK_MUTEX_TERM; OP_REFCNT_TERM; PERLIO_TERM;  \
-    MALLOC_TERM; LOCALE_TERM;
+    MALLOC_TERM; LOCALE_TERM; USER_PROP_MUTEX_TERM;
 #endif
 #define dXSUB_SYS dNOOP
 
diff --git a/embed.fnc b/embed.fnc
index 9d4a8461f5..07a38a1b57 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -1343,9 +1343,6 @@ Apmb      |OP*    |ref            |NULLOK OP* o|I32 type
 s      |OP*    |refkids        |NULLOK OP* o|I32 type
 #endif
 Ap     |void   |regdump        |NN const regexp* r
-ApM    |SV*    |regclass_swash |NULLOK const regexp *prog \
-                               |NN const struct regnode *node|bool doinit \
-                               |NULLOK SV **listsvp|NULLOK SV **altsvp
 #if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_PERL_C) || 
defined(PERL_IN_UTF8_C)
 EXpR   |SV*    |_new_invlist_C_array|NN const UV* const list
 EXMp   |bool   |_invlistEQ     |NN SV* const a|NN SV* const b|const bool 
complement_b
@@ -1735,19 +1732,12 @@ EXpM    |void   |_invlist_union_maybe_complement_2nd    
    \
 EXmM   |void   |_invlist_subtract|NN SV* const a|NN SV* const b|NN SV** result
 EXpM   |void   |_invlist_invert|NN SV* const invlist
 EXMpR  |SV*    |_new_invlist   |IV initial_size
-EXMpR  |SV*    |_swash_to_invlist      |NN SV* const swash
 EXMpR  |SV*    |_add_range_to_invlist  |NULLOK SV* invlist|UV start|UV end
 EXMpR  |SV*    |_setup_canned_invlist|const STRLEN size|const UV element0|NN 
UV** other_elements_ptr
-EXMpn  |void   |_invlist_populate_swatch   |NN SV* const invlist|const UV 
start|const UV end|NN U8* swatch
 #endif
 #if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_SV_C)
 EMpX   |SV*    |invlist_clone  |NN SV* const invlist|NULLOK SV* newlist
 #endif
-#if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_REGEXEC_C) || 
defined(PERL_IN_UTF8_C) || defined(PERL_IN_TOKE_C)
-EXp    |SV*    |_core_swash_init|NN const char* pkg|NN const char* name \
-               |NN SV* listsv|I32 minbits|I32 none \
-               |NULLOK SV* invlist|NULLOK U8* const flags_p
-#endif
 #if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_REGEXEC_C) || 
defined(PERL_IN_TOKE_C) || defined(PERL_IN_UTF8_C) || defined(PERL_IN_PP_C)
 EiMRn  |UV*    |invlist_array  |NN SV* const invlist
 EiMRn  |bool   |is_invlist     |NN SV* const invlist
@@ -1755,7 +1745,6 @@ EiMRn     |bool*  |get_invlist_offset_addr|NN SV* invlist
 EiMRn  |UV     |_invlist_len   |NN SV* const invlist
 EMiRn  |bool   |_invlist_contains_cp|NN SV* const invlist|const UV cp
 EXpMRn |SSize_t|_invlist_search        |NN SV* const invlist|const UV cp
-EXMpR  |SV*    |_get_swash_invlist|NN SV* const swash
 #endif
 #if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_REGEXEC_C)
 EXpM   |SV*    |_get_regclass_nonbitmap_data                              \
@@ -2385,10 +2374,8 @@ Es       |regnode_offset|regbranch       |NN 
RExC_state_t *pRExC_state \
 Es     |void    |set_ANYOF_arg |NN RExC_state_t* const pRExC_state \
                                |NN regnode* const node                    \
                                |NULLOK SV* const cp_list                  \
-                               |NULLOK SV* const runtime_defns            \
-                               |NULLOK SV* const only_utf8_locale_list    \
-                               |NULLOK SV* const swash                    \
-                               |const bool has_user_defined_property
+                               |NULLOK SV* const runtime_defns            \
+                               |NULLOK SV* const only_utf8_locale_list
 Es     |void   |output_posix_warnings                                      \
                                |NN RExC_state_t *pRExC_state               \
                                |NN AV* posix_warnings
@@ -2513,10 +2500,23 @@ EnsR    |int    |edit_distance  |NN const UV *src       
            \
                                |const STRLEN x                     \
                                |const STRLEN y                     \
                                |const SSize_t maxDistance
-EXp    |SV *   |parse_uniprop_string|NN const char * const name           \
-                                    |const Size_t name_len                \
-                                    |const bool to_fold                   \
-                                    |NN bool * invert
+EpX    |SV *   |parse_uniprop_string|NN const char * const name            \
+                                    |const Size_t name_len                 \
+                                    |const bool is_utf8                    \
+                                    |const bool to_fold                    \
+                                    |const bool runtime                    \
+                                    |NN bool * user_defined_ptr            \
+                                    |NN SV * msg                           \
+                                    |const STRLEN level
+EXp    |SV *   |handle_user_defined_property|NN const char * name          \
+                                            |const STRLEN name_len         \
+                                            |const bool is_utf8            \
+                                            |const bool to_fold            \
+                                            |const bool runtime            \
+                                            |NN SV* contents               \
+                                            |NN bool *user_defined_ptr     \
+                                            |NN SV * msg                   \
+                                            |const STRLEN level
 #  ifdef DEBUGGING
 Ep     |int    |re_indentf     |NN const char *fmt|U32 depth|...
 Es     |void        |regdump_intflags|NULLOK const char *lead| const U32 flags
diff --git a/embed.h b/embed.h
index 4df6fa0b0f..fa1a3766eb 100644
--- a/embed.h
+++ b/embed.h
@@ -631,7 +631,6 @@
 #define reg_named_buff_firstkey(a,b)   Perl_reg_named_buff_firstkey(aTHX_ a,b)
 #define reg_named_buff_nextkey(a,b)    Perl_reg_named_buff_nextkey(aTHX_ a,b)
 #define reg_named_buff_scalar(a,b)     Perl_reg_named_buff_scalar(aTHX_ a,b)
-#define regclass_swash(a,b,c,d,e)      Perl_regclass_swash(aTHX_ a,b,c,d,e)
 #define regdump(a)             Perl_regdump(aTHX_ a)
 #define regexec_flags(a,b,c,d,e,f,g,h) Perl_regexec_flags(aTHX_ 
a,b,c,d,e,f,g,h)
 #define regfree_internal(a)    Perl_regfree_internal(aTHX_ a)
@@ -1188,6 +1187,7 @@
 #define handle_named_backref(a,b,c,d)  S_handle_named_backref(aTHX_ a,b,c,d)
 #define handle_possible_posix(a,b,c,d,e)       S_handle_possible_posix(aTHX_ 
a,b,c,d,e)
 #define handle_regex_sets(a,b,c,d,e)   S_handle_regex_sets(aTHX_ a,b,c,d,e)
+#define handle_user_defined_property(a,b,c,d,e,f,g,h,i)        
Perl_handle_user_defined_property(aTHX_ a,b,c,d,e,f,g,h,i)
 #define invlist_contents(a,b)  S_invlist_contents(aTHX_ a,b)
 #define invlist_highest                S_invlist_highest
 #define invlist_is_iterating   S_invlist_is_iterating
@@ -1201,7 +1201,7 @@
 #define nextchar(a)            S_nextchar(aTHX_ a)
 #define output_posix_warnings(a,b)     S_output_posix_warnings(aTHX_ a,b)
 #define parse_lparen_question_flags(a) S_parse_lparen_question_flags(aTHX_ a)
-#define parse_uniprop_string(a,b,c,d)  Perl_parse_uniprop_string(aTHX_ a,b,c,d)
+#define parse_uniprop_string(a,b,c,d,e,f,g,h)  Perl_parse_uniprop_string(aTHX_ 
a,b,c,d,e,f,g,h)
 #define populate_ANYOF_from_invlist(a,b)       
S_populate_ANYOF_from_invlist(aTHX_ a,b)
 #define reg(a,b,c,d)           S_reg(aTHX_ a,b,c,d)
 #define reg2Lanode(a,b,c,d)    S_reg2Lanode(aTHX_ a,b,c,d)
@@ -1218,7 +1218,7 @@
 #define regpiece(a,b,c)                S_regpiece(aTHX_ a,b,c)
 #define regtail(a,b,c,d)       S_regtail(aTHX_ a,b,c,d)
 #define scan_commit(a,b,c,d)   S_scan_commit(aTHX_ a,b,c,d)
-#define set_ANYOF_arg(a,b,c,d,e,f,g)   S_set_ANYOF_arg(aTHX_ a,b,c,d,e,f,g)
+#define set_ANYOF_arg(a,b,c,d,e)       S_set_ANYOF_arg(aTHX_ a,b,c,d,e)
 #define set_regex_pv(a,b)      S_set_regex_pv(aTHX_ a,b)
 #define skip_to_be_ignored_text(a,b,c) S_skip_to_be_ignored_text(aTHX_ a,b,c)
 #define ssc_add_range(a,b,c)   S_ssc_add_range(aTHX_ a,b,c)
@@ -1250,7 +1250,6 @@
 #define regprop(a,b,c,d,e)     Perl_regprop(aTHX_ a,b,c,d,e)
 #  endif
 #  if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_REGEXEC_C) || 
defined(PERL_IN_TOKE_C) || defined(PERL_IN_UTF8_C) || defined(PERL_IN_PP_C)
-#define _get_swash_invlist(a)  Perl__get_swash_invlist(aTHX_ a)
 #define _invlist_contains_cp   S__invlist_contains_cp
 #define _invlist_len           S__invlist_len
 #define _invlist_search                Perl__invlist_search
@@ -1258,9 +1257,6 @@
 #define invlist_array          S_invlist_array
 #define is_invlist             S_is_invlist
 #  endif
-#  if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_REGEXEC_C) || 
defined(PERL_IN_UTF8_C) || defined(PERL_IN_TOKE_C)
-#define _core_swash_init(a,b,c,d,e,f,g)        Perl__core_swash_init(aTHX_ 
a,b,c,d,e,f,g)
-#  endif
 #  if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_SV_C)
 #define invlist_clone(a,b)     Perl_invlist_clone(aTHX_ a,b)
 #  endif
@@ -1275,11 +1271,9 @@
 #define _add_range_to_invlist(a,b,c)   Perl__add_range_to_invlist(aTHX_ a,b,c)
 #define _invlist_intersection_maybe_complement_2nd(a,b,c,d)    
Perl__invlist_intersection_maybe_complement_2nd(aTHX_ a,b,c,d)
 #define _invlist_invert(a)     Perl__invlist_invert(aTHX_ a)
-#define _invlist_populate_swatch       Perl__invlist_populate_swatch
 #define _invlist_union_maybe_complement_2nd(a,b,c,d)   
Perl__invlist_union_maybe_complement_2nd(aTHX_ a,b,c,d)
 #define _new_invlist(a)                Perl__new_invlist(aTHX_ a)
 #define _setup_canned_invlist(a,b,c)   Perl__setup_canned_invlist(aTHX_ a,b,c)
-#define _swash_to_invlist(a)   Perl__swash_to_invlist(aTHX_ a)
 #  endif
 #  if defined(PERL_IN_REGEXEC_C)
 #define advance_one_LB(a,b,c)  S_advance_one_LB(aTHX_ a,b,c)
diff --git a/embedvar.h b/embedvar.h
index 787d046a34..705be5ddf2 100644
--- a/embedvar.h
+++ b/embedvar.h
@@ -468,6 +468,12 @@
 #define PL_Gtimesbase          (my_vars->Gtimesbase)
 #define PL_use_safe_putenv     (my_vars->Guse_safe_putenv)
 #define PL_Guse_safe_putenv    (my_vars->Guse_safe_putenv)
+#define PL_user_def_props      (my_vars->Guser_def_props)
+#define PL_Guser_def_props     (my_vars->Guser_def_props)
+#define PL_user_def_props_aTHX (my_vars->Guser_def_props_aTHX)
+#define PL_Guser_def_props_aTHX        (my_vars->Guser_def_props_aTHX)
+#define PL_user_prop_mutex     (my_vars->Guser_prop_mutex)
+#define PL_Guser_prop_mutex    (my_vars->Guser_prop_mutex)
 #define PL_utf8_charname_begin (my_vars->Gutf8_charname_begin)
 #define PL_Gutf8_charname_begin        (my_vars->Gutf8_charname_begin)
 #define PL_utf8_charname_continue      (my_vars->Gutf8_charname_continue)
diff --git a/lib/utf8_heavy.pl b/lib/utf8_heavy.pl
index 8882cf4d84..22cee9e4af 100644
--- a/lib/utf8_heavy.pl
+++ b/lib/utf8_heavy.pl
@@ -75,9 +75,6 @@ sub _loose_name ($) {
         ##
         ## Callers of swash_init:
         ##     op.c:pmtrans             -- for tr/// and y///
-        ##     regexec.c:regclass_swash -- for /[]/, \p, and \P
-        ##     utf8.c:is_utf8_common    -- for common Unicode properties
-        ##     utf8.c:S__to_utf8_case   -- for lc, uc, ucfirst, etc. and //i
         ##     Unicode::UCD::prop_invlist
         ##     Unicode::UCD::prop_invmap
         ##
diff --git a/makedef.pl b/makedef.pl
index e5ee6b9085..2e4e6dcda0 100644
--- a/makedef.pl
+++ b/makedef.pl
@@ -352,6 +352,8 @@ if ($define{'PERL_USE_SAFE_PUTENV'}) {
 
 unless ($define{'USE_ITHREADS'}) {
     ++$skip{PL_thr_key};
+    ++$skip{PL_user_prop_mutex};
+    ++$skip{PL_user_def_props_aTHX};
 }
 
 # USE_5005THREADS symbols. Kept as reference for easier removal
diff --git a/perl.c b/perl.c
index d82e1e720a..3c49f9650f 100644
--- a/perl.c
+++ b/perl.c
@@ -95,6 +95,7 @@ S_init_tls_and_interp(PerlInterpreter *my_perl)
         KEYWORD_PLUGIN_MUTEX_INIT;
        HINTS_REFCNT_INIT;
         LOCALE_INIT;
+        USER_PROP_MUTEX_INIT;
        MUTEX_INIT(&PL_dollarzero_mutex);
        MUTEX_INIT(&PL_my_ctx_mutex);
 #  endif
diff --git a/perl.h b/perl.h
index dd66b120af..757fc7083c 100644
--- a/perl.h
+++ b/perl.h
@@ -5991,11 +5991,19 @@ typedef struct am_table_short AMTS;
 #  define KEYWORD_PLUGIN_MUTEX_LOCK    MUTEX_LOCK(&PL_keyword_plugin_mutex)
 #  define KEYWORD_PLUGIN_MUTEX_UNLOCK  MUTEX_UNLOCK(&PL_keyword_plugin_mutex)
 #  define KEYWORD_PLUGIN_MUTEX_TERM    MUTEX_DESTROY(&PL_keyword_plugin_mutex)
+#  define USER_PROP_MUTEX_INIT    MUTEX_INIT(&PL_user_prop_mutex)
+#  define USER_PROP_MUTEX_LOCK    MUTEX_LOCK(&PL_user_prop_mutex)
+#  define USER_PROP_MUTEX_UNLOCK  MUTEX_UNLOCK(&PL_user_prop_mutex)
+#  define USER_PROP_MUTEX_TERM    MUTEX_DESTROY(&PL_user_prop_mutex)
 #else
 #  define KEYWORD_PLUGIN_MUTEX_INIT    NOOP
 #  define KEYWORD_PLUGIN_MUTEX_LOCK    NOOP
 #  define KEYWORD_PLUGIN_MUTEX_UNLOCK  NOOP
 #  define KEYWORD_PLUGIN_MUTEX_TERM    NOOP
+#  define USER_PROP_MUTEX_INIT    NOOP
+#  define USER_PROP_MUTEX_LOCK    NOOP
+#  define USER_PROP_MUTEX_UNLOCK  NOOP
+#  define USER_PROP_MUTEX_TERM    NOOP
 #endif
 
 #ifdef USE_LOCALE /* These locale things are all subject to change */
diff --git a/perlapi.h b/perlapi.h
index de4267aa75..f08bd60a42 100644
--- a/perlapi.h
+++ b/perlapi.h
@@ -215,6 +215,12 @@ END_EXTERN_C
 #define PL_timesbase           (*Perl_Gtimesbase_ptr(NULL))
 #undef  PL_use_safe_putenv
 #define PL_use_safe_putenv     (*Perl_Guse_safe_putenv_ptr(NULL))
+#undef  PL_user_def_props
+#define PL_user_def_props      (*Perl_Guser_def_props_ptr(NULL))
+#undef  PL_user_def_props_aTHX
+#define PL_user_def_props_aTHX (*Perl_Guser_def_props_aTHX_ptr(NULL))
+#undef  PL_user_prop_mutex
+#define PL_user_prop_mutex     (*Perl_Guser_prop_mutex_ptr(NULL))
 #undef  PL_utf8_charname_begin
 #define PL_utf8_charname_begin (*Perl_Gutf8_charname_begin_ptr(NULL))
 #undef  PL_utf8_charname_continue
diff --git a/perlvars.h b/perlvars.h
index 8a4ff6a47b..51c939e128 100644
--- a/perlvars.h
+++ b/perlvars.h
@@ -307,6 +307,17 @@ PERLVAR(G, utf8_mark,      SV *)
 PERLVAR(G, InBitmap,   SV *)
 PERLVAR(G, CCC_non0_non230,    SV *)
 
+/* Definitions of user-defined \p{} properties, as the subs that define them
+ * are only called once */
+PERLVARI(G, user_def_props,    HV *, NULL)
+
+#if defined(USE_ITHREADS)
+PERLVAR(G, user_def_props_aTHX, PerlInterpreter *)  /* aTHX that user_def_props
+                                                       was defined in */
+PERLVAR(G, user_prop_mutex, perl_mutex)    /* Mutex for manipulating
+                                              PL_user_defined_properties */
+#endif
+
 /* Everything that folds to a given character, for case insensitivity regex
  * matching */
 PERLVAR(G, utf8_foldclosures, SV *)
diff --git a/pod/perldelta.pod b/pod/perldelta.pod
index 6bed9abe59..74c9bdf799 100644
--- a/pod/perldelta.pod
+++ b/pod/perldelta.pod
@@ -316,6 +316,72 @@ trees.
 
 Avoid leak in multiconcat with overloading. [perl #133789]
 
+=item *
+
+The handling of user-defined C<\p{}> properties (see
+L<perlunicode/User-Defined Character Properties>) has been rewritten to
+be in C (instead of Perl).  This speeds things up, but in the process
+several inconsistencies and bug fixes are made.
+
+=over
+
+=item 1
+
+A few error messages have minor wording changes.  This is essentially
+because the new way is integrated into the regex error handling
+mechanism that marks the position in the input at which the error
+occurred.  That was not possible previously.  The messages now also
+contain additional back-trace-like information in case the error occurs
+deep in nested calls.
+
+=item 2
+
+A user-defined property is implemented as a perl subroutine with certain
+highly constrained naming conventions.  It was documented previously
+that the sub would be in the current package if the package was
+unspecified.  This turned out not to be true in all cases, but now it
+is.
+
+=item 3
+
+All recursive calls are treated as infinite recursion.  Previously they
+would cause the interpreter to panic.  Now, they cause the regex pattern
+to fail to compile.
+
+=item 4
+
+Similarly, any other error likely would lead to a panic; now to just the
+pattern failing to compile.
+
+=item 5
+
+The old mechanism did not detect illegal ranges in the definition of the
+property.  Now, the range max must not be smaller than the range min.
+Otherwise, the pattern fails to compile.
+
+=item 6
+
+The intention was to have each sub called only once during the lifetime
+of the program, so that a property's definition is immutable.  This was
+relaxed so that it could be called once for all /i compilations, and
+potentially a second time for non-/i (the sub is passed a parameter
+indicating which).  However, in practice there were instances when this
+was broken, and multiple calls were possible.  Those have been fixed.
+Now (besides the /i,non-/i cases) the only way a sub can be called
+multiple times is if some component of it has not been defined yet.  For
+example, suppose we have sub IsA() whose definition is known at compile
+time, and it in turn calls isB() whose definition is not yet known.
+isA() will be called each time a pattern it appears in is compiled.  If
+isA() also calls isC() and that definition is known, isC() will be
+called just once.
+
+=item 7
+
+There were some races and very long hangs should one thread be compiling
+the same property as another simultaneously.  These have now been fixed.
+
+=back
+
 =back
 
 =head1 Acknowledgements
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index a7f87a1ae6..d6931e4d02 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -965,7 +965,8 @@ A single hexadecimal number denoting a code point to 
include.
 =item *
 
 Two hexadecimal numbers separated by horizontal whitespace (space or
-tabular characters) denoting a range of code points to include.
+tabular characters) denoting a range of code points to include.  The
+second number must not be smaller than the first.
 
 =item *
 
diff --git a/proto.h b/proto.h
index adf1ef5d40..680733cf9d 100644
--- a/proto.h
+++ b/proto.h
@@ -2862,9 +2862,6 @@ PERL_CALLCONV SV* Perl_reg_qr_package(pTHX_ REGEXP * 
const rx);
 PERL_CALLCONV REGEXP*  Perl_reg_temp_copy(pTHX_ REGEXP* dsv, REGEXP* ssv);
 #define PERL_ARGS_ASSERT_REG_TEMP_COPY \
        assert(ssv)
-PERL_CALLCONV SV*      Perl_regclass_swash(pTHX_ const regexp *prog, const 
struct regnode *node, bool doinit, SV **listsvp, SV **altsvp);
-#define PERL_ARGS_ASSERT_REGCLASS_SWASH        \
-       assert(node)
 PERL_CALLCONV void     Perl_regdump(pTHX_ const regexp* r);
 #define PERL_ARGS_ASSERT_REGDUMP       \
        assert(r)
@@ -5446,6 +5443,9 @@ STATIC int        S_handle_possible_posix(pTHX_ 
RExC_state_t *pRExC_state, const char*
 STATIC regnode_offset  S_handle_regex_sets(pTHX_ RExC_state_t *pRExC_state, SV 
** return_invlist, I32 *flagp, U32 depth, char * const oregcomp_parse);
 #define PERL_ARGS_ASSERT_HANDLE_REGEX_SETS     \
        assert(pRExC_state); assert(flagp); assert(oregcomp_parse)
+PERL_CALLCONV SV *     Perl_handle_user_defined_property(pTHX_ const char * 
name, const STRLEN name_len, const bool is_utf8, const bool to_fold, const bool 
runtime, SV* contents, bool *user_defined_ptr, SV * msg, const STRLEN level);
+#define PERL_ARGS_ASSERT_HANDLE_USER_DEFINED_PROPERTY  \
+       assert(name); assert(contents); assert(user_defined_ptr); assert(msg)
 STATIC SV*     S_invlist_contents(pTHX_ SV* const invlist, const bool 
traditional_style)
                        __attribute__warn_unused_result__;
 #define PERL_ARGS_ASSERT_INVLIST_CONTENTS      \
@@ -5503,9 +5503,9 @@ STATIC void       S_output_posix_warnings(pTHX_ 
RExC_state_t *pRExC_state, AV* posix_w
 STATIC void    S_parse_lparen_question_flags(pTHX_ RExC_state_t *pRExC_state);
 #define PERL_ARGS_ASSERT_PARSE_LPAREN_QUESTION_FLAGS   \
        assert(pRExC_state)
-PERL_CALLCONV SV *     Perl_parse_uniprop_string(pTHX_ const char * const 
name, const Size_t name_len, const bool to_fold, bool * invert);
+PERL_CALLCONV SV *     Perl_parse_uniprop_string(pTHX_ const char * const 
name, const Size_t name_len, const bool is_utf8, const bool to_fold, const bool 
runtime, bool * user_defined_ptr, SV * msg, const STRLEN level);
 #define PERL_ARGS_ASSERT_PARSE_UNIPROP_STRING  \
-       assert(name); assert(invert)
+       assert(name); assert(user_defined_ptr); assert(msg)
 STATIC void    S_populate_ANYOF_from_invlist(pTHX_ regnode *node, SV** 
invlist_ptr);
 #define PERL_ARGS_ASSERT_POPULATE_ANYOF_FROM_INVLIST   \
        assert(node); assert(invlist_ptr)
@@ -5561,7 +5561,7 @@ STATIC void       S_regtail(pTHX_ RExC_state_t * 
pRExC_state, const regnode_offset p,
 STATIC void    S_scan_commit(pTHX_ const RExC_state_t *pRExC_state, struct 
scan_data_t *data, SSize_t *minlenp, int is_inf);
 #define PERL_ARGS_ASSERT_SCAN_COMMIT   \
        assert(pRExC_state); assert(data); assert(minlenp)
-STATIC void    S_set_ANYOF_arg(pTHX_ RExC_state_t* const pRExC_state, regnode* 
const node, SV* const cp_list, SV* const runtime_defns, SV* const 
only_utf8_locale_list, SV* const swash, const bool has_user_defined_property);
+STATIC void    S_set_ANYOF_arg(pTHX_ RExC_state_t* const pRExC_state, regnode* 
const node, SV* const cp_list, SV* const runtime_defns, SV* const 
only_utf8_locale_list);
 #define PERL_ARGS_ASSERT_SET_ANYOF_ARG \
        assert(pRExC_state); assert(node)
 STATIC void    S_set_regex_pv(pTHX_ RExC_state_t *pRExC_state, REGEXP *Rx);
@@ -5651,11 +5651,6 @@ PERL_CALLCONV void       Perl_regprop(pTHX_ const regexp 
*prog, SV* sv, const regnode*
        assert(sv); assert(o)
 #endif
 #if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_REGEXEC_C) || 
defined(PERL_IN_TOKE_C) || defined(PERL_IN_UTF8_C) || defined(PERL_IN_PP_C)
-PERL_CALLCONV SV*      Perl__get_swash_invlist(pTHX_ SV* const swash)
-                       __attribute__warn_unused_result__;
-#define PERL_ARGS_ASSERT__GET_SWASH_INVLIST    \
-       assert(swash)
-
 #ifndef PERL_NO_INLINE_FUNCTIONS
 PERL_STATIC_INLINE bool        S__invlist_contains_cp(SV* const invlist, const 
UV cp)
                        __attribute__warn_unused_result__;
@@ -5696,11 +5691,6 @@ PERL_STATIC_INLINE bool  S_is_invlist(SV* const invlist)
        assert(invlist)
 #endif
 
-#endif
-#if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_REGEXEC_C) || 
defined(PERL_IN_UTF8_C) || defined(PERL_IN_TOKE_C)
-PERL_CALLCONV SV*      Perl__core_swash_init(pTHX_ const char* pkg, const 
char* name, SV* listsv, I32 minbits, I32 none, SV* invlist, U8* const flags_p);
-#define PERL_ARGS_ASSERT__CORE_SWASH_INIT      \
-       assert(pkg); assert(name); assert(listsv)
 #endif
 #if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_SV_C)
 PERL_CALLCONV SV*      Perl_invlist_clone(pTHX_ SV* const invlist, SV* 
newlist);
@@ -5747,9 +5737,6 @@ PERL_CALLCONV void        
Perl__invlist_intersection_maybe_complement_2nd(pTHX_ SV* con
 PERL_CALLCONV void     Perl__invlist_invert(pTHX_ SV* const invlist);
 #define PERL_ARGS_ASSERT__INVLIST_INVERT       \
        assert(invlist)
-PERL_CALLCONV void     Perl__invlist_populate_swatch(SV* const invlist, const 
UV start, const UV end, U8* swatch);
-#define PERL_ARGS_ASSERT__INVLIST_POPULATE_SWATCH      \
-       assert(invlist); assert(swatch)
 /* PERL_CALLCONV void  _invlist_subtract(pTHX_ SV* const a, SV* const b, SV** 
result); */
 /* PERL_CALLCONV void  _invlist_union(pTHX_ SV* const a, SV* const b, SV** 
output); */
 PERL_CALLCONV void     Perl__invlist_union_maybe_complement_2nd(pTHX_ SV* 
const a, SV* const b, const bool complement_b, SV** output);
@@ -5763,11 +5750,6 @@ PERL_CALLCONV SV*        
Perl__setup_canned_invlist(pTHX_ const STRLEN size, const UV e
 #define PERL_ARGS_ASSERT__SETUP_CANNED_INVLIST \
        assert(other_elements_ptr)
 
-PERL_CALLCONV SV*      Perl__swash_to_invlist(pTHX_ SV* const swash)
-                       __attribute__warn_unused_result__;
-#define PERL_ARGS_ASSERT__SWASH_TO_INVLIST     \
-       assert(swash)
-
 #endif
 #if defined(PERL_IN_REGEXEC_C)
 STATIC LB_enum S_advance_one_LB(pTHX_ U8 ** curpos, const U8 * const strend, 
const bool utf8_target)
diff --git a/regcomp.c b/regcomp.c
index b434edb241..41d2582aad 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -1546,6 +1546,10 @@ S_ssc_is_cp_posixl_init(const RExC_state_t *pRExC_state,
     return TRUE;
 }
 
+#define INVLIST_INDEX 0
+#define ONLY_LOCALE_MATCHES_INDEX 1
+#define DEFERRED_USER_DEFINED_INDEX 2
+
 STATIC SV*
 S_get_ANYOF_cp_list_for_ssc(pTHX_ const RExC_state_t *pRExC_state,
                                const regnode_charclass* const node)
@@ -1571,28 +1575,24 @@ S_get_ANYOF_cp_list_for_ssc(pTHX_ const RExC_state_t 
*pRExC_state,
         SV **const ary = AvARRAY(av);
         assert(RExC_rxi->data->what[n] == 's');
 
-        if (ary[1] && ary[1] != &PL_sv_undef) { /* Has compile-time swash */
-            invlist = sv_2mortal(invlist_clone(_get_swash_invlist(ary[1]), 
NULL));
-        }
-        else if (ary[0] && ary[0] != &PL_sv_undef) {
+        if (av_tindex_skip_len_mg(av) >= DEFERRED_USER_DEFINED_INDEX) {
 
-            /* Here, no compile-time swash, and there are things that won't be
-             * known until runtime -- we have to assume it could be anything */
+            /* Here there are things that won't be known until runtime -- we
+             * have to assume it could be anything */
             invlist = sv_2mortal(_new_invlist(1));
             return _add_range_to_invlist(invlist, 0, UV_MAX);
         }
-        else if (ary[3] && ary[3] != &PL_sv_undef) {
+        else if (ary[INVLIST_INDEX]) {
 
-            /* Here no compile-time swash, and no run-time only data.  Use the
-             * node's inversion list */
-            invlist = sv_2mortal(invlist_clone(ary[3], NULL));
+            /* Use the node's inversion list */
+            invlist = sv_2mortal(invlist_clone(ary[INVLIST_INDEX], NULL));
         }
 
         /* Get the code points valid only under UTF-8 locales */
-        if ((ANYOF_FLAGS(node) & ANYOFL_FOLD)
-            && ary[2] && ary[2] != &PL_sv_undef)
+        if (   (ANYOF_FLAGS(node) & ANYOFL_FOLD)
+            &&  av_tindex_skip_len_mg(av) >= ONLY_LOCALE_MATCHES_INDEX)
         {
-            only_utf8_locale_invlist = ary[2];
+            only_utf8_locale_invlist = ary[ONLY_LOCALE_MATCHES_INDEX];
         }
     }
 
@@ -2109,8 +2109,7 @@ S_ssc_finalize(pTHX_ RExC_state_t *pRExC_state, 
regnode_ssc *ssc)
 
     populate_ANYOF_from_invlist( (regnode *) ssc, &invlist);
 
-    set_ANYOF_arg(pRExC_state, (regnode *) ssc, invlist,
-                                NULL, NULL, NULL, FALSE);
+    set_ANYOF_arg(pRExC_state, (regnode *) ssc, invlist, NULL, NULL);
 
     /* Make sure is clone-safe */
     ssc->invlist = NULL;
@@ -9130,9 +9129,7 @@ Perl__new_invlist(pTHX_ IV initial_size)
        initial_size = 10;
     }
 
-    /* Allocate the initial space */
     new_list = newSV_type(SVt_INVLIST);
-
     initialize_invlist_guts(new_list, initial_size);
 
     return new_list;
@@ -9387,100 +9384,6 @@ Perl__invlist_search(SV* const invlist, const UV cp)
     return high;
 }
 
-void
-Perl__invlist_populate_swatch(SV* const invlist,
-                              const UV start, const UV end, U8* swatch)
-{
-    /* populates a swatch of a swash the same way swatch_get() does in utf8.c,
-     * but is used when the swash has an inversion list.  This makes this much
-     * faster, as it uses a binary search instead of a linear one.  This is
-     * intimately tied to that function, and perhaps should be in utf8.c,
-     * except it is intimately tied to inversion lists as well.  It assumes
-     * that <swatch> is all 0's on input */
-
-    UV current = start;
-    const IV len = _invlist_len(invlist);
-    IV i;
-    const UV * array;
-
-    PERL_ARGS_ASSERT__INVLIST_POPULATE_SWATCH;
-
-    if (len == 0) { /* Empty inversion list */
-        return;
-    }
-
-    array = invlist_array(invlist);
-
-    /* Find which element it is */
-    i = _invlist_search(invlist, start);
-
-    /* We populate from <start> to <end> */
-    while (current < end) {
-        UV upper;
-
-       /* The inversion list gives the results for every possible code point
-        * after the first one in the list.  Only those ranges whose index is
-        * even are ones that the inversion list matches.  For the odd ones,
-        * and if the initial code point is not in the list, we have to skip
-        * forward to the next element */
-        if (i == -1 || ! ELEMENT_RANGE_MATCHES_INVLIST(i)) {
-            i++;
-            if (i >= len) { /* Finished if beyond the end of the array */
-                return;
-            }
-            current = array[i];
-           if (current >= end) {   /* Finished if beyond the end of what we
-                                      are populating */
-                if (LIKELY(end < UV_MAX)) {
-                    return;
-                }
-
-                /* We get here when the upper bound is the maximum
-                 * representable on the machine, and we are looking for just
-                 * that code point.  Have to special case it */
-                i = len;
-                goto join_end_of_list;
-            }
-        }
-        assert(current >= start);
-
-       /* The current range ends one below the next one, except don't go past
-        * <end> */
-        i++;
-        upper = (i < len && array[i] < end) ? array[i] : end;
-
-       /* Here we are in a range that matches.  Populate a bit in the 3-bit U8
-        * for each code point in it */
-        for (; current < upper; current++) {
-            const STRLEN offset = (STRLEN)(current - start);
-            swatch[offset >> 3] |= 1 << (offset & 7);
-        }
-
-      join_end_of_list:
-
-       /* Quit if at the end of the list */
-        if (i >= len) {
-
-           /* But first, have to deal with the highest possible code point on
-            * the platform.  The previous code assumes that <end> is one
-            * beyond where we want to populate, but that is impossible at the
-            * platform's infinity, so have to handle it specially */
-            if (UNLIKELY(end == UV_MAX && 
ELEMENT_RANGE_MATCHES_INVLIST(len-1)))
-           {
-                const STRLEN offset = (STRLEN)(end - start);
-                swatch[offset >> 3] |= 1 << (offset & 7);
-            }
-            return;
-        }
-
-       /* Advance to the next range, which will be for code points not in the
-        * inversion list */
-        current = array[i];
-    }
-
-    return;
-}
-
 void
 Perl__invlist_union_maybe_complement_2nd(pTHX_ SV* const a, SV* const b,
                                          const bool complement_b, SV** output)
@@ -10317,18 +10220,15 @@ Perl__invlist_invert(pTHX_ SV* const invlist)
 SV*
 Perl_invlist_clone(pTHX_ SV* const invlist, SV* new_invlist)
 {
-
     /* Return a new inversion list that is a copy of the input one, which is
      * unchanged.  The new list will not be mortal even if the old one was. */
 
-    const STRLEN nominal_length = _invlist_len(invlist);    /* Why not +1 XXX 
*/
+    const STRLEN nominal_length = _invlist_len(invlist);
     const STRLEN physical_length = SvCUR(invlist);
     const bool offset = *(get_invlist_offset_addr(invlist));
 
     PERL_ARGS_ASSERT_INVLIST_CLONE;
 
-    /* Need to allocate extra space to accommodate Perl's addition of a
-     * trailing NUL to SvPV's, since it thinks they are always strings */
     if (new_invlist == NULL) {
         new_invlist = _new_invlist(nominal_length);
     }
@@ -16559,7 +16459,7 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
U32 depth,
      *
      * ANYOF nodes contain a bit map for the first NUM_ANYOF_CODE_POINTS
      * characters, with the corresponding bit set if that character is in the
-     * list.  For characters above this, a range list or swash is used.  There
+     * list.  For characters above this, an inversion list is used.  There
      * are extra bits for \w, etc. in locale ANYOFs, as what these match is not
      * determinable at compile time
      *
@@ -16578,7 +16478,8 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
U32 depth,
     STRLEN numlen;
     int namedclass = OOB_NAMEDCLASS;
     char *rangebegin = NULL;
-    SV *listsv = NULL;
+    SV *listsv = NULL;      /* List of \p{user-defined} whose definitions
+                               aren't available at the time this was called */
     STRLEN initial_listsv_len = 0; /* Kind of a kludge to see if it is more
                                      than just initialized.  */
     SV* properties = NULL;    /* Code points that match \p{} \P{} */
@@ -16607,14 +16508,6 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
     const bool skip_white = cBOOL(   ret_invlist
                                   || (RExC_flags & RXf_PMf_EXTENDED_MORE));
 
-    /* Unicode properties are stored in a swash; this holds the current one
-     * being parsed.  If this swash is the only above-latin1 component of the
-     * character class, an optimization is to pass it directly on to the
-     * execution engine.  Otherwise, it is set to NULL to indicate that there
-     * are other things in the class that have to be dealt with at execution
-     * time */
-    SV* swash = NULL;          /* Code points that match \p{} \P{} */
-
     /* inversion list of code points this node matches only when the target
      * string is in UTF-8.  These are all non-ASCII, < 256.  (Because is under
      * /d) */
@@ -16696,7 +16589,9 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
U32 depth,
     allow_multi_folds = FALSE;
 #endif
 
-    listsv = newSVpvs_flags("# comment\n", SVs_TEMP);
+    /* We include the /i status at the beginning of this so that we can
+     * know it at runtime */
+    listsv = sv_2mortal(Perl_newSVpvf(aTHX_ "#%d\n", cBOOL(FOLD)));
     initial_listsv_len = SvCUR(listsv);
     SvTEMP_off(listsv); /* Grr, TEMPs and mortals are conflated.  */
 
@@ -16935,17 +16830,6 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
            case 'P':
                {
                char *e;
-                char *i;
-
-                /* We will handle any undefined properties ourselves */
-                U8 swash_init_flags = _CORE_SWASH_INIT_RETURN_IF_UNDEF
-                                       /* And we actually would prefer to get
-                                        * the straight inversion list of the
-                                        * swash, since we will be accessing it
-                                        * anyway, to save a little time */
-                                      |_CORE_SWASH_INIT_ACCEPT_INVLIST;
-
-                SvREFCNT_dec(swash); /* Free any left-overs */
 
                /* \p means they want Unicode semantics */
                REQUIRE_UNI_RULES(flagp, 0);
@@ -17001,140 +16885,49 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
                }
                {
                     char* name = RExC_parse;
-                    char* base_name;    /* name after any packages are 
stripped */
-                    char* lookup_name = NULL;
-                    const char * const colon_colon = "::";
-                    bool invert;
-
-                    SV* invlist;
-
-                    /* Temporary workaround for [perl #133136].  For this
-                    * precise input that is in the .t that is failing, load
-                    * utf8.pm, which is what the test wants, so that that
-                    * .t passes */
-                    if (     memEQs(RExC_start, e + 1 - RExC_start,
-                                    "foo\\p{Alnum}")
-                        && ! hv_common(GvHVn(PL_incgv),
-                                       NULL,
-                                       "utf8.pm", sizeof("utf8.pm") - 1,
-                                       0, HV_FETCH_ISEXISTS, NULL, 0))
-                    {
-                        require_pv("utf8.pm");
-                    }
-                    invlist = parse_uniprop_string(name, n, FOLD, &invert);
-                    if (invlist) {
-                        if (invert) {
-                            value ^= 'P' ^ 'p';
-                        }
-                    }
-                    else {
 
-                    /* Try to get the definition of the property into
-                     * <invlist>.  If /i is in effect, the effective property
-                     * will have its name be <__NAME_i>.  The design is
-                     * discussed in commit
-                     * 2f833f5208e26b208886e51e09e2c072b5eabb46 */
-                    name = savepv(Perl_form(aTHX_ "%.*s", (int)n, RExC_parse));
-                    SAVEFREEPV(name);
-
-                    for (i = RExC_parse; i < RExC_parse + n; i++) {
-                        if (isCNTRL(*i) && *i != '\t') {
-                            RExC_parse = e + 1;
-                            vFAIL2("Can't find Unicode property definition 
\"%s\"", name);
+                    /* Any message returned about expanding the definition */
+                    SV* msg = newSVpvs_flags("", SVs_TEMP);
+
+                    /* If set TRUE, the property is user-defined as opposed to
+                     * official Unicode */
+                    bool user_defined = FALSE;
+
+                    SV * prop_definition = parse_uniprop_string(
+                                            name, n, UTF, FOLD,
+                                            FALSE, /* This is compile-time */
+                                            &user_defined,
+                                            msg,
+                                            0 /* Base level */
+                                           );
+                    if (SvCUR(msg)) {   /* Assumes any error causes a msg */
+                        assert(prop_definition == NULL);
+                        RExC_parse = e + 1;
+                        if (SvUTF8(msg)) {  /* msg being UTF-8 makes the whole
+                                               thing so, or else the display is
+                                               mojibake */
+                            RExC_utf8 = TRUE;
                         }
+                       /* diag_listed_as: Can't find Unicode property 
definition "%s" in regex; marked by <-- HERE in m/%s/ */
+                        vFAIL2utf8f("%" UTF8f, UTF8fARG(SvUTF8(msg),
+                                    SvCUR(msg), SvPVX(msg)));
                     }
 
-                    if (FOLD) {
-                        lookup_name = savepv(Perl_form(aTHX_ "__%s_i", name));
-
-                        /* The function call just below that uses this can fail
-                         * to return, leaking memory if we don't do this */
-                        SAVEFREEPV(lookup_name);
-                    }
-
-                    /* Look up the property name, and get its swash and
-                     * inversion list, if the property is found  */
-                    swash = _core_swash_init("utf8",
-                                             (lookup_name)
-                                              ? lookup_name
-                                              : name,
-                                             &PL_sv_undef,
-                                             1, /* binary */
-                                             0, /* not tr/// */
-                                             NULL, /* No inversion list */
-                                             &swash_init_flags
-                                            );
-                    if (! swash || ! (invlist = _get_swash_invlist(swash))) {
-                        HV* curpkg = (IN_PERL_COMPILETIME)
-                                      ? PL_curstash
-                                      : CopSTASH(PL_curcop);
-                        UV final_n = n;
-                        bool has_pkg;
-
-                        if (swash) {    /* Got a swash but no inversion list.
-                                           Something is likely wrong that will
-                                           be sorted-out later */
-                            SvREFCNT_dec_NN(swash);
-                            swash = NULL;
-                        }
+                    if (! is_invlist(prop_definition)) {
 
-                        /* Here didn't find it.  It could be a an error (like a
-                         * typo) in specifying a Unicode property, or it could
-                         * be a user-defined property that will be available at
-                         * run-time.  The names of these must begin with 'In'
-                         * or 'Is' (after any packages are stripped off).  So
-                         * if not one of those, or if we accept only
-                         * compile-time properties, is an error; otherwise add
-                         * it to the list for run-time look up. */
-                        if ((base_name = rninstr(name, name + n,
-                                                 colon_colon, colon_colon + 
2)))
-                        { /* Has ::.  We know this must be a user-defined
-                             property */
-                            base_name += 2;
-                            final_n -= base_name - name;
-                            has_pkg = TRUE;
+                        /* Here, the definition isn't known, so we have gotten
+                         * returned a string that will be evaluated if and when
+                         * encountered at runtime.  We add it to the list of
+                         * such properties, along with whether it should be
+                         * complemented or not */
+                        if (value == 'P') {
+                            sv_catpvs(listsv, "!");
                         }
                         else {
-                            base_name = name;
-                            has_pkg = FALSE;
-                        }
-
-                        if (   final_n < 3
-                            || base_name[0] != 'I'
-                            || (base_name[1] != 's' && base_name[1] != 'n')
-                            || ret_invlist)
-                        {
-                            const char * const msg
-                                = (has_pkg)
-                                  ? "Illegal user-defined property name"
-                                  : "Can't find Unicode property definition";
-                            RExC_parse = e + 1;
-
-                            /* diag_listed_as: Can't find Unicode property 
definition "%s" */
-                            vFAIL3utf8f("%s \"%" UTF8f "\"",
-                                msg, UTF8fARG(UTF, n, name));
+                            sv_catpvs(listsv, "+");
                         }
+                        sv_catsv(listsv, prop_definition);
 
-                        /* If the property name doesn't already have a package
-                         * name, add the current one to it so that it can be
-                         * referred to outside it. [perl #121777] */
-                        if (! has_pkg && curpkg) {
-                            char* pkgname = HvNAME(curpkg);
-                            if (memNEs(pkgname, HvNAMELEN(curpkg), "main")) {
-                                char* full_name = Perl_form(aTHX_
-                                                            "%s::%s",
-                                                            pkgname,
-                                                            name);
-                                n = strlen(full_name);
-                                name = savepvn(full_name, n);
-                                SAVEFREEPV(name);
-                            }
-                        }
-                        Perl_sv_catpvf(aTHX_ listsv, "%cutf8::%s%" UTF8f 
"%s\n",
-                                        (value == 'p' ? '+' : '!'),
-                                        (FOLD) ? "__" : "",
-                                        UTF8fARG(UTF, n, name),
-                                        (FOLD) ? "_i" : "");
                         has_runtime_dependency |= HAS_USER_DEFINED_PROPERTY;
 
                         /* We don't know yet what this matches, so have to flag
@@ -17142,27 +16935,32 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
                         anyof_flags |= 
ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP;
                     }
                     else {
+                        assert (prop_definition && 
is_invlist(prop_definition));
 
-                        /* Here, did get the swash and its inversion list.  If
-                         * the swash is from a user-defined property, then this
-                         * whole character class should be regarded as such */
-                        if (swash_init_flags
-                            & _CORE_SWASH_INIT_USER_DEFINED_PROPERTY)
+                        /* Here we do have the complete property definition
+                         *
+                         * Temporary workaround for [perl #133136].  For this
+                         * precise input that is in the .t that is failing,
+                         * load utf8.pm, which is what the test wants, so that
+                         * that .t passes */
+                        if (     memEQs(RExC_start, e + 1 - RExC_start,
+                                        "foo\\p{Alnum}")
+                            && ! hv_common(GvHVn(PL_incgv),
+                                           NULL,
+                                           "utf8.pm", sizeof("utf8.pm") - 1,
+                                           0, HV_FETCH_ISEXISTS, NULL, 0))
                         {
-                            has_runtime_dependency |= 
HAS_USER_DEFINED_PROPERTY;
+                            require_pv("utf8.pm");
                         }
-                    }
-                    }
-                    if (invlist) {
-                        if (! (has_runtime_dependency
-                                                & HAS_USER_DEFINED_PROPERTY) &&
+
+                        if (! user_defined &&
                             /* We warn on matching an above-Unicode code point
                              * if the match would return true, except don't
                              * warn for \p{All}, which has exactly one element
                              * = 0 */
-                            (_invlist_contains_cp(invlist, 0x110000)
-                                && (! (_invlist_len(invlist) == 1
-                                       && *invlist_array(invlist) == 0))))
+                            (_invlist_contains_cp(prop_definition, 0x110000)
+                                && (! (_invlist_len(prop_definition) == 1
+                                       && *invlist_array(prop_definition) == 
0))))
                         {
                             warn_super = TRUE;
                         }
@@ -17170,23 +16968,11 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
                         /* Invert if asking for the complement */
                         if (value == 'P') {
                            _invlist_union_complement_2nd(properties,
-                                                          invlist,
+                                                          prop_definition,
                                                           &properties);
-
-                            /* The swash can't be used as-is, because we've
-                            * inverted things; delay removing it to here after
-                            * have copied its invlist above */
-                            if (! swash) {
-                                SvREFCNT_dec_NN(invlist);
-                            }
-                            SvREFCNT_dec(swash);
-                            swash = NULL;
                         }
                         else {
-                            _invlist_union(properties, invlist, &properties);
-                            if (! swash) {
-                                SvREFCNT_dec_NN(invlist);
-                            }
+                            _invlist_union(properties, prop_definition, 
&properties);
                        }
                     }
                 }
@@ -18002,8 +17788,7 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
U32 depth,
 
     /* And combine the result (if any) with any inversion lists from posix
      * classes.  The lists are kept separate up to now because we don't want to
-     * fold the classes (folding of those is automatically handled by the swash
-     * fetching code) */
+     * fold the classes */
     if (simple_posixes) {   /* These are the classes known to be unaffected by
                                /a, /aa, and /d */
         if (cp_list) {
@@ -18184,10 +17969,9 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
      * folded until runtime */
 
     /* If we didn't do folding, it's because some information isn't available
-     * until runtime; set the run-time fold flag for these.  (We don't have to
-     * worry about properties folding, as that is taken care of by the swash
-     * fetching).  We know to set the flag if we have a non-NULL list for UTF-8
-     * locales, or the class matches at least one 0-255 range code point */
+     * until runtime; set the run-time fold flag for these  We know to set the
+     * flag if we have a non-NULL list for UTF-8 locales, or the class matches
+     * at least one 0-255 range code point */
     if (LOC && FOLD) {
 
         /* Some things on the list might be unconditionally included because of
@@ -18237,18 +18021,12 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
     {
         _invlist_invert(cp_list);
 
-        /* Any swash can't be used as-is, because we've inverted things */
-        if (swash) {
-            SvREFCNT_dec_NN(swash);
-            swash = NULL;
-        }
-
-        invert = FALSE;
+       /* Clear the invert flag since have just done it here */
+       invert = FALSE;
     }
 
     if (ret_invlist) {
         *ret_invlist = cp_list;
-        SvREFCNT_dec(swash);
 
         return RExC_emit;
     }
@@ -19043,23 +18821,10 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
         ANYOF_FLAGS(REGNODE_p(ret)) |= 
ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP;
     }
 
-    /* If there is a swash and more than one element, we can't use the swash in
-     * the optimization below. */
-    if (swash && element_count > 1) {
-       SvREFCNT_dec_NN(swash);
-       swash = NULL;
-    }
-
-    /* Note that the optimization of using 'swash' if it is the only thing in
-     * the class doesn't have us change swash at all, so it can include things
-     * that are also in the bitmap; otherwise we have purposely deleted that
-     * duplicate information */
     set_ANYOF_arg(pRExC_state, REGNODE_p(ret), cp_list,
                   (HAS_NONLOCALE_RUNTIME_PROPERTY_DEFINITION)
                    ? listsv : NULL,
-                  only_utf8_locale_list,
-                  swash, cBOOL(has_runtime_dependency
-                                                & HAS_USER_DEFINED_PROPERTY));
+                  only_utf8_locale_list);
     return ret;
 
   not_anyof:
@@ -19080,31 +18845,21 @@ S_set_ANYOF_arg(pTHX_ RExC_state_t* const pRExC_state,
                 regnode* const node,
                 SV* const cp_list,
                 SV* const runtime_defns,
-                SV* const only_utf8_locale_list,
-                SV* const swash,
-                const bool has_user_defined_property)
+                SV* const only_utf8_locale_list)
 {
     /* Sets the arg field of an ANYOF-type node 'node', using information about
      * the node passed-in.  If there is nothing outside the node's bitmap, the
      * arg is set to ANYOF_ONLY_HAS_BITMAP.  Otherwise, it sets the argument to
      * the count returned by add_data(), having allocated and stored an array,
-     * av, that that count references, as follows:
-     *  av[0] stores the character class description in its textual form.
-     *        This is used later (regexec.c:Perl_regclass_swash()) to
-     *        initialize the appropriate swash, and is also useful for dumping
-     *        the regnode.  This is set to &PL_sv_undef if the textual
-     *        description is not needed at run-time (as happens if the other
-     *        elements completely define the class)
-     *  av[1] if &PL_sv_undef, is a placeholder to later contain the swash
-     *        computed from av[0].  But if no further computation need be done,
-     *        the swash is stored here now (and av[0] is &PL_sv_undef).
-     *  av[2] stores the inversion list of code points that match only if the
-     *        current locale is UTF-8
-     *  av[3] stores the cp_list inversion list for use in addition or instead
-     *        of av[0]; used only if cp_list exists and av[1] is &PL_sv_undef.
-     *        (Otherwise everything needed is already in av[0] and av[1])
-     *  av[4] is set if any component of the class is from a user-defined
-     *        property; used only if av[3] exists */
+     * av, as follows:
+     *
+     *  av[0] stores the inversion list defining this class as far as known at
+     *        this time, or PL_sv_undef if nothing definite is now known.
+     *  av[1] stores the inversion list of code points that match only if the
+     *        current locale is UTF-8, or if none, PL_sv_undef if there is an
+     *        av[2], or no entry otherwise.
+     *  av[2] stores the list of user-defined properties whose subroutine
+     *        definitions aren't known at this time, or no entry if none. */
 
     UV n;
 
@@ -19119,26 +18874,16 @@ S_set_ANYOF_arg(pTHX_ RExC_state_t* const pRExC_state,
        AV * const av = newAV();
        SV *rv;
 
-       av_store(av, 0, (runtime_defns)
-                       ? SvREFCNT_inc(runtime_defns) : &PL_sv_undef);
-       if (swash) {
-           assert(cp_list);
-           av_store(av, 1, swash);
-           SvREFCNT_dec_NN(cp_list);
-       }
-       else {
-           av_store(av, 1, &PL_sv_undef);
-           if (cp_list) {
-               av_store(av, 3, cp_list);
-               av_store(av, 4, newSVuv(has_user_defined_property));
-           }
-       }
+        if (cp_list) {
+            av_store(av, INVLIST_INDEX, cp_list);
+        }
 
         if (only_utf8_locale_list) {
-           av_store(av, 2, only_utf8_locale_list);
+            av_store(av, ONLY_LOCALE_MATCHES_INDEX, only_utf8_locale_list);
         }
-        else {
-           av_store(av, 2, &PL_sv_undef);
+
+        if (runtime_defns) {
+            av_store(av, DEFERRED_USER_DEFINED_INDEX, 
SvREFCNT_inc(runtime_defns));
         }
 
        rv = newRV_noinc(MUTABLE_SV(av));
@@ -19159,14 +18904,15 @@ Perl__get_regclass_nonbitmap_data(pTHX_ const regexp 
*prog,
 
 {
     /* For internal core use only.
-     * Returns the swash for the input 'node' in the regex 'prog'.
-     * If <doinit> is 'true', will attempt to create the swash if not already
-     *   done.
+     * Returns the inversion list for the input 'node' in the regex 'prog'.
+     * If <doinit> is 'true', will attempt to create the inversion list if not
+     *    already done.
      * If <listsvp> is non-null, will return the printable contents of the
-     *    swash.  This can be used to get debugging information even before the
-     *    swash exists, by calling this function with 'doinit' set to false, in
-     *    which case the components that will be used to eventually create the
-     *    swash are returned  (in a printable form).
+     *    property definition.  This can be used to get debugging information
+     *    even before the inversion list exists, by calling this function with
+     *    'doinit' set to false, in which case the components that will be used
+     *    to eventually create the inversion list are returned  (in a printable
+     *    form).
      * If <only_utf8_locale_ptr> is not NULL, it is where this routine is to
      *    store an inversion list of code points that should match only if the
      *    execution-time locale is a UTF-8 one.
@@ -19174,18 +18920,17 @@ Perl__get_regclass_nonbitmap_data(pTHX_ const regexp 
*prog,
      *    inversion list of the code points that would be instead returned in
      *    <listsvp> if this were NULL.  Thus, what gets output in <listsvp>
      *    when this parameter is used, is just the non-code point data that
-     *    will go into creating the swash.  This currently should be just
+     *    will go into creating the inversion list.  This currently should be 
just
      *    user-defined properties whose definitions were not known at compile
      *    time.  Using this parameter allows for easier manipulation of the
-     *    swash's data by the caller.  It is illegal to call this function with
-     *    this parameter set, but not <listsvp>
+     *    inversion list's data by the caller.  It is illegal to call this
+     *    function with this parameter set, but not <listsvp>
      *
      * Tied intimately to how S_set_ANYOF_arg sets up the data structure.  Note
-     * that, in spite of this function's name, the swash it returns may include
-     * the bitmap data as well */
+     * that, in spite of this function's name, the inversion list it returns
+     * may include the bitmap data as well */
 
-    SV *sw  = NULL;
-    SV *si  = NULL;         /* Input swash initialization string */
+    SV *si  = NULL;         /* Input initialization string */
     SV* invlist = NULL;
 
     RXi_GET_DECL(prog, progi);
@@ -19201,69 +18946,73 @@ Perl__get_regclass_nonbitmap_data(pTHX_ const regexp 
*prog,
            SV * const rv = MUTABLE_SV(data->data[n]);
            AV * const av = MUTABLE_AV(SvRV(rv));
            SV **const ary = AvARRAY(av);
-           U8 swash_init_flags = _CORE_SWASH_INIT_ACCEPT_INVLIST;
 
-           si = *ary;  /* ary[0] = the string to initialize the swash with */
+            invlist = ary[INVLIST_INDEX];
 
-            if (av_tindex_skip_len_mg(av) >= 2) {
-                if (only_utf8_locale_ptr
-                    && ary[2]
-                    && ary[2] != &PL_sv_undef)
-                {
-                    *only_utf8_locale_ptr = ary[2];
-                }
-                else {
-                    assert(only_utf8_locale_ptr);
-                    *only_utf8_locale_ptr = NULL;
-                }
-
-                /* Elements 3 and 4 are either both present or both absent. [3]
-                 * is any inversion list generated at compile time; [4]
-                 * indicates if that inversion list has any user-defined
-                 * properties in it. */
-                if (av_tindex_skip_len_mg(av) >= 3) {
-                    invlist = ary[3];
-                    if (SvUV(ary[4])) {
-                        swash_init_flags |= 
_CORE_SWASH_INIT_USER_DEFINED_PROPERTY;
+            if (av_tindex_skip_len_mg(av) >= ONLY_LOCALE_MATCHES_INDEX) {
+                *only_utf8_locale_ptr = ary[ONLY_LOCALE_MATCHES_INDEX];
+            }
+
+            if (av_tindex_skip_len_mg(av) >= DEFERRED_USER_DEFINED_INDEX) {
+                si = ary[DEFERRED_USER_DEFINED_INDEX];
+            }
+
+           if (doinit && (si || invlist)) {
+                if (si) {
+                    bool user_defined;
+                    SV * msg = newSVpvs_flags("", SVs_TEMP);
+
+                    SV * prop_definition = handle_user_defined_property(
+                            "", 0, FALSE,   /* There is no \p{}, \P{} */
+                            SvPVX_const(si)[1] - '0',   /* /i or not has been
+                                                           stored here for just
+                                                           this occasion */
+                            TRUE,           /* run time */
+                            si,             /* The property definition  */
+                            &user_defined,
+                            msg,
+                            0               /* base level call */
+                           );
+
+                    if (SvCUR(msg)) {
+                        assert(prop_definition == NULL);
+
+                        Perl_croak(aTHX_ "%" UTF8f,
+                                UTF8fARG(SvUTF8(msg), SvCUR(msg), SvPVX(msg)));
                     }
-                }
-                else {
-                    invlist = NULL;
-                }
-           }
 
-           /* Element [1] is reserved for the set-up swash.  If already there,
-            * return it; if not, create it and store it there */
-           if (ary[1] && SvROK(ary[1])) {
-               sw = ary[1];
-           }
-           else if (doinit && ((si && si != &PL_sv_undef)
-                                 || (invlist && invlist != &PL_sv_undef))) {
-               assert(si);
-               sw = _core_swash_init("utf8", /* the utf8 package */
-                                     "", /* nameless */
-                                     si,
-                                     1, /* binary */
-                                     0, /* not from tr/// */
-                                     invlist,
-                                     &swash_init_flags);
-               (void)av_store(av, 1, sw);
+                    if (invlist) {
+                        _invlist_union(invlist, prop_definition, &invlist);
+                        SvREFCNT_dec_NN(prop_definition);
+                    }
+                    else {
+                        invlist = prop_definition;
+                    }
+
+                    assert(ONLY_LOCALE_MATCHES_INDEX == 1 + INVLIST_INDEX);
+                    assert(DEFERRED_USER_DEFINED_INDEX == 1
+                                                + ONLY_LOCALE_MATCHES_INDEX);
+
+                    av_store(av, INVLIST_INDEX, invlist);
+                    av_fill(av, (ary[ONLY_LOCALE_MATCHES_INDEX])
+                                 ? ONLY_LOCALE_MATCHES_INDEX:
+                                 INVLIST_INDEX);
+                    si = NULL;
+                }
            }
        }
     }
 
-    /* If requested, return a printable version of what this swash matches */
+    /* If requested, return a printable version of what this ANYOF node matches
+     * */
     if (listsvp) {
        SV* matches_string = NULL;
 
-        /* The swash should be used, if possible, to get the data, as it
-         * contains the resolved data.  But this function can be called at
-         * compile-time, before everything gets resolved, in which case we
-         * return the currently best available information, which is the string
-         * that will eventually be used to do that resolving, 'si' */
-       if ((! sw || (invlist = _get_swash_invlist(sw)) == NULL)
-            && (si && si != &PL_sv_undef))
-        {
+        /* This function can be called at compile-time, before everything gets
+         * resolved, in which case we return the currently best available
+         * information, which is the string that will eventually be used to do
+         * that resolving, 'si' */
+       if (si) {
             /* Here, we only have 'si' (and possibly some passed-in data in
              * 'invlist', which is handled below)  If the caller only wants
              * 'si', use that.  */
@@ -19356,12 +19105,10 @@ Perl__get_regclass_nonbitmap_data(pTHX_ const regexp 
*prog,
                 if (SvCUR(matches_string)) {  /* Get rid of trailing blank */
                     SvCUR_set(matches_string, SvCUR(matches_string) - 1);
                 }
-            } /* end of has an 'si' but no swash */
+            } /* end of has an 'si' */
        }
 
-        /* If we have a swash in place, its equivalent inversion list was above
-         * placed into 'invlist'.  If not, this variable may contain a stored
-         * inversion list which is information beyond what is in 'si' */
+        /* Add the stuff that's already known */
         if (invlist) {
 
             /* Again, if the caller doesn't want the output inversion list, put
@@ -19385,7 +19132,7 @@ Perl__get_regclass_nonbitmap_data(pTHX_ const regexp 
*prog,
        *listsvp = matches_string;
     }
 
-    return sw;
+    return invlist;
 }
 #endif /* !defined(PERL_IN_XSUB_RE) || defined(PLUGGABLE_RE_EXTENSION) */
 
@@ -21932,6 +21679,15 @@ S_dumpuntil(pTHX_ const regexp *r, const regnode 
*start, const regnode *node,
 void
 Perl_init_uniprops(pTHX)
 {
+    PL_user_def_props = newHV();
+
+#ifdef USE_ITHREADS
+
+    HvSHAREKEYS_off(PL_user_def_props);
+    PL_user_def_props_aTHX = aTHX;
+
+#endif
+
     /* Set up the inversion list global variables */
 
     PL_XPosix_ptrs[_CC_ASCII] = _new_invlist_C_array(uni_prop_ptrs[UNI_ASCII]);
@@ -22011,39 +21767,444 @@ Perl_init_uniprops(pTHX)
 #endif
 }
 
-SV *
-Perl_parse_uniprop_string(pTHX_ const char * const name, const Size_t name_len,
-                                const bool to_fold, bool * invert)
+#if 0
+
+This code was mainly added for backcompat to give a warning for non-portable
+code points in user-defined properties.  But experiments showed that the
+warning in earlier perls were only omitted on overflow, which should be an
+error, so there really isnt a backcompat issue, and actually adding the
+warning when none was present before might cause breakage, for little gain.  So
+khw left this code in, but not enabled.  Tests were never added.
+
+embed.fnc entry:
+Ei     |const char *|get_extended_utf8_msg|const UV cp
+
+PERL_STATIC_INLINE const char *
+S_get_extended_utf8_msg(pTHX_ const UV cp)
 {
-    /* Parse the interior meat of \p{} passed to this in 'name' with length
-     * 'name_len', and return an inversion list if a property with 'name' is
-     * found, or NULL if not.  'name' point to the input with leading and
-     * trailing space trimmed.  'to_fold' indicates if /i is in effect.
+    U8 dummy[UTF8_MAXBYTES + 1];
+    HV *msgs;
+    SV **msg;
+
+    uvchr_to_utf8_flags_msgs(dummy, cp, UNICODE_WARN_PERL_EXTENDED,
+                             &msgs);
+
+    msg = hv_fetchs(msgs, "text", 0);
+    assert(msg);
+
+    (void) sv_2mortal((SV *) msgs);
+
+    return SvPVX(*msg);
+}
+
+#endif
+
+SV *
+Perl_handle_user_defined_property(pTHX_
+
+    /* Parses the contents of a user-defined property definition; returning the
+     * expanded definition if possible.  If so, the return is an inversion
+     * list.
      *
-     * When the return is an inversion list, '*invert' will be set to a boolean
-     * indicating if it should be inverted or not
+     * If there are subroutines that are part of the expansion and which aren't
+     * known at the time of the call to this function, this returns what
+     * parse_uniprop_string() returned for the first one encountered.
      *
-     * This currently doesn't handle all cases.  A NULL return indicates the
-     * caller should try a different approach
-     */
+     * If an error was found, NULL is returned, and 'msg' gets a suitable
+     * message appended to it.  (Appending allows the back trace of how we got
+     * to the faulty definition to be displayed through nested calls of
+     * user-defined subs.)
+     *
+     * The caller IS responsible for freeing any returned SV.
+     *
+     * The syntax of the contents is pretty much described in perlunicode.pod,
+     * but we also allow comments on each line */
+
+    const char * name,          /* Name of property */
+    const STRLEN name_len,      /* The name's length in bytes */
+    const bool is_utf8,         /* ? Is 'name' encoded in UTF-8 */
+    const bool to_fold,         /* ? Is this under /i */
+    const bool runtime,         /* ? Are we in compile- or run-time */
+    SV* contents,               /* The property's definition */
+    bool *user_defined_ptr,     /* This will be set TRUE as we wouldn't be
+                                   getting called unless this is thought to be
+                                   a user-defined property */
+    SV * msg,                   /* Any error or warning msg(s) are appended to
+                                   this */
+    const STRLEN level)         /* Recursion level of this call */
+{
+    STRLEN len;
+    const char * string         = SvPV_const(contents, len);
+    const char * const e        = string + len;
+    const bool is_contents_utf8 = cBOOL(SvUTF8(contents));
+    const STRLEN msgs_length_on_entry = SvCUR(msg);
+
+    const char * s0 = string;   /* Points to first byte in the current line
+                                   being parsed in 'string' */
+    const char overflow_msg[] = "Code point too large in \"";
+    SV* running_definition = NULL;
+
+    PERL_ARGS_ASSERT_HANDLE_USER_DEFINED_PROPERTY;
+
+    *user_defined_ptr = TRUE;
+
+    /* Look at each line */
+    while (s0 < e) {
+        const char * s;     /* Current byte */
+        char op = '+';      /* Default operation is 'union' */
+        IV   min = 0;       /* range begin code point */
+        IV   max = -1;      /* and range end */
+        SV* this_definition;
+
+        /* Skip comment lines */
+        if (*s0 == '#') {
+            s0 = strchr(s0, '\n');
+            if (s0 == NULL) {
+                break;
+            }
+            s0++;
+            continue;
+        }
 
-    char* lookup_name;
-    bool stricter = FALSE;
-    bool is_nv_type = FALSE;         /* nv= or numeric_value=, or possibly one
-                                        of the cjk numeric properties (though
-                                        it requires extra effort to compile
-                                        them) */
-    unsigned int i;
-    unsigned int j = 0, lookup_len;
-    int equals_pos = -1;        /* Where the '=' is found, or negative if none 
*/
-    int slash_pos = -1;        /* Where the '/' is found, or negative if none 
*/
-    int table_index = 0;
-    bool starts_with_In_or_Is = FALSE;
-    Size_t lookup_offset = 0;
+        /* For backcompat, allow an empty first line */
+        if (*s0 == '\n') {
+            s0++;
+            continue;
+        }
+
+        /* First character in the line may optionally be the operation */
+        if (   *s0 == '+'
+            || *s0 == '!'
+            || *s0 == '-'
+            || *s0 == '&')
+        {
+            op = *s0++;
+        }
+
+        /* If the line is one or two hex digits separated by blank space, its
+         * a range; otherwise it is either another user-defined property or an
+         * error */
+
+        s = s0;
+
+        if (! isXDIGIT(*s)) {
+            goto check_if_property;
+        }
+
+        do { /* Each new hex digit will add 4 bits. */
+            if (min > ( (IV) MAX_LEGAL_CP >> 4)) {
+                s = strchr(s, '\n');
+                if (s == NULL) {
+                    s = e;
+                }
+                if (SvCUR(msg) > 0) sv_catpvs(msg, "; ");
+                sv_catpv(msg, overflow_msg);
+                Perl_sv_catpvf(aTHX_ msg, "%" UTF8f,
+                                     UTF8fARG(is_contents_utf8, s - s0, s0));
+                sv_catpvs(msg, "\"");
+                goto return_msg;
+            }
+
+            /* Accumulate this digit into the value */
+            min = (min << 4) + READ_XDIGIT(s);
+        } while (isXDIGIT(*s));
+
+        while (isBLANK(*s)) { s++; }
+
+        /* We allow comments at the end of the line */
+        if (*s == '#') {
+            s = strchr(s, '\n');
+            if (s == NULL) {
+                s = e;
+            }
+            s++;
+        }
+        else if (s < e && *s != '\n') {
+            if (! isXDIGIT(*s)) {
+                goto check_if_property;
+            }
+
+            /* Look for the high point of the range */
+            max = 0;
+            do {
+                if (max > ( (IV) MAX_LEGAL_CP >> 4)) {
+                    s = strchr(s, '\n');
+                    if (s == NULL) {
+                        s = e;
+                    }
+                    if (SvCUR(msg) > 0) sv_catpvs(msg, "; ");
+                    sv_catpv(msg, overflow_msg);
+                    Perl_sv_catpvf(aTHX_ msg, "%" UTF8f,
+                                      UTF8fARG(is_contents_utf8, s - s0, s0));
+                    sv_catpvs(msg, "\"");
+                    goto return_msg;
+                }
+
+                max = (max << 4) + READ_XDIGIT(s);
+            } while (isXDIGIT(*s));
+
+            while (isBLANK(*s)) { s++; }
+
+            if (*s == '#') {
+                s = strchr(s, '\n');
+                if (s == NULL) {
+                    s = e;
+                }
+            }
+            else if (s < e && *s != '\n') {
+                goto check_if_property;
+            }
+        }
+
+        if (max == -1) {    /* The line only had one entry */
+            max = min;
+        }
+        else if (max < min) {
+            if (SvCUR(msg) > 0) sv_catpvs(msg, "; ");
+            sv_catpvs(msg, "Illegal range in \"");
+            Perl_sv_catpvf(aTHX_ msg, "%" UTF8f,
+                                UTF8fARG(is_contents_utf8, s - s0, s0));
+            sv_catpvs(msg, "\"");
+            goto return_msg;
+        }
+
+#if 0   /* See explanation at definition above of get_extended_utf8_msg() */
+
+        if (   UNICODE_IS_PERL_EXTENDED(min)
+            || UNICODE_IS_PERL_EXTENDED(max))
+        {
+            if (SvCUR(msg) > 0) sv_catpvs(msg, "; ");
+
+            /* If both code points are non-portable, warn only on the lower
+             * one. */
+            sv_catpv(msg, get_extended_utf8_msg(
+                                            (UNICODE_IS_PERL_EXTENDED(min))
+                                            ? min : max));
+            sv_catpvs(msg, " in \"");
+            Perl_sv_catpvf(aTHX_ msg, "%" UTF8f,
+                                 UTF8fARG(is_contents_utf8, s - s0, s0));
+            sv_catpvs(msg, "\"");
+        }
+
+#endif
+
+        /* Here, this line contains a legal range */
+        this_definition = sv_2mortal(_new_invlist(2));
+        this_definition = _add_range_to_invlist(this_definition, min, max);
+        goto calculate;
+
+      check_if_property:
+
+        /* Here it isn't a legal range line.  See if it is a legal property
+         * line.  First find the end of the meat of the line */
+        s = strpbrk(s, "#\n");
+        if (s == NULL) {
+            s = e;
+        }
+
+        /* Ignore trailing blanks in keeping with the requirements of
+         * parse_uniprop_string() */
+        s--;
+        while (s > s0 && isBLANK_A(*s)) {
+            s--;
+        }
+        s++;
+
+        this_definition = parse_uniprop_string(s0, s - s0,
+                                               is_utf8, to_fold, runtime,
+                                               user_defined_ptr, msg,
+                                               (name_len == 0)
+                                                ? level /* Don't increase level
+                                                           if input is empty */
+                                                : level + 1
+                                              );
+        if (this_definition == NULL) {
+            goto return_msg;    /* 'msg' should have had the reason appended to
+                                   it by the above call */
+        }
+
+        if (! is_invlist(this_definition)) {    /* Unknown at this time */
+            return newSVsv(this_definition);
+        }
+
+        if (*s != '\n') {
+            s = strchr(s, '\n');
+            if (s == NULL) {
+                s = e;
+            }
+        }
+
+      calculate:
+
+        switch (op) {
+            case '+':
+                _invlist_union(running_definition, this_definition,
+                                                        &running_definition);
+                break;
+            case '-':
+                _invlist_subtract(running_definition, this_definition,
+                                                        &running_definition);
+                break;
+            case '&':
+                _invlist_intersection(running_definition, this_definition,
+                                                        &running_definition);
+                break;
+            case '!':
+                _invlist_union_complement_2nd(running_definition,
+                                        this_definition, &running_definition);
+                break;
+            default:
+                Perl_croak(aTHX_ "panic: %s: %d: Unexpected operation %d",
+                                 __FILE__, __LINE__, op);
+                break;
+        }
+
+        /* Position past the '\n' */
+        s0 = s + 1;
+    }   /* End of loop through the lines of 'contents' */
+
+    /* Here, we processed all the lines in 'contents' without error.  If we
+     * didn't add any warnings, simply return success */
+    if (msgs_length_on_entry == SvCUR(msg)) {
+
+        /* If the expansion was empty, the answer isn't nothing: its an empty
+         * inversion list */
+        if (running_definition == NULL) {
+            running_definition = _new_invlist(1);
+        }
+
+        return running_definition;
+    }
+
+    /* Otherwise, add some explanatory text, but we will return success */
+
+  return_msg:
+
+    if (name_len > 0) {
+        sv_catpvs(msg, " in expansion of ");
+        Perl_sv_catpvf(aTHX_ msg, "%" UTF8f, UTF8fARG(is_utf8, name_len, 
name));
+    }
+
+    return running_definition;
+}
+
+/* As explained below, certain operations need to take place in the first
+ * thread created.  These macros switch contexts */
+#ifdef USE_ITHREADS
+#  define DECLARATION_FOR_GLOBAL_CONTEXT                                    \
+                                        PerlInterpreter * save_aTHX = aTHX;
+#  define SWITCH_TO_GLOBAL_CONTEXT                                          \
+                           PERL_SET_CONTEXT((aTHX = PL_user_def_props_aTHX))
+#  define RESTORE_CONTEXT  PERL_SET_CONTEXT((aTHX = save_aTHX));
+#  define CUR_CONTEXT      aTHX
+#  define ORIGINAL_CONTEXT save_aTHX
+#else
+#  define DECLARATION_FOR_GLOBAL_CONTEXT
+#  define SWITCH_TO_GLOBAL_CONTEXT          NOOP
+#  define RESTORE_CONTEXT                   NOOP
+#  define CUR_CONTEXT                       NULL
+#  define ORIGINAL_CONTEXT                  NULL
+#endif
+
+STATIC void
+S_delete_recursion_entry(pTHX_ void *key)
+{
+    /* Deletes the entry used to detect recursion when expanding user-defined
+     * properties.  This is a function so it can be set up to be called even if
+     * the program unexpectedly quits */
+
+    SV ** current_entry;
+    const STRLEN key_len = strlen((const char *) key);
+    DECLARATION_FOR_GLOBAL_CONTEXT;
+
+    SWITCH_TO_GLOBAL_CONTEXT;
+
+    /* If the entry is one of these types, it is a permanent entry, and not the
+     * one used to detect recursions.  This function should delete only the
+     * recursion entry */
+    current_entry = hv_fetch(PL_user_def_props, (const char *) key, key_len, 
0);
+    if (     current_entry
+        && ! is_invlist(*current_entry)
+        && ! SvPOK(*current_entry))
+    {
+        (void) hv_delete(PL_user_def_props, (const char *) key, key_len,
+                                                                    G_DISCARD);
+    }
+
+    RESTORE_CONTEXT;
+}
+
+SV *
+Perl_parse_uniprop_string(pTHX_
+
+    /* Parse the interior of a \p{}, \P{}.  Returns its definition if knowable
+     * now.  If so, the return is an inversion list.
+     *
+     * If the property is user-defined, it is a subroutine, which in turn
+     * may call other subroutines.  This function will call the whole nest of
+     * them to get the definition they return; if some aren't known at the time
+     * of the call to this function, the fully qualified name of the highest
+     * level sub is returned.  It is an error to call this function at runtime
+     * without every sub defined.
+     *
+     * If an error was found, NULL is returned, and 'msg' gets a suitable
+     * message appended to it.  (Appending allows the back trace of how we got
+     * to the faulty definition to be displayed through nested calls of
+     * user-defined subs.)
+     *
+     * The caller should NOT try to free any returned inversion list.
+     *
+     * Other parameters will be set on return as described below */
+
+    const char * const name,    /* The first non-blank in the \p{}, \P{} */
+    const Size_t name_len,      /* Its length in bytes, not including any
+                                   trailing space */
+    const bool is_utf8,         /* ? Is 'name' encoded in UTF-8 */
+    const bool to_fold,         /* ? Is this under /i */
+    const bool runtime,         /* TRUE if this is being called at run time */
+    bool *user_defined_ptr,     /* Upon return from this function it will be
+                                   set to TRUE if any component is a
+                                   user-defined property */
+    SV * msg,                   /* Any error or warning msg(s) are appended to
+                                   this */
+   const STRLEN level)          /* Recursion level of this call */
+{
+    char* lookup_name;          /* normalized name for lookup in our tables */
+    unsigned lookup_len;        /* Its length */
+    bool stricter = FALSE;      /* Some properties have stricter name
+                                   normalization rules, which we decide upon
+                                   based on parsing */
+
+    /* nv= or numeric_value=, or possibly one of the cjk numeric properties
+     * (though it requires extra effort to download them from Unicode and
+     * compile perl to know about them) */
+    bool is_nv_type = FALSE;
+
+    unsigned int i, j = 0;
+    int equals_pos = -1;    /* Where the '=' is found, or negative if none */
+    int slash_pos  = -1;    /* Where the '/' is found, or negative if none */
+    int table_index = 0;    /* The entry number for this property in the table
+                               of all Unicode property names */
+    bool starts_with_In_or_Is = FALSE;  /* ? Does the name start with 'In' or
+                                             'Is' */
+    Size_t lookup_offset = 0;   /* Used to ignore the first few characters of
+                                   the normalized name in certain situations */
+    Size_t non_pkg_begin = 0;   /* Offset of first byte in 'name' that isn't
+                                   part of a package name */
+    bool could_be_user_defined = TRUE;  /* ? Could this be a user-defined
+                                             property rather than a Unicode
... 1886 lines suppressed ...

-- 
Perl5 Master Repository

Reply via email to