[perl.git] branch blead, updated. v5.25.2-160-gcd478ec

Karl Williamson Sat, 16 Jul 2016 19:31:15 -0700

In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/cd478ec9347b67c2eba38dd567a139c416b84793?hp=f6203e997f3012b8aab4cd35fe49f58e4d71fb8c>


- Log -----------------------------------------------------------------
commit cd478ec9347b67c2eba38dd567a139c416b84793
Merge: f6203e9 0854ea0
Author: Karl Williamson <[email protected]>
Date:   Sat Jul 16 17:43:19 2016 -0600

    Merge regcomp changes branch into blead
    
    This branch reimplements a function, adds a test file, improves the -Dr
    display of bracketed character classes in compiled patterns, fixing a
    regression in 5.24 of that display, and finds another optimization in /d
    bracketed character classes.

commit 0854ea0b9abfd9ff71c9dca1b5a5765dad2a20bd
Author: Karl Williamson <[email protected]>
Date:   Tue Jul 5 19:32:19 2016 -0600

    regcomp.c: Change some append_range_to_invlist() calls
    
    Now that add_range_to_invlist() is more efficient, some current calls to
    append_range_to_invlist() can be changed to make things slightly
    clearer.

M       regcomp.c

commit d1e2fe7b8496f259e898cf2bbdcd94103adf12c7
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 27 17:11:35 2016 -0600

    regcomp.c: Reimplement add_range_to_invlist()
    
    Previously, this function was essentially implemented by turning the new
    range into an inversion list and then unioning that with the existing
    inversion list.  This was easy to write, but expensive in terms of
    operations, with memory allocations and copies, etc.
    
    After discussing this with Tony Cook, I have reimplemented this.  Only
    in one uncommonly encountered case does it resort to using the union,
    because that case is somewhat tricky, and there is no need to have to
    know that trickiness in yet another place given its rarity.
    
    Otherwise, the function manipulates the inversion list.  Some cases are
    no-ops, some just change a single array element, some move things within
    the array, extending or splicing it.  But these are cheaper than the
    previous implementation.

M       embed.fnc
M       proto.h
M       regcomp.c

commit a7560fcde85b95349d490d9974746f6006f9f11c
Author: Karl Williamson <[email protected]>
Date:   Wed Jul 13 15:50:32 2016 -0600

    t/re/anyof.t: add tests for ranges
    
    These exercise the code changes that are coming the next commit, and are
    to show that it doesn't change what gets compiled.

M       t/re/anyof.t

commit 02517e3f826bb5fb01dfc68e372f98bec694a499
Author: Karl Williamson <[email protected]>
Date:   Tue Jul 12 21:15:07 2016 -0600

    regcomp.c: Refactor code dealing with m/[...]/d
    
    This consolidates some code that deals with bracketed character classes
    under /d.  As a result, some throw-away steps can be omitted, and things
    aren't scattered about.  The earlier version skipped doing some things
    if the class is to be inverted.  The reason turns out to not be because
    it was necessary, but that the dump of the compiled pattern was unclear.
    Previous commits have fixed that, so this now handles inverted character
    classes.

M       regcomp.c
M       t/re/anyof.t

commit d555b9dd665a725cb821a516235ceaa30169c3e6
Author: Karl Williamson <[email protected]>
Date:   Fri Jul 15 22:06:49 2016 -0600

    regcomp.c: Improve -Dr output
    
    This improves the output of a bracketed character class under /d that has
    things that match only if not in utf8.

M       regcomp.c
M       t/re/anyof.t

commit 72e868c8f167ec109319c7c8e60e236e34bd786f
Author: Karl Williamson <[email protected]>
Date:   Fri Jul 15 22:05:25 2016 -0600

    t/re/pat.t: Add some tests
    
    I'm unsure if these, [^\S ], get adequate tests, so added a few to make
    sure, where the bracket class should compile to a posix class minus a
    character normally in that class.

M       t/re/pat.t

commit 1744d18db101f04d58371a79e41ad24f71ab5d09
Author: Karl Williamson <[email protected]>
Date:   Tue Jul 5 19:41:30 2016 -0600

    regcomp.[ch]: Comments, White-space, only
    
    This indents code and reflows the comments to account for the enclosing
    block added by the previous commit.
    
    At the same time, it adds some other miscellaneous white space changes,
    and adds, revises other comments.

M       regcomp.c
M       regcomp.h

commit 58f79e73de521cbcbc1519c6043ee33bbac4082b
Author: Karl Williamson <[email protected]>
Date:   Tue Jul 5 19:37:46 2016 -0600

    regcomp.c: Improve -Dr output
    
    Previously when dumping the compiled ANYOF node under -Dr, it could say
    something like \x1B-\x1B.  A single element range should be dumped as a
    single element.

M       regcomp.c
M       t/re/anyof.t

commit ecfe5375bc04ce2ebb56d6452d1c18ec1a54a68f
Author: Karl Williamson <[email protected]>
Date:   Mon Jul 11 21:40:13 2016 -0600

    anyof.t: Add tests for previous commit
    
    The tests are being kept separate in case we want to put the previous
    commit into a maintenance release.  The tests need infrastructure added
    since the release of 5.24, so can't easily be backported.

M       t/re/anyof.t

commit 753b2c6a60a81dacbe59e2041e30e8302484dc2d
Author: Karl Williamson <[email protected]>
Date:   Wed Jul 6 11:52:01 2016 -0600

    Fix -Dr output regression
    
    Several commits in the 5.23 series improved the display of the compiled
    ANYOF regnodes, but introduced two bugs.  One of them is in \p{Any} and
    similar things that match the entire range 0-255.  That range is omitted,
    so it looks like \p{Any} only matches code points above 255.  Note that
    this is only what gets displayed under -Dr.  What actually gets compiled
    has been and still is fine.
    
    The other is that when displaying a pattern that still has unresolved
    user-defined properties that are complemented, it doesn't show properly
    that the whole thing is complemented.  That is, the output looks like it
    doesn't obey De Morgan's laws.
    
    The fixes to these are quite intertwined, and so I didn't try to
    separate them.

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit 847612ec9ecf3874e383107fcfd457a8ba962672
Author: Karl Williamson <[email protected]>
Date:   Mon Jul 11 11:54:56 2016 -0600

    Add t/re/anyof.t
    
    The ANYOF regnode is by far the most complicated to compile in regular
    expression patterns.  This new test file makes sure that what gets
    compiled doesn't get changed accidentally, nor how the compiled node is
    dumped for human readable output.
    
    I created tests for this from what changed in
    b77aba85f6ec3b2a2341077b14f39261c5753cea
    "Revamp -Dr handling of /[...]/", and from trying to exercise all
    branches in it using gcov.
    
    Making this test file showed some bugs, and infelicitous representations
    from that commit, which will be fixed in the next few commits, and tests
    added as each change is done.

M       MANIFEST
A       t/re/anyof.t
-----------------------------------------------------------------------

Summary of changes:
 MANIFEST     |   1 +
 embed.fnc    |   5 +-
 embed.h      |   2 +-
 proto.h      |   4 +-
 regcomp.c    | 757 ++++++++++++++++++++++++++++++++++++++++-------------------
 regcomp.h    |   2 +-
 t/re/anyof.t | 368 +++++++++++++++++++++++++++++
 t/re/pat.t   |  15 +-
 8 files changed, 910 insertions(+), 244 deletions(-)
 create mode 100644 t/re/anyof.t

diff --git a/MANIFEST b/MANIFEST
index 05933a9..773835f 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -5617,6 +5617,7 @@ t/porting/regen.t         Check that regen.pl doesn't 
need running
 t/porting/ss_dup.t             Check that sv.c:ss_dup handle everything
 t/porting/test_bootstrap.t     Test that the instructions for test 
bootstrapping aren't accidentally overlooked.
 t/porting/utils.t              Check that utility scripts still compile
+t/re/anyof.t                   See if bracketed char classes [...] compile 
properly
 t/re/charset.t                 See if regex modifiers like /d, /u work properly
 t/re/fold_grind.t              See if case folding works properly
 t/re/no_utf8_pm.t              Verify utf8.pm doesn't get loaded unless 
required
diff --git a/embed.fnc b/embed.fnc
index 9458575..3ae18a3 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -1584,7 +1584,7 @@ EXmM      |void   |_invlist_subtract|NN SV* const a|NN 
SV* const b|NN SV** result
 EXpM   |void   |_invlist_invert|NN SV* const invlist
 EXMpR  |SV*    |_new_invlist   |IV initial_size
 EXMpR  |SV*    |_swash_to_invlist      |NN SV* const swash
-EXMpR  |SV*    |_add_range_to_invlist  |NULLOK SV* invlist|const UV 
start|const UV end
+EXMpR  |SV*    |_add_range_to_invlist  |NULLOK SV* invlist|UV start|UV end
 EXMpR  |SV*    |_setup_canned_invlist|const STRLEN size|const UV element0|NN 
UV** other_elements_ptr
 EXMpn  |void   |_invlist_populate_swatch   |NN SV* const invlist|const UV 
start|const UV end|NN U8* swatch
 #endif
@@ -2300,7 +2300,8 @@ Es        |bool   |put_charclass_bitmap_innards|NN SV* sv 
            \
                                |NN char* bitmap                    \
                                |NULLOK SV* nonbitmap_invlist       \
                                |NULLOK SV* only_utf8_locale_invlist\
-                               |NULLOK const regnode * const node
+                               |NULLOK const regnode * const node  \
+                               |const bool force_as_is_display
 Es     |SV*    |put_charclass_bitmap_innards_common                \
                                |NN SV* invlist                     \
                                |NULLOK SV* posixes                 \
diff --git a/embed.h b/embed.h
index a666cd3..f7511ec 100644
--- a/embed.h
+++ b/embed.h
@@ -971,7 +971,7 @@
 #define dump_trie_interim_list(a,b,c,d,e)      S_dump_trie_interim_list(aTHX_ 
a,b,c,d,e)
 #define dump_trie_interim_table(a,b,c,d,e)     S_dump_trie_interim_table(aTHX_ 
a,b,c,d,e)
 #define dumpuntil(a,b,c,d,e,f,g,h)     S_dumpuntil(aTHX_ a,b,c,d,e,f,g,h)
-#define put_charclass_bitmap_innards(a,b,c,d,e)        
S_put_charclass_bitmap_innards(aTHX_ a,b,c,d,e)
+#define put_charclass_bitmap_innards(a,b,c,d,e,f)      
S_put_charclass_bitmap_innards(aTHX_ a,b,c,d,e,f)
 #define put_charclass_bitmap_innards_common(a,b,c,d,e,f)       
S_put_charclass_bitmap_innards_common(aTHX_ a,b,c,d,e,f)
 #define put_charclass_bitmap_innards_invlist(a,b)      
S_put_charclass_bitmap_innards_invlist(aTHX_ a,b)
 #define put_code_point(a,b)    S_put_code_point(aTHX_ a,b)
diff --git a/proto.h b/proto.h
index b03d2ab..5900122 100644
--- a/proto.h
+++ b/proto.h
@@ -3979,7 +3979,7 @@ STATIC void       S_dump_trie_interim_table(pTHX_ const 
struct _reg_trie_data *trie, H
 STATIC const regnode*  S_dumpuntil(pTHX_ const regexp *r, const regnode 
*start, const regnode *node, const regnode *last, const regnode *plast, SV* sv, 
I32 indent, U32 depth);
 #define PERL_ARGS_ASSERT_DUMPUNTIL     \
        assert(r); assert(start); assert(node); assert(sv)
-STATIC bool    S_put_charclass_bitmap_innards(pTHX_ SV* sv, char* bitmap, SV* 
nonbitmap_invlist, SV* only_utf8_locale_invlist, const regnode * const node);
+STATIC bool    S_put_charclass_bitmap_innards(pTHX_ SV* sv, char* bitmap, SV* 
nonbitmap_invlist, SV* only_utf8_locale_invlist, const regnode * const node, 
const bool force_as_is_display);
 #define PERL_ARGS_ASSERT_PUT_CHARCLASS_BITMAP_INNARDS  \
        assert(sv); assert(bitmap)
 STATIC SV*     S_put_charclass_bitmap_innards_common(pTHX_ SV* invlist, SV* 
posixes, SV* only_utf8, SV* not_utf8, SV* only_utf8_locale, const bool invert);
@@ -5170,7 +5170,7 @@ PERL_STATIC_INLINE I32    S_regcurly(const char *s)
 
 #endif
 #if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_UTF8_C)
-PERL_CALLCONV SV*      Perl__add_range_to_invlist(pTHX_ SV* invlist, const UV 
start, const UV end)
+PERL_CALLCONV SV*      Perl__add_range_to_invlist(pTHX_ SV* invlist, UV start, 
UV end)
                        __attribute__warn_unused_result__;
 
 /* PERL_CALLCONV void  _invlist_intersection(pTHX_ SV* const a, SV* const b, 
SV** i); */
diff --git a/regcomp.c b/regcomp.c
index 91e1c9a..33d5f7a 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -1283,8 +1283,8 @@ S_ssc_anything(pTHX_ regnode_ssc *ssc)
 
     assert(is_ANYOF_SYNTHETIC(ssc));
 
-    ssc->invlist = sv_2mortal(_new_invlist(2)); /* mortalize so won't leak */
-    _append_range_to_invlist(ssc->invlist, 0, UV_MAX);
+    /* mortalize so won't leak */
+    ssc->invlist = sv_2mortal(_add_range_to_invlist(NULL, 0, UV_MAX));
     ANYOF_FLAGS(ssc) |= SSC_MATCHES_EMPTY_STRING;  /* Plus matches empty */
 }
 
@@ -8324,33 +8324,47 @@ S_reg_scan_name(pTHX_ RExC_state_t *pRExC_state, U32 
flags)
  * as an SVt_INVLIST scalar.
  *
  * An inversion list for Unicode is an array of code points, sorted by ordinal
- * number.  The zeroth element is the first code point in the list.  The 1th
- * element is the first element beyond that not in the list.  In other words,
- * the first range is
- *  invlist[0]..(invlist[1]-1)
- * The other ranges follow.  Thus every element whose index is divisible by two
- * marks the beginning of a range that is in the list, and every element not
- * divisible by two marks the beginning of a range not in the list.  A single
- * element inversion list that contains the single code point N generally
- * consists of two elements
- *  invlist[0] == N
- *  invlist[1] == N+1
- * (The exception is when N is the highest representable value on the
- * machine, in which case the list containing just it would be a single
- * element, itself.  By extension, if the last range in the list extends to
- * infinity, then the first element of that range will be in the inversion list
- * at a position that is divisible by two, and is the final element in the
- * list.)
+ * number.  Each element gives the code point that begins a range that extends
+ * up-to but not including the code point given by the next element.  The final
+ * element gives the first code point of a range that extends to the platform's
+ * infinity.  The even-numbered elements (invlist[0], invlist[2], invlist[4],
+ * ...) give ranges whose code points are all in the inversion list.  We say
+ * that those ranges are in the set.  The odd-numbered elements give ranges
+ * whose code points are not in the inversion list, and hence not in the set.
+ * Thus, element [0] is the first code point in the list.  Element [1]
+ * is the first code point beyond that not in the list; and element [2] is the
+ * first code point beyond that that is in the list.  In other words, the first
+ * range is invlist[0]..(invlist[1]-1), and all code points in that range are
+ * in the inversion list.  The second range is invlist[1]..(invlist[2]-1), and
+ * all code points in that range are not in the inversion list.  The third
+ * range invlist[2]..(invlist[3]-1) gives code points that are in the inversion
+ * list, and so forth.  Thus every element whose index is divisible by two
+ * gives the beginning of a range that is in the list, and every element whose
+ * index is not divisible by two gives the beginning of a range not in the
+ * list.  If the final element's index is divisible by two, the inversion list
+ * extends to the platform's infinity; otherwise the highest code point in the
+ * inversion list is the contents of that element minus 1.
+ *
+ * A range that contains just a single code point N will look like
+ *  invlist[i]   == N
+ *  invlist[i+1] == N+1
+ *
+ * If N is UV_MAX (the highest representable code point on the machine), N+1 is
+ * impossible to represent, so element [i+1] is omitted.  The single element
+ * inversion list
+ *  invlist[0] == UV_MAX
+ * contains just UV_MAX, but is interpreted as matching to infinity.
+ *
  * Taking the complement (inverting) an inversion list is quite simple, if the
  * first element is 0, remove it; otherwise add a 0 element at the beginning.
  * This implementation reserves an element at the beginning of each inversion
  * list to always contain 0; there is an additional flag in the header which
  * indicates if the list begins at the 0, or is offset to begin at the next
- * element.
+ * element.  This means that the inversion list can be inverted without any
+ * copying; just flip the flag.
  *
  * More about inversion lists can be found in "Unicode Demystified"
  * Chapter 13 by Richard Gillam, published by Addison-Wesley.
- * More will be coming when functionality is added later.
  *
  * The inversion list data structure is currently implemented as an SV pointing
  * to an array of UVs that the SV thinks are bytes.  This allows us to have an
@@ -8671,7 +8685,7 @@ S__append_range_to_invlist(pTHX_ SV* const invlist,
 
        UV final_element = len - 1;
        array = invlist_array(invlist);
-       if (array[final_element] > start
+       if (   array[final_element] > start
            || ELEMENT_RANGE_MATCHES_INVLIST(final_element))
        {
            Perl_croak(aTHX_ "panic: attempting to append to an inversion list, 
but wasn't at the end of the list, final=%"UVuf", start=%"UVuf", match=%c",
@@ -8679,10 +8693,10 @@ S__append_range_to_invlist(pTHX_ SV* const invlist,
                     ELEMENT_RANGE_MATCHES_INVLIST(final_element) ? 't' : 'f');
        }
 
-       /* Here, it is a legal append.  If the new range begins with the first
-        * value not in the set, it is extending the set, so the new first
-        * value not in the set is one greater than the newly extended range.
-        * */
+        /* Here, it is a legal append.  If the new range begins 1 above the end
+         * of the range below it, it is extending the range below it, so the
+         * new first value not in the set is one greater than the newly
+         * extended range.  */
         offset = *get_invlist_offset_addr(invlist);
        if (array[final_element] == start) {
            if (end != UV_MAX) {
@@ -8690,7 +8704,8 @@ S__append_range_to_invlist(pTHX_ SV* const invlist,
            }
            else {
                /* But if the end is the maximum representable on the machine,
-                * just let the range that this would extend to have no end */
+                 * assume that infinity was actually what was meant.  Just let
+                 * the range that this would extend to have no end */
                invlist_set_len(invlist, len - 1, offset);
            }
            return;
@@ -8970,8 +8985,7 @@ Perl__invlist_union_maybe_complement_2nd(pTHX_ SV* const 
a, SV* const b,
          * It's easiest to create a new inversion list that matches everything.
          * */
         if (complement_b) {
-            SV* everything = _new_invlist(1);
-            _append_range_to_invlist(everything, 0, UV_MAX);
+            SV* everything = _add_range_to_invlist(NULL, 0, UV_MAX);
 
             /* If the output didn't exist, just point it at the new list */
             if (*output == NULL) {
@@ -9388,12 +9402,13 @@ Perl__invlist_intersection_maybe_complement_2nd(pTHX_ 
SV* const a, SV* const b,
        }
 
     }
+
     /* The loop above increments the index into exactly one of the input lists
      * each iteration, and ends when either index gets to its list end.  That
      * means the other index is lower than its end, and so something is
      * remaining in that one.  We increment 'count', as explained below, if the
-     * exhausted list was in its set.  (i_a and i_b each currently index the 
element
-     * beyond the one we care about.) */
+     * exhausted list was in its set.  (i_a and i_b each currently index the
+     * element beyond the one we care about.) */
     if (   (i_a == len_a && PREV_RANGE_MATCHES_INVLIST(i_a))
         || (i_b == len_b && PREV_RANGE_MATCHES_INVLIST(i_b)))
     {
@@ -9492,50 +9507,261 @@ Perl__invlist_intersection_maybe_complement_2nd(pTHX_ 
SV* const a, SV* const b,
 }
 
 SV*
-Perl__add_range_to_invlist(pTHX_ SV* invlist, const UV start, const UV end)
+Perl__add_range_to_invlist(pTHX_ SV* invlist, UV start, UV end)
 {
     /* Add the range from 'start' to 'end' inclusive to the inversion list's
      * set.  A pointer to the inversion list is returned.  This may actually be
      * a new list, in which case the passed in one has been destroyed.  The
      * passed-in inversion list can be NULL, in which case a new one is created
-     * with just the one range in it */
-
-    SV* range_invlist;
-    UV len;
-
+     * with just the one range in it.  The new list is not necessarily
+     * NUL-terminated.  Space is not freed if the inversion list shrinks as a
+     * result of this function.  The gain would not be large, and in many
+     * cases, this is called multiple times on a single inversion list, so
+     * anything freed may almost immediately be needed again.
+     *
+     * This used to mostly call the 'union' routine, but that is much more
+     * heavyweight than really needed for a single range addition */
+
+    UV* array;              /* The array implementing the inversion list */
+    UV len;                 /* How many elements in 'array' */
+    SSize_t i_s;            /* index into the invlist array where 'start'
+                               should go */
+    SSize_t i_e = 0;        /* And the index where 'end' should go */
+    UV cur_highest;         /* The highest code point in the inversion list
+                               upon entry to this function */
+
+    /* This range becomes the whole inversion list if none already existed */
     if (invlist == NULL) {
        invlist = _new_invlist(2);
-       len = 0;
+        _append_range_to_invlist(invlist, start, end);
+        return invlist;
     }
-    else {
-       len = _invlist_len(invlist);
+
+    /* Likewise, if the inversion list is currently empty */
+    len = _invlist_len(invlist);
+    if (len == 0) {
+        _append_range_to_invlist(invlist, start, end);
+        return invlist;
     }
 
-    /* If comes after the final entry actually in the list, can just append it
-     * to the end, */
-    if (len == 0
-       || (! ELEMENT_RANGE_MATCHES_INVLIST(len - 1)
-            && start >= invlist_array(invlist)[len - 1]))
-    {
-       _append_range_to_invlist(invlist, start, end);
-       return invlist;
+    /* Starting here, we have to know the internals of the list */
+    array = invlist_array(invlist);
+
+    /* If the new range ends higher than the current highest ... */
+    cur_highest = invlist_highest(invlist);
+    if (end > cur_highest) {
+
+        /* If the whole range is higher, we can just append it */
+        if (start > cur_highest) {
+            _append_range_to_invlist(invlist, start, end);
+            return invlist;
+        }
+
+        /* Otherwise, add the portion that is higher ... */
+        _append_range_to_invlist(invlist, cur_highest + 1, end);
+
+        /* ... and continue on below to handle the rest.  As a result of the
+         * above append, we know that the index of the end of the range is the
+         * final even numbered one of the array.  Recall that the final element
+         * always starts a range that extends to infinity.  If that range is in
+         * the set (meaning the set goes from here to infinity), it will be an
+         * even index, but if it isn't in the set, it's odd, and the final
+         * range in the set is one less, which is even. */
+        if (end == UV_MAX) {
+            i_e = len;
+        }
+        else {
+            i_e = len - 2;
+        }
+    }
+
+    /* We have dealt with appending, now see about prepending.  If the new
+     * range starts lower than the current lowest ... */
+    if (start < array[0]) {
+
+        /* Adding something which has 0 in it is somewhat tricky, and uncommon.
+         * Let the union code handle it, rather than having to know the
+         * trickiness in two code places.  */
+        if (UNLIKELY(start == 0)) {
+            SV* range_invlist;
+
+            range_invlist = _new_invlist(2);
+            _append_range_to_invlist(range_invlist, start, end);
+
+            _invlist_union(invlist, range_invlist, &invlist);
+
+            SvREFCNT_dec_NN(range_invlist);
+
+            return invlist;
+        }
+
+        /* If the whole new range comes before the first entry, and doesn't
+         * extend it, we have to insert it as an additional range */
+        if (end < array[0] - 1) {
+            i_s = i_e = -1;
+            goto splice_in_new_range;
+        }
+
+        /* Here the new range adjoins the existing first range, extending it
+         * downwards. */
+        array[0] = start;
+
+        /* And continue on below to handle the rest.  We know that the index of
+         * the beginning of the range is the first one of the array */
+        i_s = 0;
+    }
+    else { /* Not prepending any part of the new range to the existing list.
+            * Find where in the list it should go.  This finds i_s, such that:
+            *     invlist[i_s] <= start < array[i_s+1]
+            */
+        i_s = _invlist_search(invlist, start);
     }
 
-    /* Here, can't just append things, create and return a new inversion list
-     * which is the union of this range and the existing inversion list.  (If
-     * the new range is well-behaved wrt to the old one, we could just insert
-     * it, doing a Move() down on the tail of the old one (potentially growing
-     * it first).  But to determine that means we would have the extra
-     * (possibly throw-away) work of first finding where the new one goes and
-     * whether it disrupts (splits) an existing range, so it doesn't appear to
-     * me (khw) that it's worth it) */
-    range_invlist = _new_invlist(2);
-    _append_range_to_invlist(range_invlist, start, end);
+    /* At this point, any extending before the beginning of the inversion list
+     * and/or after the end has been done.  This has made it so that, in the
+     * code below, each endpoint of the new range is either in a range that is
+     * in the set, or is in a gap between two ranges that are.  This means we
+     * don't have to worry about exceeding the array bounds.
+     *
+     * Find where in the list the new range ends (but we can skip this if we
+     * have already determined what it is, or if it will be the same as i_s,
+     * which we already have computed) */
+    if (i_e == 0) {
+        i_e = (start == end)
+              ? i_s
+              : _invlist_search(invlist, end);
+    }
+
+    /* Here generally invlist[i_e] <= end < array[i_e+1].  But if invlist[i_e]
+     * is a range that goes to infinity there is no element at invlist[i_e+1],
+     * so only the first relation holds. */
 
-    _invlist_union(invlist, range_invlist, &invlist);
+    if ( ! ELEMENT_RANGE_MATCHES_INVLIST(i_s)) {
 
-    /* The temporary can be freed */
-    SvREFCNT_dec_NN(range_invlist);
+        /* Here, the ranges on either side of the beginning of the new range
+         * are in the set, and this range starts in the gap between them.
+         *
+         * The new range extends the range above it downwards if the new range
+         * ends at or above that range's start */
+        const bool extends_the_range_above = (   end == UV_MAX
+                                              || end + 1 >= array[i_s+1]);
+
+        /* The new range extends the range below it upwards if it begins just
+         * after where that range ends */
+        if (start == array[i_s]) {
+
+            /* If the new range fills the entire gap between the other ranges,
+             * they will get merged together.  Other ranges may also get
+             * merged, depending on how many of them the new range spans.  In
+             * the general case, we do the merge later, just once, after we
+             * figure out how many to merge.  But in the case where the new
+             * range exactly spans just this one gap (possibly extending into
+             * the one above), we do the merge here, and an early exit.  This
+             * is done here to avoid having to special case later. */
+            if (i_e - i_s <= 1) {
+
+                /* If i_e - i_s == 1, it means that the new range terminates
+                 * within the range above, and hence 'extends_the_range_above'
+                 * must be true.  (If the range above it extends to infinity,
+                 * 'i_s+2' will be above the array's limit, but 'len-i_s-2'
+                 * will be 0, so no harm done.) */
+                if (extends_the_range_above) {
+                    Move(array + i_s + 2, array + i_s, len - i_s - 2, UV);
+                    invlist_set_len(invlist,
+                                    len - 2,
+                                    *(get_invlist_offset_addr(invlist)));
+                    return invlist;
+                }
+
+                /* Here, i_e must == i_s.  We keep them in sync, as they apply
+                 * to the same range, and below we are about to decrement i_s
+                 * */
+                i_e--;
+            }
+
+            /* Here, the new range is adjacent to the one below.  (It may also
+             * span beyond the range above, but that will get resolved later.)
+             * Extend the range below to include this one. */
+            array[i_s] = (end == UV_MAX) ? UV_MAX : end + 1;
+            i_s--;
+            start = array[i_s];
+        }
+        else if (extends_the_range_above) {
+
+            /* Here the new range only extends the range above it, but not the
+             * one below.  It merges with the one above.  Again, we keep i_e
+             * and i_s in sync if they point to the same range */
+            if (i_e == i_s) {
+                i_e++;
+            }
+            i_s++;
+            array[i_s] = start;
+        }
+    }
+
+    /* Here, we've dealt with the new range start extending any adjoining
+     * existing ranges.
+     *
+     * If the new range extends to infinity, it is now the final one,
+     * regardless of what was there before */
+    if (UNLIKELY(end == UV_MAX)) {
+        invlist_set_len(invlist, i_s + 1, *(get_invlist_offset_addr(invlist)));
+        return invlist;
+    }
+
+    /* If i_e started as == i_s, it has also been dealt with,
+     * and been updated to the new i_s, which will fail the following if */
+    if (! ELEMENT_RANGE_MATCHES_INVLIST(i_e)) {
+
+        /* Here, the ranges on either side of the end of the new range are in
+         * the set, and this range ends in the gap between them.
+         *
+         * If this range is adjacent to (hence extends) the range above it, it
+         * becomes part of that range; likewise if it extends the range below,
+         * it becomes part of that range */
+        if (end + 1 == array[i_e+1]) {
+            i_e++;
+            array[i_e] = start;
+        }
+        else if (start <= array[i_e]) {
+            array[i_e] = end + 1;
+            i_e--;
+        }
+    }
+
+    if (i_s == i_e) {
+
+        /* If the range fits entirely in an existing range (as possibly already
+         * extended above), it doesn't add anything new */
+        if (ELEMENT_RANGE_MATCHES_INVLIST(i_s)) {
+            return invlist;
+        }
+
+        /* Here, no part of the range is in the list.  Must add it.  It will
+         * occupy 2 more slots */
+      splice_in_new_range:
+
+        invlist_extend(invlist, len + 2);
+        array = invlist_array(invlist);
+        /* Move the rest of the array down two slots. Don't include any
+         * trailing NUL */
+        Move(array + i_e + 1, array + i_e + 3, len - i_e - 1, UV);
+
+        /* Do the actual splice */
+        array[i_e+1] = start;
+        array[i_e+2] = end + 1;
+        invlist_set_len(invlist, len + 2, *(get_invlist_offset_addr(invlist)));
+        return invlist;
+    }
+
+    /* Here the new range crossed the boundaries of a pre-existing range.  The
+     * code above has adjusted things so that both ends are in ranges that are
+     * in the set.  This means everything in between must also be in the set.
+     * Just squash things together */
+    Move(array + i_e + 1, array + i_s + 1, len - i_e - 1, UV);
+    invlist_set_len(invlist,
+                    len - i_e + i_s,
+                    *(get_invlist_offset_addr(invlist)));
 
     return invlist;
 }
@@ -9561,7 +9787,7 @@ Perl__setup_canned_invlist(pTHX_ const STRLEN size, const 
UV element0,
 
     PERL_ARGS_ASSERT__SETUP_CANNED_INVLIST;
 
-    _append_range_to_invlist(invlist, element0, element0);
+    invlist = add_cp_to_invlist(invlist, element0);
     offset = *get_invlist_offset_addr(invlist);
 
     invlist_set_len(invlist, size, offset);
@@ -16303,9 +16529,9 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
U32 depth,
             else if (! SIZE_ONLY) {
 
                 /* Here, not in pass1 (in that pass we skip calculating the
-                 * contents of this class), and is /l, or is a POSIX class for
-                 * which /l doesn't matter (or is a Unicode property, which is
-                 * skipped here). */
+                 * contents of this class), and is not /l, or is a POSIX class
+                 * for which /l doesn't matter (or is a Unicode property, which
+                 * is skipped here). */
                 if (namedclass >= ANYOF_POSIXL_MAX) {  /* If a special class */
                     if (namedclass != ANYOF_UNIPROP) { /* UNIPROP = \p and \P 
*/
 
@@ -16330,9 +16556,9 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, 
U32 depth,
                                 &cp_list);
                     }
                 }
-                else if (UNI_SEMANTICS
+                else if (  UNI_SEMANTICS
                         || classnum == _CC_ASCII
-                        || (DEPENDS_SEMANTICS && (classnum == _CC_DIGIT
+                        || (DEPENDS_SEMANTICS && (   classnum == _CC_DIGIT
                                                   || classnum == _CC_XDIGIT)))
                 {
                     /* We usually have to worry about /d and /a affecting what
@@ -17129,76 +17355,156 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
        SvREFCNT_dec_NN(cp_foldable_list);
     }
 
-    /* And combine the result (if any) with any inversion list from posix
+    /* And combine the result (if any) with any inversion lists from posix
      * classes.  The lists are kept separate up to now because we don't want to
      * fold the classes (folding of those is automatically handled by the swash
      * fetching code) */
-    if (simple_posixes) {
-        _invlist_union(cp_list, simple_posixes, &cp_list);
-        SvREFCNT_dec_NN(simple_posixes);
+    if (simple_posixes) {   /* These are the classes known to be unaffected by
+                               /a, /aa, and /d */
+        if (cp_list) {
+            _invlist_union(cp_list, simple_posixes, &cp_list);
+            SvREFCNT_dec_NN(simple_posixes);
+        }
+        else {
+            cp_list = simple_posixes;
+        }
     }
     if (posixes || nposixes) {
-        if (posixes && AT_LEAST_ASCII_RESTRICTED) {
+
+        /* We have to adjust /a and /aa */
+        if (AT_LEAST_ASCII_RESTRICTED) {
+
             /* Under /a and /aa, nothing above ASCII matches these */
-            _invlist_intersection(posixes,
-                                  PL_XPosix_ptrs[_CC_ASCII],
-                                  &posixes);
-        }
-        if (nposixes) {
-            if (DEPENDS_SEMANTICS) {
-                /* Under /d, everything in the upper half of the Latin1 range
-                 * matches these complements */
-                ANYOF_FLAGS(ret) |= 
ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER;
+            if (posixes) {
+                _invlist_intersection(posixes,
+                                    PL_XPosix_ptrs[_CC_ASCII],
+                                    &posixes);
             }
-            else if (AT_LEAST_ASCII_RESTRICTED) {
-                /* Under /a and /aa, everything above ASCII matches these
-                 * complements */
+
+            /* Under /a and /aa, everything above ASCII matches these
+             * complements */
+            if (nposixes) {
                 _invlist_union_complement_2nd(nposixes,
                                               PL_XPosix_ptrs[_CC_ASCII],
                                               &nposixes);
             }
-            if (posixes) {
-                _invlist_union(posixes, nposixes, &posixes);
-                SvREFCNT_dec_NN(nposixes);
-            }
-            else {
-                posixes = nposixes;
-            }
         }
+
         if (! DEPENDS_SEMANTICS) {
-            if (cp_list) {
-                _invlist_union(cp_list, posixes, &cp_list);
-                SvREFCNT_dec_NN(posixes);
+
+            /* For everything but /d, we can just add the current 'posixes' and
+             * 'nposixes' to the main list */
+            if (posixes) {
+                if (cp_list) {
+                    _invlist_union(cp_list, posixes, &cp_list);
+                    SvREFCNT_dec_NN(posixes);
+                }
+                else {
+                    cp_list = posixes;
+                }
             }
-            else {
-                cp_list = posixes;
+            if (nposixes) {
+                if (cp_list) {
+                    _invlist_union(cp_list, nposixes, &cp_list);
+                    SvREFCNT_dec_NN(nposixes);
+                }
+                else {
+                    cp_list = nposixes;
+                }
             }
         }
         else {
-            /* Under /d, we put into a separate list the Latin1 things that
-             * match only when the target string is utf8 */
-            SV* nonascii_but_latin1_properties = NULL;
-            _invlist_intersection(posixes, PL_UpperLatin1,
-                                  &nonascii_but_latin1_properties);
-            _invlist_subtract(posixes, nonascii_but_latin1_properties,
-                              &posixes);
-            if (cp_list) {
-                _invlist_union(cp_list, posixes, &cp_list);
-                SvREFCNT_dec_NN(posixes);
-            }
-            else {
-                cp_list = posixes;
-            }
+            /* Under /d, things like \w match upper Latin1 characters only if
+             * the target string is in UTF-8.  But things like \W match all the
+             * upper Latin1 characters if the target string is not in UTF-8.
+             *
+             * Handle the case where there something like \W separately */
+            if (nposixes) {
+                SV* only_non_utf8_list = invlist_clone(PL_UpperLatin1);
+
+                /* A complemented posix class matches all upper Latin1
+                 * characters if not in UTF-8.  And it matches just certain
+                 * ones when in UTF-8.  That means those certain ones are
+                 * matched regardless, so can just be added to the
+                 * unconditional list */
+                if (cp_list) {
+                    _invlist_union(cp_list, nposixes, &cp_list);
+                    SvREFCNT_dec_NN(nposixes);
+                    nposixes = NULL;
+                }
+                else {
+                    cp_list = nposixes;
+                }
+
+                /* Likewise for 'posixes' */
+                _invlist_union(posixes, cp_list, &cp_list);
+
+                /* Likewise for anything else in the range that matched only
+                 * under UTF-8 */
+                if (has_upper_latin1_only_utf8_matches) {
+                    _invlist_union(cp_list,
+                                   has_upper_latin1_only_utf8_matches,
+                                   &cp_list);
+                    SvREFCNT_dec_NN(has_upper_latin1_only_utf8_matches);
+                    has_upper_latin1_only_utf8_matches = NULL;
+                }
 
-            if (has_upper_latin1_only_utf8_matches) {
-                _invlist_union(has_upper_latin1_only_utf8_matches,
-                               nonascii_but_latin1_properties,
-                               &has_upper_latin1_only_utf8_matches);
-                SvREFCNT_dec_NN(nonascii_but_latin1_properties);
+                /* If we don't match all the upper Latin1 characters regardless
+                 * of UTF-8ness, we have to set a flag to match the rest when
+                 * not in UTF-8 */
+                _invlist_subtract(only_non_utf8_list, cp_list,
+                                  &only_non_utf8_list);
+                if (_invlist_len(only_non_utf8_list) != 0) {
+                    ANYOF_FLAGS(ret) |= 
ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER;
+                }
             }
             else {
-                has_upper_latin1_only_utf8_matches
-                                            = nonascii_but_latin1_properties;
+                /* Here there were no complemented posix classes.  That means
+                 * the upper Latin1 characters in 'posixes' match only when the
+                 * target string is in UTF-8.  So we have to add them to the
+                 * list of those types of code points, while adding the
+                 * remainder to the unconditional list.
+                 *
+                 * First calculate what they are */
+                SV* nonascii_but_latin1_properties = NULL;
+                _invlist_intersection(posixes, PL_UpperLatin1,
+                                      &nonascii_but_latin1_properties);
+
+                /* And add them to the final list of such characters. */
+                if (has_upper_latin1_only_utf8_matches) {
+                    _invlist_union(has_upper_latin1_only_utf8_matches,
+                                   nonascii_but_latin1_properties,
+                                   &has_upper_latin1_only_utf8_matches);
+                    SvREFCNT_dec_NN(nonascii_but_latin1_properties);
+                }
+                else {
+                    has_upper_latin1_only_utf8_matches
+                                                = 
nonascii_but_latin1_properties;
+                }
+
+                /* Remove them from what now becomes the unconditional list */
+                _invlist_subtract(posixes, nonascii_but_latin1_properties,
+                                  &posixes);
+
+                /* And the remainder are the unconditional ones */
+                if (cp_list) {
+                    _invlist_union(cp_list, posixes, &cp_list);
+                    SvREFCNT_dec_NN(posixes);
+                    posixes = NULL;
+                }
+                else {
+                    cp_list = posixes;
+                }
+
+                /* Get rid of any characters that we now know are matched
+                 * unconditionally from the conditional list */
+                _invlist_subtract(has_upper_latin1_only_utf8_matches,
+                                  cp_list,
+                                  &has_upper_latin1_only_utf8_matches);
+                if (_invlist_len(has_upper_latin1_only_utf8_matches) == 0) {
+                    SvREFCNT_dec_NN(has_upper_latin1_only_utf8_matches);
+                    has_upper_latin1_only_utf8_matches = NULL;
+                }
             }
         }
     }
@@ -17288,79 +17594,14 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, I32 
*flagp, U32 depth,
             invlist_iterfinish(cp_list);
         }
     }
-
-#define MATCHES_ALL_NON_UTF8_NON_ASCII(ret)                                 \
-    (   DEPENDS_SEMANTICS                                                   \
-     && (ANYOF_FLAGS(ret)                                                   \
-        & ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER))
-
-    /* See if we can simplify things under /d */
-    if (   has_upper_latin1_only_utf8_matches
-        || MATCHES_ALL_NON_UTF8_NON_ASCII(ret))
+    else if (   DEPENDS_SEMANTICS
+             && (    has_upper_latin1_only_utf8_matches
+                 || (ANYOF_FLAGS(ret) & 
ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER)))
     {
-        /* But not if we are inverting, as that screws it up */
-        if (! invert) {
-            if (has_upper_latin1_only_utf8_matches) {
-                if (MATCHES_ALL_NON_UTF8_NON_ASCII(ret)) {
-
-                    /* Here, we have both the flag and inversion list.  Any
-                     * character in 'has_upper_latin1_only_utf8_matches'
-                     * matches when UTF-8 is in effect, but it also matches
-                     * when UTF-8 is not in effect because of
-                     * MATCHES_ALL_NON_UTF8_NON_ASCII.  Therefore it matches
-                     * unconditionally, so can be added to the regular list,
-                     * and 'has_upper_latin1_only_utf8_matches' cleared */
-                    _invlist_union(cp_list,
-                                   has_upper_latin1_only_utf8_matches,
-                                   &cp_list);
-                    SvREFCNT_dec_NN(has_upper_latin1_only_utf8_matches);
-                    has_upper_latin1_only_utf8_matches = NULL;
-                }
-                else if (cp_list) {
-
-                    /* Here, 'cp_list' gives chars that always match, and
-                     * 'has_upper_latin1_only_utf8_matches' gives chars that
-                     * were specified to match only if the target string is in
-                     * UTF-8.  It may be that these overlap, so we can subtract
-                     * the unconditionally matching from the conditional ones,
-                     * to make the conditional list as small as possible,
-                     * perhaps even clearing it, in which case more
-                     * optimizations are possible later */
-                    _invlist_subtract(has_upper_latin1_only_utf8_matches,
-                                      cp_list,
-                                      &has_upper_latin1_only_utf8_matches);
-                    if (_invlist_len(has_upper_latin1_only_utf8_matches) == 0) 
{
-                        SvREFCNT_dec_NN(has_upper_latin1_only_utf8_matches);
-                        has_upper_latin1_only_utf8_matches = NULL;
-                    }
-                }
-            }
-
-            /* Similarly, if the unconditional matches include every upper
-             * latin1 character, we can clear that flag to permit later
-             * optimizations */
-            if (cp_list && MATCHES_ALL_NON_UTF8_NON_ASCII(ret)) {
-                SV* only_non_utf8_list = invlist_clone(PL_UpperLatin1);
-                _invlist_subtract(only_non_utf8_list, cp_list,
-                                  &only_non_utf8_list);
-                if (_invlist_len(only_non_utf8_list) == 0) {
-                    ANYOF_FLAGS(ret) &= 
~ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER;
-                }
-                SvREFCNT_dec_NN(only_non_utf8_list);
-                only_non_utf8_list = NULL;;
-            }
-        }
-
-        /* If we haven't gotten rid of all conditional matching, we change the
-         * regnode type to indicate that */
-        if (   has_upper_latin1_only_utf8_matches
-            || MATCHES_ALL_NON_UTF8_NON_ASCII(ret))
-        {
-            OP(ret) = ANYOFD;
-            optimizable = FALSE;
-        }
+        OP(ret) = ANYOFD;
+        optimizable = FALSE;
     }
-#undef MATCHES_ALL_NON_UTF8_NON_ASCII
+
 
     /* Optimize inverted simple patterns (e.g. [^a-z]) when everything is known
      * at compile time.  Besides not inverting folded locale now, we can't
@@ -18699,7 +18940,8 @@ Perl_regprop(pTHX_ const regexp *prog, SV *sv, const 
regnode *o, const regmatch_
                                                  : TRIE_BITMAP(trie)),
                                                 NULL,
                                                 NULL,
-                                                NULL
+                                                NULL,
+                                                FALSE
                                                );
             sv_catpvs(sv, "]");
         }
@@ -18798,6 +19040,8 @@ Perl_regprop(pTHX_ const regexp *prog, SV *sv, const 
regnode *o, const regmatch_
         /* And things that aren't in the bitmap, but are small enough to be */
         SV* bitmap_range_not_in_bitmap = NULL;
 
+        const bool inverted = flags & ANYOF_INVERT;
+
        if (OP(o) == ANYOFL) {
             if (ANYOFL_UTF8_LOCALE_REQD(flags)) {
                 sv_catpvs(sv, "{utf8-locale-reqd}");
@@ -18842,21 +19086,37 @@ Perl_regprop(pTHX_ const regexp *prog, SV *sv, const 
regnode *o, const regmatch_
                                               ANYOF_BITMAP(o),
                                               bitmap_range_not_in_bitmap,
                                               only_utf8_locale_invlist,
-                                              o);
+                                              o,
+
+                                              /* Can't try inverting for a
+                                               * better display if there are
+                                               * things that haven't been
+                                               * resolved */
+                                              unresolved != NULL);
         SvREFCNT_dec(bitmap_range_not_in_bitmap);
 
         /* If there are user-defined properties which haven't been defined yet,
-         * output them, in a separate [] from the bitmap range stuff */
+         * output them.  If the result is not to be inverted, it is clearest to
+         * output them in a separate [] from the bitmap range stuff.  If the
+         * result is to be complemented, we have to show everything in one [],
+         * as the inversion applies to the whole thing.  Use {braces} to
+         * separate them from anything in the bitmap and anything above the
+         * bitmap. */
         if (unresolved) {
-            if (do_sep) {
-                Perl_sv_catpvf(aTHX_ sv,"%s][%s",PL_colors[1],PL_colors[0]);
+            if (inverted) {
+                if (! do_sep) { /* If didn't output anything in the bitmap */
+                    sv_catpvs(sv, "^");
+                }
+                sv_catpvs(sv, "{");
             }
-            if (flags & ANYOF_INVERT) {
-                sv_catpvs(sv, "^");
+            else if (do_sep) {
+                Perl_sv_catpvf(aTHX_ sv,"%s][%s",PL_colors[1],PL_colors[0]);
             }
             sv_catsv(sv, unresolved);
-            do_sep = TRUE;
-            SvREFCNT_dec_NN(unresolved);
+            if (inverted) {
+                sv_catpvs(sv, "}");
+            }
+            do_sep = ! inverted;
         }
 
         /* And, finally, add the above-the-bitmap stuff */
@@ -18873,9 +19133,11 @@ Perl_regprop(pTHX_ const regexp *prog, SV *sv, const 
regnode *o, const regmatch_
                 Perl_sv_catpvf(aTHX_ sv,"%s][%s",PL_colors[1],PL_colors[0]);
             }
 
-            /* And, for easy of understanding, it is always output not-shown as
-             * complemented */
-            if (flags & ANYOF_INVERT) {
+            /* And, for easy of understanding, it is shown in the
+             * uncomplemented form if possible.  The one exception being if
+             * there are unresolved items, where the inversion has to be
+             * delayed until runtime */
+            if (inverted && ! unresolved) {
                 _invlist_invert(nonbitmap_invlist);
                 _invlist_subtract(nonbitmap_invlist, PL_InBitmap, 
&nonbitmap_invlist);
             }
@@ -18912,6 +19174,8 @@ Perl_regprop(pTHX_ const regexp *prog, SV *sv, const 
regnode *o, const regmatch_
 
         /* And finally the matching, closing ']' */
        Perl_sv_catpvf(aTHX_ sv, "%s]", PL_colors[1]);
+
+        SvREFCNT_dec(unresolved);
     }
     else if (k == POSIXD || k == NPOSIXD) {
         U8 index = FLAGS(o) * 2;
@@ -19725,30 +19989,36 @@ S_put_range(pTHX_ SV *sv, UV start, const UV end, 
const bool allow_literals)
          * mnemonic names.  Split off any of those at the beginning and end of
          * the range to print mnemonically.  It isn't possible for many of
          * these to be in a row, so this won't overwhelm with output */
-        while (isMNEMONIC_CNTRL(start) && start <= end) {
-            put_code_point(sv, start);
-            start++;
-        }
-        if (start < end && isMNEMONIC_CNTRL(end)) {
-
-            /* Here, the final character in the range has a mnemonic name.
-             * Work backwards from the end to find the final non-mnemonic */
-            UV temp_end = end - 1;
-            while (isMNEMONIC_CNTRL(temp_end)) {
-                temp_end--;
+        if (   start <= end
+            && (isMNEMONIC_CNTRL(start) || isMNEMONIC_CNTRL(end)))
+        {
+            while (isMNEMONIC_CNTRL(start) && start <= end) {
+                put_code_point(sv, start);
+                start++;
             }
 
-            /* And separately output the interior range that doesn't start or
-             * end with mnemonics */
-            put_range(sv, start, temp_end, FALSE);
+            /* If this didn't take care of the whole range ... */
+            if (start <= end) {
 
-            /* Then output the mnemonic trailing controls */
-            start = temp_end + 1;
-            while (start <= end) {
-                put_code_point(sv, start);
-                start++;
+                /* Look backwards from the end to find the final non-mnemonic
+                 * */
+                UV temp_end = end;
+                while (isMNEMONIC_CNTRL(temp_end)) {
+                    temp_end--;
+                }
+
+                /* And separately output the interior range that doesn't start
+                 * or end with mnemonics */
+                put_range(sv, start, temp_end, FALSE);
+
+                /* Then output the mnemonic trailing controls */
+                start = temp_end + 1;
+                while (start <= end) {
+                    put_code_point(sv, start);
+                    start++;
+                }
+                break;
             }
-            break;
         }
 
         /* As a final resort, output the range or subrange as hex. */
@@ -19836,7 +20106,9 @@ S_put_charclass_bitmap_innards_common(pTHX_
 )
 {
     /* Create and return an SV containing a displayable version of the bitmap
-     * and associated information determined by the input parameters. */
+     * and associated information determined by the input parameters.  If the
+     * output would have been only the inversion indicator '^', NULL is instead
+     * returned. */
 
     SV * output;
 
@@ -19895,9 +20167,8 @@ S_put_charclass_bitmap_innards_common(pTHX_
         }
     }
 
-    /* If the only thing we output is the '^', clear it */
     if (invert && SvCUR(output) == 1) {
-        SvCUR_set(output, 0);
+        return NULL;
     }
 
     return output;
@@ -19908,7 +20179,8 @@ S_put_charclass_bitmap_innards(pTHX_ SV *sv,
                                      char *bitmap,
                                      SV *nonbitmap_invlist,
                                      SV *only_utf8_locale_invlist,
-                                     const regnode * const node)
+                                     const regnode * const node,
+                                     const bool force_as_is_display)
 {
     /* Appends to 'sv' a displayable version of the innards of the bracketed
      * character class defined by the other arguments:
@@ -19924,13 +20196,16 @@ S_put_charclass_bitmap_innards(pTHX_ SV *sv,
      *  'node' is the regex pattern node.  It is needed only when the above two
      *      parameters are not null, and is passed so that this routine can
      *      tease apart the various reasons for them.
+     *  'force_as_is_display' is TRUE if this routine should definitely NOT try
+     *      to invert things to see if that leads to a cleaner display.  If
+     *      FALSE, this routine is free to use its judgment about doing this.
      *
      * It returns TRUE if there was actually something output.  (It may be that
      * the bitmap, etc is empty.)
      *
      * When called for outputting the bitmap of a non-ANYOF node, just pass the
-     * bitmap, with the succeeding parameters set to NULL.
-     *
+     * bitmap, with the succeeding parameters set to NULL, and the final one to
+     * FALSE.
      */
 
     /* In general, it tries to display the 'cleanest' representation of the
@@ -19938,7 +20213,7 @@ S_put_charclass_bitmap_innards(pTHX_ SV *sv,
      * whether the class itself is to be inverted.  However,  there are some
      * cases where it can't try inverting, as what actually matches isn't known
      * until runtime, and hence the inversion isn't either. */
-    bool inverting_allowed = TRUE;
+    bool inverting_allowed = ! force_as_is_display;
 
     int i;
     STRLEN orig_sv_cur = SvCUR(sv);
@@ -19954,7 +20229,7 @@ S_put_charclass_bitmap_innards(pTHX_ SV *sv,
                                        is UTF-8 */
 
     SV* as_is_display;      /* The output string when we take the inputs
-                              literally */
+                               literally */
     SV* inverted_display;   /* The output string when we invert the inputs */
 
     U8 flags = (node) ? ANYOF_FLAGS(node) : 0;
@@ -20067,7 +20342,10 @@ S_put_charclass_bitmap_innards(pTHX_ SV *sv,
 
     /* If have to take the output as-is, just do that */
     if (! inverting_allowed) {
-        sv_catsv(sv, as_is_display);
+        if (as_is_display) {
+            sv_catsv(sv, as_is_display);
+            SvREFCNT_dec_NN(as_is_display);
+        }
     }
     else { /* But otherwise, create the output again on the inverted input, and
               use whichever version is shorter */
@@ -20103,10 +20381,13 @@ S_put_charclass_bitmap_innards(pTHX_ SV *sv,
             _invlist_invert(only_utf8);
             _invlist_intersection(only_utf8, PL_UpperLatin1, &only_utf8);
         }
+        else if (not_utf8) {
 
-        if (not_utf8) {
-            _invlist_invert(not_utf8);
-            _invlist_intersection(not_utf8, PL_UpperLatin1, &not_utf8);
+            /* If a code point matches iff the target string is not in UTF-8,
+             * then complementing the result has it not match iff not in UTF-8,
+             * which is the same thing as matching iff it is UTF-8. */
+            only_utf8 = not_utf8;
+            not_utf8 = NULL;
         }
 
         if (only_utf8_locale) {
@@ -20125,17 +20406,19 @@ S_put_charclass_bitmap_innards(pTHX_ SV *sv,
 
         /* Use the shortest representation, taking into account our bias
          * against showing it inverted */
-        if (SvCUR(inverted_display) + inverted_bias
-            < SvCUR(as_is_display) + as_is_bias)
+        if (   inverted_display
+            && (   ! as_is_display
+                || (  SvCUR(inverted_display) + inverted_bias
+                    < SvCUR(as_is_display)    + as_is_bias)))
         {
            sv_catsv(sv, inverted_display);
         }
-        else {
+        else if (as_is_display) {
            sv_catsv(sv, as_is_display);
         }
 
-        SvREFCNT_dec_NN(as_is_display);
-        SvREFCNT_dec_NN(inverted_display);
+        SvREFCNT_dec(as_is_display);
+        SvREFCNT_dec(inverted_display);
     }
 
     SvREFCNT_dec_NN(invlist);
diff --git a/regcomp.h b/regcomp.h
index 79c2853..ec0c9f8 100644
--- a/regcomp.h
+++ b/regcomp.h
@@ -522,7 +522,7 @@ struct regnode_ssc {
                |ANYOFL_SHARED_UTF8_LOCALE_fold_HAS_MATCHES_nonfold_REQD))   \
              == ANYOFL_SHARED_UTF8_LOCALE_fold_HAS_MATCHES_nonfold_REQD)
 
-/* Spare:                                       0x10 */
+/* Spare: Be sure to change ANYOF_FLAGS_ALL if this gets used  0x10 */
 
 /* If set, the node matches every code point NUM_ANYOF_CODE_POINTS and above.
  * Can be in an SSC */
diff --git a/t/re/anyof.t b/t/re/anyof.t
new file mode 100644
index 0000000..978ec73
--- /dev/null
+++ b/t/re/anyof.t
@@ -0,0 +1,368 @@
+use utf8;
+
+# This tests that the ANYOF nodes generated by bracketed character classes are
+# as expected.  The representation of these is not guaranteed, and this test
+# may need to be updated if it changes.  But it is here to make sure that no
+# unexpected changes occur.  These could come from faulty generation of the
+# node, or faulty display of them (or both).  Because these causes come from
+# very different parts of the regex compiler, it is unlikely that a commit
+# would change both of them, so this test will adequately serve to test both.
+
+BEGIN {
+    chdir 't' if -d 't';
+    @INC = ('../lib','.','../ext/re');
+    require Config; import Config;
+    require './test.pl';
+    skip_all('no re module') unless defined &DynaLoader::boot_DynaLoader;
+}
+
+# An array is used instead of a hash, so that the tests are carried out in the
+# order given by this file.  Even-numbered indices are the regexes to compile.
+# The next higher element is the expected compilation result.
+#
+# It is painful to port some of these to EBCDIC, as not only do the code point
+# numbers change (for those < 256), but the order changes, as the compiled
+# version is sorted by native code point order.  On EBCDIC, \r comes before
+# \n, and 'k' before "K', for example.  So, the tests where there are
+# differences are skipped on EBCDIC.  They are all at the beginning of the
+# array, and a special marker entry is used to delmit the boundary between
+# skipped and not skipped.
+
+my @tests = (
+    '[[{]' => 'ANYOF[[\{]',
+    '[^\S ]' => 'ANYOFD[\t\n\x0B\f\r{utf8}\x85\xA0][1680 2000-200A 2028-2029 
202F 205F 3000]',
+    '[^\n\r]' => 'ANYOF[^\n\r][0100-INFINITY]',
+    '[^\/\|,\$\%%\@\ \%"\<\>\:\#\&\*\{\}\[\]\(\)]' => 'ANYOF[^ 
"#$%&()*,/:<>@[\]\{|\}][0100-INFINITY]',
+    '[^[:^print:][:^ascii:]]' => 'ANYOF[\x20-\x7E]',
+    '[ [:blank:]]' => 'ANYOFD[\t {utf8}\xA0][1680 2000-200A 202F 205F 3000]',
+    '[_[:^blank:]]' => 'ANYOFD[^\t {utf8}\xA0][0100-167F 1681-1FFF 200B-202E 
2030-205E 2060-2FFF 3001-INFINITY]',
+    '[\xA0[:^blank:]]' => 'ANYOF[^\t ][0100-167F 1681-1FFF 200B-202E 2030-205E 
2060-2FFF 3001-INFINITY]',
+    '[ [:blank:]]' => 'ANYOFD[\t {utf8}\xA0][1680 2000-200A 202F 205F 3000]',
+    '[_[:^blank:]]' => 'ANYOFD[^\t {utf8}\xA0][0100-167F 1681-1FFF 200B-202E 
2030-205E 2060-2FFF 3001-INFINITY]',
+    '[\xA0[:^blank:]]' => 'ANYOF[^\t ][0100-167F 1681-1FFF 200B-202E 2030-205E 
2060-2FFF 3001-INFINITY]',
+    '(?d:[_[:^blank:]])' => 'ANYOFD[^\t {utf8}\xA0][0100-167F 1681-1FFF 
200B-202E 2030-205E 2060-2FFF 3001-INFINITY]',
+    '[\x{07}-\x{0B}]' => 'ANYOF[\a\b\t\n\x0B]',
+    '(?il:[\x{212A}])' => 'ANYOFL{i}[{utf8 locale}Kk][212A]',
+    '(?il:(?[\x{212A}]))' => 'ANYOFL{utf8-locale-reqd}[Kk][212A]',
+
+    'ebcdic_ok_below_this_marker',
+
+    '(?l:[\x{212A}])' => 'ANYOFL[212A]',
+    '(?l:[\s\x{212A}])' => 'ANYOFL[\s][1680 2000-200A 2028-2029 202F 205F 212A 
3000]',
+    '(?l:[^\S\x{202F}])' => 'ANYOFL[^\\S][1680 2000-200A 2028-2029 205F 3000]',
+    '(?i:[^:])' => 'ANYOF[^:][0100-INFINITY]',
+    '[\p{Any}]' => 'ANYOF[\x00-\xFF][0100-10FFFF]',
+    '[\p{IsMyRuntimeProperty}]' => 'ANYOF[+utf8::IsMyRuntimeProperty]',
+    '[^\p{IsMyRuntimeProperty}]' => 'ANYOF[^{+utf8::IsMyRuntimeProperty}]',
+    '[a\p{IsMyRuntimeProperty}]' => 'ANYOF[a][+utf8::IsMyRuntimeProperty]',
+    '[^a\p{IsMyRuntimeProperty}]' => 'ANYOF[^a{+utf8::IsMyRuntimeProperty}]',
+    '[^a\x{100}\p{IsMyRuntimeProperty}]' => 
'ANYOF[^a{+utf8::IsMyRuntimeProperty}0100]',
+    '[{INFINITY_minus_1}]' => 'ANYOF[INFINITY_minus_1]',
+    '[{INFINITY}]' => 'ANYOF[INFINITY-INFINITY]',
+    '[\x{102}\x{104}]' => 'ANYOF[0102 0104]',
+    '[\x{104}\x{102}]' => 'ANYOF[0102 0104]',
+    '[\x{103}\x{102}]' => 'ANYOF[0102-0103]',
+    '[\x{00}-{INFINITY_minus_1}]' => 'ANYOF[\x00-\xFF][0100-INFINITY_minus_1]',
+    '[\x{00}-{INFINITY}]' => 'SANY',
+    '[\x{101}-{INFINITY_minus_1}]' => 'ANYOF[0101-INFINITY_minus_1]',
+    '[\x{101}-{INFINITY}]' => 'ANYOF[0101-INFINITY]',
+    '[\x{104}\x{102}\x{103}]' => 'ANYOF[0102-0104]',
+    '[\x{102}-\x{104}\x{101}]' => 'ANYOF[0101-0104]',
+    '[\x{102}-\x{104}\x{102}]' => 'ANYOF[0102-0104]',
+    '[\x{102}-\x{104}\x{103}]' => 'ANYOF[0102-0104]',
+    '[\x{102}-\x{104}\x{104}]' => 'ANYOF[0102-0104]',
+    '[\x{102}-\x{104}\x{105}]' => 'ANYOF[0102-0105]',
+    '[\x{102}-\x{104}\x{106}]' => 'ANYOF[0102-0104 0106]',
+    '[\x{102}-\x{104}{INFINITY_minus_1}]' => 'ANYOF[0102-0104 
INFINITY_minus_1]',
+    '[\x{102}-\x{104}{INFINITY}]' => 'ANYOF[0102-0104 INFINITY-INFINITY]',
+    '[\x{102}-\x{104}\x{101}-{INFINITY_minus_1}]' => 
'ANYOF[0101-INFINITY_minus_1]',
+    '[\x{102}-\x{104}\x{101}-{INFINITY}]' => 'ANYOF[0101-INFINITY]',
+    '[\x{102}-\x{104}\x{102}-{INFINITY_minus_1}]' => 
'ANYOF[0102-INFINITY_minus_1]',
+    '[\x{102}-\x{104}\x{102}-{INFINITY}]' => 'ANYOF[0102-INFINITY]',
+    '[\x{102}-\x{104}\x{103}-{INFINITY_minus_1}]' => 
'ANYOF[0102-INFINITY_minus_1]',
+    '[\x{102}-\x{104}\x{103}-{INFINITY}]' => 'ANYOF[0102-INFINITY]',
+    '[\x{102}-\x{104}\x{104}-{INFINITY_minus_1}]' => 
'ANYOF[0102-INFINITY_minus_1]',
+    '[\x{102}-\x{104}\x{104}-{INFINITY}]' => 'ANYOF[0102-INFINITY]',
+    '[\x{102}-\x{104}\x{105}-{INFINITY_minus_1}]' => 
'ANYOF[0102-INFINITY_minus_1]',
+    '[\x{102}-\x{104}\x{105}-{INFINITY}]' => 'ANYOF[0102-INFINITY]',
+    '[\x{102}-\x{104}\x{106}-{INFINITY_minus_1}]' => 'ANYOF[0102-0104 
0106-INFINITY_minus_1]',
+    '[\x{102}-\x{104}\x{106}-{INFINITY}]' => 'ANYOF[0102-0104 0106-INFINITY]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{101}]' => 'ANYOF[0101-0104 0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{102}]' => 'ANYOF[0102-0104 0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{103}]' => 'ANYOF[0102-0104 0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{104}]' => 'ANYOF[0102-0104 0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{105}]' => 'ANYOF[0102-0105 0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{106}]' => 'ANYOF[0102-0104 0106 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{107}]' => 'ANYOF[0102-0104 0107-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{108}]' => 'ANYOF[0102-0104 0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{109}]' => 'ANYOF[0102-0104 0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{10A}]' => 'ANYOF[0102-0104 0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{10B}]' => 'ANYOF[0102-0104 0108-010B]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}{INFINITY_minus_1}]' => 'ANYOF[0102-0104 
0108-010A INFINITY_minus_1]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}{INFINITY}]' => 'ANYOF[0102-0104 0108-010A 
INFINITY-INFINITY]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{102}]' => 'ANYOF[0101-0104 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{103}]' => 'ANYOF[0101-0104 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{104}]' => 'ANYOF[0101-0104 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{105}]' => 'ANYOF[0101-0105 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{106}]' => 'ANYOF[0101-0106 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{107}]' => 'ANYOF[0101-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{108}]' => 'ANYOF[0101-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{109}]' => 'ANYOF[0101-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{10A}]' => 'ANYOF[0101-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{10B}]' => 'ANYOF[0101-010B]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{101}-{INFINITY_minus_1}]' => 
'ANYOF[0101-INFINITY_minus_1]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{101}-{INFINITY}]' => 
'ANYOF[0101-INFINITY]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{102}]' => 'ANYOF[0102-0104 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{103}]' => 'ANYOF[0102-0104 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{104}]' => 'ANYOF[0102-0104 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{105}]' => 'ANYOF[0102-0105 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{106}]' => 'ANYOF[0102-0106 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{107}]' => 'ANYOF[0102-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{108}]' => 'ANYOF[0102-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{109}]' => 'ANYOF[0102-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{10A}]' => 'ANYOF[0102-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{10B}]' => 'ANYOF[0102-010B]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{10C}]' => 'ANYOF[0102-010C]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{102}-{INFINITY_minus_1}]' => 
'ANYOF[0102-INFINITY_minus_1]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{102}-{INFINITY}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{103}]' => 'ANYOF[0102-0104 0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{104}]' => 'ANYOF[0102-0104 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{105}]' => 'ANYOF[0102-0105 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{106}]' => 'ANYOF[0102-0106 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{107}]' => 'ANYOF[0102-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{108}]' => 'ANYOF[0102-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{109}]' => 'ANYOF[0102-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{10A}]' => 'ANYOF[0102-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{10B}]' => 'ANYOF[0102-010B]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{10C}]' => 'ANYOF[0102-010C]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{103}-{INFINITY_minus_1}]' => 
'ANYOF[0102-INFINITY_minus_1]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{103}-{INFINITY}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{104}]' => 'ANYOF[0102-0104 0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{104}-\x{105}]' => 'ANYOF[0102-0105 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{104}-\x{106}]' => 'ANYOF[0102-0106 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{104}-\x{107}]' => 'ANYOF[0102-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{104}-\x{108}]' => 'ANYOF[0102-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{104}-\x{109}]' => 'ANYOF[0102-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{104}-\x{10A}]' => 'ANYOF[0102-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{104}-\x{10B}]' => 'ANYOF[0102-010B]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{104}-\x{10C}]' => 'ANYOF[0102-010C]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{104}-{INFINITY_minus_1}]' => 
'ANYOF[0102-INFINITY_minus_1]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{104}-{INFINITY}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{105}]' => 'ANYOF[0102-0105 0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{105}-\x{106}]' => 'ANYOF[0102-0106 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{105}-\x{107}]' => 'ANYOF[0102-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{105}-\x{108}]' => 'ANYOF[0102-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{105}-\x{109}]' => 'ANYOF[0102-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{105}-\x{10A}]' => 'ANYOF[0102-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{105}-\x{10B}]' => 'ANYOF[0102-010B]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{105}-\x{10C}]' => 'ANYOF[0102-010C]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{105}-{INFINITY_minus_1}]' => 
'ANYOF[0102-INFINITY_minus_1]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{105}-{INFINITY}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{106}]' => 'ANYOF[0102-0104 0106 
0108-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{106}-\x{107}]' => 'ANYOF[0102-0104 
0106-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{106}-\x{108}]' => 'ANYOF[0102-0104 
0106-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{106}-\x{109}]' => 'ANYOF[0102-0104 
0106-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{106}-\x{10A}]' => 'ANYOF[0102-0104 
0106-010A]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{106}-\x{10B}]' => 'ANYOF[0102-0104 
0106-010B]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{106}-\x{10C}]' => 'ANYOF[0102-0104 
0106-010C]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{106}-{INFINITY_minus_1}]' => 
'ANYOF[0102-0104 0106-INFINITY_minus_1]',
+    '[\x{102}-\x{104}\x{108}-\x{10A}\x{106}-{INFINITY}]' => 'ANYOF[0102-0104 
0106-INFINITY]',
+    '[\x{10C}-{INFINITY}{INFINITY_minus_1}]' => 'ANYOF[010C-INFINITY]',
+    '[\x{10C}-{INFINITY}{INFINITY}]' => 'ANYOF[010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}\x{104}]' => 'ANYOF[0102 0104 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{104}\x{102}]' => 'ANYOF[0102 0104 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{103}\x{102}]' => 'ANYOF[0102-0103 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{00}-{INFINITY_minus_1}]' => 'SANY',
+    '[\x{10C}-{INFINITY}\x{00}-{INFINITY}]' => 'SANY',
+    '[\x{10C}-{INFINITY}\x{101}-{INFINITY_minus_1}]' => 'ANYOF[0101-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{101}-{INFINITY}]' => 'ANYOF[0101-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{104}\x{102}\x{103}]' => 'ANYOF[0102-0104 
010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{101}]' => 'ANYOF[0101-0104 
010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{102}]' => 'ANYOF[0102-0104 
010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{103}]' => 'ANYOF[0102-0104 
010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{104}]' => 'ANYOF[0102-0104 
010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{105}]' => 'ANYOF[0102-0105 
010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{106}]' => 'ANYOF[0102-0104 0106 
010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}{INFINITY_minus_1}]' => 
'ANYOF[0102-0104 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}{INFINITY}]' => 'ANYOF[0102-0104 
010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{101}-{INFINITY_minus_1}]' => 
'ANYOF[0101-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{101}-{INFINITY}]' => 
'ANYOF[0101-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{102}-{INFINITY_minus_1}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{102}-{INFINITY}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{103}-{INFINITY_minus_1}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{103}-{INFINITY}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{104}-{INFINITY_minus_1}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{104}-{INFINITY}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{105}-{INFINITY_minus_1}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{105}-{INFINITY}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{106}-{INFINITY_minus_1}]' => 
'ANYOF[0102-0104 0106-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{106}-{INFINITY}]' => 
'ANYOF[0102-0104 0106-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{101}]' => 
'ANYOF[0101-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{102}]' => 
'ANYOF[0102-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{103}]' => 
'ANYOF[0102-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{104}]' => 
'ANYOF[0102-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{105}]' => 
'ANYOF[0102-0105 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{106}]' => 
'ANYOF[0102-0104 0106 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{107}]' => 
'ANYOF[0102-0104 0107-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{108}]' => 
'ANYOF[0102-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{109}]' => 
'ANYOF[0102-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{10A}]' => 
'ANYOF[0102-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{10B}]' => 
'ANYOF[0102-0104 0108-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}{INFINITY_minus_1}]' => 
'ANYOF[0102-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}{INFINITY}]' => 
'ANYOF[0102-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{102}]' => 
'ANYOF[0101-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{103}]' => 
'ANYOF[0101-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{104}]' => 
'ANYOF[0101-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{105}]' => 
'ANYOF[0101-0105 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{106}]' => 
'ANYOF[0101-0106 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{107}]' => 
'ANYOF[0101-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{108}]' => 
'ANYOF[0101-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{109}]' => 
'ANYOF[0101-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{10A}]' => 
'ANYOF[0101-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{101}-\x{10B}]' => 
'ANYOF[0101-INFINITY]',
+    
'[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{101}-{INFINITY_minus_1}]' 
=> 'ANYOF[0101-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{101}-{INFINITY}]' => 
'ANYOF[0101-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{102}]' => 
'ANYOF[0102-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{103}]' => 
'ANYOF[0102-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{104}]' => 
'ANYOF[0102-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{105}]' => 
'ANYOF[0102-0105 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{106}]' => 
'ANYOF[0102-0106 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{107}]' => 
'ANYOF[0102-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{108}]' => 
'ANYOF[0102-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{109}]' => 
'ANYOF[0102-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{10A}]' => 
'ANYOF[0102-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{10B}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{102}-\x{10C}]' => 
'ANYOF[0102-INFINITY]',
+    
'[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{102}-{INFINITY_minus_1}]' 
=> 'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{102}-{INFINITY}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{103}]' => 
'ANYOF[0102-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{104}]' => 
'ANYOF[0102-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{105}]' => 
'ANYOF[0102-0105 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{106}]' => 
'ANYOF[0102-0106 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{107}]' => 
'ANYOF[0102-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{108}]' => 
'ANYOF[0102-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{109}]' => 
'ANYOF[0102-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{10A}]' => 
'ANYOF[0102-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{10B}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{103}-\x{10C}]' => 
'ANYOF[0102-INFINITY]',
+    
'[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{103}-{INFINITY_minus_1}]' 
=> 'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{103}-{INFINITY}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{104}]' => 
'ANYOF[0102-0104 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{104}-\x{105}]' => 
'ANYOF[0102-0105 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{104}-\x{106}]' => 
'ANYOF[0102-0106 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{104}-\x{107}]' => 
'ANYOF[0102-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{104}-\x{108}]' => 
'ANYOF[0102-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{104}-\x{109}]' => 
'ANYOF[0102-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{104}-\x{10A}]' => 
'ANYOF[0102-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{104}-\x{10B}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{104}-\x{10C}]' => 
'ANYOF[0102-INFINITY]',
+    
'[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{104}-{INFINITY_minus_1}]' 
=> 'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{104}-{INFINITY}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{105}]' => 
'ANYOF[0102-0105 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{105}-\x{106}]' => 
'ANYOF[0102-0106 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{105}-\x{107}]' => 
'ANYOF[0102-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{105}-\x{108}]' => 
'ANYOF[0102-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{105}-\x{109}]' => 
'ANYOF[0102-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{105}-\x{10A}]' => 
'ANYOF[0102-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{105}-\x{10B}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{105}-\x{10C}]' => 
'ANYOF[0102-INFINITY]',
+    
'[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{105}-{INFINITY_minus_1}]' 
=> 'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{105}-{INFINITY}]' => 
'ANYOF[0102-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{106}]' => 
'ANYOF[0102-0104 0106 0108-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{106}-\x{107}]' => 
'ANYOF[0102-0104 0106-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{106}-\x{108}]' => 
'ANYOF[0102-0104 0106-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{106}-\x{109}]' => 
'ANYOF[0102-0104 0106-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{106}-\x{10A}]' => 
'ANYOF[0102-0104 0106-010A 010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{106}-\x{10B}]' => 
'ANYOF[0102-0104 0106-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{106}-\x{10C}]' => 
'ANYOF[0102-0104 0106-INFINITY]',
+    
'[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{106}-{INFINITY_minus_1}]' 
=> 'ANYOF[0102-0104 0106-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{108}-\x{10A}\x{106}-{INFINITY}]' => 
'ANYOF[0102-0104 0106-INFINITY]',
+    '[\x{106}-{INFINITY}\x{104}]' => 'ANYOF[0104 0106-INFINITY]',
+    '[\x{106}-{INFINITY}\x{105}]' => 'ANYOF[0105-INFINITY]',
+    '[\x{106}-{INFINITY}\x{106}]' => 'ANYOF[0106-INFINITY]',
+    '[\x{106}-{INFINITY}\x{107}]' => 'ANYOF[0106-INFINITY]',
+    '[\x{106}-{INFINITY}\x{104}-\x{105}]' => 'ANYOF[0104-INFINITY]',
+    '[\x{106}-{INFINITY}\x{104}-\x{106}]' => 'ANYOF[0104-INFINITY]',
+    '[\x{106}-{INFINITY}\x{104}-\x{107}]' => 'ANYOF[0104-INFINITY]',
+    '[\x{106}-{INFINITY}\x{104}-{INFINITY_minus_1}]' => 'ANYOF[0104-INFINITY]',
+    '[\x{106}-{INFINITY}\x{104}-{INFINITY}]' => 'ANYOF[0104-INFINITY]',
+    '[\x{106}-{INFINITY}\x{105}-\x{106}]' => 'ANYOF[0105-INFINITY]',
+    '[\x{106}-{INFINITY}\x{105}-\x{107}]' => 'ANYOF[0105-INFINITY]',
+    '[\x{106}-{INFINITY}\x{105}-{INFINITY_minus_1}]' => 'ANYOF[0105-INFINITY]',
+    '[\x{106}-{INFINITY}\x{105}-{INFINITY}]' => 'ANYOF[0105-INFINITY]',
+    '[\x{106}-{INFINITY}\x{106}-\x{107}]' => 'ANYOF[0106-INFINITY]',
+    '[\x{106}-{INFINITY}\x{106}-{INFINITY_minus_1}]' => 'ANYOF[0106-INFINITY]',
+    '[\x{106}-{INFINITY}\x{106}-{INFINITY}]' => 'ANYOF[0106-INFINITY]',
+    '[\x{106}-{INFINITY}\x{107}-\x{107}]' => 'ANYOF[0106-INFINITY]',
+    '[\x{106}-{INFINITY}\x{107}-{INFINITY_minus_1}]' => 'ANYOF[0106-INFINITY]',
+    '[\x{106}-{INFINITY}\x{107}-{INFINITY}]' => 'ANYOF[0106-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{100}]' => 'ANYOF[0100 0102-0104 
010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{101}]' => 'ANYOF[0101-0104 
010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{102}]' => 'ANYOF[0102-0104 
010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{103}]' => 'ANYOF[0102-0104 
010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{104}]' => 'ANYOF[0102-0104 
010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{105}]' => 'ANYOF[0102-0105 
010C-INFINITY]',
+    '[\x{10C}-{INFINITY}\x{102}-\x{104}\x{106}]' => 'ANYOF[0102-0104 0106 
010C-INFINITY]',
+);
+
+# 2**32-1 or 2**64-1
+my $highest_cp_string = "F" x (($Config{uvsize} < 8) ? 8 : 16);
+
+my $next_highest_cp_string = $highest_cp_string =~ s/ F $ /E/xr;
+
+my $highest_cp = "\\x{$highest_cp_string}";
+my $next_highest_cp = "\\x{$next_highest_cp_string}";
+
+plan(scalar (@tests - 1) / 2);  # -1 because of the marker.
+
+my $skip_ebcdic = $::IS_EBCDIC;
+while (defined (my $test = shift @tests)) {
+
+    if ($test eq 'ebcdic_ok_below_this_marker') {
+        $skip_ebcdic = 0;
+        next;
+    }
+
+    my $expected = shift @tests;
+
+    SKIP: {
+        skip("test not ported to EBCDIC", 1) if $skip_ebcdic;
+
+        my $display_expected = $expected
+                                  =~ s/ INFINITY_minus_1 /$next_highest_cp/xgr;
+
+        # Convert platform-independent values to what is suitable for the
+        # platform
+        $test =~ s/{INFINITY}/$highest_cp/g;
+        $test =~ s/{INFINITY_minus_1}/$next_highest_cp/g;
+
+        $test = "qr/$test/";
+        my $actual_test = "use re qw(Debug COMPILE); $test";
+
+        my $result = fresh_perl($actual_test);
+        if ($? != 0) {  # Re-run so as to display STDERR.
+            fail($test);
+            fresh_perl($actual_test, { stderr => 0, verbose => 1 });
+            next;
+        }
+
+        # The Debug output will come back as a bunch of lines.  We are
+        # interested only in the line after /Final program/
+        my @lines = split /\n/, $result;
+        while (defined ($_ = shift @lines)) {
+            next unless /Final program/;
+            $_ = shift @lines;
+
+            s/ \s* \( \d+ \) \s* //x;   # Get rid of the node branch
+            s/ ^ \s* \d+ : \s* //x;     # ... And the node number
+
+            # Use platform-independent values
+            s/$highest_cp_string/INFINITY/g;
+            s/$next_highest_cp_string/INFINITY_minus_1/g;
+
+            is($_, $expected,
+               "Verify compilation of $test displays as $display_expected");
+            last;   # Discard the rest of this test's output
+        }
+    }
+}
diff --git a/t/re/pat.t b/t/re/pat.t
index e788b20..f6bd04a 100644
--- a/t/re/pat.t
+++ b/t/re/pat.t
@@ -23,7 +23,7 @@ BEGIN {
     skip_all_without_unicode_tables();
 }
 
-plan tests => 790;  # Update this when adding/deleting tests.
+plan tests => 796;  # Update this when adding/deleting tests.
 
 run_tests() unless caller;
 
@@ -973,6 +973,19 @@ sub run_tests {
         @b = grep /\s/, @a;
         @c = grep /[\s]/, @a;
         is("@b", "@c", $message);
+
+        # Test an inverted posix class with a char also in the class.
+        my $nbsp = chr utf8::unicode_to_native(0xA0);
+        my $non_s = chr utf8::unicode_to_native(0xA1);
+        my $pat_string = "[^\\S ]";
+        unlike(" ", qr/$pat_string/, "Verify ' ' !~ /$pat_string/");
+        like("\t", qr/$pat_string/, "Verify '\\t =~ /$pat_string/");
+        unlike($nbsp, qr/$pat_string/, "Verify non-utf8-NBSP !~ 
/$pat_string/");
+        utf8::upgrade($nbsp);
+        like($nbsp, qr/$pat_string/, "Verify utf8-NBSP =~ /$pat_string/");
+        unlike($non_s, qr/$pat_string/, "Verify non-utf8-inverted-bang !~ 
/$pat_string/");
+        utf8::upgrade($non_s);
+        unlike($non_s, qr/$pat_string/, "Verify utf8-inverted-bang !~ 
/$pat_string/");
     }
     {
         my $message = '\D, [\D], \d, [\d]';

--
Perl5 Master Repository

[perl.git] branch blead, updated. v5.25.2-160-gcd478ec

Reply via email to