#1>0xa0 is indeed a non-breaking space in 8859 and Unicode. I don't know 
>what the EBCDIC equivalent is ... a quick Google suggests that it might 
>be 0x41:
I tried andTESTOUT1:----------------/\H*\h+\V?\v{3,4}/     
\x09\x20\xa0X\x0a\x0b\x0c\x0d\x0a 0: \x09 \xa0X\x0a\x0b\x0c\x0d    
\x09\x20\xa0\x0a\x0b\x0c\x0d\x0a 0: \x09 \xa0\x0a\x0b\x0c\x0d    
\x09\x20\xa0\x0a\x0b\x0c 0: \x09 \xa0\x0a\x0b\x0c    ** Failers No match    
\x09\x20\xa0\x0a\x0bNo match
My tests:------------/\H*\h+\V?\v{3,4}/                           
\x05\x40\x41X\x15\x0b\x0c\x0d\x15    No match                                   
  \x05\x40\x41\x15\x0b\x0c\x0d\x15      0: \x05  \x15\x0b\x0c\x0d               
    \x05\x40\x41\x15\x0b\x0c              0: \x05  \x15\x0b\x0c                 
      ** Failers                           No match                             
        \x05\x40\x41\x15\x0b                 No match                           
      
0x41 is not recognized as any of \h, \t, \v, (not even as \s, but that is 
consistent with ASCII)
0x25 is not recognized as anything as well (but it is recognized as part of 
<any>, <bsr_unicode>)

I cannot reproduce accurately all the tests that involve 0x85 nor those that 
involve 0xa0.  Unless we think that something could and should be done about 
those, I would close my tests for testinput1 and testinpu2.  What do you think?

#2
An interesting point:  The Perlre in perldocs (5.20), document states: (The 
following all specify the same class of three characters: [-az] , [az-] , and 
[a\-z] . All are different from [a-z] , which specifies a class containing 
twenty-six characters, even on EBCDIC-based character sets.) 



Apparently, Perl somehow recognizes [a-z] and treats it as a special case in 
EBCDIC and ignore the non-letters gaps.  This is news to me.  Dis you know 
that?  I intend to ask in the perl-mvs forum what do they do about it.
Ze'ev Atlas
  
-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to