I ran pcretest with -C option to get:
PCRE version 8.37 2015-04-28Compiled with  EBCDIC code support: LF is 0x25  
8-bit support  No UTF-8 support  No Unicode properties support  No just-in-time 
compiler support  Newline sequence is a non-standard value: 0x0015  \R matches 
all Unicode newlines  Internal link size = 2  POSIX malloc threshold = 10  
Parentheses nest limit = 250  Default match limit = 10000000  Default recursion 
depth limit = 10000000  Match recursion uses stack

This reminded me an old conversation we've had sometime before<snip>I've been 
reading various web pages about EBCDIC systems, and they suggest that the NL 
EBCDIC character (0x15) is used as the equivalent ofASCII LF, though EBCDIC 
does have its own LF character (0x25) which was mentioned as sometimes used. 
Ze'ev's experience suggests that 0x15 isused in his environment. 
Unicode has a NEL character 0x85, which I guess should be equivalent toEBCDIC's 
0x25 when 0x15 is NL, as suggested in
http://unicode.org/standard/reports/tr13/tr13-5.html
I am not sure why the user who contributed the original EBCDIC patch used the 
name CHAR_NL rather than CHAR_LF for the character represented by '\n', but I 
guess it was because it is usually the NL character. I think I will change that 
name, and introduce CHAR_NEL for the other character.
Ze'ev: please can you check that '\n' really is 0x15 in your environment?<snip>
I wrote a little program that reads my test file to a chunk of memory, prints 
it and dump the memory.  You will see below that the equivalent of \n is indeed 
21=0x15.  My conclusion is that we need a way to tell PCRE that \n is not the 
LF character, but the NL character or otherwise, dictate what \n should be.

/(?<=foo\n)¬bar/Im    foo\x0Fbarbar    ***Failers    rhubarb    barbell    
abc\nbarton/¬(?<=foo\n)bar/Im    foo\x15barbar    ***Failers    rhubarb    
barbell    abc\nbarton/(?<=foo\x0F)¬bar/Im    foo\x0Fbarbar    ***Failers    
rhubarb    barbell    abc\nbarton/¬(?<=foo\x15)bar/Im    foo\x15barbar    
***Failers    rhubarb    barbell    abc\nbarton/(?>¬abc)/Im    abc    def\nabc  
  *** Failers    
defabc/(?<=ab(c+)d)ef//(?<=ab(?<=c+)d)ef//(?<=ab(c|de)f)g/082FF5E0 | 61 4D 6F 
4C 7E 86 96 96 E0 95 5D 5F 82 81 99 61 | /(?<=foo\n)¬bar/082FF5F0 | C9 94 15 40 
40 40 40 86 96 96 E0 A7 F0 C6 82 81 | Im     foo\x0Fba082FF600 | 99 82 81 99 40 
15 40 40 40 40 5C 5C 5C C6 81 89 | rbar      ***Fai082FF610 | 93 85 99 A2 15 40 
40 40 40 99 88 A4 82 81 99 82 | lers     rhubarb082FF620 | 15 40 40 40 40 82 81 
99 82 85 93 93 15 40 40 40 |      barbell082FF630 | 40 81 82 83 E0 95 82 81 99 
A3 96 95 15 15 61 5F |  abc\nbarton  /¬082FF640 | 4D 6F 4C 7E 86 96 96 E0 95 5D 
82 81 99 61 C9 94 | (?<=foo\n)bar/Im082FF650 | 15 40 40 40 40 86 96 96 E0 A7 F1 
F5 82 81 99 82 |      foo\x15barb082FF660 | 81 99 40 15 40 40 40 40 5C 5C 5C C6 
81 89 93 85 | ar      ***Faile082FF670 | 99 A2 15 40 40 40 40 99 88 A4 82 81 99 
82 15 40 | rs     rhubarb082FF680 | 40 40 40 82 81 99 82 85 93 93 15 40 40 40 
40 81 |    barbell     a082FF690 | 82 83 E0 95 82 81 99 A3 96 95 15 15 61 4D 6F 
4C | bc\nbarton  /(?<082FF6A0 | 7E 86 96 96 E0 A7 F0 C6 5D 5F 82 81 99 61 C9 94 
| =foo\x0F)¬bar/Im082FF6B0 | 15 40 40 40 40 86 96 96 E0 A7 F0 C6 82 81 99 82 |  
    foo\x0Fbarb082FF6C0 | 81 99 40 15 40 40 40 40 5C 5C 5C C6 81 89 93 85 | ar  
    ***Faile082FF6D0 | 99 A2 15 40 40 40 40 99 88 A4 82 81 99 82 15 40 | rs     
rhubarb082FF6E0 | 40 40 40 82 81 99 82 85 93 93 15 40 40 40 40 81 |    barbell  
   a082FF6F0 | 82 83 E0 95 82 81 99 A3 96 95 15 15 61 5F 4D 6F | bc\nbarton  
/¬(?082FF700 | 4C 7E 86 96 96 E0 A7 F1 F5 5D 82 81 99 61 C9 94 | 
<=foo\x15)bar/Im082FF710 | 15 40 40 40 40 86 96 96 E0 A7 F1 F5 82 81 99 82 |    
  foo\x15barb082FF720 | 81 99 40 15 40 40 40 40 5C 5C 5C C6 81 89 93 85 | ar    
  ***Faile082FF730 | 99 A2 15 40 40 40 40 99 88 A4 82 81 99 82 15 40 | rs     
rhubarb082FF740 | 40 40 40 82 81 99 82 85 93 93 15 40 40 40 40 81 |    barbell  
   a082FF750 | 82 83 E0 95 82 81 99 A3 96 95 15 15 61 4D 6F 6E | bc\nbarton  
/(?>082FF760 | 5F 81 82 83 5D 61 C9 94 15 40 40 40 40 81 82 83 | ¬abc)/Im     
abc082FF770 | 15 40 40 40 40 84 85 86 E0 95 81 82 83 15 40 40 |      
def\nabc082FF780 | 40 40 5C 5C 5C 40 C6 81 89 93 85 99 A2 15 40 40 |   *** 
Failers082FF790 | 40 40 84 85 86 81 82 83 15 15 61 4D 6F 4C 7E 81 |   defabc  
/(?<=a082FF7A0 | 82 4D 83 4E 5D 84 5D 85 86 61 15 15 61 4D 6F 4C | b(c+)d)ef/  
/(?<082FF7B0 | 7E 81 82 4D 6F 4C 7E 83 4E 5D 84 5D 85 86 61 15 | 
=ab(?<=c+)d)ef/082FF7C0 | 15 61 4D 6F 4C 7E 81 82 4D 83 4F 84 85 5D 86 5D |  
/(?<=ab(c|de)f)082FF7D0 | 87 61 15 __ __ __ __ __ __ __ __ __ __ __ __ __ | g/ 
Ze'ev Atlas


      From: Ze'ev Atlas <[email protected]>
 To: "[email protected]" <[email protected]> 
 Sent: Thursday, May 28, 2015 3:31 PM
 Subject: Re: PCRE on EBCDIC tests
   
\n is definitly not defined correctly
I ran 4 tests.  Note that new line is defined correctly as 21 = 0x15 in EBCDIC
/(?<=foo\n)¬bar/ImCapturing subpattern count = 0Max lookbehind = 4Contains 
explicit CR or LF matchOptions: multilineNo first charNeed char = 'r'    
foo\x0FbarbarNo match    ***FailersNo match    rhubarbNo match    barbellNo 
match    abc\nbartonNo match
/¬(?<=foo\n)bar/ImCapturing subpattern count = 0Max lookbehind = 4Contains 
explicit CR or LF matchOptions: multilineFirst char at start or follows 
newlineNeed char = 'r'    foo\x15barbarNo match    ***FailersNo match    
rhubarbNo match    barbellNo match    abc\nbartonNo match 
/(?<=foo\x0F)¬bar/ImCapturing subpattern count = 0Max lookbehind = 4Options: 
multilineNo first charNeed char = 'r'    foo\x0FbarbarNo match    ***FailersNo 
match    rhubarbNo match    barbellNo match    abc\nbartonNo match
/¬(?<=foo\x15)bar/ImCapturing subpattern count = 0Max lookbehind = 4Options: 
multilineFirst char at start or follows newlineNeed char = 'r'    foo\x15barbar 
0: bar    ***FailersNo match    rhubarbNo match    barbellNo match    
abc\nbartonNo match
In my config.h I have#ifndef NEWLINE#define NEWLINE 21#endif


  
-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to