Hi
Thanks Phillip

>I see from the pcretest.c file that
>an output file is opened with mode "wb", i.e. "binary". Maybe if you
>change this to just "w" it might behave differently. The output mode is
>held in a macro called OUTPUT_MODE, defined separately for Windows and
>non-Windows (though the same value is used for both for OUTPUT_MODE).
>There is also INPUT_MODE, which is set non-binary for Windows and binary
>for everything else. 


I changed both INPUT_MODE and OUTPUT_MODE under NATIVE_ZOS option (option in 
the z/OS config.h; macros in pcretest.c [or TESTD in my library]) to be simple 
'r' and 'w' correspondingly.  Now I get ["incorrect"] results printed correctly.

Here are somepreliminary results for a small part of input1.txt that 
demonstrate what I would have to deal with. 

As I'd suspected, the /i modifier does not work correctly under EBCDIC.  I 
will have to investigate that in the way EBCDIC tables are set and in the way 
the logic expects to identify the upper case-lower case pairs (I will look 
whether the logic knows to work with non-contiguous encoding and whether it 
knows that C1 (i.e 'A') corresponds correctly with x'81' (i.e. 'a') and so on.

The pattern /abcd\t\n\r\f\a\e\071\x3b\$\\\?caxyz/ would correctly NOT MATCH 
with abcd\t\n\r\f\a\e9;\$\\?caxyz because the octal 071 IS NOT the numeral 9 in 
EBCDIC (x'F0' is,) so I have to say that 'pcre' bits us to that even though the 
result is obviously different then the one in the ASCII world

As I have noted before, we have to use the character '¬' (x'5F') instead of the 
'^' (x'B0')

The pattern /¬(b+|a){1,2}?bc/ would not match bbc... one of many things I'll 
have to investigate

Similarly, the pattern /¬(b*|ba){1,2}?bc/ would not match bbabc although it 
matches:
0: babc
 1: ba
like the ASCII version.  There are similar results for similar pattern later.

Pattern /¬\ca\cA\c[\c{\c:/ does not match \x01\x01\e;z  - This is somewhat 
surprising because the shortcuts should be correct... I'll have to review the 
EBCDIC tables and what exactly \c[, \c{ and \c: mean in EBCDIC and how they are 
understood by pcre.  I suspect that they conform with the old 037 rather then 
1047, but even that is not a good enough explanation.

Besides resolving these issues and others that will come to light, I will have 
to develop EBCDIC specific tests.

BTW x'FF' is EO in EBCDIC and thus correctly abruptly stops the FTP

 
Ze'ev Atlas
-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to