[pcre-dev] [Bug 791] UTF-8 support does not work on EBCDIC platforms

Martin Jerabek Fri, 19 Dec 2008 03:41:56 -0800

------- You are receiving this mail because: -------
You are on the CC list for the bug.


http://bugs.exim.org/show_bug.cgi?id=791




--- Comment #4 from Martin Jerabek <[email protected]>  
2008-12-17 13:20:19 ---
On 17.12.2008 12:03, Philip Hazel wrote:
> One other thought struck me: have you considered compiling two different 
> versions of PCRE? One would be for use in the EBCDIC case, the other for 
> use in the UTF-8 case.
This was my first thought because for our purposes we only need a UTF-8 
version of PCRE but trying to be a good open-source citizen I intended 
to strive for a general solution. If it is acceptable to you I will 
modify the sources in such a way that I replace all character constants 
with macros which are defined as normal literals (e.g. '*') or as UTF-8 
literals (e.g. '\x2A') depending on --enable-utf8:

- If --enable-utf8 (or --enable-unicode-properties) is *not* passed to 
configure, the macros evaluate to normal character literals just as they 
are used now. This is correct for ASCII and EBCDIC platforms and 
independent of --enable-ebcdic. On every platform, the "native" code 
page is used. On EBCDIC platforms, --enable-ebcdic would still have to 
be passed to configure.

- If --enable-utf8 is passed, the macros evaluate to ASCII/UTF-8 codes 
such as '\x2A' for asterisk on all platforms. This works on both 
non-EBCDIC platforms and on EBCDIC platforms in UTF-8 mode. If 
--enable-ebcdic is also passed, a warning is issued saying that the 
resulting PCRE library will *only* support UTF-8 and not EBCDIC. 
Alternatively configure could return an error in this case to make sure 
that the warning is not overlooked, and we could introduce a new option 
like --enable-utf8-ebcdic to compile a UTF-8-only PCRE library on EBCDIC 
platforms. In this case all appropriate functions would return an error 
if PCRE_UTF8 is not passed to them.

> It would be easy to rename all the functions by 
> defining some macros, so that you could link both versions into the same 
> product.
Do you really mean to replace *all* PCRE function names with macros 
which would would then be defined, e.g., with and without _ebcdic 
appended? I think this is rather messy but necessary if someone wanted 
to use both an EBCDIC-only and a UTF-8-only PCRE library in the same 
process. I got bitten too often by violations of the One Definition Rule 
to risk to have symbols with identical names in the same process because 
the run-time linker will probably choose the wrong one. I would rather 
avoid that and put the burden to implement this on the poor soul which 
really needs this. ;-)

Many thanks for your help
Martin


-- 
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev

[pcre-dev] [Bug 791] UTF-8 support does not work on EBCDIC platforms

Reply via email to